You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everybody, I am attempting to perform ASR using Vicuna-7B and Whisper Large V3 on a system with two NVIDIA GPUs (24GB each) connected via NVLink. However, I consistently encounter CUDA out of memory (OOM) errors, and only one GPU is utilized during execution, despite specifying multiple GPUs.
So far, I've tried:
Reducing Model Size: I attempted to downscale Whisper and use lower-precision settings (whisper base).
Mixed Precision: Enabled mixed_precision=true, but lowering precision further raised errors.
FSDP & DeepSpeed: Enabled both enable_fsdp=true and enable_deepspeed=true to optimize memory usage.
Multi-GPU Configuration: Set CUDA_VISIBLE_DEVICES=0,1, but Vicuna and Whisper both load onto the same GPU, leaving the second GPU idle.
Gradient Accumulation & Batch Size Adjustments: Lowered batch size (val_batch_size=1) and accumulation steps (gradient_accumulation_steps=1) without success.
So the issue is: I constantly get CUDA OOM and only one GPU is being used, while the second remains idle. I would like to load Vicuna-7B on one GPU and Whisper Large V3 on the other to distribute memory usage effectively, if possible. Is there a way to ensure that DeepSpeed or FSDP correctly distributes the model across both GPUs?
I've attached my configuration file.
thank you in advance
The text was updated successfully, but these errors were encountered:
decode_whisper_large_linear_vicuna_7b.txt
Hello everybody, I am attempting to perform ASR using Vicuna-7B and Whisper Large V3 on a system with two NVIDIA GPUs (24GB each) connected via NVLink. However, I consistently encounter CUDA out of memory (OOM) errors, and only one GPU is utilized during execution, despite specifying multiple GPUs.
So far, I've tried:
mixed_precision=true
, but lowering precision further raised errors.enable_fsdp=true
andenable_deepspeed=true
to optimize memory usage.CUDA_VISIBLE_DEVICES=0,1
, but Vicuna and Whisper both load onto the same GPU, leaving the second GPU idle.val_batch_size=1
) and accumulation steps (gradient_accumulation_steps=1
) without success.So the issue is: I constantly get CUDA OOM and only one GPU is being used, while the second remains idle. I would like to load Vicuna-7B on one GPU and Whisper Large V3 on the other to distribute memory usage effectively, if possible. Is there a way to ensure that DeepSpeed or FSDP correctly distributes the model across both GPUs?
I've attached my configuration file.
thank you in advance
The text was updated successfully, but these errors were encountered: