Skip to content

[ASR] - CUDA OOM with Vicuna-7B & Whisper Large V3 #214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ritaaadr opened this issue Mar 10, 2025 · 1 comment
Open

[ASR] - CUDA OOM with Vicuna-7B & Whisper Large V3 #214

ritaaadr opened this issue Mar 10, 2025 · 1 comment

Comments

@ritaaadr
Copy link

decode_whisper_large_linear_vicuna_7b.txt

Hello everybody, I am attempting to perform ASR using Vicuna-7B and Whisper Large V3 on a system with two NVIDIA GPUs (24GB each) connected via NVLink. However, I consistently encounter CUDA out of memory (OOM) errors, and only one GPU is utilized during execution, despite specifying multiple GPUs.

So far, I've tried:

  • Reducing Model Size: I attempted to downscale Whisper and use lower-precision settings (whisper base).
  • Mixed Precision: Enabled mixed_precision=true, but lowering precision further raised errors.
  • FSDP & DeepSpeed: Enabled both enable_fsdp=true and enable_deepspeed=true to optimize memory usage.
  • Multi-GPU Configuration: Set CUDA_VISIBLE_DEVICES=0,1, but Vicuna and Whisper both load onto the same GPU, leaving the second GPU idle.
  • Gradient Accumulation & Batch Size Adjustments: Lowered batch size (val_batch_size=1) and accumulation steps (gradient_accumulation_steps=1) without success.

So the issue is: I constantly get CUDA OOM and only one GPU is being used, while the second remains idle. I would like to load Vicuna-7B on one GPU and Whisper Large V3 on the other to distribute memory usage effectively, if possible. Is there a way to ensure that DeepSpeed or FSDP correctly distributes the model across both GPUs?

I've attached my configuration file.

thank you in advance

@ddlBoJack
Copy link
Collaborator

Sorry that we did not support muti gpus for inference ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants