Whisper V3 Large Turbo – Words/Sec Capped at ~284? Bottleneck or Parallelism Limit? #2593
Closed
AryanSakhala
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
During a benchmarking run I have been doing, I found openai/whisper-large-v3-turbo showing some strange behaviour.
Irrespective of the concurrency or Sampling Rate of the audio, Words/Sec were constant around ~284.
Am I missing something?
I am using Loadbalancer - nginx
I have deployed it using Vllm
The architecture only uses 4 decoder layers (compared to 32 in Whisper Large), so I expected higher parallelism, but it seems capped.
Is this:
Would love to hear from others who’ve tried pushing this model to its limits.
Beta Was this translation helpful? Give feedback.
All reactions