added onEmitChunk callback to extract audio before onSpeechEnd. live audio #122 #191

gencerege · 2025-02-22T18:54:01Z

I was trying to implement whisper_streaming via a websocket. For the implementation I needed to access the frames before onSpeechEnd triggered. I first implemented it outside the package using onFrameProcessed callback, then implemented it as a callback onEmitChunk after seeing issues #186 #122 #68.

I have added a callback function onEmitChunk, that returns an audio segment of length numFramesToEmit * frameSamples.
when speech end is detected it returns the all the accumulated frames since last call to onEmitChunk.

After this implementation, I was able do live transcription solely using this callback, which I believe is a nice simplification.

I have also made a small modification to the algorithm by adding endSpeechPadFrames. Mainly, this allows flexibility in the ending region of the audio segment. As an example, one may want to wait 0.5s before ending speech segment, but a padding of 0.5s can be overkill, and prefer a smaller pad such as 0.2s.

Using endSpeechPadFrames and redemptionFrames together, I changed how audio buffer resets after speech end is detected. If there is frames that fall between endSpeechPadFrames and redemptionFrames, those are kept in the buffer to be used as preSpeechPadFrames in case speech starts right away.
Ideally preSpeechPadFrames + endSpeechPadFrames => redemptionFrames so that we can always pad with desired preSpeechPadFrames. However, even if this is not the case, as long as endSpeechPadFrames < redemptionFrames there is extra padding compared to just resetting the audio buffer.
This allows for better segmentation of speech when speech starts right after speech end is raised.
By removing/reducing the period where speech cannot be prepended and reducing the chance of starting sylabbles of buffer t to leak into buffer t - 1. the This is not a problem if consecutive buffers are appended before processing however, if they are processed seperately it is not ideal.

I would appreciate it if there are some things you might want to change. Please let me know!

[ X] Verified that changes work on the test site, adding changes to the test site if necessary to try out your changes
[ X] Updated relevant changelogs
[ X] Ran npm run format

…123#122

vercel · 2025-02-22T18:54:05Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
vad_test_site	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Feb 22, 2025 6:54pm

altyni86 · 2025-03-20T19:46:30Z

This is cool!

ricky0123 · 2025-05-04T19:09:40Z

Hey, thanks for the PR, I really appreciate it. I'm going to play around with it and get back to you. Thanks!

CertainLach · 2025-05-04T19:13:05Z

packages/web/src/frame-processor.ts

+          frames.length -
+            (this.options.redemptionFrames - this.options.endSpeechPadFrames)
+        )
+        const audio = concatArrays(audioBufferPad)
        handleEvent({ msg: Message.SpeechEnd, audio })


Shouldn't SpeechEnd be emitted after EmitChunk?

yourenyouyu · 2025-05-16T03:00:14Z

When will the code be merged?

niron1 · 2025-05-25T15:23:09Z

i'm not sure how to use this commit. all the examples are based on vad-web, not vad. and vad-web does not expose the newly added onEmitChunk

added onEmitChunk callback to extract audio before onSpeechEnd ricky0…

470ebbc

…123#122

vercel bot deployed to Preview February 22, 2025 18:54 View deployment

CertainLach reviewed May 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

added onEmitChunk callback to extract audio before onSpeechEnd. live audio #122 #191

added onEmitChunk callback to extract audio before onSpeechEnd. live audio #122 #191

Uh oh!

gencerege commented Feb 22, 2025

Uh oh!

vercel bot commented Feb 22, 2025 •

edited

Loading

Uh oh!

altyni86 commented Mar 20, 2025

Uh oh!

ricky0123 commented May 4, 2025

Uh oh!

CertainLach May 4, 2025

Uh oh!

yourenyouyu commented May 16, 2025

Uh oh!

niron1 commented May 25, 2025

Uh oh!

Uh oh!

Uh oh!

added onEmitChunk callback to extract audio before onSpeechEnd. live audio #122 #191

Are you sure you want to change the base?

added onEmitChunk callback to extract audio before onSpeechEnd. live audio #122 #191

Uh oh!

Conversation

gencerege commented Feb 22, 2025

Uh oh!

vercel bot commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

altyni86 commented Mar 20, 2025

Uh oh!

ricky0123 commented May 4, 2025

Uh oh!

CertainLach May 4, 2025

Choose a reason for hiding this comment

Uh oh!

yourenyouyu commented May 16, 2025

Uh oh!

niron1 commented May 25, 2025

Uh oh!

Uh oh!

vercel bot commented Feb 22, 2025 •

edited

Loading