Replies: 6 comments 8 replies
-
Have you tested the latest Git version?
I'm not sure if this is the root of your problem, although I could imagine it giving "exactly" 56 characters for languages like Chinese or Japanese which don't use spaces to separate words, and therefore provide Whisper with no simple method to avoid wrapping a line in the middle of a word. In most languages which use spaces, lines will rarely be exactly 56 characters because if a word doesn't entirely fit on the end of the current line, it is moved to the next line, with spaces dictating the break points. But I suspect your issue isn't to do with it being "exactly" 56 characters but rather that it doesn't have any idea of the natural points to split a line in sentences that are either too short or too long. E.g. You want it to at least split always after each sentence. The problem here is again that Whisper doesn't have a reliable way to detect sentence endings because it doesn't give you POS tags or anything like that. It can't just split the line after a "." because then a line like "I saw Mr. Jones the other day." would be split after "Mr." So instead it uses a heuristic where it splits after a significant pause (a few seconds). Since there is more likely to be a pause after "day." but not after "Mr." this gives somewhat a prediction of where to split without knowing anything about the POS tags. If you need something more sophisticated than that, you are actually doing the right thing by taking it into your own hands and implementing your own splitting logic. In your own application you would also be able to include other dependencies such as Spacy which would allow you to do POS and deprel tagging to develop a better heuristic on when to split the lines. Including Spacy as a dependency of Whisper is probably too heavyweight a dependency to include in core Whisper, but the API is extensible and so you could implement your own subtitle writer using tags from Spacy to choose smarter split points. One other thing you can do is try to edit the current pause threshold and make it shorter. Although note that if you make it too short, it will split too often. |
Beta Was this translation helpful? Give feedback.
-
I have a problem where the option doesn't work at all. It's just unstable of a feature. |
Beta Was this translation helpful? Give feedback.
-
… On Mon, Jun 9, 2025, 17:31 ryanheise ***@***.***> wrote:
@wenchaoliu-93 <https://github.com/wenchaoliu-93> , can you share an
example output SRT or VTT file along with the command line options used to
produce it?
—
Reply to this email directly, view it on GitHub
<#1543 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A47TGB3DQXXQQOAHZRIYF4T3CVH6DAVCNFSM6AAAAAB64NVRF6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGNBQHAZDQMY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Well, YouTube does it way better from personal experience.
…On Mon, Jun 9, 2025 at 9:21 PM ryanheise ***@***.***> wrote:
Forced alignment is actually what Whisper does internally (when the
--word_timestamps option is invoked).
—
Reply to this email directly, view it on GitHub
<#1543 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A47TGBZCFQRYMOHST5COPRT3CWC63AVCNFSM6AAAAAB64NVRF6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGNBRGAYTGNA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
It's called "auto-sync": Add subtitles & captions - YouTube Help
<https://support.google.com/youtube/answer/2734796?hl=en#zippy=%2Cauto-sync>
…On Mon, Jun 9, 2025 at 9:23 PM Wenchao Liu ***@***.***> wrote:
Well, YouTube does it way better from personal experience.
On Mon, Jun 9, 2025 at 9:21 PM ryanheise ***@***.***> wrote:
> Forced alignment is actually what Whisper does internally (when the
> --word_timestamps option is invoked).
>
> —
> Reply to this email directly, view it on GitHub
> <#1543 (reply in thread)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/A47TGBZCFQRYMOHST5COPRT3CWC63AVCNFSM6AAAAAB64NVRF6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGNBRGAYTGNA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
I indeed have a similar problem now that I have checked the actual output file. The line width varies, but vast majority is close to the max setting. I now wonder what YouTube uses for forced alignment, as it is so much better in my experience. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I realize that using --max_line_width doesn't seem to behave as expected (?)
When I add --max_line_width 56, all the sentences are exactly 56 characters.
It should be the maximum value allowed, but if the context and the sentence happens to be shorter, they shouldn't force their way to the max. (same concept as
width
andmax-width
in CSS).As a result, when I add --max_line_width, it often adds the next sentence as well, which really doesn't make sense in terms of subtitles.
In the end, I can not use --max_line_width at all. I don't use that setting, but instead manually truncates the ~10% sentences that are really too long.
Beta Was this translation helpful? Give feedback.
All reactions