Skip to content

Bulk repetition #471

Closed
Closed
@garthk

Description

@garthk

I'm not sure if this is a variant of #412, but check out this partial output:

[00:25:16.880 --> 00:25:20.240]   And you're like, this character needs some like thigh highs and like, it should have
[00:25:20.240 --> 00:25:21.240]   been a bit of a dresser.
[00:25:21.240 --> 00:25:22.240]   It should have been a dresser.
[00:25:22.240 --> 00:25:23.240]   It should have been a dresser.
[00:25:23.240 --> 00:25:24.240]   It should have been a dresser.
[00:25:24.240 --> 00:25:25.240]   It should have been a dresser.
[3333 additional repetitions elided]
[01:21:40.240 --> 01:21:41.240]   It should have been a dresser.
[01:21:41.240 --> 01:21:42.240]   It should have been a dresser.
[01:21:42.240 --> 01:21:43.240]   It should have been a dresser.
[01:21:43.240 --> 01:21:44.240]   It should have been a dresser.
[01:21:44.240 --> 01:21:45.240]   It should have been a dresser.
[01:21:45.240 --> 01:21:51.240]   Whether it's true or not is first and foremost a bluff to stop you from doing the right thing.

Reproduction:

./models/download-ggml-model.sh base.en
make
curl -o episode.mp3 -L https://mcdn.podbean.com/mf/web/5ein65/07-31-Clear-Present-free.mp3
ffmpeg -ar 16 -i episode.mp3 episode.wav
./main -f episode.wav 

Standard error:

whisper_init_from_file: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  215.00 MB (+    6.00 MB per decoder)
whisper_model_load: kv self size  =    5.25 MB
whisper_model_load: kv cross size =   17.58 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: processing 'episode.wav' (94221793 samples, 5888.9 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


whisper_print_timings:     fallbacks =   3 p /   9 h
whisper_print_timings:     load time =   120.07 ms
whisper_print_timings:      mel time =  8174.57 ms
whisper_print_timings:   sample time = 21253.98 ms / 46180 runs (    0.46 ms per run)
whisper_print_timings:   encode time = 84284.79 ms /   246 runs (  342.62 ms per run)
whisper_print_timings:   decode time = 139710.86 ms / 46321 runs (    3.02 ms per run)
whisper_print_timings:    total time = 253756.25 ms

I'm on the main branch at v1.2.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    decodingDecoding related issuesenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions