Bulk repetition

I'm not sure if this is a variant of #412, but check out this partial output:

```plain
[00:25:16.880 --> 00:25:20.240]   And you're like, this character needs some like thigh highs and like, it should have
[00:25:20.240 --> 00:25:21.240]   been a bit of a dresser.
[00:25:21.240 --> 00:25:22.240]   It should have been a dresser.
[00:25:22.240 --> 00:25:23.240]   It should have been a dresser.
[00:25:23.240 --> 00:25:24.240]   It should have been a dresser.
[00:25:24.240 --> 00:25:25.240]   It should have been a dresser.
[3333 additional repetitions elided]
[01:21:40.240 --> 01:21:41.240]   It should have been a dresser.
[01:21:41.240 --> 01:21:42.240]   It should have been a dresser.
[01:21:42.240 --> 01:21:43.240]   It should have been a dresser.
[01:21:43.240 --> 01:21:44.240]   It should have been a dresser.
[01:21:44.240 --> 01:21:45.240]   It should have been a dresser.
[01:21:45.240 --> 01:21:51.240]   Whether it's true or not is first and foremost a bluff to stop you from doing the right thing.
```

Reproduction:

```bash
./models/download-ggml-model.sh base.en
make
curl -o episode.mp3 -L https://mcdn.podbean.com/mf/web/5ein65/07-31-Clear-Present-free.mp3
ffmpeg -ar 16 -i episode.mp3 episode.wav
./main -f episode.wav 
```

Standard error:

```plain
whisper_init_from_file: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  215.00 MB (+    6.00 MB per decoder)
whisper_model_load: kv self size  =    5.25 MB
whisper_model_load: kv cross size =   17.58 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: processing 'episode.wav' (94221793 samples, 5888.9 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


whisper_print_timings:     fallbacks =   3 p /   9 h
whisper_print_timings:     load time =   120.07 ms
whisper_print_timings:      mel time =  8174.57 ms
whisper_print_timings:   sample time = 21253.98 ms / 46180 runs (    0.46 ms per run)
whisper_print_timings:   encode time = 84284.79 ms /   246 runs (  342.62 ms per run)
whisper_print_timings:   decode time = 139710.86 ms / 46321 runs (    3.02 ms per run)
whisper_print_timings:    total time = 253756.25 ms
```

I'm on the main branch at v1.2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bulk repetition #471

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bulk repetition #471

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions