Skip to content

WhisperTextStreamer token_ids must be a non-empty array of integers #1273

Closed
@SpeedyGonzaless

Description

@SpeedyGonzaless

System Info

@huggingface/transformers 3.4.2

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I am using AutomaticSpeechRecognitionPipeline (automatic-speech-recognition) and when I try to define new WhisperTextStreamer using tokenizer from this pipeline I get error:
"token_ids must be a non-empty array of integers"

This problem was not happening on versions before 3.4.0

Reproduction

Define pipeline:

const transcriber = pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
                dtype: {
                    encoder_model:
                        this.model === "onnx-community/whisper-large-v3-turbo"
                            ? "fp16"
                            : "fp32",
                    decoder_model_merged: 'q4',
                },
                device: 'webgpu',
                progress_callback,
            });

And then try to define WhesperTextStreamer:

const streamer = new WhisperTextStreamer(transcriber.tokenizer, {
        time_precision,
        on_chunk_start: (x) => {
            const offset = (chunk_length_s - stride_length_s) * chunk_count;
            chunks.push({
                text: "",
                timestamp: [offset + x, null],
                finalised: false,
                offset,
            });
        },
        token_callback_function: () => {
            start_time = start_time || performance.now();
            if (num_tokens++ > 0) {
                tps = (num_tokens / (performance.now() - start_time)) * 1000;
            }
        },
        callback_function: (x) => {
            if (chunks.length === 0) return;
            chunks.at(-1).text += x;
            console.log('chunk', chunks.at(-1).text);
            chrome.runtime.sendMessage({
                status: 'update',
                data: {chunks, tps},
            });
        },
        on_chunk_end: (x) => {
            const current = chunks.at(-1);
            current.timestamp[1] = x + current.offset;
            current.finalised = true;
        },
        on_finalize: () => {
            start_time = null;
            num_tokens = 0;
            chunk_count++;
        },
    });

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions