issue in inference_s2s_batch.sh #218

Lalaramarya · 2025-03-31T16:10:03Z

#######Thank you for your help in resolving the earlier issues! However, I'm now facing a new problem during inference:

Generating: 0%| | 0/3000 [00:00<?, ?it/s]We detected that you are passing past_key_values as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate Cache class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
Generating: 16%|████████████████████████▊ | 469/3000 [00:24<02:12, 19.07it/s]
[2025-03-31 20:48:37][root][INFO] - LLM Inference Time: 25.14s
Error executing job with overrides: ['++model_config.llm_name=qwen2-0.5b', '++model_config.llm_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/Qwen2-0.5B', '++model_config.llm_dim=896', '++model_config.encoder_name=whisper', '++model_config.encoder_projector_ds_rate=5', '++model_config.encoder_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/small.pt', '++model_config.encoder_dim=768', '++model_config.encoder_projector=linear', '++model_config.codec_decoder_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/pretrained_models/CosyVoice-300M-SFT', '++model_config.codec_decode=true', '++model_config.vocab_config.code_layer=3', '++model_config.vocab_config.total_audio_vocabsize=4160', '++model_config.vocab_config.total_vocabsize=156160', '++model_config.code_type=CosyVoice', '++model_config.codec_decoder_type=CosyVoice', '++model_config.group_decode=true', '++model_config.group_decode_adapter_type=linear', '++dataset_config.dataset=speech_dataset_s2s', '++dataset_config.val_data_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni/data/dev_manifest.jsonl', '++dataset_config.train_data_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni/data/dev_manifest.jsonl', '++dataset_config.input_type=mel', '++dataset_config.mel_size=80', '++dataset_config.inference_mode=true', '++dataset_config.manifest_format=jsonl', '++dataset_config.split_size=0.002', '++dataset_config.load_from_cache_file=false', '++dataset_config.task_type=s2s', '++dataset_config.seed=777', '++dataset_config.vocab_config.code_layer=3', '++dataset_config.vocab_config.total_audio_vocabsize=4160', '++dataset_config.vocab_config.total_vocabsize=156160', '++dataset_config.code_type=CosyVoice', '++dataset_config.num_latency_tokens=0', '++dataset_config.do_layershift=false', '++train_config.model_name=s2s', '++train_config.freeze_encoder=true', '++train_config.freeze_llm=true', '++train_config.freeze_encoder_projector=true', '++train_config.freeze_group_decode_adapter=true', '++train_config.batching_strategy=custom', '++train_config.num_epochs=1', '++train_config.val_batch_size=1', '++train_config.num_workers_dataloader=2', '++train_config.task_type=s2s', '++decode_config.text_repetition_penalty=1.2', '++decode_config.audio_repetition_penalty=1.2', '++decode_config.max_new_tokens=3000', '++decode_config.task_type=s2s', '++decode_config.do_sample=false', '++decode_config.top_p=1.0', '++decode_config.top_k=0', '++decode_config.temperature=1.0', '++decode_config.decode_text_only=false', '++decode_config.do_layershift=false', '++decode_log=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/Qwen2-0.5b-whisper_small-latency0-group3-single-round-English-20250201T121121Z-002/Qwen2-0.5b-whisper_small-latency0-group3-single-round-English/s2s_decode__trp1.2_arp1.2_seed777_greedy', '++decode_config.num_latency_tokens=0', '++ckpt_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/Qwen2-0.5b-whisper_small-latency0-group3-single-round-English-20250201T121121Z-002/Qwen2-0.5b-whisper_small-latency0-group3-single-round-English/model.pt', '++output_text_only=false', '++inference_online=false', '++speech_sample_rate=22050', '++audio_prompt_path=/DATA/Lalaram/SLAM_omni_Jsn/SLAM-LLM/examples/s2s/audio_prompt/en/prompt_3.wav']
Traceback (most recent call last):
File "/DATA/Lalaram/SLAM_omni_Jsn/SLAM-LLM/examples/s2s/inference_s2s.py", line 102, in main_hydra
batch_inference(cfg)
File "/DATA/Lalaram/SLAM_omni_Jsn/SLAM-LLM/examples/s2s/generate/generate_s2s_batch.py", line 176, in main
q.write(key + "\t" + source_text + "\n")
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I'm facing this issue while running inference_s2s_batch.sh with both the pre-trained and fine-tuned models. However, when I load the pre-trained model using inference_s2s_online.sh, it successfully generates both the target text and audio. Please look into this.

The text was updated successfully, but these errors were encountered:

cwx-worst-one · 2025-04-02T06:59:15Z

It seems that the JSONL file you provided doesn't contain the key field, which results in its value being NoneType. You can either add the missing key field to your data or simply remove the line q.write(key + "\t" + source_text + "\n") from the code manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue in inference_s2s_batch.sh #218

issue in inference_s2s_batch.sh #218

Lalaramarya commented Mar 31, 2025

cwx-worst-one commented Apr 2, 2025

Uh oh!

issue in inference_s2s_batch.sh #218

issue in inference_s2s_batch.sh #218

Comments

Lalaramarya commented Mar 31, 2025

I'm facing this issue while running inference_s2s_batch.sh with both the pre-trained and fine-tuned models. However, when I load the pre-trained model using inference_s2s_online.sh, it successfully generates both the target text and audio. Please look into this.

cwx-worst-one commented Apr 2, 2025

Uh oh!