IndexError: pop from an empty deque in run_r1_grpo.py

Dear @philschmid ,

Thank you so much for sharing the great tutorial on [Mini-R1](https://www.philschmid.de/mini-deepseek-r1).

I followed the tutorial to run the code. My workstation has three A6000 GPUs and I set num_processes=7

I got the following error. Any comments are highly appreciated.

```bash
2025-01-31 10:18:34,579 - __main__ - INFO - *** Starting training 2025-01-31 10:18:34 for 3.0 epochs***
INFO:__main__:*** Starting training 2025-01-31 10:18:34 for 3.0 epochs***
Parameter Offload: Total persistent parameters: 241664 in 181 params
  0%|▏                                                                                      | 1/450 [00:12<1:37:12, 12.99s/it][rank1]: Traceback (most recent call last):
[rank1]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 273, in <module>
[rank1]:     main()
[rank1]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 269, in main
[rank1]:     grpo_function(model_args, script_args, training_args)
[rank1]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 230, in grpo_function
[rank1]:     train_result = trainer.train(resume_from_checkpoint=last_checkpoint)
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
[rank1]:     return inner_training_loop(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
[rank1]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 444, in compute_loss
[rank1]:     per_token_logps = get_per_token_logps(model, prompt_completion_ids, num_logits_to_keep)
[rank1]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 432, in get_per_token_logps
[rank1]:     logits = model(input_ids, num_logits_to_keep=num_logits_to_keep + 1).logits  # (B, L, V)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank1]:     loss = self.module(*inputs, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank1]:     return inner()
[rank1]:            ^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in inner
[rank1]:     args_result = hook(self, args)
[rank1]:                   ^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 228, in _start_of_forward_hook
[rank1]:     self.get_param_coordinator().reset_step()
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 232, in reset_step
[rank1]:     self.construct_parameter_trace_from_module_trace()
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 216, in construct_parameter_trace_from_module_trace
[rank1]:     self.record_parameters(sub_module)
[rank1]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 208, in record_parameters
[rank1]:     step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: IndexError: pop from an empty deque
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 273, in <module>
[rank0]:     main()
[rank0]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 269, in main
[rank0]:     grpo_function(model_args, script_args, training_args)
[rank0]:   File "/home/jma/Documents/prototype/run_r1_grpo.py", line 230, in grpo_function
[rank0]:     train_result = trainer.train(resume_from_checkpoint=last_checkpoint)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
[rank0]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 444, in compute_loss
[rank0]:     per_token_logps = get_per_token_logps(model, prompt_completion_ids, num_logits_to_keep)
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 432, in get_per_token_logps
[rank0]:     logits = model(input_ids, num_logits_to_keep=num_logits_to_keep + 1).logits  # (B, L, V)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]:     loss = self.module(*inputs, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:            ^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in inner
[rank0]:     args_result = hook(self, args)
[rank0]:                   ^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 228, in _start_of_forward_hook
[rank0]:     self.get_param_coordinator().reset_step()
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 232, in reset_step
[rank0]:     self.construct_parameter_trace_from_module_trace()
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 216, in construct_parameter_trace_from_module_trace
[rank0]:     self.record_parameters(sub_module)
[rank0]:   File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 208, in record_parameters
[rank0]:     step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: IndexError: pop from an empty deque
  0%|▏                                                                                      | 1/450 [00:54<6:48:11, 54.55s/it]
W0131 10:19:35.719000 192129 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 192383 closing signal SIGTERM
E0131 10:19:36.439000 192129 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 192384) of binary: /home/jma/anaconda3/envs/r1demo/bin/python
Traceback (most recent call last):
  File "/home/jma/anaconda3/envs/r1demo/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1157, in launch_command
    deepspeed_launcher(args)
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/launch.py", line 845, in deepspeed_launcher
    distrib_run.run(args)
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
run_r1_grpo.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
```

Here are the configs:

```
# Model arguments
model_name_or_path: Qwen2.5-3B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2
bf16: true
tf32: true
output_dir: log/qwen-2.5-3b-r1-countdown

# Dataset arguments
dataset_id_or_path: Countdown-Tasks-3to4

# Lora Arguments
# No LoRA is used here

# Training arguments
max_steps: 450
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
learning_rate: 5.0e-7 # 1.0e-6 as in the deepseek math paper 5-e7 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
lr_scheduler_type: cosine
warmup_ratio: 0.03
# GRPO specific parameters
beta: 0.001 # 0.04 as in the deepseek math paper 0.001 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
max_prompt_length: 256
max_completion_length: 1024
num_generations: 8
use_vllm: true
vllm_device: "cuda:2"
vllm_gpu_memory_utilization: 0.5

# Logging arguments
logging_strategy: steps
logging_steps: 2
report_to:
- tensorboard
save_strategy: "steps"
save_steps: 25
seed: 42

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IndexError: pop from an empty deque in run_r1_grpo.py #76

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

IndexError: pop from an empty deque in run_r1_grpo.py #76

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions