Open
Description
Dear @philschmid ,
Thank you so much for sharing the great tutorial on Mini-R1.
I followed the tutorial to run the code. My workstation has three A6000 GPUs and I set num_processes=7
I got the following error. Any comments are highly appreciated.
2025-01-31 10:18:34,579 - __main__ - INFO - *** Starting training 2025-01-31 10:18:34 for 3.0 epochs***
INFO:__main__:*** Starting training 2025-01-31 10:18:34 for 3.0 epochs***
Parameter Offload: Total persistent parameters: 241664 in 181 params
0%|▏ | 1/450 [00:12<1:37:12, 12.99s/it][rank1]: Traceback (most recent call last):
[rank1]: File "/home/jma/Documents/prototype/run_r1_grpo.py", line 273, in <module>
[rank1]: main()
[rank1]: File "/home/jma/Documents/prototype/run_r1_grpo.py", line 269, in main
[rank1]: grpo_function(model_args, script_args, training_args)
[rank1]: File "/home/jma/Documents/prototype/run_r1_grpo.py", line 230, in grpo_function
[rank1]: train_result = trainer.train(resume_from_checkpoint=last_checkpoint)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
[rank1]: return inner_training_loop(
[rank1]: ^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank1]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
[rank1]: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 444, in compute_loss
[rank1]: per_token_logps = get_per_token_logps(model, prompt_completion_ids, num_logits_to_keep)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 432, in get_per_token_logps
[rank1]: logits = model(input_ids, num_logits_to_keep=num_logits_to_keep + 1).logits # (B, L, V)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]: return forward_call(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank1]: ret_val = func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank1]: loss = self.module(*inputs, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank1]: return inner()
[rank1]: ^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in inner
[rank1]: args_result = hook(self, args)
[rank1]: ^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank1]: ret_val = func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 228, in _start_of_forward_hook
[rank1]: self.get_param_coordinator().reset_step()
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank1]: return fn(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 232, in reset_step
[rank1]: self.construct_parameter_trace_from_module_trace()
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 216, in construct_parameter_trace_from_module_trace
[rank1]: self.record_parameters(sub_module)
[rank1]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 208, in record_parameters
[rank1]: step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: IndexError: pop from an empty deque
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/jma/Documents/prototype/run_r1_grpo.py", line 273, in <module>
[rank0]: main()
[rank0]: File "/home/jma/Documents/prototype/run_r1_grpo.py", line 269, in main
[rank0]: grpo_function(model_args, script_args, training_args)
[rank0]: File "/home/jma/Documents/prototype/run_r1_grpo.py", line 230, in grpo_function
[rank0]: train_result = trainer.train(resume_from_checkpoint=last_checkpoint)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
[rank0]: return inner_training_loop(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
[rank0]: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 444, in compute_loss
[rank0]: per_token_logps = get_per_token_logps(model, prompt_completion_ids, num_logits_to_keep)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 432, in get_per_token_logps
[rank0]: logits = model(input_ids, num_logits_to_keep=num_logits_to_keep + 1).logits # (B, L, V)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]: ret_val = func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]: loss = self.module(*inputs, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]: return inner()
[rank0]: ^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in inner
[rank0]: args_result = hook(self, args)
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]: ret_val = func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 228, in _start_of_forward_hook
[rank0]: self.get_param_coordinator().reset_step()
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 232, in reset_step
[rank0]: self.construct_parameter_trace_from_module_trace()
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 216, in construct_parameter_trace_from_module_trace
[rank0]: self.record_parameters(sub_module)
[rank0]: File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 208, in record_parameters
[rank0]: step_id = self.__step_id_module_fetched_for[sub_module.id].popleft()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: IndexError: pop from an empty deque
0%|▏ | 1/450 [00:54<6:48:11, 54.55s/it]
W0131 10:19:35.719000 192129 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 192383 closing signal SIGTERM
E0131 10:19:36.439000 192129 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 192384) of binary: /home/jma/anaconda3/envs/r1demo/bin/python
Traceback (most recent call last):
File "/home/jma/anaconda3/envs/r1demo/bin/accelerate", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1157, in launch_command
deepspeed_launcher(args)
File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/accelerate/commands/launch.py", line 845, in deepspeed_launcher
distrib_run.run(args)
File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jma/anaconda3/envs/r1demo/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
run_r1_grpo.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
Here are the configs:
# Model arguments
model_name_or_path: Qwen2.5-3B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2
bf16: true
tf32: true
output_dir: log/qwen-2.5-3b-r1-countdown
# Dataset arguments
dataset_id_or_path: Countdown-Tasks-3to4
# Lora Arguments
# No LoRA is used here
# Training arguments
max_steps: 450
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
learning_rate: 5.0e-7 # 1.0e-6 as in the deepseek math paper 5-e7 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
lr_scheduler_type: cosine
warmup_ratio: 0.03
# GRPO specific parameters
beta: 0.001 # 0.04 as in the deepseek math paper 0.001 from https://hijkzzz.notion.site/unraveling-rlhf-and-its-variants-engineering-insights#147d9a33ecc9806090f3d5c749d31f05
max_prompt_length: 256
max_completion_length: 1024
num_generations: 8
use_vllm: true
vllm_device: "cuda:2"
vllm_gpu_memory_utilization: 0.5
# Logging arguments
logging_strategy: steps
logging_steps: 2
report_to:
- tensorboard
save_strategy: "steps"
save_steps: 25
seed: 42
Metadata
Metadata
Assignees
Labels
No labels