Closed
Description
Reproduction
When the mask_truncated_completions
is True, the values logged for the completion lengths are incorrect, for the max they are equal to the terminated completions and for in they are often zero. This is because we use the attn masks to calculate the lengths and these are zeroed for the truncated sequences.
The issue is obvious when you look at the min completion lengths:
It should be a simple change to make a copy of the mask here:
trl/trl/trainer/grpo_trainer.py
Line 1110 in a528b9c
and then gathering the copied masks here:
trl/trl/trainer/grpo_trainer.py
Line 1215 in a528b9c
System Info
- Platform: Linux-5.15.0-1049-aws-x86_64-with-glibc2.31
- Python version: 3.11.11
- TRL version: 0.18.0.dev0+a528b9c
- PyTorch version: 2.6.0
- accelerator(s): cpu
- Transformers version: 4.51.3
- Accelerate version: 1.3.0
- Accelerate config: not found
- Datasets version: 3.4.1
- HF Hub version: 0.30.2
- bitsandbytes version: 0.45.3
- DeepSpeed version: 0.16.4
- Diffusers version: 0.32.2
- Liger-Kernel version: 0.5.8
- LLM-Blender version: 0.0.2
- OpenAI version: 1.66.5
- PEFT version: 0.14.0
- vLLM version: 0.8.4
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete