You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am fine-tuning Qwen2-0.5b model with GRPO, I successfully launched the training with 1 GPU. However, the training was failed when I tried to launch with multiple gpus.
Uh oh!
There was an error while loading. Please reload this page.
Reproduction
I am fine-tuning Qwen2-0.5b model with GRPO, I successfully launched the training with 1 GPU. However, the training was failed when I tried to launch with multiple gpus.
The error shows that the linear layer mat2 requires 2-d matrix, not 1-d matrix.
I had checked the inputs to the model in both 1 GPU and multi-GPUs training are the same. I wonder what causes this error?
I also faced kernel assertion
I saw some previous issues saying that removing 'device_map="auto"' would resolve the problem, but it seems not in my case.
System Info
python3.9
torch 2.6.0
transformers 4.51.3
trl 0.17.0
h100 80G
Checklist
The text was updated successfully, but these errors were encountered: