DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.15 - the loss during training and at end differs substantially

As shown in below during training DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.06, also in another try with another dataset still before the end of the epoch training loss drops to 0.0 and eval loss follows 2.14e-6. ![Image](https://github.com/user-attachments/assets/2f7d016a-dda3-4bf4-9062-d48ef81d511b)

Using a different dataset but other than that the implementation is mostly the same with https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/scripts/dpo/run_dpo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.15 - the loss during training and at end differs substantially #80

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.15 - the loss during training and at end differs substantially #80

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions