Open
Description
As shown in below during training DPOTrainer loss goes down to 0.0 while at the end it reports the train_loss is 0.06, also in another try with another dataset still before the end of the epoch training loss drops to 0.0 and eval loss follows 2.14e-6.
Using a different dataset but other than that the implementation is mostly the same with https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/scripts/dpo/run_dpo.py
Metadata
Metadata
Assignees
Labels
No labels