🤧 LD-DPO support #3458

AIR-hl · 2025-05-16T12:35:04Z

What does this PR do?

This PR adds LD-DPO implementations to TRL. This paper has been accepted by CoRR 2024

Liu W, Bai Y, Han C, et al. Length Desensitization in Directed Preference Optimization[J]. CoRR, 2024.

This paper proposed a novel method aims to desensitize DPO to data length by decoupling explicit length preference, which is relatively insignificant, from the other implicit preferences, thereby enabling more effective learning of the intrinsic preferences

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

kashif · 2025-05-16T12:38:35Z

congrats! can you also kindly add the method in the dpo documentation with some description?

AIR-hl · 2025-05-16T13:33:40Z

congrats! can you also kindly add the method in the dpo documentation with some description?

done

docs/source/dpo_trainer.md

kashif · 2025-05-16T13:43:43Z

also you can add a small test too?

AIR-hl · 2025-05-16T14:44:59Z

also you can add a small test too?

Sorry, i dont know how to test if it is effective. T_T

kashif · 2025-05-16T14:46:01Z

no problem, I can do it

HuggingFaceDocBuilderDev · 2025-05-27T22:20:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…/3458

qgallouedec

Thanks, LGTM!

LD-DPO support

09fcf9f

add description of LD-DPO

11b575a

kashif reviewed May 16, 2025

View reviewed changes

docs/source/dpo_trainer.md Outdated Show resolved Hide resolved

Update docs/source/dpo_trainer.md

f134474

kashif and others added 4 commits May 19, 2025 12:39

Merge branch 'main' into ld-dpo

eb8ba9a

Merge branch 'main' into ld-dpo

85c4541

fix the logps computation of LD-DPO

d375a06

Merge branch 'main' into ld-dpo

cdc377e

qgallouedec added 5 commits May 27, 2025 22:23

refine doc

2e36155

fix latex

bd7b8aa

add a test

4760d67

nits

a180903

Merge branch 'ld-dpo' of https://github.com/air-hl/trl into pr/AIR-hl…

00383e7

…/3458

qgallouedec changed the title ~~LD-DPO support~~ 🤧 LD-DPO support May 27, 2025

qgallouedec approved these changes May 27, 2025

View reviewed changes

qgallouedec merged commit 4e7f0a5 into huggingface:main May 27, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤧 LD-DPO support #3458

🤧 LD-DPO support #3458

Uh oh!

AIR-hl commented May 16, 2025 •

edited

Loading

Uh oh!

kashif commented May 16, 2025

Uh oh!

AIR-hl commented May 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

kashif commented May 16, 2025

Uh oh!

AIR-hl commented May 16, 2025

Uh oh!

kashif commented May 16, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 27, 2025

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Uh oh!

🤧 LD-DPO support #3458

🤧 LD-DPO support #3458

Uh oh!

Conversation

AIR-hl commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

kashif commented May 16, 2025

Uh oh!

AIR-hl commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kashif commented May 16, 2025

Uh oh!

AIR-hl commented May 16, 2025

Uh oh!

kashif commented May 16, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 27, 2025

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AIR-hl commented May 16, 2025 •

edited

Loading

AIR-hl commented May 16, 2025 •

edited

Loading