Skip to content

Reintroduce generate method for PPOTrainer #3374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

CloseChoice
Copy link
Contributor

@CloseChoice CloseChoice commented Apr 27, 2025

What does this PR do?

In the release v0.12.0 the generate and the _generate_batched methods were removed from the PPOTrainer. As there is feedback from the community (see the issues #3250 and #3270) (including a member's comment) this PR reintroduces the method and tests it properly. Note that the reintroduced methods are basically copy-pasted from version v0.11.4 and then slightly adapted to the current code.

Supports #3250 (not strictly closing since in the reproducible example the .step method is used as well, which this PR does not reintroduce).

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@CloseChoice CloseChoice changed the title Reintroduce generate method for Ppotrainer Reintroduce generate method for PPOTrainer Apr 27, 2025
@CloseChoice
Copy link
Contributor Author

CloseChoice commented Apr 27, 2025

Just realized that there is the generate_completions method and the batch_generation function already. I would assume that reusing this functionality should be the target of this PR, not duplicating functionality as it is done right now. Though right now this is not possible since the generate_completions method does not return anything. Would it be suitable to add a return to the generate_completions?

@qgallouedec
Copy link
Member

What do you think is the best?

@CloseChoice
Copy link
Contributor Author

CloseChoice commented May 2, 2025

What do you think is the best?

I would prefer reusing functionality if this does not add much complexity and doesn't break existing tests. I'll prepare an update to the generate_completions function and update this PR, will ping you once that is ready for a first review.

@CloseChoice CloseChoice force-pushed the ppotrainer_generate branch from 0bb2d8b to 4d61397 Compare May 3, 2025 08:10
@CloseChoice
Copy link
Contributor Author

@qgallouedec I updated the functionality and reused the generate method that we have already. I think this is way cleaner and simpler. My current implementation of generate does not match the one in 0.11.4 exactly, since I wanted to keep it simple. Let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants