-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Reintroduce generate
method for PPOTrainer
#3374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
generate
method for PPOTrainer
Just realized that there is the |
What do you think is the best? |
I would prefer reusing functionality if this does not add much complexity and doesn't break existing tests. I'll prepare an update to the |
0bb2d8b
to
4d61397
Compare
@qgallouedec I updated the functionality and reused the |
What does this PR do?
In the release v0.12.0 the
generate
and the_generate_batched
methods were removed from the PPOTrainer. As there is feedback from the community (see the issues #3250 and #3270) (including a member's comment) this PR reintroduces the method and tests it properly. Note that the reintroduced methods are basically copy-pasted from version v0.11.4 and then slightly adapted to the current code.Supports #3250 (not strictly closing since in the reproducible example the
.step
method is used as well, which this PR does not reintroduce).Before submitting
[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.