Skip to content

Support for TRL / GRPO training? #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
awsaf49 opened this issue May 6, 2025 · 2 comments
Open

Support for TRL / GRPO training? #10

awsaf49 opened this issue May 6, 2025 · 2 comments

Comments

@awsaf49
Copy link

awsaf49 commented May 6, 2025

I think it would be nice to have RL based training here. I would be happy to work on the PR on this.

@lusxvr
Copy link
Member

lusxvr commented May 6, 2025

We always welcome contributions! Feel free to open a PR :)

@DebjyotiRay
Copy link

Same here as well. I was working on implementing TRL.
However, how do you envision collecting or creating preference data for vision-language tasks?
Like any advise on which among these? - curate a dataset of preferred vs. rejected completions for VQA or generate synthetic data by comparing to ground truth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants