Support for TRL / GRPO training? #10

awsaf49 · 2025-05-06T18:13:45Z

I think it would be nice to have RL based training here. I would be happy to work on the PR on this.

lusxvr · 2025-05-06T22:23:34Z

We always welcome contributions! Feel free to open a PR :)

DebjyotiRay · 2025-05-13T11:44:00Z

Same here as well. I was working on implementing TRL.
However, how do you envision collecting or creating preference data for vision-language tasks?
Like any advise on which among these? - curate a dataset of preferred vs. rejected completions for VQA or generate synthetic data by comparing to ground truth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for TRL / GRPO training? #10

Support for TRL / GRPO training? #10

awsaf49 commented May 6, 2025

lusxvr commented May 6, 2025

Uh oh!

DebjyotiRay commented May 13, 2025

Uh oh!

Support for TRL / GRPO training? #10

Support for TRL / GRPO training? #10

Comments

awsaf49 commented May 6, 2025

lusxvr commented May 6, 2025

Uh oh!

DebjyotiRay commented May 13, 2025

Uh oh!