Skip to content

[GPG][new trainer] Add support to new GPG method #3472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
lerogo opened this issue May 20, 2025 · 0 comments
Open
3 tasks done

[GPG][new trainer] Add support to new GPG method #3472

lerogo opened this issue May 20, 2025 · 0 comments
Labels
✨ enhancement New feature or request

Comments

@lerogo
Copy link

lerogo commented May 20, 2025

Method description

  • short description: In this work, we revisit the traditional Policy Gradient (PG) mechanism and propose a minimalist RL approach termed Group Policy Gradient (GPG). Unlike conventional methods, GPG directly optimizes the original RL objective, thus obviating the need for surrogate loss functions. By eliminating the critic and reference models, avoiding KL divergence constraints, and addressing the advantage and gradient estimation bias, our approach significantly simplifies the training process compared to Group Relative Policy Optimization (GRPO). Our approach achieves superior performance without relying on auxiliary techniques or adjustments. Extensive experiments demonstrate that our method not only reduces computational costs but also consistently outperforms GRPO across various unimodal and multimodal tasks.
  • paper link: https://arxiv.org/abs/2504.02546

Open source status

  • The method implementation is available
  • The model weights are available
  • The training datasets are available

Provide useful links for the implementation

@github-actions github-actions bot added the ✨ enhancement New feature or request label May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant