inclusionAI/AReaL: Distributed RL System for LLM Reasoning, an open-source and efficient reinforcement learning system developed at the RL Lab, Ant Research, git
veRL: Volcano Engine Reinforcement Learning for LLM, a flexible, efficient and production-ready RL training library for large language models (LLMs), git; Awesome work using verl:
- DAPO : open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, the reproduction code is publicly available now.
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
- Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
- Process Reinforcement Through Implicit Rewards
- TinyZero: a reproduction of DeepSeek R1 Zero recipe for reasoning tasks
- RAGEN: a general-purpose reasoning agent training framework
- Logic R1: a reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
- deepscaler: iterative context scaling with GRPO
- critic-rl: Teaching Language Models to Critique via Reinforcement Learning
OpenRLHF: a high-performance RLHF framework built on Ray, DeepSpeed and HF Transformers, git
DeepSpeed: empowers ChatGPT-like model training with a single click, offering 15x speedup over SOTA RLHF systems with unprecedented cost reduction at all scales,blog, git
TRL - Transformer Reinforcement Learning, git
Kimi k1.5: Scaling Reinforcement Learning with LLMs, paper, git, 20250120
NVIDIA NeMo-Aligner: Scalable toolkit for efficient model alignment, git
RLHFlow: Open-Source Code for RLHF Workflow, git, paper; Code for Reward Modeling: git; Code for Online RLHF: git; Online-DPO-R1: Unlocking Effective Reasoning Without the PPO Overhead, git
SGLang: Efficient Execution of Structured Language Model Programs: blog, paper, git.
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs, blog, git
Efficient Memory Management for Large Language Model Serving with PagedAttention, paper
Megatron-LM & Megatron-Core: Ongoing research training transformer models at scale, paper, git
Ray: Ray v2 Architecture doc, latest doc, git
DeepSeekMoE