GitHub - junwuxgi/Open-Scalable-RL-Agentic-Alignment-Algorithm-and-Framework

Open Scalable RL/Agentic Alignment Framework

inclusionAI/AReaL: Distributed RL System for LLM Reasoning, an open-source and efficient reinforcement learning system developed at the RL Lab, Ant Research, git

veRL: Volcano Engine Reinforcement Learning for LLM, a flexible, efficient and production-ready RL training library for large language models (LLMs), git; Awesome work using verl:

DAPO : open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, the reproduction code is publicly available now.
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Process Reinforcement Through Implicit Rewards
TinyZero: a reproduction of DeepSeek R1 Zero recipe for reasoning tasks
RAGEN: a general-purpose reasoning agent training framework
Logic R1: a reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
deepscaler: iterative context scaling with GRPO
critic-rl: Teaching Language Models to Critique via Reinforcement Learning

OpenRLHF: a high-performance RLHF framework built on Ray, DeepSpeed and HF Transformers, git

DeepSpeed: empowers ChatGPT-like model training with a single click, offering 15x speedup over SOTA RLHF systems with unprecedented cost reduction at all scales,blog, git

TRL - Transformer Reinforcement Learning, git

Open Scalable RL/Agentic Alignment Algorithms

Kimi k1.5: Scaling Reinforcement Learning with LLMs, paper, git, 20250120

NVIDIA NeMo-Aligner: Scalable toolkit for efficient model alignment, git

RLHFlow: Open-Source Code for RLHF Workflow, git, paper; Code for Reward Modeling: git; Code for Online RLHF: git; Online-DPO-R1: Unlocking Effective Reasoning Without the PPO Overhead, git

Open Scalable RL/Agentic Alignment Related Modules

SGLang: Efficient Execution of Structured Language Model Programs: blog, paper, git.

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs, blog, git

Efficient Memory Management for Large Language Model Serving with PagedAttention, paper

Megatron-LM & Megatron-Core: Ongoing research training transformer models at scale, paper, git

Ray: Ray v2 Architecture doc, latest doc, git

DeepSeekMoE

DeepSeek-V3 Technical Report, paper, 202502v2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, paper, 202406v5
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models, paper, 202406v1

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open Scalable RL/Agentic Alignment Framework

Open Scalable RL/Agentic Alignment Algorithms

Open Scalable RL/Agentic Alignment Related Modules

About

Uh oh!

Releases

Packages

junwuxgi/Open-Scalable-RL-Agentic-Alignment-Algorithm-and-Framework

Folders and files

Latest commit

History

Repository files navigation

Open Scalable RL/Agentic Alignment Framework

Open Scalable RL/Agentic Alignment Algorithms

Open Scalable RL/Agentic Alignment Related Modules

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages