Releases: modelscope/ms-swift
Releases · modelscope/ms-swift
Patch release v3.6.1
Full Changelog: v3.6.0...v3.6.1
v3.6.0
中文版
新特性
- Megatron-SWIFT:
a. 支持更多的 MoE 模型结构,包括:DeepseekV3ForCausalLM、Dots1ForCausalLM 和 Ernie4_5_MoeForCausalLM。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/moe
b. 支持更多的 Dense 模型结构,包括:MiMoForCausalLM、InternLM3ForCausalLM 和 Ernie4_5_ForCausalLM。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/dense
c. 支持 DPO 训练。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf/dpo
d. 支持 FP8 训练。
e. 支持更多 rope scaling 类型,包括:default、linear、yarn、dynamic、longrope、llama3 等。
f.--test_convert_precision
参数优化,方便测试 mcore 与 huggingface 模型权重转换精度。 - GRPO:
a. GRPO 多轮训练重构,支持使用 AsyncEngine 加速多轮推理,参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/%E5%A4%9A%E8%BD%AE%E8%AE%AD%E7%BB%83.html
b. offload_model 参数额外对参考模型进行卸载。
c. 优化 sleep_level 和 offload_model 参数下的显存管理。
d. reward_funcs 增加了 trainer_state 入参,方便获取当前训练步数和总步数。 - 训练:
a. 支持 reranker 训练,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker
b. CPT/SFT/DPO/GRPO 纯文本大模型训练支持 ring-attention 切分序列长度,降低显存占用。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/long_text/ring_attention
c. channel loss 在CPT/SFT训练时,兼容 padding_free 与 packing。 感谢招商银行技术团队的贡献。
d. remove_unused_columns 参数优化。设置为 False,则将额外数据集传递至 Trainer 内,方便自定义损失函数。
e.split_dataset_ratio
参数默认值从0.01修改为0,默认不再进行验证集切分,需要手动设置--split_dataset_ratio
或者--val_dataset
。
f. 多模态模型 packing/padding_free 损失对齐问题修复。详见此PR:#4838
g. swanlab 支持训练完成后的飞书通知回调。 - RLHF:
a. 纯文本/多模态模型支持 GKD 训练,部分场景下支持 padding_free 和 packing,训练脚本如下:
i. 大模型:https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh
ii. 多模态大模型:https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh
b. reward model 训练支持 margin 参数支持,参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90.html#rm - 全链路:
a. 支持使用 SGLang 推理引擎对 ms-swift 推理/部署/评测/ui模块进行加速,设置--infer_backend sglang
即可。推理脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/infer/sglang
b. 支持 FP8 量化,量化脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/fp8.sh - Web-UI:
a. 支持 SFT/RLHF/GRPO 在不同 Tab 页面训练,支持保存训练命令行。
b. Web-UI 界面支持数据采样。
新模型
- 多模态模型:
a. ZhipuAI/GLM-4.1V-9B-Thinking系列
b. Kwai-Keye/Keye-VL-8B-Preview
c. moonshotai/Kimi-VL-A3B-Thinking-2506
d. google/gemma-3n-E2B-it系列 - 纯文本模型:
a. PaddlePaddle/ERNIE-4.5-21B-A3B-PT系列
b. rednote-hilab/dots.llm1.inst系列
c. Tencent-Hunyuan/Hunyuan-A13B-Instruct
d. MiniMax/MiniMax-M1-80k系列(推理)
e. moonshotai/Kimi-Dev-72B
f. cognitivecomputations/DeepSeek-R1-0528-AWQ
English Version
New Features
- Megatron-SWIFT:
a. Support for more MoE model architectures, including: DeepseekV3ForCausalLM, Dots1ForCausalLM, and Ernie4_5_MoeForCausalLM. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/moe
b. Support for more Dense model architectures, including: MiMoForCausalLM, InternLM3ForCausalLM, and Ernie4_5_ForCausalLM. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/dense
c. DPO training supported. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf/dpo
d. FP8 training supported.
e. More rope scaling types supported, including: default, linear, yarn, dynamic, longrope, llama3, etc.
f.--test_convert_precision
parameter optimized for easier testing of weight conversion precision between mcore and huggingface models. - GRPO:
a. GRPO multi-turn training refactored, supporting accelerated multi-turn inference with AsyncEngine. Documentation: https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/%E5%A4%9A%E8%BD%AE%E8%AE%AD%E7%BB%83.html
b. The offload_model parameter now also offloads the reference model.
c. Optimized GPU memory management under sleep_level and offload_model parameters.
d. Added trainer_state as an input parameter to reward_funcs, making it easier to obtain the current and total training steps. - Training:
a. Reranker training supported. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker
b. CPT/SFT/DPO/GRPO pure-text large model training supports ring-attention sequence length partitioning, reducing memory usage. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/long_text/ring_attention
c. Channel loss in CPT/SFT training is compatible with padding_free and packing. Thanks to the technical team at China Merchants Bank for their contribution.
d. Optimized remove_unused_columns parameter. When set to False, extra dataset columns are passed to the Trainer for custom loss functions.
e. The default value forsplit_dataset_ratio
changed from 0.01 to 0, so the validation set is not split by default. You now need to manually set--split_dataset_ratio
or--val_dataset
.
f. Fixed loss alignment issue between packing/padding_free for multimodal models. For details, see this PR: #4838
g. Swanlab now supports Feishu (Lark Suite) notification callback after training is completed. - RLHF:
a. Pure-text and multimodal models support GKD training, with some scenarios supporting padding_free and packing. Training scripts:
i. Large models: https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh
ii. Multimodal large models: https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh
b. Reward model training now supports the margin parameter. Documentation: https://swift.readthedocs.io/zh-cn/latest/Instruction/%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90.html#rm - Full Pipeline:
a. SGLang inference engine can be used to accelerate ms-swift inference/deployment/evaluation/ui modules, by setting--infer_backend sglang
. Inference script reference: https://github.com/modelscope/ms-swift/tree/main/examples/infer/sglang
b. FP8 quantization supported. Quantization script reference: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/fp8.sh - Web-UI:
a. Supports SFT/RLHF/GRPO training on different Tab pages, and saves training command lines.
b. Web-UI interface supports data sampling.
New Models
- Multimodal Models:
a. ZhipuAI/GLM-4.1V-9B-Thinking series
b. Kwai-Keye/Keye-VL-8B-Preview
c. moonshotai/Kimi-VL-A3B-Thinking-2506
d. google/gemma-3n-E2B-it series - Pure Text Models:
a. PaddlePaddle/ERNIE-4.5-21B-A3B-PT series
b. rednote-hilab/dots.llm1.inst series
c. Tencent-Hunyuan/Hunyuan-A13B-Instruct
d. MiniMax/MiniMax-M1-80k series (inference)
e. moonshotai/Kimi-Dev-72B
f. cognitivecomputations/DeepSeek-R1-0528-AWQ
What's Changed
- fix emb script and docs by @tastelikefeet in #4521
- [grpo] update doc about move_model_batches by @hjh0119 in #4523
- fix LoraModel by @Jintao-Huang in #4536
- support cognitivecomputations/DeepSeek-R1-0528-AWQ by @Jintao-Huang in #4537
- fix: handle INFONCE_HARD_NEGATIVES as integer if provided by @dlutwy in #4545
- fix qwen3 embedding saving by @tastelikefeet in #4548
- [megatron/dpo] fix megatron packing_cache & update DPOTrainer by @Jintao-Huang in #4556
- [megatron] support DPO by @Jintao-Huang in #4193
- support dots1 by @Jintao-Huang in #4560
- [grpo] support offloading reference model by @hjh0119 in #4554
- [grpo] fix the pickle data collator by @hjh0119 in #4562
- [dataset] fix toolbench (local) by @Jintao-Huang in #4563
- [Bug]Fix ulysses train steps, embedding negative sample length by @tastelikefeet in #4565
- fix args.json by @Jintao-Huang in #4566
- [model] fix ovis gradient_checkpointing vit no_grad by @Jintao-Huang in #4571
- [megatron] Fix megatron all_reduce warning by @Jintao-Huang in #4568
- [grpo] remove data collator to top-level to avoid pickle error in spawn mode by @hjh0119 in #4582
- [grpo] model weight synchronization before first turn rollout with async generation by @hjh0119 in #4584
- [megatron] support more rope_scaling & support deepseek-r1-qwen3-8b/internlm3/mimo-7b by @Jintao-Huang in #4576
- [grpo] restore num_generations check by @hjh0119 in #4590
- fix gc_kwargs by @Jintao-Huang in #4591
- Fix UI llm_train by @slin000111 in #4592
- [mirror] update swift mirror by @Jintao-Huang in #4601
- [megatron] compat megatron-core main branch by @Jintao-Huang in https://github.com/modelscope/ms-swift...
Patch release v3.5.3
Full Changelog: v3.5.2...v3.5.3
Patch release v3.5.2
Full Changelog: v3.5.1...v3.5.2
Patch release v3.5.1
Full Changelog: v3.5.0...v3.5.1
v3.5.0
中文版
新特性
- GRPO:
a. 代码重构,使用参数vllm_mode指定。参数说明详见参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id1:~:text=vllm_mode%20server%20%E5%8F%82%E6%95%B0,colocate%20mode%20%E7%94%9F%E6%95%88%E3%80%82
b. GRPO长文本优化,支持ulysses序列并行,显著降低长文本训练显存占用,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. 新增sync_ref_model参数,支持训练中同步参考模型权重。
d. 支持 liger kernel loss,使用参数 use_liger_kernel,降低显存占用。
e. External mode 支持 move_model_batches,降低zero3同步权重时的显存峰值。
f. 集成 INTELLECT-2 的 Two-Sided Clipping 算法,使用参数 delta。
g. 支持奖励函数返回 None,适用于多任务训练,参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id7
h. Internal mode 支持 vllm_server_base_url,传入外部 vLLM 服务器url。
i. 插件拓展:支持 QwenLong-L1 奖励模型插件。
j. 新增 steps_per_generation/generation_batch_size 参数,支持自定义采样批量大小。
k. Web-UI支持GRPO训练。
l. 以下参数将在 v3.6 移除:tensor_parallel_size / vllm_device / vllm_max_num_seqs / num_infer_workers。 - 训练:
a. CPT/SFT/DPO/GRPO 支持 padding free。通过将批次数据展平避免数据填充(padding),显著降低显存并加速训练。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. 多模态训练增强。支持使用 vit_lr 和 aligner_lr 参数独立控制 ViT 和 Aligner 模块的学习率。支持通过 vit_gradient_checkpointing 参数单独控制 vit 模块的 gradient checkpointing,性能基准测试参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT支持使用 channel loss 对不同 channel 数据集分别统计损失值。感谢招商银行技术团队的贡献。
d. CPT/SFT/DPO支持 use_logits_to_keep参数,降低显存占用,提升训练速度。
e. Qwen2.5-VL/Omni 支持传入图像目录进行视频训练。 - 推理部署:
a.swift infer
批处理优化,新增 write_batch_size 参数,用于控制批处理推理结果写入result_path的间隔。
b. vllm 推理引擎默认使用 V1 engine,并支持TP和DP结合的推理模式,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh - Megatron-SWIFT:
a. 非流式数据集支持通过 max_epochs 自动计算 train_iters。
b. 提供 extra_megatron_kwargs 参数,支持未写入ms-swift的megatron参数传入。
新模型
- Qwen/Qwen3-Embedding-0.6B系列,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B系列,最佳实践参考https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
- iic/QwenLong-L1-32B
- XiaomiMiMo/MiMo-7B-RL-0530、XiaomiMiMo/MiMo-VL-7B-SFT系列
- OpenBMB/MiniCPM4-0.5B系列
English Version
New Features
- GRPO:
a. Code refactored, specified via thevllm_mode
parameter. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#arguments-and-execution-script:~:text=vllm_mode%20server%20parameter,in%20colocate%20mode.
b. GRPO long-text optimization with Ulysses sequence parallelism, significantly reducing GPU memory usage during long-text training. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. Addedsync_ref_model
parameter to synchronize reference model weights during training.
d. Supports Liger Kernel Loss viause_liger_kernel
parameter, reducing GPU memory consumption.
e. External mode supportsmove_model_batches
to lower peak GPU memory during ZeRO-3 weight synchronization.
f. Integrated INTELLECT-2’s Two-Sided Clipping algorithm using thedelta
parameter.
g. Supports reward functions returning None, applicable for multi-task training. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#multi-task-training
h. Internal mode supportsvllm_server_base_url
for passing external vLLM server URLs.
i. Plugin extension: Added QwenLong-L1 reward model plugin.
j. Addedsteps_per_generation
andgeneration_batch_size
parameters for customizing sampling batch size.
k. Web-UI supports GRPO training.
l. The following parameters will be deprecated in v3.6:tensor_parallel_size
,vllm_device
,vllm_max_num_seqs
,num_infer_workers
. - Training:
a. CPT/SFT/DPO/GRPO support padding-free training. By flattening batch data to avoid padding, GPU memory usage is reduced and training speed is improved. Script: https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. Multimodal training enhancements: Supports separate learning rates for ViT and Aligner modules viavit_lr
andaligner_lr
parameters. Addedvit_gradient_checkpointing
to independently control gradient checkpointing for ViT modules. Benchmark: https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT supportchannel_loss
to separately calculate loss for different channel datasets. Thanks to the contributions from the technical team at China Merchants Bank.
d. CPT/SFT/DPO supportuse_logits_to_keep
to reduce GPU memory usage and accelerate training.
e. Qwen2.5-VL/Omni support video training by passing image directories. - Inference & Deployment:
a. Optimizedswift infer
batching with newwrite_batch_size
parameter to control inference result write intervals toresult_path
.
b. vLLM inference engine now defaults to V1 engine and supports hybrid Tensor Parallelism (TP) and Data Parallelism (DP). Script: https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh - Megatron-SWIFT:
a. Non-streaming datasets automatically calculatetrain_iters
viamax_epochs
.
b. Addedextra_megatron_kwargs
to pass unlisted Megatron parameters into ms-swift.
New Models
- Qwen/Qwen3-Embedding-0.6B series. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B series. Best practices: https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
- iic/QwenLong-L1-32B
- XiaomiMiMo/MiMo-7B-RL-0530 & XiaomiMiMo/MiMo-VL-7B-SFT series
- OpenBMB/MiniCPM4-0.5B series
What's Changed
- [grpo] code refactor by @hjh0119 in #4097
- support yarn by @tastelikefeet in #4197
- fix ppo init model by @hjh0119 in #4199
- fix ppo reward model by @hjh0119 in #4200
- [doc] remove vllm version warning in grpo by @hjh0119 in #4204
- [grpo] fix colocate + tp by @hjh0119 in #4209
- Refactor packing by @Jintao-Huang in #4207
- [grpo] set system in inputs by @hjh0119 in #4214
- fix mm packing by @Jintao-Huang in #4217
- fix packing multi_node by @Jintao-Huang in #4222
- fix get reward model by @hjh0119 in #4225
- fix val_dataset_shuffle by @Jintao-Huang in #4226
- fix task type judgement in rlhf by @hjh0119 in #4228
- fix eval extral args by @Yunnglin in #4227
- fix loss_scale by @Jintao-Huang in #4229
- update docs by @Jintao-Huang in #4235
- [rlhf] prepare_model for ref_model & reduce peak memory in dpo by @hjh0119 in #4232
- fix qwen2_5_vl VIDEO_TOTAL_PIXELS by @Jintao-Huang in #4236
- Support super long length sft by @tastelikefeet in #4237
- compat transformers 4.52 by @Jintao-Huang in #4238
- update liger_kernel docs by @Jintao-Huang in #4241
- [grpo] support synchronizing ref model by @hjh0119 in #4242
- optimize packing io by @Jintao-Huang in #4244
- fix register_post_encode_hook by @Jintao-Huang in #4247
- compat megatron-core 0.11 by @Jintao-Huang in #4250
- fix qwen2_5_omni by @Jintao-Huang in #4253
- fix readme by @Jintao-Huang in #4256
- [grpo] set v1 engine as default in external rollout by @hjh0119 in #4258
- fix ddp_timeout by @Jintao-Huang in #4259
- Add tqdm by @Jintao-Huang in #4260
- Fix is_master by @Jintao-Huang in #4262
- fix ppo zero3 by @Jintao-Huang in #4263
- test link valid by @Jintao-Huang in #4265
- update docs & fix quant by @Jintao-Huang in #4268
- [grpo] fix external mode&multi turn by @hjh0119 in #4255
- fix ulysses eval by @tastelikefeet in #4271
- support IndexedDataset shard by @Jintao-Huang in #4269
- Support vit_lr aligner_lr by @Jintao-Huang in #4273
- support padding_free CPT/SFT by @Jintao-Huang in #4274
- [grpo] fix num of reward_model > 1 by @hjh0119 in #4287
- fix n > 1 with vLLM V1 Engine by @hjh0119 in #4295
- update load_args by @Jintao-Huang in #4296
- update swift image by @Jintao-Huang in #4309
- Fix ulysses pending by @tastelikefeet in https://github...
v3.4.1.post1
Full Changelog: v3.4.1...v3.4.1.post1
v3.4.1
中文版
新特性
- 序列并行: 支持在PT/SFT/DPO阶段使用ulysses序列并行。兼容deepspeed、packing、flash_attn、streaming等训练技术。训练脚本参考这里。
- GRPO: 支持自定义奖励模型逻辑,内置了一个生成式奖励模型的例子,训练脚本参考这里。
- Megatron-SWIFT: 更新megatron-core到0.12.0;新增max_epochs参数,在epoch到达max_epochs时停止训练并保存权重;新增wandb参数记录训练日志。
- 最佳实践:新增从零开始快速训练视觉语言模型的最佳实践,参考这里。
- 外部贡献:支持GRPO使用judge0执行生成的代码;支持指定freeze/activate parameters使用正则表达式;支持对初始化模型中未初始化参数指定初始化策略。感谢招商银行技术团队的贡献。
新模型
- XiaomiMiMo/MiMo-7B-RL系列
- deepseek-ai/DeepSeek-Prover-V2-7B系列
- OpenGVLab/InternVL3-1B-Pretrained系列
English Version
New Features
- Sequence Parallelism: Supports the use of Ulysses sequence parallelism during PT/SFT/DPO stages. Compatible with training techniques such as DeepSpeed, packing, flash_attn, and streaming. Refer to the training script here.
- GRPO: Supports custom reward model logic. Includes a built-in example of a generative reward model. Refer to the training script here.
- Megatron-SWIFT: Updated megatron-core to version 0.12.0. Added the max_epochs parameter to stop training and save weights when the epoch reaches max_epochs. Added the wandb parameter to log training metrics.
- Best Practices: Added best practices for quickly training vision-language models from scratch. Refer to the guide here.
- External Contributions: Supports GRPO using judge0 for executing generated code. Allows specifying freeze/activate parameters using regular expressions. Supports defining initialization strategies for uninitialized parameters in the initial model. Thanks to the contributions from the technical team at China Merchants Bank.
New Models
- XiaomiMiMo/MiMo-7B-RL Series
- deepseek-ai/DeepSeek-Prover-V2-7B Series
- OpenGVLab/InternVL3-1B-Pretrained Series
What's Changed
- Fix grpo eval when gas > 1 by @hjh0119 in #4057
- support qwen3-moe awq by @Jintao-Huang in #4059
- Support empty think loss scale by @Jintao-Huang in #4065
- fix packing eval streaming by @Jintao-Huang in #4066
- support MiMo-7B by @Jintao-Huang in #4067
- fix padding_side left by @Jintao-Huang in #4069
- feat: add run name support by @firefighter-eric in #4072
- feat: support megatron wandb by @firefighter-eric in #4074
- update docs by @Jintao-Huang in #4078
- Support ulysses for llm/mllm,dpo/sft by @tastelikefeet in #4085
- fix enable_cache by @Jintao-Huang in #4091
- Update liger code by @tastelikefeet in #4095
- support max_epochs by @Jintao-Huang in #4102
- [megatron] Update long text shell by @Jintao-Huang in #4106
- fix requirements by @Jintao-Huang in #4108
- fix enable_cache by @Jintao-Huang in #4109
- fix packing by @Jintao-Huang in #4113
- Fix ulysses eval by @tastelikefeet in #4114
- fix omni aligner by @Jintao-Huang in #4117
- fix sequence_parallel by @Jintao-Huang in #4122
- update qwen3 more models by @Jintao-Huang in #4123
- [grpo] fix labels pop and peftmodel parameter check by @hjh0119 in #4136
- [megatron] support max_epochs by @Jintao-Huang in #4125
- grpo code reward by judge0 by @kevssim in #4140
- Feature freezing/activating parameters via regex by @lincq2000 in #4143
- Support init parameters by @lincq2000 in #4141
- fix ulysses dpo by @tastelikefeet in #4149
- Fix bugs by @Jintao-Huang in #4150
- fix init parameters by @lincq2000 in #4148
- Add sp script by @tastelikefeet in #4154
- Add more evaluation args by @Yunnglin in #4155
- update readme by @Jintao-Huang in #4157
- Support ulysses streaming by @tastelikefeet in #4160
- [megatron]Support packing & CP by @Jintao-Huang in #4163
- support internvl3 pretrain instruct by @Jintao-Huang in #4164
- [grpo] support gen rm by @hjh0119 in #4151
- [grpo] fix multi modal doc by @hjh0119 in #4124
- fix _tp_plan by @Jintao-Huang in #4167
- [doc] VL model training best practice by @hjh0119 in #4168
- fix val_dataset streaming packing by @Jintao-Huang in #4172
- fix kto by @tastelikefeet in #4180
- fix max_length by @Jintao-Huang in #4178
- fix loss_scale by @Jintao-Huang in #4183
- support deepseek_prover_v2 by @Jintao-Huang in #4184
- update docs by @Jintao-Huang in #4189
New Contributors
- @firefighter-eric made their first contribution in #4072
- @kevssim made their first contribution in #4140
- @lincq2000 made their first contribution in #4143
Full Changelog: v3.4.0...v3.4.1
v3.4.0
中文版
新特性
- 支持Qwen3/Qwen2-MoE/Qwen3-MoE的Megatron训练(CPT/SFT),在MoE模型上相比transformers实现训练速度快近10倍。Qwen3-MoE训练最佳实践参考: #4030
新模型
- Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B系列
- Qwen/Qwen2.5-Omni-3B
English Version
New Features
- Support for Megatron training (CPT/SFT) of Qwen3/Qwen2-MoE/Qwen3-MoE, with training speeds nearly 10 times faster on MoE models compared to the Transformers implementation. For best practices on Qwen3-MoE training, refer to: #4030
New Models
- Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B series
- Qwen/Qwen2.5-Omni-3B
What's Changed
- 🐛 fix: fix reward model train seq_cls by @gaohongkui in #3921
- Support vllm quantization by @tastelikefeet in #4003
- [megatron] Support Qwen3 by @Jintao-Huang in #3995
- Fix merge sentence transformers by @tastelikefeet in #4011
- Fix gte training and compatible with ds3 by @tastelikefeet in #4022
- fix truncation_strategy by @Jintao-Huang in #4025
- [Megatron] support MoE (Qwen2-Moe & Qwen3-MoE) by @Jintao-Huang in #4012
- Support Qwen3 series by @Jintao-Huang in #4029
- fix bugs by @Jintao-Huang in #4031
- fix grpo resume_from_checkpoint by @Jintao-Huang in #4035
- support qwen3_self_cognition by @Jintao-Huang in #4039
- Update readme & fix generate by @Jintao-Huang in #4041
- update wechat by @tastelikefeet in #4047
- support Qwen2.5-Omni-3B by @Jintao-Huang in #4052
- updates GRPOTrainer compatible with trl 0.17 by @hjh0119 in #3969
- fix rollout by @hjh0119 in #4055
New Contributors
- @gaohongkui made their first contribution in #3921
Full Changelog: v3.3.1...v3.4.0
v3.3.1
中文版
新特性
- Agent训练部署模块引入agent template,包括hermes, glm4_0414, llama4等10余种agent template,支持agent数据集兼容不同模型的训练切换,文档参考这里。
- GRPO训练支持调用外部vLLM server,训练与部署显存分配更灵活,训练脚本参考这里。
新模型
- OpenGVLab/InternVL3-1B系列
- moonshotai/Kimi-VL-A3B-Instruct系列
- ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414系列
English Version
New Features
- The Agent training and deployment module introduces agent templates, including more than 10 types such as hermes, glm4_0414, and llama4. These templates support switching between different models for agent dataset compatibility during training. For documentation, refer to here.
- GRPO training now supports calling an external vLLM server, allowing for more flexible allocation of GPU memory during training and deployment. For the training script, refer to here.
New Models
- OpenGVLab/InternVL3-1B series
- moonshotai/Kimi-VL-A3B-Instruct series
- ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414 series
What's Changed
- Fix sampling and rft by @tastelikefeet in #3847
- Fix incorrect retry count check in LazyLLMDataset.getitem by @IamLihua in #3845
- support internvl3 by @hjh0119 in #3842
- fix grpo filter overlong by @hjh0119 in #3844
- dapo-bug by @Evilxya in #3846
- support agent packing by @Jintao-Huang in #3853
- Fix internvl2.5/3 deepspeed packing by @Jintao-Huang in #3855
- fix multimodal target_modules by @Jintao-Huang in #3856
- Fix multimodal target modules by @Jintao-Huang in #3858
- Update FAQ by @slin000111 in #3841
- fix grpo completion length equal zero by @hjh0119 in #3857
- support val_dataset_shuffle by @Jintao-Huang in #3860
- Update swift docker by @Jintao-Huang in #3866
- fix citest & minimax link by @Jintao-Huang in #3868
- fix grpo save checkpoint by @hjh0119 in #3865
- support glm4-z1 by @hjh0119 in #3862
- add paper link by @tastelikefeet in #3886
- refactor mm target_regex (compat peft/vllm) by @Jintao-Huang in #3879
- Support kimi-vl by @Jintao-Huang in #3884
- Fix glm4 z1 by @Jintao-Huang in #3889
- fix bugs by @Jintao-Huang in #3893
- fix typealias to be compatible with Python 3.9 by @hjh0119 in #3895
- Fix ui by @tastelikefeet in #3903
- Fix fp16 bf16 by @Jintao-Huang in #3909
- add rm center_rewards_coefficient argument by @hjh0119 in #3917
- revert swift_from_pretrained by @Jintao-Huang in #3914
- fix grpo doc by @hjh0119 in #3920
- update qwen2_5_omni by @Jintao-Huang in #3908
- Support qwen3 by @Jintao-Huang in #3945
- Decouple vLLM engine and GRPOTrainer. by @hjh0119 in #3911
- Refactor Agent Template by @Jintao-Huang in #3918
- update docs by @Jintao-Huang in #3961
- fix bugs by @Jintao-Huang in #3962
- Support hermes loss_scale by @Jintao-Huang in #3963
- fix parse tools by @Jintao-Huang in #3975
- Update unsloth compatibility by @tastelikefeet in #3970
- Fix qwen2.5-omni use_audio_in_video by @Jintao-Huang in #3987
- Fix web-ui by @tastelikefeet in #3997
- fix get_toolcall & fix ci by @Jintao-Huang in #3999
- fix bugs by @Jintao-Huang in #4001
- fix seq_cls by @Jintao-Huang in #4002
New Contributors
Full Changelog: v3.3.0...v3.3.1