Skip to content

MooreThreads/MT-MegatronLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MT-MegatronLM

MT-MegatronLM is an extension to Megatron-LM. It enables Megatron-LM to perform large-scale distributed training on Moore Threads' GPUs using a list of hardware-dependent patches.

Installation

You can create a directory named megatron_dev, and use the command below to clone the Megatron-LM, MT-MegatronLM to megatron_dev.

# Megatron-LM
git clone https://github.com/NVIDIA/Megatron-LM.git
pushd Megatron-LM
git checkout -b core_r0.9.0 core_r0.9.0
popd

# megatron-lm-musa-patch
git clone https://github.com/MooreThreads/MT-MegatronLM.git
pushd MT-MegatronLM
popd

## Getting started
### Llama3 

```bash
cd MT-MegatronLM/examples/llama3
bash dist_run_pretrain_megatron_llama3_musa.sh

Mixtral

cd MT-MegatronLM/examples/mixtral
bash dist_run_pretrain_megatron_llama3_musa.sh

LLAVA

cd MT-MegatronLM/examples/llava

DeepSeekV3

cd MT-MegatronLM/examples/deepseekv3

In deepseek-v2/v3, the ffn-size in first several dense layer is not the same as moe-ffn-size. So it's need to modify some codes in Megatron to support this situation while not use GroupGEMM.

Modify some codes in Megatron

Megatron-LM/megatron/core/transformer/mlp.py

add in line63:

if is_expert:
    ffn_hidden_size = self.config.moe_ffn_hidden_size

change in line83:

            self.config.ffn_hidden_size,
-->         self.config.ffn_hidden_size if not is_expert else self.config.moe_ffn_hidden_size,

Megatron-LM/megatron/core/transformer/moe/experts.py

comment line757-760

        # assert (
        #     self.config.moe_ffn_hidden_size == self.config.ffn_hidden_size
        # ), "Please use GroupedMLP or TEGroupedMLP when moe_ffn_hidden_size is \
        #         different from ffn_hidden_size"

Acknowledgement

This project modified some of training scripts from FlagScale.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7