MT-MegatronLM

MT-MegatronLM is an extension to Megatron-LM. It enables Megatron-LM to perform large-scale distributed training on Moore Threads' GPUs using a list of hardware-dependent patches.

Installation

You can create a directory named megatron_dev, and use the command below to clone the Megatron-LM, MT-MegatronLM to megatron_dev.

# Megatron-LM
git clone https://github.com/NVIDIA/Megatron-LM.git
pushd Megatron-LM
git checkout -b core_r0.9.0 core_r0.9.0
popd

# megatron-lm-musa-patch
git clone https://github.com/MooreThreads/MT-MegatronLM.git
pushd MT-MegatronLM
popd

## Getting started
### Llama3 

```bash
cd MT-MegatronLM/examples/llama3
bash dist_run_pretrain_megatron_llama3_musa.sh

Mixtral

cd MT-MegatronLM/examples/mixtral
bash dist_run_pretrain_megatron_llama3_musa.sh

LLAVA

cd MT-MegatronLM/examples/llava

DeepSeekV3

cd MT-MegatronLM/examples/deepseekv3

In deepseek-v2/v3, the ffn-size in first several dense layer is not the same as moe-ffn-size. So it's need to modify some codes in Megatron to support this situation while not use GroupGEMM.

Modify some codes in Megatron

Megatron-LM/megatron/core/transformer/mlp.py

add in line63:

if is_expert:
    ffn_hidden_size = self.config.moe_ffn_hidden_size

change in line83:

            self.config.ffn_hidden_size,
-->         self.config.ffn_hidden_size if not is_expert else self.config.moe_ffn_hidden_size,

Megatron-LM/megatron/core/transformer/moe/experts.py

comment line757-760

        # assert (
        #     self.config.moe_ffn_hidden_size == self.config.ffn_hidden_size
        # ), "Please use GroupedMLP or TEGroupedMLP when moe_ffn_hidden_size is \
        #         different from ffn_hidden_size"

Acknowledgement

This project modified some of training scripts from FlagScale.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cuda_patch		cuda_patch
examples		examples
musa_patch		musa_patch
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MT-MegatronLM

Installation

Mixtral

LLAVA

DeepSeekV3

Modify some codes in Megatron

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

MooreThreads/MT-MegatronLM

Folders and files

Latest commit

History

Repository files navigation

MT-MegatronLM

Installation

Mixtral

LLAVA

DeepSeekV3

Modify some codes in Megatron

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages