Skip to content

Update Axolotl Examples #2502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ hide:
</h3>

<p>
Fine-tune Llama 3 on a custom dataset using Axolotl.
Fine-tune Llama 4 on a custom dataset using Axolotl.
</p>
</a>

Expand Down
24 changes: 18 additions & 6 deletions examples/accelerators/amd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,13 +161,18 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by

```yaml
type: task
# The name is optional, if not specified, generated randomly
name: axolotl-amd-llama31-train

# Using RunPod's ROCm Docker image
image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
# Required environment variables
env:
- HF_TOKEN
- WANDB_API_KEY
- WANDB_PROJECT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is WANDB_API_KEY not enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we need to set WANDB_PROJECT and WANDB_NAME.

The difference is that in our current master it is set in the config file and in this PR, we are sending it as an argument. When we send it as argument, we do not need to include the config.yaml in our repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then at least let's hardcode the value of WANDB_NAME, e.g. to axolotl-amd-llama31-train. If the user wants, they would change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this is another user case when we could set it to $DSTACK_RUN_NAME

- WANDB_NAME=axolotl-amd-llama31-train
- HUB_MODEL_ID
# Commands of the task
commands:
- export PATH=/opt/conda/envs/py_3.10/bin:$PATH
Expand All @@ -177,6 +182,9 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
- cd axolotl
- git checkout d4f6c65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this particular revision??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xformers is incompatible with ROCm. Axolotl suggests applying workarounds as suggested in this link

In the revision d4f6c65, workaround is implemented. This is how ROCm builds Axolotl image. link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then a comment is needed I suppose

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will update it accordingly.

- pip install -e .
# Latest pynvml is not compatible with axolotl commit d4f6c65, so we need to fall back to version 11.5.3
- pip uninstall pynvml -y
- pip install pynvml==11.5.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note or at least a comment on it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will add.

- cd ..
- wget https://dstack-binaries.s3.amazonaws.com/flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl
- pip install flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl
Expand All @@ -190,18 +198,18 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
- make
- pip install .
- cd ..
- accelerate launch -m axolotl.cli.train axolotl/examples/llama-3/fft-8b.yaml

# Uncomment to leverage spot instances
#spot_policy: auto
- accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
--wandb-project "$WANDB_PROJECT"
--wandb-name "$WANDB_NAME"
--hub-model-id "$HUB_MODEL_ID"

resources:
gpu: MI300X
disk: 150GB
```
</div>

Note, to support ROCm, we need to checkout to commit `d4f6c65`. You can find the installation instruction in [rocm-blogs :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm){:target="_blank"}.
Note, to support ROCm, we need to checkout to commit `d4f6c65`. This commit eliminates the need to manually modify the Axolotl source code to make xformers compatible with ROCm, as described in the [xformers workaround :material-arrow-top-right-thin:{ .external }](https://docs.axolotl.ai/docs/amd_hpc.html#apply-xformers-workaround). This installation approach is also followed for building Axolotl ROCm docker image. [(See Dockerfile) :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm){:target="_blank"}.

> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
> You can find the tasks that build and upload the binaries
Expand All @@ -216,6 +224,10 @@ cloud resources and run the configuration.

```shell
$ HF_TOKEN=...
$ WANDB_API_KEY=...
$ WANDB_PROJECT=...
$ WANDB_NAME=axolotl-amd-llama31-train
$ HUB_MODEL_ID=...
$ dstack apply -f examples/deployment/vllm/amd/.dstack.yml
```

Expand Down
18 changes: 10 additions & 8 deletions examples/fine-tuning/axolotl/.dstack.yaml
Original file line number Diff line number Diff line change
@@ -1,23 +1,25 @@
type: task
# The name is optional, if not specified, generated randomly
name: axolotl-train
name: axolotl-nvidia-llama-scout-train

# Using the official Axolotl's Docker image
image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
image: axolotlai/axolotl:main-latest

# Required environment variables
env:
- HF_TOKEN
- WANDB_API_KEY
- WANDB_PROJECT
- WANDB_NAME=axolotl-nvidia-llama-scout-train
- HUB_MODEL_ID
# Commands of the task
commands:
- accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml
- wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
- axolotl train scout-qlora-fsdp1.yaml --wandb-project $WANDB_PROJECT --wandb-name $WANDB_NAME --hub-model-id $HUB_MODEL_ID

resources:
gpu:
# 24GB or more vRAM
memory: 24GB..
# Two or more GPU (required by FSDP)
count: 2..
# Two GPU (required by FSDP)
gpu: H100:2
# Shared memory size for inter-process communication
shm_size: 24GB
disk: 500GB..
40 changes: 22 additions & 18 deletions examples/fine-tuning/axolotl/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Axolotl

This example shows how use [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/OpenAccess-AI-Collective/axolotl){:target="_blank"}
with `dstack` to fine-tune Llama3 8B using FSDP and QLoRA.
with `dstack` to fine-tune 4-bit Quantized [Llama-4-Scout-17B-16E :material-arrow-top-right-thin:{ .external }](https://huggingface.co/axolotl-quants/Llama-4-Scout-17B-16E-Linearized-bnb-nf4-bf16){:target="_blank"} using FSDP and QLoRA.

??? info "Prerequisites"
Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
Expand All @@ -18,44 +18,45 @@ with `dstack` to fine-tune Llama3 8B using FSDP and QLoRA.

## Training configuration recipe

Axolotl reads the model, LoRA, and dataset arguments, as well as trainer configuration from a YAML file. This file can
be found at [`examples/fine-tuning/axolotl/config.yaml` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/config.yaml){:target="_blank"}.
You can modify it as needed.
Axolotl reads the model, QLoRA, and dataset arguments, as well as trainer configuration from a [`scout-qlora-fsdp1.yaml` :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-4/scout-qlora-fsdp1.yaml){:target="_blank"} file. The configuration uses 4-bit axolotl quantized version of `meta-llama/Llama-4-Scout-17B-16E`, requiring only ~43GB VRAM/GPU with 4K context length.

> Before you proceed with training, make sure to update the `hub_model_id` in [`examples/fine-tuning/axolotl/config.yaml` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/alignment-handbook/config.yaml){:target="_blank"}
> with your HuggingFace username.

## Single-node training

The easiest way to run a training script with `dstack` is by creating a task configuration file.
This file can be found at [`examples/fine-tuning/axolotl/train.dstack.yml` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/train.dstack.yml){:target="_blank"}.
This file can be found at [`examples/fine-tuning/axolotl/.dstack.yml` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/.dstack.yaml){:target="_blank"}.

<div editor-title="examples/fine-tuning/axolotl/.dstack.yml">

```yaml
type: task
name: axolotl-train
# The name is optional, if not specified, generated randomly
name: axolotl-nvidia-llama-scout-train

# Using the official Axolotl's Docker image
image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
image: axolotlai/axolotl:main-latest

# Required environment variables
env:
- HF_TOKEN
- WANDB_API_KEY
- WANDB_PROJECT
- WANDB_NAME=axolotl-nvidia-llama-scout-train
- HUB_MODEL_ID
# Commands of the task
commands:
- accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml

# Uncomment to leverage spot instances
#spot_policy: auto
- wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
- axolotl train scout-qlora-fsdp1.yaml
--wandb-project $WANDB_PROJECT
--wandb-name $WANDB_NAME
--hub-model-id $HUB_MODEL_ID

resources:
gpu:
# 24GB or more vRAM
memory: 24GB..
# Two or more GPU
count: 2..
# Two GPU (required by FSDP)
gpu: H100:2
# Shared memory size for inter-process communication
shm_size: 24GB
disk: 500GB..
```

</div>
Expand All @@ -75,6 +76,9 @@ cloud resources and run the configuration.
```shell
$ HF_TOKEN=...
$ WANDB_API_KEY=...
$ WANDB_PROJECT=...
$ WANDB_NAME=axolotl-nvidia-llama-scout-train
$ HUB_MODEL_ID=...
$ dstack apply -f examples/fine-tuning/axolotl/.dstack.yml
```

Expand Down
14 changes: 11 additions & 3 deletions examples/fine-tuning/axolotl/amd/.dstack.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
type: task
# The name is optional, if not specified, generated randomly
name: axolotl-amd-llama31-train

image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04

# Required environment variables
env:
- HF_TOKEN
- WANDB_API_KEY
- WANDB_PROJECT
- WANDB_NAME=axolotl-amd-llama31-train
- HUB_MODEL_ID
# Commands of the task
commands:
- export PATH=/opt/conda/envs/py_3.10/bin:$PATH
Expand All @@ -16,6 +18,9 @@ commands:
- cd axolotl
- git checkout d4f6c65
- pip install -e .
# Latest pynvml is not compatible with axolotl commit d4f6c65, so we need to fall back to version 11.5.3
- pip uninstall pynvml -y
- pip install pynvml==11.5.3
- cd ..
- wget https://dstack-binaries.s3.amazonaws.com/flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl
- pip install flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl
Expand All @@ -29,7 +34,10 @@ commands:
- make
- pip install .
- cd ..
- accelerate launch -m axolotl.cli.train axolotl/examples/llama-3/fft-8b.yaml
- accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
--wandb-project "$WANDB_PROJECT"
--wandb-name "$WANDB_NAME"
--hub-model-id "$HUB_MODEL_ID"

resources:
gpu: MI300X
Expand Down