Skip to content

Commit e238c0d

Browse files
authored
Update Llama4 Readme with Axolotl fine-tuning example (#2545)
1 parent b0e159b commit e238c0d

File tree

1 file changed

+80
-3
lines changed

1 file changed

+80
-3
lines changed

examples/llms/llama/README.md

Lines changed: 80 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -200,13 +200,90 @@ is available at `https://<run name>.<gateway domain>/`.
200200

201201
[//]: # (TODO: https://github.com/dstackai/dstack/issues/1777)
202202

203+
## Fine-tuning
204+
205+
Here's and example of FSDP and QLoRA fine-tuning of 4-bit Quantized [Llama-4-Scout-17B-16E :material-arrow-top-right-thin:{ .external }](https://huggingface.co/axolotl-quants/Llama-4-Scout-17B-16E-Linearized-bnb-nf4-bf16) on 2xH100 NVIDIA GPUs using [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/OpenAccess-AI-Collective/axolotl){:target="_blank"}
206+
207+
<div editor-title="examples/fine-tuning/axolotl/.dstack.yml">
208+
209+
```yaml
210+
type: task
211+
# The name is optional, if not specified, generated randomly
212+
name: axolotl-nvidia-llama-scout-train
213+
214+
# Using the official Axolotl's Docker image
215+
image: axolotlai/axolotl:main-latest
216+
217+
# Required environment variables
218+
env:
219+
- HF_TOKEN
220+
- WANDB_API_KEY
221+
- WANDB_PROJECT
222+
- WANDB_NAME=axolotl-nvidia-llama-scout-train
223+
- HUB_MODEL_ID
224+
# Commands of the task
225+
commands:
226+
- wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
227+
- axolotl train scout-qlora-fsdp1.yaml
228+
--wandb-project $WANDB_PROJECT
229+
--wandb-name $WANDB_NAME
230+
--hub-model-id $HUB_MODEL_ID
231+
232+
resources:
233+
# Two GPU (required by FSDP)
234+
gpu: H100:2
235+
# Shared memory size for inter-process communication
236+
shm_size: 24GB
237+
disk: 500GB..
238+
```
239+
</div>
240+
241+
The task uses Axolotl's Docker image, where Axolotl is already pre-installed.
242+
243+
### Memory requirements
244+
245+
Below are the approximate memory requirements for loading the model.
246+
This excludes memory for the model context and CUDA kernel reservations.
247+
248+
| Model | Size | Full fine-tuning | LoRA | QLoRA |
249+
|---------------|----------|--------------------|--------|--------|
250+
| `Behemoth` | **2T** | 32TB | 4.3TB | 1.3TB |
251+
| `Maverick` | **400B** | 6.5TB | 864GB | 264GB |
252+
| `Scout` | **109B** | 1.75TB | 236GB | 72GB |
253+
254+
The memory estimates assume FP16 precision for model weights, with low-rank adaptation (LoRA/QLoRA) layers comprising 1% of the total model parameters.
255+
256+
| Fine-tuning type | Calculation |
257+
|------------------|--------------------------------------------------|
258+
| Full fine-tuning | 2T × 16 bytes = 32TB |
259+
| LoRA | 2T × 2 bytes + 1% of 2T × 16 bytes = 4.3TB |
260+
| QLoRA(4-bit) | 2T × 0.5 bytes + 1% of 2T × 16 bytes = 1.3TB |
261+
262+
## Running a configuration
263+
264+
Once the configuration is ready, run `dstack apply -f <configuration file>`, and `dstack` will automatically provision the
265+
cloud resources and run the configuration.
266+
267+
<div class="termy">
268+
269+
```shell
270+
$ HF_TOKEN=...
271+
$ WANDB_API_KEY=...
272+
$ WANDB_PROJECT=...
273+
$ WANDB_NAME=axolotl-nvidia-llama-scout-train
274+
$ HUB_MODEL_ID=...
275+
$ dstack apply -f examples/fine-tuning/axolotl/.dstack.yml
276+
```
277+
278+
</div>
279+
203280
## Source code
204281

205-
The source-code of this example can be found in
206-
[`examples/llms/llama` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/llama).
282+
The source-code for deployment examples can be found in
283+
[`examples/llms/llama` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/llama) and the source-code for the finetuning example can be found in [`examples/fine-tuning/axolotl` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl){:target="_blank"}.
207284

208285
## What's next?
209286

210287
1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
211288
[services](https://dstack.ai/docs/services), and [protips](https://dstack.ai/docs/protips).
212-
2. Browse [Llama 4 with SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang/blob/main/docs/references/llama4.md), [Llama 4 with vLLM :material-arrow-top-right-thin:{ .external }](https://blog.vllm.ai/2025/04/05/llama4.html) and [Llama 4 with AMD :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/llama4-day-0-support/README.html).
289+
2. Browse [Llama 4 with SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang/blob/main/docs/references/llama4.md), [Llama 4 with vLLM :material-arrow-top-right-thin:{ .external }](https://blog.vllm.ai/2025/04/05/llama4.html), [Llama 4 with AMD :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/llama4-day-0-support/README.html) and [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/OpenAccess-AI-Collective/axolotl){:target="_blank"}.

0 commit comments

Comments
 (0)