Skip to content

Update Axolotl Examples #2502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 17, 2025
Merged

Update Axolotl Examples #2502

merged 2 commits into from
Apr 17, 2025

Conversation

Bihan
Copy link
Collaborator

@Bihan Bihan commented Apr 11, 2025

  1. Updated Axolotl Nvidia Example with Llama 4 Scout

  2. Resolve module not found error for AMD
    Error
    ModuleNotFoundError: No module named 'pynvml.nvml'; 'pynvml' is not a package
    Solution
    Installed previous release of pynvml
    pip install pynvml==11.5.3

Updated Axolotl Nvidia Example with Llama 4 Scout

Update AMD axolotl example for dependency error
@Bihan Bihan force-pushed the update_axolotl_example branch from 0c9cb3b to 46c3a47 Compare April 11, 2025 10:18
@Bihan Bihan requested a review from peterschmidt85 April 16, 2025 08:41
# Using RunPod's ROCm Docker image
image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
# Required environment variables
env:
- HF_TOKEN
- WANDB_API_KEY
- WANDB_PROJECT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is WANDB_API_KEY not enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we need to set WANDB_PROJECT and WANDB_NAME.

The difference is that in our current master it is set in the config file and in this PR, we are sending it as an argument. When we send it as argument, we do not need to include the config.yaml in our repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then at least let's hardcode the value of WANDB_NAME, e.g. to axolotl-amd-llama31-train. If the user wants, they would change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this is another user case when we could set it to $DSTACK_RUN_NAME

@@ -177,6 +182,8 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
- cd axolotl
- git checkout d4f6c65
- pip install -e .
- pip uninstall pynvml -y
- pip install pynvml==11.5.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note or at least a comment on it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will add.

@@ -177,6 +182,8 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
- cd axolotl
- git checkout d4f6c65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this particular revision??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xformers is incompatible with ROCm. Axolotl suggests applying workarounds as suggested in this link

In the revision d4f6c65, workaround is implemented. This is how ROCm builds Axolotl image. link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then a comment is needed I suppose

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will update it accordingly.

@Bihan Bihan requested a review from peterschmidt85 April 16, 2025 16:34
Copy link
Contributor

@peterschmidt85 peterschmidt85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to merge

@Bihan Bihan merged commit 99a88d3 into dstackai:master Apr 17, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants