-
Notifications
You must be signed in to change notification settings - Fork 177
Update Axolotl Examples #2502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Axolotl Examples #2502
Conversation
Updated Axolotl Nvidia Example with Llama 4 Scout Update AMD axolotl example for dependency error
0c9cb3b
to
46c3a47
Compare
# Using RunPod's ROCm Docker image | ||
image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04 | ||
# Required environment variables | ||
env: | ||
- HF_TOKEN | ||
- WANDB_API_KEY | ||
- WANDB_PROJECT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is WANDB_API_KEY
not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we need to set WANDB_PROJECT and WANDB_NAME.
The difference is that in our current master it is set in the config file and in this PR, we are sending it as an argument. When we send it as argument, we do not need to include the config.yaml in our repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, then at least let's hardcode the value of WANDB_NAME
, e.g. to axolotl-amd-llama31-train
. If the user wants, they would change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this is another user case when we could set it to $DSTACK_RUN_NAME
@@ -177,6 +182,8 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by | |||
- cd axolotl | |||
- git checkout d4f6c65 | |||
- pip install -e . | |||
- pip uninstall pynvml -y | |||
- pip install pynvml==11.5.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a note or at least a comment on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will add.
@@ -177,6 +182,8 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by | |||
- cd axolotl | |||
- git checkout d4f6c65 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this particular revision??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then a comment is needed I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will update it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please feel free to merge
Updated Axolotl Nvidia Example with Llama 4 Scout
Resolve module not found error for AMD
Error
ModuleNotFoundError: No module named 'pynvml.nvml'; 'pynvml' is not a package
Solution
Installed previous release of pynvml
pip install pynvml==11.5.3