-
Notifications
You must be signed in to change notification settings - Fork 16
Initial tests for files and fine-tuning #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
ef10e5f
Initial tests for files and finetuning
orangetin 30cd1ae
Add dependency
orangetin 572cba6
Add assertion for API key pre-test
orangetin 4d6fc3b
Add tests readme
orangetin fc44ba2
Fix linting
orangetin 9516764
Add dependency to workflow
orangetin e4e0103
Add warning about test job charges and use upload new dataset for fin…
orangetin 9cb17f2
Set url and save_path as parameters
orangetin fcdd833
Add upload_file function
orangetin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# How to run tests | ||
> 🚧 Warning: test_finetune.py can take a while. Please have at least one prior successful finetuning run in your account for successful results. | ||
|
||
> 🚧 Please have enough space on disk to download your lastest successful fine-tuned model's weights into the `tests` directory of this repo. All downloaded files will be deleted after successful test runs but may not be deleted if tests fail. | ||
|
||
> 🚧 Warning: This test will start 2 fine-tune jobs on small datasets from your account. You WILL be charged for the amount of one job on a 7B model. The second job will be cancelled soon after creation so you will likely not be charged for it. | ||
|
||
1. Clone the repo locally | ||
```bash | ||
git clone https://github.com/togethercomputer/together.git | ||
``` | ||
2. Change directory | ||
```bash | ||
cd together | ||
``` | ||
3. [Optional] Checkout the commit you'd like to test | ||
```bash | ||
git checkout COMMIT_HASH | ||
``` | ||
4. Install together package and dependencies | ||
```bash | ||
pip install . && pip install .['tests'] | ||
``` | ||
5. Change directory into `tests` | ||
```bash | ||
cd tests | ||
``` | ||
6. Export API key | ||
```bash | ||
export TOGETHER_API_KEY=<API_KEY> | ||
``` | ||
7. Run pytest | ||
```bash | ||
pytest | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
import os | ||
from typing import Any, List | ||
|
||
import pytest | ||
import requests | ||
|
||
import together | ||
from together.utils import extract_time | ||
|
||
|
||
def test_upload() -> None: | ||
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_joke_explanations.jsonl" | ||
save_path = "unified_joke_explanations.jsonl" | ||
download_response = requests.get(url) | ||
|
||
assert download_response.status_code == 200 | ||
|
||
with open(save_path, "wb") as file: | ||
file.write(download_response.content) | ||
|
||
# upload file | ||
response = together.Files.upload(save_path) | ||
|
||
assert isinstance(response, dict) | ||
assert response["filename"] == os.path.basename(save_path) | ||
assert response["object"] == "file" | ||
|
||
os.remove(save_path) | ||
|
||
|
||
def test_list() -> None: | ||
response = together.Files.list() | ||
assert isinstance(response, dict) | ||
assert isinstance(response["data"], list) | ||
|
||
|
||
def test_retrieve() -> None: | ||
# extract file id | ||
files: List[Any] | ||
files = together.Files.list()["data"] | ||
files.sort(key=extract_time) | ||
file_id = str(files[-1]["id"]) | ||
|
||
response = together.Files.retrieve(file_id) | ||
assert isinstance(response, dict) | ||
assert isinstance(response["filename"], str) | ||
assert isinstance(response["bytes"], int) | ||
assert isinstance(response["Processed"], bool) | ||
assert response["Processed"] is True | ||
|
||
|
||
def test_retrieve_content() -> None: | ||
# extract file id | ||
files: List[Any] | ||
files = together.Files.list()["data"] | ||
files.sort(key=extract_time) | ||
file_id = str(files[-1]["id"]) | ||
|
||
file_path = "retrieved_file.jsonl" | ||
|
||
response = together.Files.retrieve_content(file_id, file_path) | ||
print(response) | ||
assert os.path.exists(file_path) | ||
assert os.path.getsize(file_path) > 0 | ||
os.remove(file_path) | ||
|
||
|
||
def test_delete() -> None: | ||
# extract file id | ||
files: List[Any] | ||
files = together.Files.list()["data"] | ||
files.sort(key=extract_time) | ||
file_id = str(files[-1]["id"]) | ||
|
||
# delete file | ||
response = together.Files.delete(file_id) | ||
|
||
# tests | ||
assert isinstance(response, dict) | ||
assert response["id"] == file_id | ||
assert response["deleted"] == "true" | ||
|
||
|
||
if __name__ == "__main__": | ||
assert ( | ||
together.api_key | ||
), "No API key found, please run `export TOGETHER_API_KEY=<API_KEY>`" | ||
pytest.main([__file__]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,206 @@ | ||
import os | ||
import time | ||
from typing import Any, Dict, List | ||
|
||
import pytest | ||
import requests | ||
|
||
import together | ||
from together.utils import parse_timestamp | ||
|
||
|
||
MODEL = "togethercomputer/llama-2-7b" | ||
N_EPOCHS = 1 | ||
N_CHECKPOINTS = 1 | ||
BATCH_SIZE = 32 | ||
LEARNING_RATE = 0.00001 | ||
SUFFIX = "pytest" | ||
|
||
CANCEL_TIMEOUT = 60 | ||
|
||
FT_STATUSES = [ | ||
"pending", | ||
"queued", | ||
"running", | ||
"cancel_requested", | ||
"cancelled", | ||
"error", | ||
"completed", | ||
] | ||
|
||
|
||
def list_models() -> List[Any]: | ||
model_list = together.Models.list() | ||
model: Dict[str, Any] | ||
|
||
finetunable_models = [] | ||
for model in model_list: | ||
if model.get("finetuning_supported"): | ||
finetunable_models.append(model.get("name")) | ||
return finetunable_models | ||
|
||
|
||
# Download, save, and upload dataset | ||
def upload_file( | ||
url: str = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_joke_explanations.jsonl", | ||
save_path: str = "unified_joke_explanations.jsonl", | ||
) -> str: | ||
download_response = requests.get(url) | ||
|
||
assert download_response.status_code == 200 | ||
|
||
with open(save_path, "wb") as file: | ||
file.write(download_response.content) | ||
|
||
response = together.Files.upload(save_path) | ||
os.remove(save_path) | ||
|
||
assert isinstance(response, dict) | ||
file_id = str(response["id"]) | ||
return file_id | ||
|
||
|
||
def create_ft( | ||
model: str, | ||
n_epochs: int, | ||
n_checkpoints: int, | ||
batch_size: int, | ||
learning_rate: float, | ||
suffix: str, | ||
file_id: str, | ||
) -> Dict[Any, Any]: | ||
response = together.Finetune.create( | ||
training_file=file_id, | ||
model=model, | ||
n_epochs=n_epochs, | ||
n_checkpoints=n_checkpoints, | ||
batch_size=batch_size, | ||
learning_rate=learning_rate, | ||
suffix=suffix, | ||
) | ||
return response | ||
|
||
|
||
def test_create() -> None: | ||
file_id = upload_file() | ||
response = create_ft( | ||
MODEL, N_EPOCHS, N_CHECKPOINTS, BATCH_SIZE, LEARNING_RATE, SUFFIX, file_id | ||
) | ||
|
||
assert isinstance(response, dict) | ||
assert response["training_file"] == file_id | ||
assert response["model"] == MODEL | ||
assert SUFFIX in str(response["model_output_name"]) | ||
|
||
|
||
def test_list() -> None: | ||
response = together.Finetune.list() | ||
assert isinstance(response, dict) | ||
assert isinstance(response["data"], list) | ||
|
||
|
||
def test_retrieve() -> None: | ||
ft_list = together.Finetune.list()["data"] | ||
ft_list.sort(key=lambda x: parse_timestamp(x["created_at"])) | ||
ft_id = ft_list[-1]["id"] | ||
response = together.Finetune.retrieve(ft_id) | ||
|
||
assert isinstance(response, dict) | ||
assert str(response["training_file"]).startswith("file-") | ||
assert str(response["id"]).startswith("ft-") | ||
|
||
|
||
def test_list_events() -> None: | ||
ft_list = together.Finetune.list()["data"] | ||
ft_list.sort(key=lambda x: parse_timestamp(x["created_at"])) | ||
ft_id = ft_list[-1]["id"] | ||
response = together.Finetune.list_events(ft_id) | ||
|
||
assert isinstance(response, dict) | ||
assert isinstance(response["data"], list) | ||
|
||
|
||
def test_status() -> None: | ||
ft_list = together.Finetune.list()["data"] | ||
ft_list.sort(key=lambda x: parse_timestamp(x["created_at"])) | ||
ft_id = ft_list[-1]["id"] | ||
response = together.Finetune.get_job_status(ft_id) | ||
|
||
assert isinstance(response, str) | ||
assert response in FT_STATUSES | ||
|
||
|
||
def test_download() -> None: | ||
ft_list = together.Finetune.list()["data"] | ||
ft_list.sort(key=lambda x: parse_timestamp(x["created_at"])) | ||
ft_list.reverse() | ||
|
||
ft_id = None | ||
for item in ft_list: | ||
id = item["id"] | ||
if together.Finetune.get_job_status(id) == "completed": | ||
ft_id = id | ||
break | ||
|
||
if ft_id is None: | ||
# no models available to download | ||
assert False | ||
|
||
output_file = together.Finetune.download(ft_id) | ||
|
||
assert os.path.exists(output_file) | ||
assert os.path.getsize(output_file) > 0 | ||
|
||
os.remove(output_file) | ||
|
||
|
||
def test_cancel() -> None: | ||
cancelled = False | ||
file_id = upload_file() | ||
response, file_id = create_ft( | ||
MODEL, N_EPOCHS, N_CHECKPOINTS, BATCH_SIZE, LEARNING_RATE, SUFFIX, file_id | ||
) | ||
ft_id = response["id"] | ||
response = together.Finetune.cancel(ft_id) | ||
|
||
# loop to check if job was cancelled | ||
start = time.time() | ||
while time.time() - start < CANCEL_TIMEOUT: | ||
status = together.Finetune.get_job_status(ft_id) | ||
if status == "cancel_requested": | ||
cancelled = True | ||
break | ||
time.sleep(1) | ||
|
||
assert cancelled | ||
|
||
# delete file after cancelling | ||
together.Files.delete(file_id) | ||
|
||
|
||
def test_checkpoints() -> None: | ||
ft_list = together.Finetune.list()["data"] | ||
ft_list.sort(key=lambda x: parse_timestamp(x["created_at"])) | ||
ft_list.reverse() | ||
|
||
ft_id = None | ||
for item in ft_list: | ||
id = item["id"] | ||
if together.Finetune.get_job_status(id) == "completed": | ||
ft_id = id | ||
break | ||
|
||
if ft_id is None: | ||
# no models available to download | ||
assert False | ||
|
||
response = together.Finetune.get_checkpoints(ft_id) | ||
|
||
assert isinstance(response, list) | ||
|
||
|
||
if __name__ == "__main__": | ||
assert ( | ||
together.api_key | ||
), "No API key found, please run `export TOGETHER_API_KEY=<API_KEY>`" | ||
pytest.main([__file__]) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.