Skip to content

[WIP] v1.0.0 Updates #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 80 commits into from
Apr 1, 2024
Merged

[WIP] v1.0.0 Updates #71

merged 80 commits into from
Apr 1, 2024

Conversation

orangetin
Copy link
Member

@orangetin orangetin commented Jan 31, 2024

To-Do:

  • Async client across all classes
  • Abstracted engine
  • Persistant session
  • Client class
  • Structured typing
  • Chat Completions support
  • Update CLI
  • Update price estimate function
  • Update readme and examples
  • Add unit tests
  • Add integration tests
  • Add pre-commit linter & update github workflow
  • Replace requests/aiohttp with httpx
  • Allow strict pydantic typing
  • add overload for all api classes
  • Add timeout to header
  • fix CLI comments and help hints

Endpoints to support:

  • Completions
  • Chat Completions
  • Embeddings
  • Finetune
  • Files
  • Images
  • Models

Example usage:

import os
import json

api_key = os.getenv("TOGETHER_API_KEY")

import together

client = together.Together(
  api_key=api_key
)

# Chat Completions
response = client.chat.completions.create(model="togethercomputer/llama-2-7b-chat", max_tokens=10, messages=[{"role": "user", "content": "hello there"}])
print(response.choices[0].message.content)

# Completions
response = client.completions.create(model="togethercomputer/llama-2-7b", max_tokens=10, prompt="hello there")
print(response.choices[0].text)

# Embeddings
response = client.embeddings.create(model="bert-base-uncased", input=["test"])
print(response.data[0].embedding)

# Fine Tuning
response = client.fine_tuning.create(training_file="file-6e432514-18e8-407d-b36e-ba904e4d4856", model="togethercomputer/llama-2-7b")
print(json.dumps(response.model_dump(), indent=4))

# Files
response = client.files.upload("unified_joke_explanations.jsonl")
print(json.dumps(response.model_dump(), indent=4))

# Images
response = client.images.generate(prompt="space robots", model="stabilityai/stable-diffusion-xl-base-1.0", steps=10, n=4)
print(response.data[0].b64_json)

# Models
response = client.models.list()
print(response[0].id)

Updated contribution style

Setting up pre-commit for dev:

poetry install --with quality,tests
pre-commit install

Copy link

linear bot commented Jan 31, 2024

ENG-900 Support messages (chat endpoint) in together python library

Multiple customers are getting confused that using prompt through the together python library uses the raw prompt since the preferred way to do it through the REST API and OpenAI package is using messages which adds prompt formatting.

Therefore, we want to support messages so that all 3 ways of using our inference API are consistent.

More context here: https://www.notion.so/together-docs/Prompt-template-discrepancy-proposal-a557d4fb7f5d49d59a9b79480e0926b9

@orangetin orangetin changed the title [WIP] v0.3 Updates [WIP] v1.0.0 Updates Feb 22, 2024
@clam004
Copy link
Contributor

clam004 commented Mar 14, 2024

here is my async demo

import os
import time
from together import Together

TOGETHER_API_KEY = os.getenv('TOGETHER_API_KEY')

def sync_chat_completion(messages, max_tokens):
    client = Together(api_key=TOGETHER_API_KEY)
    
    start_time = time.time()
    
    for message in messages:
        response = client.chat.completions.create(
            model="togethercomputer/llama-2-7b-chat", 
            max_tokens=max_tokens, 
            messages=[{"role": "user", "content": message}]
        )
        print(response.choices[0].message.content)
    
    end_time = time.time()
    print("Synchronous total execution time:", end_time - start_time, "seconds")

async def async_chat_completion(messages, max_tokens):
    async_client = AsyncTogether(api_key=TOGETHER_API_KEY)
    
    start_time = time.time()
    
    tasks = [async_client.chat.completions.create(
                model="togethercomputer/llama-2-7b-chat", 
                max_tokens=max_tokens, 
                messages=[{"role": "user", "content": message}]
             ) for message in messages]
             
    responses = await asyncio.gather(*tasks)
    
    for response in responses:
        print(response.choices[0].message.content)
    
    end_time = time.time()
    print("Asynchronous total execution time:", end_time - start_time, "seconds")

in jupyter notebook

messages = ["hi there what is the meaning of life?", "What country is Paris in?"]
sync_chat_completion(messages, 32)
await async_chat_completion(messages, 32)

otherwise

messages = ["hi there what is the meaning of life?", "What country is Paris in?"]
sync_chat_completion(messages, 32)
asyncio.run(async_chat_completion(messages, 32))

expected output

  The meaning of life is a question that has puzzled philosophers, theologians, and scientists for centuries. There are many different perspectives
  Paris is located in France. It is the capital and largest city of France, situated in the northern central part of the country.
Synchronous total execution time: 0.7738921642303467 seconds
  The meaning of life is a question that has puzzled philosophers, theologians, and scientists for centuries. There are many different perspectives
  Paris is located in France. It is the capital and largest city of France, situated in the northern central part of the country.
Asynchronous total execution time: 0.4429478645324707 seconds

@orangetin orangetin marked this pull request as ready for review March 19, 2024 17:26
@orangetin orangetin requested a review from Nutlope March 19, 2024 17:26
@orangetin orangetin changed the title [WIP] v1.0.0 Updates [WIP] v0.3.0 Updates Mar 23, 2024
Copy link
Contributor

@Nutlope Nutlope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Abhy, amazing work! Feel free to merge, then I can make a new PR to update the README (and update our actual docs), then we can open source this repo + announce.

@orangetin orangetin changed the title [WIP] v0.3.0 Updates [WIP] v1.0.0 Updates Apr 1, 2024
@orangetin orangetin merged commit fc781ee into main Apr 1, 2024
@orangetin orangetin deleted the orangetin/eng-900 branch April 5, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants