-
Notifications
You must be signed in to change notification settings - Fork 61
201 ml streaming endpoint #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
1527b76
bump project deps
grillazz 79349ff
bump project dev deps
grillazz f261fb3
add llm service
grillazz c8ff69e
refactor llm service
grillazz 849f02c
add chat endpoint
grillazz 6fd874c
add chat testing client
grillazz 61ba8cc
refactor chat testing client
grillazz 6f2db27
format code
grillazz b5fcd04
connect to local ollama
grillazz e215876
add README.md and test
grillazz 2f484d6
lint and format
grillazz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
from typing import Annotated | ||
|
||
from fastapi import APIRouter, Depends, Form | ||
from fastapi.responses import StreamingResponse | ||
|
||
from app.services.llm import get_llm_service | ||
from app.utils.logging import AppLogger | ||
|
||
logger = AppLogger().get_logger() | ||
|
||
router = APIRouter() | ||
|
||
|
||
@router.post("/chat/") | ||
async def chat(prompt: Annotated[str, Form()], llm_service=Depends(get_llm_service)): | ||
return StreamingResponse(llm_service.stream_chat(prompt), media_type="text/plain") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
from collections.abc import AsyncGenerator | ||
|
||
import httpx | ||
import orjson | ||
|
||
|
||
class StreamLLMService: | ||
def __init__(self, base_url: str = "http://localhost:11434/v1"): | ||
self.base_url = base_url | ||
self.model = "llama3.2" | ||
|
||
async def stream_chat(self, prompt: str) -> AsyncGenerator[bytes]: | ||
"""Stream chat completion responses from LLM.""" | ||
# Send the user a message first | ||
user_msg = { | ||
"role": "user", | ||
"content": prompt, | ||
} | ||
yield orjson.dumps(user_msg) + b"\n" | ||
|
||
# Open client as context manager and stream responses | ||
async with httpx.AsyncClient(base_url=self.base_url) as client: | ||
async with client.stream( | ||
"POST", | ||
"/chat/completions", | ||
json={ | ||
"model": self.model, | ||
"messages": [{"role": "user", "content": prompt}], | ||
"stream": True, | ||
}, | ||
timeout=60.0, | ||
) as response: | ||
async for line in response.aiter_lines(): | ||
if line.startswith("data: ") and line != "data: [DONE]": | ||
try: | ||
json_line = line[6:] # Remove "data: " prefix | ||
data = orjson.loads(json_line) | ||
content = ( | ||
data.get("choices", [{}])[0] | ||
.get("delta", {}) | ||
.get("content", "") | ||
) | ||
if content: | ||
model_msg = {"role": "model", "content": content} | ||
yield orjson.dumps(model_msg) + b"\n" | ||
except Exception: | ||
pass | ||
|
||
|
||
# FastAPI dependency | ||
def get_llm_service(base_url: str | None = None) -> StreamLLMService: | ||
return StreamLLMService(base_url=base_url or "http://localhost:11434/v1") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,32 @@ | ||||||
import anyio | ||||||
import httpx | ||||||
import orjson | ||||||
|
||||||
|
||||||
async def chat_with_endpoint(): | ||||||
async with httpx.AsyncClient() as client: | ||||||
while True: | ||||||
# Get user input | ||||||
prompt = input("\nYou: ") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using the synchronous input() call inside an async function may block the event loop. Consider using an asynchronous input strategy or executing the blocking call in a separate thread to avoid potential performance issues.
Suggested change
Copilot uses AI. Check for mistakes. Positive FeedbackNegative Feedback |
||||||
if prompt.lower() == "exit": | ||||||
break | ||||||
|
||||||
# Send request to the API | ||||||
print("\nModel: ", end="", flush=True) | ||||||
async with client.stream( | ||||||
"POST", | ||||||
"http://0.0.0.0:8080/v1/ml/chat/", | ||||||
data={"prompt": prompt}, | ||||||
timeout=60, | ||||||
) as response: | ||||||
async for chunk in response.aiter_lines(): | ||||||
if chunk: | ||||||
try: | ||||||
data = orjson.loads(chunk) | ||||||
print(data["content"], end="", flush=True) | ||||||
except Exception as e: | ||||||
print(f"\nError parsing chunk: {e}") | ||||||
|
||||||
|
||||||
if __name__ == "__main__": | ||||||
anyio.run(chat_with_endpoint) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid silently passing exceptions using a bare 'except Exception:' block. Consider logging the error details or handling the exception explicitly to aid in debugging.
Copilot uses AI. Check for mistakes.