-
Notifications
You must be signed in to change notification settings - Fork 419
Use makeRequestOptions
to generate inference snippets
#1273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
makeRequestOptional
to generate inference snippetsmakeRequestOptions
to generate inference snippets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed the generated snippets, all good!
This PR does nothing else than updating `packages/tasks-gen/scripts/generate-snippets-fixtures.ts` which is an internal script used to test the inference snippets. Goal of this PR is to store generated snippet under a new file structure like this: ``` ./snippets-fixtures/automatic-speech-recognition/python/huggingface_hub/1.hf-inference.py ``` instead of ``` ./snippets-fixtures/automatic-speech-recognition/1.huggingface_hub.hf-inference.py ``` In practice the previous file naming was annoying as it meant that adding a new snippet in a client type could lead to renaming another file (due to the `0.`, `1.`, ... prefixes). --- Typically in #1273 it makes the PR much bigger by e.g. deleting [`1.openai.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-4759b74a67cc4caa7b2d273d7c2a8015ba062a19a8fad5cb2e227ca935dcb749) and creating [`2.openai.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-522e7173f8dd851189bb9b7ff311f4ee78ca65a3994caae803ff4fda5fe59733) just because a new [`1.requests.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-c8c5536f5af1631e8f1802155b66b0a23a4316eaaf5fcfce1a036da490acaa22) has been added. Separating files by language + client avoid these unnecessary problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe adding unit tests to check generated snippets look like expected woudl be nice
Still need to update the E2E tests probably, with |
This is what all these files are doing: https://github.com/huggingface/huggingface.js/pull/1273/files#diff-167c61699d82d2eb043e51674cba555d467fb72f0bf52d7ca30eef6da4f66732. Tests is here: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks-gen/scripts/generate-snippets-fixtures.ts |
const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args; | ||
delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI you can use this syntax
const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args; | |
delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication | |
const { accessToken, endpointUrl, provider: maybeProvider, model, ...remainingArgs } = args; | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I remember why I did that in the first place. If I deconstruct model
and don't use it, the linter complains with 'model' is assigned a value but never used
.
@SBrandeis any way to get rid of this? (otherwise I can go back to using delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!!
I have a few minor comments, but overall it looks good
if ( | ||
model.pipeline_tag && | ||
["text-generation", "image-text-to-text"].includes(model.pipeline_tag) && | ||
model.tags.includes("conversational") | ||
) { | ||
templateName = opts?.streaming ? "conversationalStream" : "conversational"; | ||
inputPreparationFn = prepareConversationalInput; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess another solution would be to have the caller pass a isConversational: boolean
flag or similar - wdyt?
...es/tasks-gen/snippets-fixtures/document-question-answering/python/requests/0.hf-inference.py
Outdated
Show resolved
Hide resolved
function formatBody(obj: object, format: "python" | "json" | "js" | "curl"): string { | ||
if (format === "python") { | ||
return Object.entries(obj) | ||
.map(([key, value]) => { | ||
const formattedValue = JSON.stringify(value, null, 4).replace(/"/g, '"'); | ||
return `${key}=${formattedValue},`; | ||
}) | ||
.join("\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh i was wondering if we shouldn't use prettier or ruff to format snippets at runtime... not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so as well but not fully convinced. I feel that for some snippets it would condense more the code leading to slightly less readable snippets compared to what we have (maybe neglectable side effect?).
Anyway, maybe tooling for another day 😄
Co-authored-by: Simon Brandeis <[email protected]>
Co-authored-by: Simon Brandeis <[email protected]>
PR built on top of #1273. This is supposed to be the last PR refactoring inference snippets :hear_no_evil: `python.ts`, `curl.ts` and `js.ts` have been merged into a single `getInferenceSnippets.ts` which handles snippet generations for all languages and all providers at once. Here is how to use it: ```ts import { snippets } from "@huggingface/inference"; const generatedSnippets = snippets.getInferenceSnippets(model, "api_token", provider, providerModelId, opts); ``` it returns a list `InferenceSnippet[]` defined as ```ts export interface InferenceSnippet { language: InferenceSnippetLanguage; // e.g. `python`, `curl`, `js` client: string; // e.g. `huggingface_hub`, `openai`, `fetch`, etc. content: string; } ``` --- ### How to review? It's hard to track all atomic changes made to the inference snippets but the best way IMO to review this PR is to check the generated snippets in the tests. Many inconsistencies in the URLs, sent parameters and indentation have been fixed. --- ### What's next? - [x] get #1273 approved - [ ] merge this one (#1291) into #1273 - [ ] merge #1273 - [ ] open PR in moon-landing to adapt to new interface (basically use `snippets.getInferenceSnippets` instead of `python.getPythonSnippets`, etc) - [ ] open PR to fix hub-docs automatic generation - [ ] fully ready for Inference Providers documentation! --------- Co-authored-by: Simon Brandeis <[email protected]>
Merging this PR to move forward with https://github.com/huggingface-internal/moon-landing/pull/13013 |
The broader goal of this PR is to use
makeRequestOptions
from JS InferenceClient in order to get all the implementation details (correct URL, correct authorization header, correct payload, etc.). JS InferenceClient is supposed to be the ground truth in this case.In practice:
makeUrl
when chatCompletion + image-text-to-text (review here + other providers)openai
python snippet (e.g. here, here)requests
snippet (here)Technically, this PR:
makeRequestOptions
in two parts: the async part that does the model ID resolution (depending on task+provider) and the sync part which generates the url, headers, body, etc. For snippets we only need the second part which is a sync call. => new (internal) methodmakeRequestOptionsFromResolvedModel
snippetGenerator
/chat/completions
endpoint whenchatCompletion
is enabledtext-generation
=> now we will also use /chat/completion on "image-text-to-text" models./packages/inference/package.json
to allow dev mode. Now runningpnpm run dev
in@inference
makes it much easier to work with@tasks-gen
(no need to rebuild each time I make a change)EDIT:
there is definitely a breaking change in how I handle the=> fixed.makeRequestOptions
split (hence the broken CI). Will fix this.