Use `makeRequestOptions` to generate inference snippets #1273

Wauplin · 2025-03-12T17:05:23Z

The broader goal of this PR is to use makeRequestOptions from JS InferenceClient in order to get all the implementation details (correct URL, correct authorization header, correct payload, etc.). JS InferenceClient is supposed to be the ground truth in this case.

In practice:

fixed makeUrl when chatCompletion + image-text-to-text (review here + other providers)
fixed wrong URL in openai python snippet (e.g. here, here)
fixed DQA requests snippet (here)

Technically, this PR:

splits makeRequestOptions in two parts: the async part that does the model ID resolution (depending on task+provider) and the sync part which generates the url, headers, body, etc. For snippets we only need the second part which is a sync call. => new (internal) method makeRequestOptionsFromResolvedModel
moves most of the logic inside snippetGenerator
- logic is: get inputs => make request options => prepare template data => iterate over clients => generate snippets
- Next: now that the logic is unified, adapting cURL and JS to use the same logic should be fairly easy (e.g. "just" need to create the jinja templates)
  - => final goal is to handle all languages/clients/providers with the same code and swap the templates
update most providers to use /chat/completions endpoint when chatCompletion is enabled
- Previously we were also checking that task is text-generation => now we will also use /chat/completion on "image-text-to-text" models
- that was mostly a bug in existing codebase => detected it thanks to the snippets
updated ./packages/inference/package.json to allow dev mode. Now running pnpm run dev in @inference makes it much easier to work with @tasks-gen (no need to rebuild each time I make a change)

EDIT: ~~there is definitely a breaking change in how I handle the makeRequestOptions split (hence the broken CI). Will fix this.~~ => fixed.

hanouticelina

reviewed the generated snippets, all good!

This PR does nothing else than updating `packages/tasks-gen/scripts/generate-snippets-fixtures.ts` which is an internal script used to test the inference snippets. Goal of this PR is to store generated snippet under a new file structure like this: ``` ./snippets-fixtures/automatic-speech-recognition/python/huggingface_hub/1.hf-inference.py ``` instead of ``` ./snippets-fixtures/automatic-speech-recognition/1.huggingface_hub.hf-inference.py ``` In practice the previous file naming was annoying as it meant that adding a new snippet in a client type could lead to renaming another file (due to the `0.`, `1.`, ... prefixes). --- Typically in #1273 it makes the PR much bigger by e.g. deleting [`1.openai.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-4759b74a67cc4caa7b2d273d7c2a8015ba062a19a8fad5cb2e227ca935dcb749) and creating [`2.openai.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-522e7173f8dd851189bb9b7ff311f4ee78ca65a3994caae803ff4fda5fe59733) just because a new [`1.requests.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-c8c5536f5af1631e8f1802155b66b0a23a4316eaaf5fcfce1a036da490acaa22) has been added. Separating files by language + client avoid these unnecessary problems.

coyotte508

maybe adding unit tests to check generated snippets look like expected woudl be nice

coyotte508 · 2025-03-17T16:48:53Z

fixed makeUrl when chatCompletion + image-text-to-text (review here + other providers)

Still need to update the E2E tests probably, with VCR_MODE="cache" pnpm --filter inference test

Wauplin · 2025-03-17T17:06:03Z

maybe adding unit tests to check generated snippets look like expected woudl be nice

This is what all these files are doing: https://github.com/huggingface/huggingface.js/pull/1273/files#diff-167c61699d82d2eb043e51674cba555d467fb72f0bf52d7ca30eef6da4f66732. Tests is here: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks-gen/scripts/generate-snippets-fixtures.ts

SBrandeis · 2025-03-18T09:28:44Z

packages/inference/src/lib/makeRequestOptions.ts

+	const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args;
+	delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication
+


FYI you can use this syntax

Suggested change

const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args;

delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication

const { accessToken, endpointUrl, provider: maybeProvider, model, ...remainingArgs } = args;

Now I remember why I did that in the first place. If I deconstruct model and don't use it, the linter complains with 'model' is assigned a value but never used.

@SBrandeis any way to get rid of this? (otherwise I can go back to using delete

SBrandeis

This is great!!
I have a few minor comments, but overall it looks good

packages/inference/src/providers/novita.ts

SBrandeis · 2025-03-18T09:33:26Z

packages/inference/src/snippets/python.ts

+		if (
+			model.pipeline_tag &&
+			["text-generation", "image-text-to-text"].includes(model.pipeline_tag) &&
+			model.tags.includes("conversational")
+		) {
+			templateName = opts?.streaming ? "conversationalStream" : "conversational";
+			inputPreparationFn = prepareConversationalInput;
+		}


I guess another solution would be to have the caller pass a isConversational: boolean flag or similar - wdyt?

packages/inference/src/snippets/python.ts

...es/tasks-gen/snippets-fixtures/document-question-answering/python/requests/0.hf-inference.py

julien-c · 2025-03-18T10:13:29Z

packages/inference/src/snippets/python.ts

+function formatBody(obj: object, format: "python" | "json" | "js" | "curl"): string {
+	if (format === "python") {
+		return Object.entries(obj)
+			.map(([key, value]) => {
+				const formattedValue = JSON.stringify(value, null, 4).replace(/"/g, '"');
+				return `${key}=${formattedValue},`;
+			})
+			.join("\n");


tbh i was wondering if we shouldn't use prettier or ruff to format snippets at runtime... not sure.

I thought so as well but not fully convinced. I feel that for some snippets it would condense more the code leading to slightly less readable snippets compared to what we have (maybe neglectable side effect?).

Anyway, maybe tooling for another day 😄

Co-authored-by: Simon Brandeis <[email protected]>

PR built on top of #1273. This is supposed to be the last PR refactoring inference snippets :hear_no_evil: `python.ts`, `curl.ts` and `js.ts` have been merged into a single `getInferenceSnippets.ts` which handles snippet generations for all languages and all providers at once. Here is how to use it: ```ts import { snippets } from "@huggingface/inference"; const generatedSnippets = snippets.getInferenceSnippets(model, "api_token", provider, providerModelId, opts); ``` it returns a list `InferenceSnippet[]` defined as ```ts export interface InferenceSnippet { language: InferenceSnippetLanguage; // e.g. `python`, `curl`, `js` client: string; // e.g. `huggingface_hub`, `openai`, `fetch`, etc. content: string; } ``` --- ### How to review? It's hard to track all atomic changes made to the inference snippets but the best way IMO to review this PR is to check the generated snippets in the tests. Many inconsistencies in the URLs, sent parameters and indentation have been fixed. --- ### What's next? - [x] get #1273 approved - [ ] merge this one (#1291) into #1273 - [ ] merge #1273 - [ ] open PR in moon-landing to adapt to new interface (basically use `snippets.getInferenceSnippets` instead of `python.getPythonSnippets`, etc) - [ ] open PR to fix hub-docs automatic generation - [ ] fully ready for Inference Providers documentation! --------- Co-authored-by: Simon Brandeis <[email protected]>

Wauplin · 2025-03-19T09:59:33Z

Merging this PR to move forward with https://github.com/huggingface-internal/moon-landing/pull/13013

Wauplin added 21 commits March 12, 2025 12:38

focus on conversational for now

a3d70c0

Split makeRequestOptions into sync and async parts

c8aae06

Make snippets depend on makeRequestOptionsFromResolvedModel

ea13266

revert back to hf_token

8ab7172

ASR

359a87e

conversational-llm-stream

b7ac932

VLMs + fix in InferenceClient

dc76e7b

DQA

a6e6f25

imageToImage

6ebe167

tabular

5287f97

text-to-aduio

bea1f97

text-to-image

0a1ab31

text-to-video

1925c61

basic

c489845

text-classification

281ce27

basic-snippet--token-classification

f1c3367

zero shot class

d776a4b

zero shot image class

bd36835

textToAduio

8c82e7a

remove conversational requests for now

2cbdcec

fix typing

8a482e0

Wauplin requested review from julien-c, hanouticelina, SBrandeis and coyotte508 as code owners March 12, 2025 17:05

Wauplin and others added 3 commits March 12, 2025 18:09

Merge branch 'main' into fix-openai-inference-snippets

20ec0b0

fix lock

ace6c41

might fix?

8562523

Wauplin changed the title ~~Use makeRequestOptional to generate inference snippets~~ Use makeRequestOptions to generate inference snippets Mar 12, 2025

hanouticelina approved these changes Mar 12, 2025

View reviewed changes

Wauplin added 2 commits March 13, 2025 10:16

Add conversational snippets for requests

c26b0bb

better

7f574bd

Wauplin mentioned this pull request Mar 13, 2025

[example] Add conversational snippets for requests #1282

Merged

Wauplin and others added 2 commits March 13, 2025 11:20

Pass provider inputs in raw payloads

d443069

Merge branch 'main' into fix-openai-inference-snippets

bfc4b4d

Wauplin mentioned this pull request Mar 14, 2025

[internal] refacto tests for inference snippets #1287

Merged

Wauplin added 2 commits March 14, 2025 19:10

Merge branch 'main' into fix-openai-inference-snippets

3d904d2

Merge branch 'main' into fix-openai-inference-snippets

fc97b6a

coyotte508 approved these changes Mar 17, 2025

View reviewed changes

Wauplin mentioned this pull request Mar 17, 2025

Generate js and curl snippets using templates #1291

Merged

6 tasks

SBrandeis reviewed Mar 18, 2025

View reviewed changes

julien-c reviewed Mar 18, 2025

View reviewed changes

julien-c approved these changes Mar 18, 2025

View reviewed changes

Wauplin and others added 5 commits March 18, 2025 11:59

Update packages/inference/src/lib/makeRequestOptions.ts

c706553

Co-authored-by: Simon Brandeis <[email protected]>

Update packages/inference/src/providers/novita.ts

7f6c7dd

Co-authored-by: Simon Brandeis <[email protected]>

fix DQA

df783f8

fix

16ae575

Wauplin requested review from gary149, pcuenca and ngxson as code owners March 18, 2025 16:50

Wauplin added 2 commits March 18, 2025 17:53

Merge branch 'main' into fix-openai-inference-snippets

633847b

Merge branch 'main' into fix-openai-inference-snippets

7686739

Wauplin merged commit 43b9364 into main Mar 19, 2025
5 checks passed

Wauplin deleted the fix-openai-inference-snippets branch March 19, 2025 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use `makeRequestOptions` to generate inference snippets #1273

Use `makeRequestOptions` to generate inference snippets #1273

Uh oh!

Wauplin commented Mar 12, 2025 •

edited

Loading

Uh oh!

hanouticelina left a comment

Uh oh!

coyotte508 left a comment

Uh oh!

coyotte508 commented Mar 17, 2025

Uh oh!

Wauplin commented Mar 17, 2025

Uh oh!

SBrandeis Mar 18, 2025

Uh oh!

Wauplin Mar 18, 2025 •

edited

Loading

Uh oh!

SBrandeis left a comment

Uh oh!

Uh oh!

SBrandeis Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

julien-c Mar 18, 2025

Uh oh!

Wauplin Mar 18, 2025

Uh oh!

Wauplin commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

		const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args;
		delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication

Use makeRequestOptions to generate inference snippets #1273

Use makeRequestOptions to generate inference snippets #1273

Uh oh!

Conversation

Wauplin commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanouticelina left a comment

Choose a reason for hiding this comment

Uh oh!

coyotte508 left a comment

Choose a reason for hiding this comment

Uh oh!

coyotte508 commented Mar 17, 2025

Uh oh!

Wauplin commented Mar 17, 2025

Uh oh!

SBrandeis Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Wauplin Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SBrandeis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SBrandeis Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

julien-c Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Wauplin Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Wauplin commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

Use `makeRequestOptions` to generate inference snippets #1273

Use `makeRequestOptions` to generate inference snippets #1273

Wauplin commented Mar 12, 2025 •

edited

Loading

Wauplin Mar 18, 2025 •

edited

Loading