Skip to content

Use makeRequestOptions to generate inference snippets #1273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Mar 19, 2025

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Mar 12, 2025

The broader goal of this PR is to use makeRequestOptions from JS InferenceClient in order to get all the implementation details (correct URL, correct authorization header, correct payload, etc.). JS InferenceClient is supposed to be the ground truth in this case.

In practice:

  • fixed makeUrl when chatCompletion + image-text-to-text (review here + other providers)
  • fixed wrong URL in openai python snippet (e.g. here, here)
  • fixed DQA requests snippet (here)

Technically, this PR:

  • splits makeRequestOptions in two parts: the async part that does the model ID resolution (depending on task+provider) and the sync part which generates the url, headers, body, etc. For snippets we only need the second part which is a sync call. => new (internal) method makeRequestOptionsFromResolvedModel
  • moves most of the logic inside snippetGenerator
    • logic is: get inputs => make request options => prepare template data => iterate over clients => generate snippets
    • Next: now that the logic is unified, adapting cURL and JS to use the same logic should be fairly easy (e.g. "just" need to create the jinja templates)
      • => final goal is to handle all languages/clients/providers with the same code and swap the templates
  • update most providers to use /chat/completions endpoint when chatCompletion is enabled
    • Previously we were also checking that task is text-generation => now we will also use /chat/completion on "image-text-to-text" models
    • that was mostly a bug in existing codebase => detected it thanks to the snippets
  • updated ./packages/inference/package.json to allow dev mode. Now running pnpm run dev in @inference makes it much easier to work with @tasks-gen (no need to rebuild each time I make a change)

EDIT: there is definitely a breaking change in how I handle the makeRequestOptions split (hence the broken CI). Will fix this. => fixed.

@Wauplin Wauplin changed the title Use makeRequestOptional to generate inference snippets Use makeRequestOptions to generate inference snippets Mar 12, 2025
Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed the generated snippets, all good!

Wauplin added a commit that referenced this pull request Mar 14, 2025
This PR does nothing else than updating
`packages/tasks-gen/scripts/generate-snippets-fixtures.ts` which is an
internal script used to test the inference snippets. Goal of this PR is
to store generated snippet under a new file structure like this:

```
./snippets-fixtures/automatic-speech-recognition/python/huggingface_hub/1.hf-inference.py
```

instead of 

```
./snippets-fixtures/automatic-speech-recognition/1.huggingface_hub.hf-inference.py
```

In practice the previous file naming was annoying as it meant that
adding a new snippet in a client type could lead to renaming another
file (due to the `0.`, `1.`, ... prefixes).

---

Typically in #1273 it
makes the PR much bigger by e.g. deleting
[`1.openai.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-4759b74a67cc4caa7b2d273d7c2a8015ba062a19a8fad5cb2e227ca935dcb749)
and creating
[`2.openai.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-522e7173f8dd851189bb9b7ff311f4ee78ca65a3994caae803ff4fda5fe59733)
just because a new
[`1.requests.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-c8c5536f5af1631e8f1802155b66b0a23a4316eaaf5fcfce1a036da490acaa22)
has been added.

Separating files by language + client avoid these unnecessary problems.
Copy link
Member

@coyotte508 coyotte508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adding unit tests to check generated snippets look like expected woudl be nice

@coyotte508
Copy link
Member

fixed makeUrl when chatCompletion + image-text-to-text (review here + other providers)

Still need to update the E2E tests probably, with VCR_MODE="cache" pnpm --filter inference test

@Wauplin
Copy link
Contributor Author

Wauplin commented Mar 17, 2025

Comment on lines 114 to 116
const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args;
delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can use this syntax

Suggested change
const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args;
delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication
const { accessToken, endpointUrl, provider: maybeProvider, model, ...remainingArgs } = args;

Copy link
Contributor Author

@Wauplin Wauplin Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I remember why I did that in the first place. If I deconstruct model and don't use it, the linter complains with 'model' is assigned a value but never used.

image

@SBrandeis any way to get rid of this? (otherwise I can go back to using delete

Copy link
Contributor

@SBrandeis SBrandeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!!
I have a few minor comments, but overall it looks good

Comment on lines 102 to 109
if (
model.pipeline_tag &&
["text-generation", "image-text-to-text"].includes(model.pipeline_tag) &&
model.tags.includes("conversational")
) {
templateName = opts?.streaming ? "conversationalStream" : "conversational";
inputPreparationFn = prepareConversationalInput;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess another solution would be to have the caller pass a isConversational: boolean flag or similar - wdyt?

Comment on lines 271 to 278
function formatBody(obj: object, format: "python" | "json" | "js" | "curl"): string {
if (format === "python") {
return Object.entries(obj)
.map(([key, value]) => {
const formattedValue = JSON.stringify(value, null, 4).replace(/"/g, '"');
return `${key}=${formattedValue},`;
})
.join("\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i was wondering if we shouldn't use prettier or ruff to format snippets at runtime... not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so as well but not fully convinced. I feel that for some snippets it would condense more the code leading to slightly less readable snippets compared to what we have (maybe neglectable side effect?).

Anyway, maybe tooling for another day 😄

Wauplin and others added 5 commits March 18, 2025 11:59
PR built on top of
#1273.

This is supposed to be the last PR refactoring inference snippets
:hear_no_evil:
`python.ts`, `curl.ts` and `js.ts` have been merged into a single
`getInferenceSnippets.ts` which handles snippet generations for all
languages and all providers at once. Here is how to use it:

```ts
import { snippets } from "@huggingface/inference";

const generatedSnippets = snippets.getInferenceSnippets(model, "api_token", provider, providerModelId, opts);
```

it returns a list `InferenceSnippet[]` defined as 

```ts
export interface InferenceSnippet {
    language: InferenceSnippetLanguage; // e.g. `python`, `curl`, `js`
    client: string; // e.g. `huggingface_hub`, `openai`, `fetch`, etc.
    content: string;
}
```

---

### How to review?

It's hard to track all atomic changes made to the inference snippets but
the best way IMO to review this PR is to check the generated snippets in
the tests. Many inconsistencies in the URLs, sent parameters and
indentation have been fixed.

---

### What's next?

- [x] get #1273
approved
- [ ] merge this one
(#1291) into
#1273
- [ ] merge #1273
- [ ] open PR in moon-landing to adapt to new interface (basically use
`snippets.getInferenceSnippets` instead of `python.getPythonSnippets`,
etc)
- [ ] open PR to fix hub-docs automatic generation
- [ ] fully ready for Inference Providers documentation!

---------

Co-authored-by: Simon Brandeis <[email protected]>
@Wauplin
Copy link
Contributor Author

Wauplin commented Mar 19, 2025

Merging this PR to move forward with https://github.com/huggingface-internal/moon-landing/pull/13013

@Wauplin Wauplin merged commit 43b9364 into main Mar 19, 2025
5 checks passed
@Wauplin Wauplin deleted the fix-openai-inference-snippets branch March 19, 2025 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants