Skip to content

[Inference Providers] Fix structured output schema in chat completion #3082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 22, 2025

Conversation

hanouticelina
Copy link
Contributor

This PR fixes compatibility issues with structured outputs across providers by ensuring the InferenceClient follows the OpenAI API specs structured output.

Originally raised by @akseljoonas on Slack:

I have been trying out structured outputs through the InferenceClient on the hub's Python package, and I saw that each inference provider (Nebius, Novita, Together, etc) has a slightly different format that they expect for the call with structured output. So I wanted to ask if there is any mapping being done under the hood to match the format of the provider? Im currently passing in the schema below and getting 500 errors with Novita while it works with Nebius
response_format = {
"type": "json_object",
"value": response_format["json_schema"]["schema"],
}

It turns out some providers don’t fully follow OpenAI’s spec for the response_format field. This PR ensures the client always uses the OpenAI-compliant format and adds internal mappings for each provider when needed.

Note: When integrating a new provider into our clients, we should ensure that structured output and function calling are compatible with the OpenAI specs. If that’s not the case, a custom mapping should be added as part of the integration.

@hanouticelina hanouticelina marked this pull request as draft May 14, 2025 10:31
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin
Copy link
Contributor

Wauplin commented May 14, 2025

Is this PR a follow-up of huggingface/huggingface.js#1380 or are the types added manually ? (just curious)

@hanouticelina
Copy link
Contributor Author

I added the types manually for now (huggingface/huggingface.js#1380 is more of an experiment and not totally ready to be used). also, i'm not sure it's worth prioritizing the switch to the OpenAPI specs of OpenAI for now.

That said, since we don't do any input validation, adding the new types here is not "necessary", it was mainly to show the discrepancy between TGI and OpenAI specs for structured output.

@hanouticelina hanouticelina marked this pull request as ready for review May 21, 2025 10:41
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Now I understand better why we are always testing MCP servers on Nebius ^^

@hanouticelina hanouticelina requested a review from Wauplin May 21, 2025 15:14
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested it myself for all providers but looks good to me! Just a nit regarding removing parameters from the input mapping (not sure it's necessary).

Pre-approving so that I'm not a blocker on this :)

@hanouticelina hanouticelina merged commit 417ad89 into main May 22, 2025
25 checks passed
@hanouticelina hanouticelina deleted the fix-response-format-providers branch May 22, 2025 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants