Description
Description
Vertex now supports extraction of thinking tokens in certain Gemini models.
I have opened a PR #6261 to provide a suggested implementation of this.
Thinking budget is "technically supported" via:
providerOptions: {
google: {
thinkingConfig: {
thinkingBudget: 2048,
},
}
},
But actual extraction and usage of the thinking tokens requires additional logic.
Ideally, you'd send something like:
providerOptions: {
google: {
thinkingConfig: {
thinkingBudget: 2048,
includeThoughts: true // This line WOULD make vertex output thinking tokens
},
}
},
This would be identical to how the request is shaped on the vertex side.
The proper request body sent to vertex looks something like:
{"generationConfig":{"maxOutputTokens":65535,"temperature":0.7,"frequencyPenalty":0,"presencePenalty":0, "thinkingConfig": {"includeThoughts": true, "thinking_budget": 2048}},"contents":[{"role":"user","parts":[{"text":"Describe the most unusual or striking architectural feature you've ever seen in a building or structure."}]}]}
When the includeThoughts
option is passed to the aisdk via providerOptions, it is stripped from the request sent to vertex, and thus included thoughts are not sent.
Example Streamed response with thought
The response sent by vertex for thought tokens is like this for thoughts:
{"candidates": [{"content": {"role": "model","parts": [{"text": "Thinking... \n\n","thought": true}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyy","responseId": "xxxxxx"}
And like this for normal text parts:
{"candidates": [{"content": {"role": "model","parts": [{"text": " form.\n\nWhile historical examples exist (like cliff dwellings), seeing this concept applied in modern, high-design architecture is particularly striking because it feels both primal and cutting-edge simultaneously. It's a feature that grounds the building quite literally and figuratively, making it feel less like an object placed *on* the earth"}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyy","responseId": "xxxxxx"}
For completeness, here's the last data part with token usage metadata:
{"candidates": [{"content": {"role": "model","parts": [{"text": " and more like something emerging *from* it."}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 157,"candidatesTokenCount": 417,"totalTokenCount": 1930,"trafficType": "ON_DEMAND","promptTokensDetails": [{"modality": "TEXT","tokenCount": 157}],"candidatesTokensDetails": [{"modality": "TEXT","tokenCount": 417}],"thoughtsTokenCount": 1356},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyyyy","responseId": "xxxxxxxxx"}
Non-streamed response with thoughts
{
"candidates": [
{
"content": {
"role": "model",
"parts": [
{
"text": "**My Selection: The Nativity Facade of the Sagrada Familia**...",
"thought": true
},
{
"text": "Okay, drawing from the vast amount of architectural data I've processed, the most unusual and striking architectural feature I can describe is ..."
}
]
},
"finishReason": "STOP",
"avgLogprobs": -1.2357349219472042
}
],
"usageMetadata": {
"promptTokenCount": 19,
"candidatesTokenCount": 541,
"totalTokenCount": 1785,
"trafficType": "ON_DEMAND",
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 19
}
],
"candidatesTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 541
}
],
"thoughtsTokenCount": 1225
},
"modelVersion": "gemini-2.5-flash-preview-04-17",
"createTime": "yyyyyyy",
"responseId": "xxxxxxx"
}
Like before, a thought
key is included in reasoning parts, so this should be straighforward to extract.