Skip to content

Google-Vertex: Support include_thinking in reasoning configuration and extraction of model thoughts. #6259

@Und3rf10w

Description

@Und3rf10w

Description

Vertex now supports extraction of thinking tokens in certain Gemini models.

I have opened a PR #6261 to provide a suggested implementation of this.

Thinking budget is "technically supported" via:

  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 2048,
      },
    }
  },

But actual extraction and usage of the thinking tokens requires additional logic.

Ideally, you'd send something like:

  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 2048,
        includeThoughts: true  // This line WOULD make vertex output thinking tokens
      },
    }
  },

This would be identical to how the request is shaped on the vertex side.

The proper request body sent to vertex looks something like:

{"generationConfig":{"maxOutputTokens":65535,"temperature":0.7,"frequencyPenalty":0,"presencePenalty":0, "thinkingConfig": {"includeThoughts": true, "thinking_budget": 2048}},"contents":[{"role":"user","parts":[{"text":"Describe the most unusual or striking architectural feature you've ever seen in a building or structure."}]}]}

When the includeThoughts option is passed to the aisdk via providerOptions, it is stripped from the request sent to vertex, and thus included thoughts are not sent.

Example Streamed response with thought

The response sent by vertex for thought tokens is like this for thoughts:

{"candidates": [{"content": {"role": "model","parts": [{"text": "Thinking... \n\n","thought": true}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyy","responseId": "xxxxxx"}

And like this for normal text parts:

{"candidates": [{"content": {"role": "model","parts": [{"text": " form.\n\nWhile historical examples exist (like cliff dwellings), seeing this concept applied in modern, high-design architecture is particularly striking because it feels both primal and cutting-edge simultaneously. It's a feature that grounds the building quite literally and figuratively, making it feel less like an object placed *on* the earth"}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyy","responseId": "xxxxxx"}

For completeness, here's the last data part with token usage metadata:

{"candidates": [{"content": {"role": "model","parts": [{"text": " and more like something emerging *from* it."}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 157,"candidatesTokenCount": 417,"totalTokenCount": 1930,"trafficType": "ON_DEMAND","promptTokensDetails": [{"modality": "TEXT","tokenCount": 157}],"candidatesTokensDetails": [{"modality": "TEXT","tokenCount": 417}],"thoughtsTokenCount": 1356},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyyyy","responseId": "xxxxxxxxx"}

Non-streamed response with thoughts

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "**My Selection: The Nativity Facade of the Sagrada Familia**...",
            "thought": true
          },
          {
            "text": "Okay, drawing from the vast amount of architectural data I've processed, the most unusual and striking architectural feature I can describe is ..."
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -1.2357349219472042
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 19,
    "candidatesTokenCount": 541,
    "totalTokenCount": 1785,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 19
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 541
      }
    ],
    "thoughtsTokenCount": 1225
  },
  "modelVersion": "gemini-2.5-flash-preview-04-17",
  "createTime": "yyyyyyy",
  "responseId": "xxxxxxx"
}

Like before, a thought key is included in reasoning parts, so this should be straighforward to extract.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions