Google-Vertex: Support `include_thinking` in reasoning configuration and extraction of model thoughts.

### Description

[Vertex now supports extraction of thinking tokens in certain Gemini models](https://cloud.google.com/vertex-ai/generative-ai/docs/thinking).

I have opened a PR #6261 to provide a suggested implementation of this.

Thinking budget is "technically supported" via:

```typescript
  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 2048,
      },
    }
  },
```

But actual extraction and usage of the thinking tokens requires additional logic.

Ideally, you'd send something like:

```typescript
  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 2048,
        includeThoughts: true  // This line WOULD make vertex output thinking tokens
      },
    }
  },
```

This would be identical to how the request is shaped on the vertex side.

The proper request body sent to vertex looks something like:

```typescript
{"generationConfig":{"maxOutputTokens":65535,"temperature":0.7,"frequencyPenalty":0,"presencePenalty":0, "thinkingConfig": {"includeThoughts": true, "thinking_budget": 2048}},"contents":[{"role":"user","parts":[{"text":"Describe the most unusual or striking architectural feature you've ever seen in a building or structure."}]}]}
```

When the `includeThoughts` option is passed to the aisdk via providerOptions, it is stripped from the request sent to vertex, and thus included thoughts are not sent.

## Example Streamed response with thought
The response sent by vertex for thought tokens is like this for thoughts:

```json
{"candidates": [{"content": {"role": "model","parts": [{"text": "Thinking... \n\n","thought": true}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyy","responseId": "xxxxxx"}
```

And like this for normal text parts:

```json
{"candidates": [{"content": {"role": "model","parts": [{"text": " form.\n\nWhile historical examples exist (like cliff dwellings), seeing this concept applied in modern, high-design architecture is particularly striking because it feels both primal and cutting-edge simultaneously. It's a feature that grounds the building quite literally and figuratively, making it feel less like an object placed *on* the earth"}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyy","responseId": "xxxxxx"}
```

For completeness, here's the last data part with token usage metadata:

```json
{"candidates": [{"content": {"role": "model","parts": [{"text": " and more like something emerging *from* it."}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 157,"candidatesTokenCount": 417,"totalTokenCount": 1930,"trafficType": "ON_DEMAND","promptTokensDetails": [{"modality": "TEXT","tokenCount": 157}],"candidatesTokensDetails": [{"modality": "TEXT","tokenCount": 417}],"thoughtsTokenCount": 1356},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyyyy","responseId": "xxxxxxxxx"}
```

## Non-streamed response with thoughts

```json
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "**My Selection: The Nativity Facade of the Sagrada Familia**...",
            "thought": true
          },
          {
            "text": "Okay, drawing from the vast amount of architectural data I've processed, the most unusual and striking architectural feature I can describe is ..."
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -1.2357349219472042
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 19,
    "candidatesTokenCount": 541,
    "totalTokenCount": 1785,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 19
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 541
      }
    ],
    "thoughtsTokenCount": 1225
  },
  "modelVersion": "gemini-2.5-flash-preview-04-17",
  "createTime": "yyyyyyy",
  "responseId": "xxxxxxx"
}

```


Like before, a `thought` key is included in reasoning parts, so this should be straighforward to extract.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Google-Vertex: Support `include_thinking` in reasoning configuration and extraction of model thoughts. #6259

Description

Example Streamed response with thought

Non-streamed response with thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Google-Vertex: Support include_thinking in reasoning configuration and extraction of model thoughts. #6259

Description

Description

Example Streamed response with thought

Non-streamed response with thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Google-Vertex: Support `include_thinking` in reasoning configuration and extraction of model thoughts. #6259