Skip to content

Misc. bug: Reasoning content is not separated when streaming #13867

Closed
@Edremon

Description

@Edremon

Name and Version

version: 5523 (aa6dff0)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m models/Qwen3-30B-A3B-IQ4_XS.gguf --jinja

Problem description & steps to reproduce

Thinking content should be separated when streaming too.

@ochafik in #12379 said:

Note: Ideally, we'd stream the thoughts as a reasoning_content delta (now trivial to implement), but for now we are just aiming for compatibility w/ DeepSeek's API (if --reasoning-format deepseek, which is the default).

I just tested using the official deepseek API and thoughts are separated.

Official deepseek API:
"choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Okay"},"logprobs":null,"finish_reason":null}]}
llama.cpp server API:
"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<think>Okay"}}]

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions