Closed
Description
Name and Version
version: 5523 (aa6dff0)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m models/Qwen3-30B-A3B-IQ4_XS.gguf --jinja
Problem description & steps to reproduce
Thinking content should be separated when streaming too.
Note: Ideally, we'd stream the thoughts as a reasoning_content delta (now trivial to implement), but for now we are just aiming for compatibility w/ DeepSeek's API (if --reasoning-format deepseek, which is the default).
I just tested using the official deepseek API and thoughts are separated.
Official deepseek API:
"choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Okay"},"logprobs":null,"finish_reason":null}]}
llama.cpp server API:
"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<think>Okay"}}]