models_usage shows prompt_tokens and completion_tokens as 0 #6548

quantexperts · 2025-05-16T13:27:58Z

What happened?

Describe the bug
I was previously using v0.5.1 and sometime over the last few weeks, I had upgraded to v0.5.6. I was checking the history data and noticed that the prompt_tokens and completion_tokens under models_usage are coming as 0.
I remember they used to report token count but I can only see zero.

To Reproduce

response = await agent.on_messages(
	messages=[request], 
	cancellation_token=CancellationToken()
)
print('>>>>', response.chat_message.json())

Expected behavior
Expecting token usage counts

Screenshots
I see output like: {"source":"assistant","models_usage":{"prompt_tokens":0,"completion_tokens":0},"metadata":{},"content":"Hello! How can I assist you today?","type":"TextMessage"}

Which packages was the bug in?

Python Core (autogen-core)

AutoGen library version.

Python 0.5.6

Other library version.

No response

Model used

gpt-4o-mini, gemini-2.0-flash

Model provider

OpenAI

Other model provider

Gemini

Python version

3.12

.NET version

None

Operating system

None

The text was updated successfully, but these errors were encountered:

ekzhu · 2025-05-17T17:14:26Z

v0.5.7, cannot reproduce

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def main():
    client = OpenAIChatCompletionClient(model="gemini-2.0-flash")
    assistant = AssistantAgent(
        "assistant",
        model_client=client,
        system_message="You are a helpful assistant.",
    )
    result = await assistant.run(task="What is the capital of France?")
    print(result)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

messages=[TextMessage(source='user', models_usage=None, metadata={}, content='What is the capital of France?', type='TextMessage'), TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=13, completion_tokens=8), metadata={}, content='The capital of France is Paris.\n', type='TextMessage')] stop_reason=None

quantexperts · 2025-05-18T09:40:10Z

Hi, below is a sample code using Azure OpenAI but I noticed the same behaviour with Gemini as well. Please let me know what am I missing:

from autogen_agentchat.agents import AssistantAgent
from autogen_core.models import ChatCompletionClient, UserMessage
from autogen_agentchat.messages import TextMessage
# Print loaded versions
import importlib.metadata

def print_versions():
    packages = ["autogen-agentchat", "autogen-core"]
    for pkg in packages:
        try:
            version = importlib.metadata.version(pkg)
            print(f"{pkg} version: {version}")
        except importlib.metadata.PackageNotFoundError:
            print(f"{pkg} is not installed or version not found")

print_versions()

autogen-agentchat version: 0.5.7
autogen-core version: 0.5.7

model_config = {
    'provider': 'autogen_ext.models.openai.AzureOpenAIChatCompletionClient', 
    'config': 
        {
            'model': 'gpt-4o-mini', 
            'azure_endpoint': os.getenv('AZURE_ENDPOINT_URL'),
            'api_version': '2025-02-01-preview', 
            'max_tokens': 500,
            'temperature': 0.1
        }
}
model_client = ChatCompletionClient.load_component(model_config)

async def main():
    agent = AssistantAgent(
            name="assistant",
            model_client=model_client,
            model_client_stream=True,
            system_message = "You are a helpful assistant",
            reflect_on_tool_use=True
        )
    request = TextMessage(content="What is the capital of France?", source="user")
    stream = agent.run_stream(task=request)
    async for message in stream:
        if isinstance(message, TextMessage):
            print(message.model_dump())
            
await main()

{'source': 'user', 'models_usage': None, 'metadata': {}, 'content': 'What is the capital of France?', 'type': 'TextMessage'}
{'source': 'assistant', 'models_usage': {'prompt_tokens': 0, 'completion_tokens': 0}, 'metadata': {}, 'content': 'The capital of France is Paris.', 'type': 'TextMessage'}

peterychang · 2025-05-21T18:32:15Z

The problem appears to be limited to streaming responses. Investigating

peterychang · 2025-05-22T15:11:31Z

The problem here is a missing parameter. As a temporary workaround while I fix the issue, you can add the following parameter on your side:

OpenAI/Azure OpenAI:

model_client = OpenAIChatCompletionClient(
    ...,
    stream_options={"include_usage": True},
)

Baukebrenninkmeijer · 2025-05-23T07:06:49Z

I was encountering this problem with both the streaming and non-streaming calls from agents, when using Azure OpenAI. If I disable the stream_options parameters now for batch, it will still return usage of 0 tokens. However, this solved the problem I was having, so thank you! @peterychang

## Why are these changes needed? Enables usage statistics for streaming responses by default. There is a similar bug in the AzureAI client. Theoretically adding the parameter ``` model_extras={"stream_options": {"include_usage": True}} ``` should fix the problem, but I'm currently unable to test that workflow ## Related issue number closes #6548 ## Checks - [ ] I've included any doc changes needed for <https://microsoft.github.io/autogen/>. See <https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to build and test documentation locally. - [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR. - [ ] I've made sure all auto checks have passed.

quantexperts added the needs-triage label May 16, 2025

ekzhu added the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label May 17, 2025

github-actions bot removed the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label May 18, 2025

jackgerrits assigned peterychang May 19, 2025

peterychang removed the needs-triage label May 21, 2025

peterychang mentioned this issue May 22, 2025

Default usage statistics for streaming responses #6578

Merged

3 tasks

peterychang closed this as completed in #6578 May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

models_usage shows prompt_tokens and completion_tokens as 0 #6548

models_usage shows prompt_tokens and completion_tokens as 0 #6548

quantexperts commented May 16, 2025

ekzhu commented May 17, 2025 •

edited

Loading

Uh oh!

quantexperts commented May 18, 2025 •

edited

Loading

Uh oh!

peterychang commented May 21, 2025

Uh oh!

peterychang commented May 22, 2025 •

edited

Loading

Uh oh!

Baukebrenninkmeijer commented May 23, 2025

Uh oh!

models_usage shows prompt_tokens and completion_tokens as 0 #6548

models_usage shows prompt_tokens and completion_tokens as 0 #6548

Comments

quantexperts commented May 16, 2025

What happened?

Which packages was the bug in?

AutoGen library version.

Other library version.

Model used

Model provider

Other model provider

Python version

.NET version

Operating system

ekzhu commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quantexperts commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterychang commented May 21, 2025

Uh oh!

peterychang commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Baukebrenninkmeijer commented May 23, 2025

Uh oh!

ekzhu commented May 17, 2025 •

edited

Loading

quantexperts commented May 18, 2025 •

edited

Loading

peterychang commented May 22, 2025 •

edited

Loading