Skip to content

support for bedrock prompt caching #750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eluo28 opened this issue May 24, 2025 · 0 comments
Open

support for bedrock prompt caching #750

eluo28 opened this issue May 24, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@eluo28
Copy link

eluo28 commented May 24, 2025

Please read this first

  • Have you read the custom model provider docs, including the 'Common issues' section? Model provider docs
  • Have you searched for related issues? Others may have faced similar issues.

Describe the question

I don't believe prompt caching through bedrock as the model provider is supported, I tested the same prompt using bedrock converse API directly which returned prompt cache tokens used as well as using the agents sdk
but the agents sdk never returned >0 cached tokens when called again right after.

Debug information

  • Agents SDK version: (e.g. v0.0.3)
  • Python version (e.g. Python 3.10)

Repro steps

Ideally provide a minimal python script that can be run to reproduce the issue.

agent = Agent(
name="big prompt agent",
instructions= "some prompt that needs prompt caching > token requirement",
model=LitellmModel(
model=f"bedrock/{BedrockModelIdentifier.CLAUDE35_HAIKU}",
),
)

result = Runner.run_sync(agent, prompt)

Method 1: Get total usage from context wrapper

total_usage = result.context_wrapper.usage
print("First request usage:")
print(
total_usage.input_tokens_details,
total_usage.output_tokens_details,
total_usage.input_tokens,
total_usage.output_tokens,
)

result2 = Runner.run_sync(agent, prompt)

print("\nSecond request usage (should show cached tokens):")
total_usage2 = result2.context_wrapper.usage
print(
total_usage2.input_tokens_details,
total_usage2.output_tokens_details,
total_usage2.input_tokens,
total_usage2.output_tokens,
)

Expected behavior

A clear and concise description of what you expected to happen.

total_usage2 input_tokens_details to return cached_tokens >0

@eluo28 eluo28 added the bug Something isn't working label May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant