Skip to content

Feat/anthropic extended ttl #6205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

md2k
Copy link

@md2k md2k commented Jun 19, 2025

Description

Implements granular per-message-type caching for Anthropic models to improve token efficiency in Agent mode. Adds new CacheBehavior options to specify how many of each message type to cache (user messages, tool results, assistant tool calls, etc.) instead of only caching the last 2 user messages.
This is related to issue #6135

Checklist

  • I've read the contributing guide
  • The relevant docs, if any, have been updated or created
  • The relevant tests, if any, have been updated or created

Screenshots

N/A - Backend caching enhancement with no visual changes.

Tests

Added comprehensive test suite core/llm/llms/Anthropic.enhanced-caching.test.ts with 6 test cases covering:

  • Tool result message caching
  • Assistant tool call message caching
  • Per-type caching limits validation
  • Disabled caching behavior
  • Fallback TTL handling
  • Core shouldCacheMessage logic

All tests pass and validate the new per-type caching functionality while maintaining backward compatibility.

image

md2k added 5 commits June 19, 2025 22:52
Added to `cacheBehaviorSchema` extra optional parameters.
```
  useExtendedCacheTtlBeta: z.boolean().optional(),
  cacheTtl: z.enum(["5m", "1h"]).optional(), 
```
@md2k md2k requested a review from a team as a code owner June 19, 2025 22:54
@md2k md2k requested review from sestinj and removed request for a team June 19, 2025 22:54
Copy link

netlify bot commented Jun 19, 2025

👷 Deploy request for continuedev pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit e3140dd

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 19, 2025
Copy link

github-actions bot commented Jun 19, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link

recurseml bot commented Jun 19, 2025

😱 Found 3 issues. Time to roll up your sleeves! 😱

@md2k
Copy link
Author

md2k commented Jun 19, 2025

I have read the CLA Document and I hereby sign the CLA

@md2k
Copy link
Author

md2k commented Jun 19, 2025

Some details about how long session with big context looks with 5min cache and 1h cache and cost perspective:
5 min cache:
image
image

@md2k
Copy link
Author

md2k commented Jun 19, 2025

1h ttl
image
image

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 19, 2025
Copy link
Contributor

@sestinj sestinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great PR as far as the code goes. I kind of want to step back though to better understand whether you think this could be a sensible default rather than a configuration option. I'm weary of too many options and if everyone would benefit from the way you are configuring your Anthropic models, maybe we should just ship that as the default (I repeated all this in a comment below)

@@ -62,18 +62,40 @@ Anthropic currently does not offer any reranking models.

Anthropic supports [prompt caching with Claude](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching), which allows Claude models to cache system messages and conversation history between requests to improve performance and reduce costs.

> **NOTE:** As part of their `Beta` support [Extended caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#1-hour-cache-duration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs here feel a bit extensive since they take up most of this page now. I think we should try to make a collapsible block or make a dedicated page to prompt caching. If possible, I think the collapsible would be the better option

@@ -927,6 +927,13 @@ export interface RequestOptions {
export interface CacheBehavior {
cacheSystemMessage?: boolean;
cacheConversation?: boolean;
useExtendedCacheTtlBeta?: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coming at this review with the lens of "if we add it now, we'll have to support it forever (or go through a deliberate deprecation process)". I'm worried there are a large number of options here that aren't going to be relevant forever, or that they might not be the final form of this configuration.

It would be helpful to better understand whether all of the cacheUserMessages, cacheAssistantMessages, etc. are truly necessary for people to customize, or whether we just need to set a more sensible default. For example, I'd be curious what values you set here and whether you think we should just ship those as the defaults for everyone. Usage patterns in Continue are probably similar across a variety of users. Not that we couldn't eventually also allow this customization, but it might save a lot of maintenance (and give many users money back without needing to configure anything)

@chezsmithy
Copy link
Contributor

@sestinj maybe we align to this previous PR. #5371

It introduced a single caching setting that controls all the options. Whatever we do here I should likely bring to Bedrock as well.

Watching.

@sestinj
Copy link
Contributor

sestinj commented Jun 23, 2025

Agreed @chezsmithy ! Thanks for linking the PR here, that what I had in mind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants