Remove old tool outputs from previous responses to save tokens & context? #1050
Replies: 1 comment
-
To give a specific example, I'm following this official smolagents tutorial: https://huggingface.co/learn/agents-course/en/unit2/smolagents/code_agents#lets-see-some-examples And I'm on this example task:
The model is searching once, then thinking "Hmm, this is all party music, I should search for more appropriate formal music" and searching again. And at this point, the old search results for party music is taking up ~8k tokens of context, when we can be reasonably sure it can be removed, since now the agent is performing a new search. You might argue the results of the old search can still be useful for completing the final response ("I also found some party music if you want to lighten the vibe...") but I think the dev should decide if they want to make that tradeoff or not. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm trying to understand if there's a simple way to remove old tool outputs (or perhaps truncate them to some max size) in future requests, after the output has already been used.
I guess it's not always possible to say "the agent is done using the output". But in many flows, the code author might know for sure that a tool's message will never be relevant except for the first time it is emitted. So tokens can be saved (and context can be shortened) in those cases by pruning the old tool calls.
Is there already some flag for this? If not, what's the recommended pattern for hooking into the transcript and invoking some pruning behavior?
Beta Was this translation helpful? Give feedback.
All reactions