Description
π§ Feature Request: Native Prompt Caching & Token Optimization in Flowise v2
β Problem
Flowise v2 lacks native support for system prompt persistence across multiple queries without re-injection. This creates unnecessary token consumption, especially in agent workflows involving:
- Search & scraping agents
- Validation agents
- Multi-step reasoning agents
Currently, the only reliable way to preserve the system prompt is to:
- Re-send it on every query (
agentMessages
) - Or use memory nodes (BufferMemory / BufferWindowMemory)
However, memory nodes also retain user/assistant interactions, which bloats the token context over time β defeating the optimization goal.
π Feature Comparison
-
β
startState
loaded into LLM memory
β Exists in flow state, but not injected into the LLM prompt context. -
β Native system prompt caching
β No support for fingerprinting or prompt reuse across multiple queries. -
β OpenAI
system_fingerprint
support
β Can't leverage OpenAI's native prompt caching to reduce token costs. -
β Selective memory (system prompt only)
β No option to drop user/assistant messages while retaining the system prompt alone. -
β Memory window size configuration (UI)
β Can adjustk
in BufferWindowMemory node to limit how much memory is retained. -
β Agent memory toggle (UI)
β Memory can be enabled or disabled per agent using the Flowise UI. -
β Token usage monitoring per agent
β No visibility or caps on tokens consumed per node or step. -
β Prompt injection only once (query 1)
β No built-in condition logic to inject the prompt only on the first query. -
β Prompt TTL / Expiry
β No control over how long memory or prompts persist during a session.
β Proposed Enhancements
1. Sticky Prompt Memory Type
- Memory mode that only retains the initial system prompt
- Automatically drops all user/assistant messages per interaction
- Ideal for long sessions with a static instruction context
2. Prompt Caching & system_fingerprint
Support
- Allow prompts to be reused across multiple LLM calls without re-sending
- Compatible with OpenAI's
system_fingerprint
for efficient pricing
3. Token Budget Control
- Ability to cap token usage per node or per agent
- Alerts or guards when token budget is exceeded
4. Memory TTL / Expiration
- Expire memory context after
X
minutes orN
queries - Prevents silent prompt reuse in long, unrelated sessions
π Benefits
- Reduces token usage significantly for long-running agents
- Makes Flowise cost-efficient for production environments
- Enables better orchestration of multi-agent flows (search, scrape, validate)
- Aligns with best practices in LangChain / OpenAI Assistants API prompt reuse
π Request
Would love to see native support for prompt caching and sticky memory in a future Flowise release.
Happy to collaborate or test if needed! Let me know.