Skip to content

[FEATURE] Native Prompt Caching & Token OptimizationΒ #4634

Open
@saakai-dev

Description

@saakai-dev

🧠 Feature Request: Native Prompt Caching & Token Optimization in Flowise v2

❗ Problem

Flowise v2 lacks native support for system prompt persistence across multiple queries without re-injection. This creates unnecessary token consumption, especially in agent workflows involving:

  • Search & scraping agents
  • Validation agents
  • Multi-step reasoning agents

Currently, the only reliable way to preserve the system prompt is to:

  • Re-send it on every query (agentMessages)
  • Or use memory nodes (BufferMemory / BufferWindowMemory)

However, memory nodes also retain user/assistant interactions, which bloats the token context over time β€” defeating the optimization goal.


πŸ“Š Feature Comparison

  • ❌ startState loaded into LLM memory
    β†’ Exists in flow state, but not injected into the LLM prompt context.

  • ❌ Native system prompt caching
    β†’ No support for fingerprinting or prompt reuse across multiple queries.

  • ❌ OpenAI system_fingerprint support
    β†’ Can't leverage OpenAI's native prompt caching to reduce token costs.

  • ❌ Selective memory (system prompt only)
    β†’ No option to drop user/assistant messages while retaining the system prompt alone.

  • βœ… Memory window size configuration (UI)
    β†’ Can adjust k in BufferWindowMemory node to limit how much memory is retained.

  • βœ… Agent memory toggle (UI)
    β†’ Memory can be enabled or disabled per agent using the Flowise UI.

  • ❌ Token usage monitoring per agent
    β†’ No visibility or caps on tokens consumed per node or step.

  • ❌ Prompt injection only once (query 1)
    β†’ No built-in condition logic to inject the prompt only on the first query.

  • ❌ Prompt TTL / Expiry
    β†’ No control over how long memory or prompts persist during a session.


βœ… Proposed Enhancements

1. Sticky Prompt Memory Type

  • Memory mode that only retains the initial system prompt
  • Automatically drops all user/assistant messages per interaction
  • Ideal for long sessions with a static instruction context

2. Prompt Caching & system_fingerprint Support

  • Allow prompts to be reused across multiple LLM calls without re-sending
  • Compatible with OpenAI's system_fingerprint for efficient pricing

3. Token Budget Control

  • Ability to cap token usage per node or per agent
  • Alerts or guards when token budget is exceeded

4. Memory TTL / Expiration

  • Expire memory context after X minutes or N queries
  • Prevents silent prompt reuse in long, unrelated sessions

πŸš€ Benefits

  • Reduces token usage significantly for long-running agents
  • Makes Flowise cost-efficient for production environments
  • Enables better orchestration of multi-agent flows (search, scrape, validate)
  • Aligns with best practices in LangChain / OpenAI Assistants API prompt reuse

πŸ™ Request

Would love to see native support for prompt caching and sticky memory in a future Flowise release.
Happy to collaborate or test if needed! Let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions