[FEATURE] Native Prompt Caching & Token Optimization

## 🧠 Feature Request: Native Prompt Caching & Token Optimization in Flowise v2

### ❗ Problem

Flowise v2 lacks native support for **system prompt persistence** across multiple queries without re-injection. This creates **unnecessary token consumption**, especially in agent workflows involving:

- Search & scraping agents  
- Validation agents  
- Multi-step reasoning agents

Currently, the only reliable way to preserve the system prompt is to:
- Re-send it on every query (`agentMessages`)
- Or use memory nodes (BufferMemory / BufferWindowMemory)

However, **memory nodes also retain user/assistant interactions**, which bloats the token context over time — defeating the optimization goal.

---

### 📊 Feature Comparison

- ❌ **`startState` loaded into LLM memory**  
  → Exists in flow state, but not injected into the LLM prompt context.

- ❌ **Native system prompt caching**  
  → No support for fingerprinting or prompt reuse across multiple queries.

- ❌ **OpenAI `system_fingerprint` support**  
  → Can't leverage OpenAI's native prompt caching to reduce token costs.

- ❌ **Selective memory (system prompt only)**  
  → No option to drop user/assistant messages while retaining the system prompt alone.

- ✅ **Memory window size configuration (UI)**  
  → Can adjust `k` in BufferWindowMemory node to limit how much memory is retained.

- ✅ **Agent memory toggle (UI)**  
  → Memory can be enabled or disabled per agent using the Flowise UI.

- ❌ **Token usage monitoring per agent**  
  → No visibility or caps on tokens consumed per node or step.

- ❌ **Prompt injection only once (query 1)**  
  → No built-in condition logic to inject the prompt only on the first query.

- ❌ **Prompt TTL / Expiry**  
  → No control over how long memory or prompts persist during a session.

---

### ✅ Proposed Enhancements

#### 1. Sticky Prompt Memory Type
- Memory mode that **only retains the initial system prompt**
- Automatically drops all user/assistant messages per interaction
- Ideal for long sessions with a static instruction context

#### 2. Prompt Caching & `system_fingerprint` Support
- Allow prompts to be reused across multiple LLM calls without re-sending
- Compatible with OpenAI's `system_fingerprint` for efficient pricing

#### 3. Token Budget Control
- Ability to cap token usage per node or per agent
- Alerts or guards when token budget is exceeded

#### 4. Memory TTL / Expiration
- Expire memory context after `X` minutes or `N` queries
- Prevents silent prompt reuse in long, unrelated sessions

---

### 🚀 Benefits

- Reduces token usage significantly for long-running agents  
- Makes Flowise cost-efficient for production environments  
- Enables better orchestration of multi-agent flows (search, scrape, validate)  
- Aligns with best practices in LangChain / OpenAI Assistants API prompt reuse

---

### 🙏 Request

Would love to see native support for **prompt caching and sticky memory** in a future Flowise release.
Happy to collaborate or test if needed! Let me know.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FEATURE] Native Prompt Caching & Token Optimization #4634

🧠 Feature Request: Native Prompt Caching & Token Optimization in Flowise v2

❗ Problem

📊 Feature Comparison

✅ Proposed Enhancements

1. Sticky Prompt Memory Type

2. Prompt Caching & `system_fingerprint` Support

3. Token Budget Control

4. Memory TTL / Expiration

🚀 Benefits

🙏 Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Native Prompt Caching & Token Optimization #4634

Description

🧠 Feature Request: Native Prompt Caching & Token Optimization in Flowise v2

❗ Problem

📊 Feature Comparison

✅ Proposed Enhancements

1. Sticky Prompt Memory Type

2. Prompt Caching & system_fingerprint Support

3. Token Budget Control

4. Memory TTL / Expiration

🚀 Benefits

🙏 Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Prompt Caching & `system_fingerprint` Support