Agent Memory
AI agents in Tensorify can maintain conversation context across messages using memory plugins. This guide explains when to use each memory type, how to configure session keys, and patterns for production deployments.
Tensorify provides two memory plugins:
| Memory Type | Best For | Storage | Recall Method |
|---|---|---|---|
| Window Memory | Short conversations, prototyping | Workflow state (SQLite) | Last N messages (chronological) |
| Qdrant Memory | Long-term recall, support agents | Qdrant vector database | Recent + semantic search (hybrid) |
Use Window Memory when:
- You're building a prototype or simple chatbot
- Conversations are short (under ~20 messages)
- You don't need to recall information from past conversations
- You want zero external dependencies
Use Qdrant Memory when:
- The agent needs to recall information from days or weeks ago
- Users return for multiple sessions and expect the agent to remember past context
- You need semantic search ("what did the user say about their billing issue?")
- You're building a production support agent or knowledge worker
- Add a memory node (Window Memory or Qdrant Memory) to the canvas
- Connect it to the AI Agent's
Memoryhandle — a dashed purple edge appears - Configure the memory settings (window size, session key, etc.)
Only one memory provider can be connected to an agent at a time.
Session keys isolate conversations between different users or threads. Without a session key, all messages share a single memory buffer.
| Pattern | Session Key | Use Case |
|---|---|---|
| Per-user | user:{{ api_request.body.user_id }} | Each user has their own conversation history |
| Per-thread | thread:{{ api_request.body.thread_id }} | Multiple conversations per user (like Slack threads) |
| Per-session | session:{{ webhook.headers.x-session-id }} | Ephemeral sessions that can be reset |
| Global (shared) | global | All users share the same memory (rare) |
When using the OpenAI Chat protocol on the API trigger, the session ID from the X-Tensorify-Session-Id header is available as api_request.headers["x-tensorify-session-id"]. The Playground Chat mode sends this automatically.
Always use session keys in production. Without them, User A's messages become part of User B's context — a privacy issue and a confusing experience.
Both the AI Agent and memory plugins have a sessionKey setting. If set on the Agent, it's passed to the memory provider automatically. If set on both, the memory plugin's key takes precedence.
Recommendation: set sessionKey on the memory plugin for clarity.
When no sessionKey is set on either the agent or the memory plugin, the agent falls back to agent_memory:{workflow_id}. This means all users share a single conversation buffer — always set a session key in production.
Window Memory keeps a sliding window of the most recent messages. When the window exceeds windowSize, the oldest messages are dropped.
How it maps to LLM context:
[system prompt] + [last N messages from memory] + [current user message]
↓ ↓ ↓
Fixed context Window Memory output New input
Choosing window size:
- 5–10: Very short conversations, saves tokens
- 15–20: Good default for most chatbots
- 30–50: Long technical discussions, but watch token limits
- 50+: Risk exceeding context window — use Qdrant Memory instead
Persistence modes:
persistent: true— memory saved to workflow state, survives restartspersistent: false— in-memory only, resets on restart (useful for testing)
Qdrant Memory uses a hybrid strategy combining recent messages with semantic search. This means the agent always has both immediate context and relevant historical context.
How hybrid recall works:
Current message: "What was the refund policy we discussed?"
↓
┌─────────────────────┴─────────────────────┐
│ │
▼ ▼
Recent Window (last 4) Semantic Search (top 5)
"Hi, I need help" "Our refund policy is 30 days"
"What's your return policy?" "You mentioned wanting a refund"
"I bought item #123" "The refund was processed on..."
"It arrived damaged"
↓
Merged + Deduplicated
↓
Final context for the LLM
Tuning parameters:
recentWindow: 4— enough for immediate conversational flowtopK: 5— retrieves the most relevant past messages- Increase
topKfor agents that need to recall many past details - Decrease
recentWindowif you want to rely more on semantic relevance
If no memory plugin is connected, the agent is stateless — each message is processed independently. This is fine for:
- One-shot tasks (summarization, classification)
- Workflows where the trigger always sends the full context
- High-throughput scenarios where you want minimal overhead
- Token limits: Memory content counts toward the LLM's context window. A window of 50 messages with long responses can easily exceed 128K tokens.
- Memory + structured output: If using
outputSchema, memory messages still use the LLM's context but don't affect the output schema validation. - Clearing memory: To reset memory for a session, use a new session key value. There is no dedicated API to clear memory — changing the key effectively starts a fresh conversation.
- Cross-workflow memory: Memory is scoped to the workflow. Two different workflows with the same session key have separate memory stores (unless both use the same Qdrant collection).
- AI Agent Reference — full agent settings
- Window Memory Reference — sliding window memory
- Qdrant Memory Reference — vector store memory
- Build an AI Agent — complete build guide
