Agent Memory

AI agents in Tensorify can maintain conversation context across messages using memory plugins. This guide explains when to use each memory type, how to configure session keys, and patterns for production deployments.

Memory Types

Tensorify provides two memory plugins:

Memory Type	Best For	Storage	Recall Method
Window Memory	Short conversations, prototyping	Workflow state (SQLite)	Last N messages (chronological)
Qdrant Memory	Long-term recall, support agents	Qdrant vector database	Recent + semantic search (hybrid)

When to Use Each

Use Window Memory when:

You're building a prototype or simple chatbot
Conversations are short (under ~20 messages)
You don't need to recall information from past conversations
You want zero external dependencies

Use Qdrant Memory when:

The agent needs to recall information from days or weeks ago
Users return for multiple sessions and expect the agent to remember past context
You need semantic search ("what did the user say about their billing issue?")
You're building a production support agent or knowledge worker

Connecting Memory to an Agent

Add a memory node (Window Memory or Qdrant Memory) to the canvas
Connect it to the AI Agent's Memory handle — a dashed purple edge appears
Configure the memory settings (window size, session key, etc.)

Only one memory provider can be connected to an agent at a time.

Session Keys

Session keys isolate conversations between different users or threads. Without a session key, all messages share a single memory buffer.

Common Patterns

Pattern	Session Key	Use Case
Per-user	`user:{{ api_request.body.user_id }}`	Each user has their own conversation history
Per-thread	`thread:{{ api_request.body.thread_id }}`	Multiple conversations per user (like Slack threads)
Per-session	`session:{{ webhook.headers.x-session-id }}`	Ephemeral sessions that can be reset
Global (shared)	`global`	All users share the same memory (rare)

When using the OpenAI Chat protocol on the API trigger, the session ID from the X-Tensorify-Session-Id header is available as api_request.headers["x-tensorify-session-id"]. The Playground Chat mode sends this automatically.

Always use session keys in production. Without them, User A's messages become part of User B's context — a privacy issue and a confusing experience.

Session Key on Agent vs Memory

Both the AI Agent and memory plugins have a sessionKey setting. If set on the Agent, it's passed to the memory provider automatically. If set on both, the memory plugin's key takes precedence.

Recommendation: set sessionKey on the memory plugin for clarity.

Default Session Key

When no sessionKey is set on either the agent or the memory plugin, the agent falls back to agent_memory:{workflow_id}. This means all users share a single conversation buffer — always set a session key in production.

Window Memory Deep Dive

Window Memory keeps a sliding window of the most recent messages. When the window exceeds windowSize, the oldest messages are dropped.

How it maps to LLM context:

[system prompt] + [last N messages from memory] + [current user message]
         ↓                    ↓                           ↓
   Fixed context     Window Memory output        New input

Choosing window size:

5–10: Very short conversations, saves tokens
15–20: Good default for most chatbots
30–50: Long technical discussions, but watch token limits
50+: Risk exceeding context window — use Qdrant Memory instead

Persistence modes:

persistent: true — memory saved to workflow state, survives restarts
persistent: false — in-memory only, resets on restart (useful for testing)

Qdrant Memory Deep Dive

Qdrant Memory uses a hybrid strategy combining recent messages with semantic search. This means the agent always has both immediate context and relevant historical context.

How hybrid recall works:

Current message: "What was the refund policy we discussed?"
                          ↓
    ┌─────────────────────┴─────────────────────┐
    │                                           │
    ▼                                           ▼
Recent Window (last 4)                   Semantic Search (top 5)
  "Hi, I need help"                     "Our refund policy is 30 days"
  "What's your return policy?"          "You mentioned wanting a refund"
  "I bought item #123"                  "The refund was processed on..."
  "It arrived damaged"                  
                          ↓
                   Merged + Deduplicated
                          ↓
               Final context for the LLM

Tuning parameters:

recentWindow: 4 — enough for immediate conversational flow
topK: 5 — retrieves the most relevant past messages
Increase topK for agents that need to recall many past details
Decrease recentWindow if you want to rely more on semantic relevance

No Memory (Stateless)

If no memory plugin is connected, the agent is stateless — each message is processed independently. This is fine for:

One-shot tasks (summarization, classification)
Workflows where the trigger always sends the full context
High-throughput scenarios where you want minimal overhead

Common Gotchas

Token limits: Memory content counts toward the LLM's context window. A window of 50 messages with long responses can easily exceed 128K tokens.
Memory + structured output: If using outputSchema, memory messages still use the LLM's context but don't affect the output schema validation.
Clearing memory: To reset memory for a session, use a new session key value. There is no dedicated API to clear memory — changing the key effectively starts a fresh conversation.
Cross-workflow memory: Memory is scoped to the workflow. Two different workflows with the same session key have separate memory stores (unless both use the same Qdrant collection).