Agent Memory

AI agents in Tensorify can maintain conversation context across messages using memory plugins. This guide explains when to use each memory type, how to configure session keys, and patterns for production deployments.


Memory Types

Tensorify provides two memory plugins:

Memory TypeBest ForStorageRecall Method
Window MemoryShort conversations, prototypingWorkflow state (SQLite)Last N messages (chronological)
Qdrant MemoryLong-term recall, support agentsQdrant vector databaseRecent + semantic search (hybrid)

When to Use Each

Use Window Memory when:

  • You're building a prototype or simple chatbot
  • Conversations are short (under ~20 messages)
  • You don't need to recall information from past conversations
  • You want zero external dependencies

Use Qdrant Memory when:

  • The agent needs to recall information from days or weeks ago
  • Users return for multiple sessions and expect the agent to remember past context
  • You need semantic search ("what did the user say about their billing issue?")
  • You're building a production support agent or knowledge worker

Connecting Memory to an Agent

  1. Add a memory node (Window Memory or Qdrant Memory) to the canvas
  2. Connect it to the AI Agent's Memory handle — a dashed purple edge appears
  3. Configure the memory settings (window size, session key, etc.)

Only one memory provider can be connected to an agent at a time.

Session Keys

Session keys isolate conversations between different users or threads. Without a session key, all messages share a single memory buffer.

Common Patterns

PatternSession KeyUse Case
Per-useruser:{{ api_request.body.user_id }}Each user has their own conversation history
Per-threadthread:{{ api_request.body.thread_id }}Multiple conversations per user (like Slack threads)
Per-sessionsession:{{ webhook.headers.x-session-id }}Ephemeral sessions that can be reset
Global (shared)globalAll users share the same memory (rare)

When using the OpenAI Chat protocol on the API trigger, the session ID from the X-Tensorify-Session-Id header is available as api_request.headers["x-tensorify-session-id"]. The Playground Chat mode sends this automatically.

Always use session keys in production. Without them, User A's messages become part of User B's context — a privacy issue and a confusing experience.

Session Key on Agent vs Memory

Both the AI Agent and memory plugins have a sessionKey setting. If set on the Agent, it's passed to the memory provider automatically. If set on both, the memory plugin's key takes precedence.

Recommendation: set sessionKey on the memory plugin for clarity.

Default Session Key

When no sessionKey is set on either the agent or the memory plugin, the agent falls back to agent_memory:{workflow_id}. This means all users share a single conversation buffer — always set a session key in production.

Window Memory Deep Dive

Window Memory keeps a sliding window of the most recent messages. When the window exceeds windowSize, the oldest messages are dropped.

How it maps to LLM context:

[system prompt] + [last N messages from memory] + [current user message]
         ↓                    ↓                           ↓
   Fixed context     Window Memory output        New input

Choosing window size:

  • 5–10: Very short conversations, saves tokens
  • 15–20: Good default for most chatbots
  • 30–50: Long technical discussions, but watch token limits
  • 50+: Risk exceeding context window — use Qdrant Memory instead

Persistence modes:

  • persistent: true — memory saved to workflow state, survives restarts
  • persistent: false — in-memory only, resets on restart (useful for testing)

Qdrant Memory Deep Dive

Qdrant Memory uses a hybrid strategy combining recent messages with semantic search. This means the agent always has both immediate context and relevant historical context.

How hybrid recall works:

Current message: "What was the refund policy we discussed?"
                          ↓
    ┌─────────────────────┴─────────────────────┐
    │                                           │
    ▼                                           ▼
Recent Window (last 4)                   Semantic Search (top 5)
  "Hi, I need help"                     "Our refund policy is 30 days"
  "What's your return policy?"          "You mentioned wanting a refund"
  "I bought item #123"                  "The refund was processed on..."
  "It arrived damaged"                  
                          ↓
                   Merged + Deduplicated
                          ↓
               Final context for the LLM

Tuning parameters:

  • recentWindow: 4 — enough for immediate conversational flow
  • topK: 5 — retrieves the most relevant past messages
  • Increase topK for agents that need to recall many past details
  • Decrease recentWindow if you want to rely more on semantic relevance

No Memory (Stateless)

If no memory plugin is connected, the agent is stateless — each message is processed independently. This is fine for:

  • One-shot tasks (summarization, classification)
  • Workflows where the trigger always sends the full context
  • High-throughput scenarios where you want minimal overhead

Common Gotchas

  • Token limits: Memory content counts toward the LLM's context window. A window of 50 messages with long responses can easily exceed 128K tokens.
  • Memory + structured output: If using outputSchema, memory messages still use the LLM's context but don't affect the output schema validation.
  • Clearing memory: To reset memory for a session, use a new session key value. There is no dedicated API to clear memory — changing the key effectively starts a fresh conversation.
  • Cross-workflow memory: Memory is scoped to the workflow. Two different workflows with the same session key have separate memory stores (unless both use the same Qdrant collection).

See Also

On this page