Build a RAG System

This guide walks you through building a retrieval-augmented generation (RAG) system with Tensorify. By the end, you'll have a deployed AI agent that answers questions using your own documents.

Uses: API Endpoint, AI Agent, Qdrant Memory, HTTP Request, Code, Return Time: ~20 minutes

What is RAG?

Retrieval-Augmented Generation combines vector search with LLM generation. When a user asks a question, the system:

Searches a vector database for relevant documents
Injects those documents into the LLM's context
Generates an answer grounded in the retrieved content

Tensorify's Qdrant Memory plugin handles steps 1-2 automatically via hybrid retrieval (recent messages + semantic search).

Prerequisites

A Tensorify workspace
An OpenAI API key (for the LLM and embeddings)
A Qdrant instance — Qdrant Cloud (free tier available) or self-hosted via Docker:

docker run -p 6333:6333 qdrant/qdrant

Architecture

You'll build two workflows:

Ingestion Workflow: Receives documents via webhook, chunks them, and upserts embeddings into Qdrant
Query Workflow: Receives user questions via an OpenAI-compatible endpoint, retrieves relevant context from Qdrant, and generates answers

Step 1: Set Up Environment Variables

Go to Settings > Environment Variables and add:

OPENAI_API_KEY — your OpenAI API key (used for embeddings and the agent's LLM)
QDRANT_API_KEY — your Qdrant Cloud API key (skip if self-hosting without auth)

Step 2: Build the Query Workflow

This is the main workflow users interact with.

Add an API Endpoint trigger:
- Path: /v1
- Protocol: OpenAI Chat
- Allowed Methods: GET, POST
- Auth: bearer-token with a secret of your choice
Add an AI Agent node. Connect the trigger's POST handle to message.
- Provider: OpenAI
- Model: gpt-4o
- System Prompt: You are a helpful assistant. Answer questions based on the context provided by your memory. If the context doesn't contain the answer, say so.
- Streaming: true
Add a Qdrant Memory node. Connect it to the agent's Memory handle.
- Qdrant URL: your Qdrant instance URL (e.g. https://your-cluster.qdrant.io)
- Collection Name: my-documents
- Embedding Model: text-embedding-3-small
- Top K: 5 (retrieves the 5 most relevant chunks)
- Session Key: session:{{ "{{ api_request.headers["x-tensorify-session-id"] }}" }}
Add a Return node. Connect the agent's response to it.
Deploy the workflow.

Step 3: Build the Ingestion Workflow

This workflow receives documents and stores them in Qdrant.

Add a Webhook Trigger with a test payload:

{
  "text": "Tensorify is a visual backend builder...",
  "metadata": { "source": "docs", "page": 1 }
}

Add a Code node connected to the trigger. This chunks the document:

import os

text = input["body"]["text"]
metadata = input["body"].get("metadata", {})
chunk_size = 500
overlap = 50

chunks = []
for i in range(0, len(text), chunk_size - overlap):
    chunk = text[i:i + chunk_size]
    if chunk.strip():
        chunks.append({
            "text": chunk,
            "metadata": {**metadata, "chunk_index": len(chunks)}
        })

output = {"chunks": chunks}

Add another Code node to generate embeddings and upsert to Qdrant:

import os, json
from urllib.request import Request, urlopen

openai_key = os.environ["OPENAI_API_KEY"]
qdrant_url = os.environ.get("QDRANT_URL", "http://localhost:6333")
qdrant_key = os.environ.get("QDRANT_API_KEY", "")

chunks = input["chunks"]
texts = [c["text"] for c in chunks]

# Generate embeddings via OpenAI
req = Request("https://api.openai.com/v1/embeddings", method="POST",
    headers={"Authorization": f"Bearer {openai_key}", "Content-Type": "application/json"},
    data=json.dumps({"input": texts, "model": "text-embedding-3-small"}).encode())
resp = json.loads(urlopen(req).read())
embeddings = [d["embedding"] for d in resp["data"]]

# Upsert to Qdrant
import uuid
points = [{"id": str(uuid.uuid4()), "vector": emb, "payload": {"text": c["text"], **c["metadata"]}}
          for emb, c in zip(embeddings, chunks)]

req = Request(f"{qdrant_url}/collections/my-documents/points", method="PUT",
    headers={"Content-Type": "application/json", **({"api-key": qdrant_key} if qdrant_key else {})},
    data=json.dumps({"points": points}).encode())
urlopen(req)

output = {"upserted": len(points)}

Add a Return node to confirm ingestion.
Deploy and send documents to the webhook URL.

Step 4: Test with the Playground

Open the Query Workflow's canvas and click the Playground tab. Chat mode activates automatically because the protocol is openai-chat.

Ask questions about your ingested documents. The Qdrant Memory plugin automatically performs semantic search on each message, injecting relevant chunks into the agent's context.

Step 5: Connect from Your App

Use any OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://triggers.tensorify.io/h/YOUR_HOOK_ID/v1",
    api_key="your-bearer-token"
)

response = client.chat.completions.create(
    model="rag-agent",
    messages=[{"role": "user", "content": "What is Tensorify?"}],
    stream=True
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Alternative: Explicit RAG Pipeline

For more control over the retrieval step, skip the Qdrant Memory plugin and build an explicit pipeline:

API Trigger → Code (preprocess query) → HTTP Request (search Qdrant API) → Transform (format context) → AI Agent → Return

This gives you full control over query preprocessing, filtering, re-ranking, and context formatting.