Build a RAG System
This guide walks you through building a retrieval-augmented generation (RAG) system with Tensorify. By the end, you'll have a deployed AI agent that answers questions using your own documents.
Uses: API Endpoint, AI Agent, Qdrant Memory, HTTP Request, Code, Return Time: ~20 minutes
Retrieval-Augmented Generation combines vector search with LLM generation. When a user asks a question, the system:
- Searches a vector database for relevant documents
- Injects those documents into the LLM's context
- Generates an answer grounded in the retrieved content
Tensorify's Qdrant Memory plugin handles steps 1-2 automatically via hybrid retrieval (recent messages + semantic search).
- A Tensorify workspace
- An OpenAI API key (for the LLM and embeddings)
- A Qdrant instance — Qdrant Cloud (free tier available) or self-hosted via Docker:
docker run -p 6333:6333 qdrant/qdrant
You'll build two workflows:
- Ingestion Workflow: Receives documents via webhook, chunks them, and upserts embeddings into Qdrant
- Query Workflow: Receives user questions via an OpenAI-compatible endpoint, retrieves relevant context from Qdrant, and generates answers
Go to Settings > Environment Variables and add:
OPENAI_API_KEY— your OpenAI API key (used for embeddings and the agent's LLM)QDRANT_API_KEY— your Qdrant Cloud API key (skip if self-hosting without auth)
This is the main workflow users interact with.
-
Add an API Endpoint trigger:
- Path:
/v1 - Protocol:
OpenAI Chat - Allowed Methods:
GET, POST - Auth:
bearer-tokenwith a secret of your choice
- Path:
-
Add an AI Agent node. Connect the trigger's
POSThandle tomessage.- Provider: OpenAI
- Model:
gpt-4o - System Prompt:
You are a helpful assistant. Answer questions based on the context provided by your memory. If the context doesn't contain the answer, say so. - Streaming:
true
-
Add a Qdrant Memory node. Connect it to the agent's
Memoryhandle.- Qdrant URL: your Qdrant instance URL (e.g.
https://your-cluster.qdrant.io) - Collection Name:
my-documents - Embedding Model:
text-embedding-3-small - Top K:
5(retrieves the 5 most relevant chunks) - Session Key:
session:{{ "{{ api_request.headers["x-tensorify-session-id"] }}" }}
- Qdrant URL: your Qdrant instance URL (e.g.
-
Add a Return node. Connect the agent's
responseto it. -
Deploy the workflow.
This workflow receives documents and stores them in Qdrant.
- Add a Webhook Trigger with a test payload:
{
"text": "Tensorify is a visual backend builder...",
"metadata": { "source": "docs", "page": 1 }
}
- Add a Code node connected to the trigger. This chunks the document:
import os
text = input["body"]["text"]
metadata = input["body"].get("metadata", {})
chunk_size = 500
overlap = 50
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunk = text[i:i + chunk_size]
if chunk.strip():
chunks.append({
"text": chunk,
"metadata": {**metadata, "chunk_index": len(chunks)}
})
output = {"chunks": chunks}
- Add another Code node to generate embeddings and upsert to Qdrant:
import os, json
from urllib.request import Request, urlopen
openai_key = os.environ["OPENAI_API_KEY"]
qdrant_url = os.environ.get("QDRANT_URL", "http://localhost:6333")
qdrant_key = os.environ.get("QDRANT_API_KEY", "")
chunks = input["chunks"]
texts = [c["text"] for c in chunks]
# Generate embeddings via OpenAI
req = Request("https://api.openai.com/v1/embeddings", method="POST",
headers={"Authorization": f"Bearer {openai_key}", "Content-Type": "application/json"},
data=json.dumps({"input": texts, "model": "text-embedding-3-small"}).encode())
resp = json.loads(urlopen(req).read())
embeddings = [d["embedding"] for d in resp["data"]]
# Upsert to Qdrant
import uuid
points = [{"id": str(uuid.uuid4()), "vector": emb, "payload": {"text": c["text"], **c["metadata"]}}
for emb, c in zip(embeddings, chunks)]
req = Request(f"{qdrant_url}/collections/my-documents/points", method="PUT",
headers={"Content-Type": "application/json", **({"api-key": qdrant_key} if qdrant_key else {})},
data=json.dumps({"points": points}).encode())
urlopen(req)
output = {"upserted": len(points)}
-
Add a Return node to confirm ingestion.
-
Deploy and send documents to the webhook URL.
Open the Query Workflow's canvas and click the Playground tab. Chat mode activates automatically because the protocol is openai-chat.
Ask questions about your ingested documents. The Qdrant Memory plugin automatically performs semantic search on each message, injecting relevant chunks into the agent's context.
Use any OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="https://triggers.tensorify.io/h/YOUR_HOOK_ID/v1",
api_key="your-bearer-token"
)
response = client.chat.completions.create(
model="rag-agent",
messages=[{"role": "user", "content": "What is Tensorify?"}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
For more control over the retrieval step, skip the Qdrant Memory plugin and build an explicit pipeline:
API Trigger → Code (preprocess query) → HTTP Request (search Qdrant API) → Transform (format context) → AI Agent → Return
This gives you full control over query preprocessing, filtering, re-ranking, and context formatting.
- Qdrant Memory Reference — vector store memory settings
- AI Agent Reference — agent configuration
- Agent Memory Guide — choosing and configuring memory
- Playground Guide — testing your RAG system
