Build a RAG System

This guide walks you through building a retrieval-augmented generation (RAG) system with Tensorify. By the end, you'll have a deployed AI agent that answers questions using your own documents.

Uses: API Endpoint, AI Agent, Qdrant Memory, HTTP Request, Code, Return Time: ~20 minutes


What is RAG?

Retrieval-Augmented Generation combines vector search with LLM generation. When a user asks a question, the system:

  1. Searches a vector database for relevant documents
  2. Injects those documents into the LLM's context
  3. Generates an answer grounded in the retrieved content

Tensorify's Qdrant Memory plugin handles steps 1-2 automatically via hybrid retrieval (recent messages + semantic search).

Prerequisites

  • A Tensorify workspace
  • An OpenAI API key (for the LLM and embeddings)
  • A Qdrant instance — Qdrant Cloud (free tier available) or self-hosted via Docker:
docker run -p 6333:6333 qdrant/qdrant

Architecture

You'll build two workflows:

  1. Ingestion Workflow: Receives documents via webhook, chunks them, and upserts embeddings into Qdrant
  2. Query Workflow: Receives user questions via an OpenAI-compatible endpoint, retrieves relevant context from Qdrant, and generates answers

Step 1: Set Up Environment Variables

Go to Settings > Environment Variables and add:

  • OPENAI_API_KEY — your OpenAI API key (used for embeddings and the agent's LLM)
  • QDRANT_API_KEY — your Qdrant Cloud API key (skip if self-hosting without auth)

Step 2: Build the Query Workflow

This is the main workflow users interact with.

  1. Add an API Endpoint trigger:

    • Path: /v1
    • Protocol: OpenAI Chat
    • Allowed Methods: GET, POST
    • Auth: bearer-token with a secret of your choice
  2. Add an AI Agent node. Connect the trigger's POST handle to message.

    • Provider: OpenAI
    • Model: gpt-4o
    • System Prompt: You are a helpful assistant. Answer questions based on the context provided by your memory. If the context doesn't contain the answer, say so.
    • Streaming: true
  3. Add a Qdrant Memory node. Connect it to the agent's Memory handle.

    • Qdrant URL: your Qdrant instance URL (e.g. https://your-cluster.qdrant.io)
    • Collection Name: my-documents
    • Embedding Model: text-embedding-3-small
    • Top K: 5 (retrieves the 5 most relevant chunks)
    • Session Key: session:{{ "{{ api_request.headers["x-tensorify-session-id"] }}" }}
  4. Add a Return node. Connect the agent's response to it.

  5. Deploy the workflow.

Step 3: Build the Ingestion Workflow

This workflow receives documents and stores them in Qdrant.

  1. Add a Webhook Trigger with a test payload:
{
  "text": "Tensorify is a visual backend builder...",
  "metadata": { "source": "docs", "page": 1 }
}
  1. Add a Code node connected to the trigger. This chunks the document:
import os

text = input["body"]["text"]
metadata = input["body"].get("metadata", {})
chunk_size = 500
overlap = 50

chunks = []
for i in range(0, len(text), chunk_size - overlap):
    chunk = text[i:i + chunk_size]
    if chunk.strip():
        chunks.append({
            "text": chunk,
            "metadata": {**metadata, "chunk_index": len(chunks)}
        })

output = {"chunks": chunks}
  1. Add another Code node to generate embeddings and upsert to Qdrant:
import os, json
from urllib.request import Request, urlopen

openai_key = os.environ["OPENAI_API_KEY"]
qdrant_url = os.environ.get("QDRANT_URL", "http://localhost:6333")
qdrant_key = os.environ.get("QDRANT_API_KEY", "")

chunks = input["chunks"]
texts = [c["text"] for c in chunks]

# Generate embeddings via OpenAI
req = Request("https://api.openai.com/v1/embeddings", method="POST",
    headers={"Authorization": f"Bearer {openai_key}", "Content-Type": "application/json"},
    data=json.dumps({"input": texts, "model": "text-embedding-3-small"}).encode())
resp = json.loads(urlopen(req).read())
embeddings = [d["embedding"] for d in resp["data"]]

# Upsert to Qdrant
import uuid
points = [{"id": str(uuid.uuid4()), "vector": emb, "payload": {"text": c["text"], **c["metadata"]}}
          for emb, c in zip(embeddings, chunks)]

req = Request(f"{qdrant_url}/collections/my-documents/points", method="PUT",
    headers={"Content-Type": "application/json", **({"api-key": qdrant_key} if qdrant_key else {})},
    data=json.dumps({"points": points}).encode())
urlopen(req)

output = {"upserted": len(points)}
  1. Add a Return node to confirm ingestion.

  2. Deploy and send documents to the webhook URL.

Step 4: Test with the Playground

Open the Query Workflow's canvas and click the Playground tab. Chat mode activates automatically because the protocol is openai-chat.

Ask questions about your ingested documents. The Qdrant Memory plugin automatically performs semantic search on each message, injecting relevant chunks into the agent's context.

Step 5: Connect from Your App

Use any OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://triggers.tensorify.io/h/YOUR_HOOK_ID/v1",
    api_key="your-bearer-token"
)

response = client.chat.completions.create(
    model="rag-agent",
    messages=[{"role": "user", "content": "What is Tensorify?"}],
    stream=True
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Alternative: Explicit RAG Pipeline

For more control over the retrieval step, skip the Qdrant Memory plugin and build an explicit pipeline:

API Trigger → Code (preprocess query) → HTTP Request (search Qdrant API) → Transform (format context) → AI Agent → Return

This gives you full control over query preprocessing, filtering, re-ranking, and context formatting.

See Also

On this page