Self-Host an AI Agent on Your Machine
Deploy an AI agent that runs entirely on your own machine. Your data never leaves your infrastructure, you can use local LLMs (zero API costs), and the agent can access local files, databases, and services.
Uses: API Trigger, AI Agent, Window Memory, Return, CLI Runner Time: ~15 minutes
- Privacy — Data stays on your machine. No third-party cloud processing.
- Cost — Use Ollama with open-source models like Llama 3.2 for zero API costs.
- Access — The agent can read local files, connect to
localhostdatabases, and call internal services. - Control — Update models, prompts, and tools without redeploying infrastructure.
- A Tensorify workspace (sign up free)
- A machine to run the agent (your laptop, a VPS, or a server)
- (Optional) Ollama installed for local LLMs
The fastest way to start is with the OpenAI Chatbot template:
- Go to Templates → AI Agents → OpenAI Chatbot
- Click Use template
This creates a workflow with:
- API Trigger (openai-chat protocol) — exposes your agent as an OpenAI-compatible endpoint
- AI Agent — processes messages with an LLM
- Return — sends the response back
You can also build this from scratch: drag an API Trigger, AI Agent, and Return node onto the canvas and wire them together.
In the AI Agent settings:
- Provider:
openai(orcustomfor local LLMs — see Step 6) - Model:
gpt-4o(or any OpenAI model) - System Prompt: Customize for your use case
- Streaming: Enable for real-time token delivery
Go to Settings → Environment Variables and add:
OPENAI_API_KEY— your OpenAI API key (skip if using a local LLM)
On the machine where you want to run the agent:
curl -fsSL https://cli.tensorify.io/install | sh
Then initialize the runner:
tensorify init
This creates ~/.tensorify/config.json with your runner configuration.
tensorify login
This opens a browser window to authenticate with your Tensorify account.
In the Tensorify dashboard:
- Open your workflow
- Click Deploy
- Set Execution Mode to CLI
- Choose your runner
- Click Deploy
Your workflow is now assigned to your CLI runner.
tensorify runner start
The runner connects to Tensorify via WebSocket and starts processing incoming requests. You'll see logs for each request in the terminal.
To run as a background service:
tensorify runner install
This installs the runner as a systemd service (Linux) or launchd agent (macOS) that starts automatically on boot.
Your agent is now accessible via the OpenAI SDK. The URL is shown in your deployment dashboard.
from openai import OpenAI
client = OpenAI(
base_url="https://triggers.tensorify.io/h/YOUR_HOOK_PATH",
api_key="your-tensorify-api-key",
)
response = client.chat.completions.create(
model="tensorify",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
curl -X POST https://triggers.tensorify.io/h/YOUR_HOOK_PATH/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "tensorify",
"messages": [{"role": "user", "content": "Hello!"}]
}'
To eliminate cloud API costs entirely, use Ollama:
- Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh - Pull a model:
ollama pull llama3.2 - In your AI Agent settings, change:
- Provider:
custom - Custom Base URL:
http://localhost:11434/v1 - Model:
llama3.2
- Provider:
- Remove
OPENAI_API_KEYfrom environment variables (no longer needed)
The agent now runs 100% locally — no data leaves your machine, and there are no API costs.
Tip: Use the Local LLM Agent template for a pre-configured workflow with Ollama settings and conversation memory.
Because the CLI runner executes on your machine, Code nodes can access local files:
import os
import json
files = os.listdir("/path/to/your/project")
content = open("/path/to/your/project/README.md").read()
result = {"files": files, "readme": content}
Add a Code node as a tool to your AI Agent, and the agent can read, analyze, and summarize files from your local filesystem.
- Process Local Files with an AI Agent — Build a codebase assistant
- Deploy as an OpenAI Endpoint — Expose your workflow as an OpenAI-compatible API
- Build a RAG System — Add vector search for document Q&A
- Agent Memory — Add conversation memory to your agent
