Streaming AI Responses
Tensorify AI agents emit real-time, token-by-token responses using Server-Sent Events (SSE) in the OpenAI chat.completion.chunk format. Streaming works on both managed cloud deployments and self-hosted CLI runner deployments — your client receives tokens as the agent generates them, not after the full response completes.
Streaming requires two settings:
- AI Agent node — Set
streaming: truein the AI Agent node settings in the workflow editor. - Request signal — Tell the endpoint you want a stream by either:
- Sending the header
Accept: text/event-stream, or - Including
"stream": truein the request body.
- Sending the header
Both are recommended for maximum compatibility with OpenAI SDKs and custom clients.
Tensorify streams chunks in the standard OpenAI chat completion chunk shape:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1719000000,"model":"tensorify","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
A typical stream follows this sequence:
- Role delta — First chunk may include
"delta": {"role": "assistant"}. - Content deltas — Subsequent chunks carry token text in
"delta": {"content": "..."}. - Finish — Final chunk sets
"finish_reason": "stop". - Done — Stream ends with
data: [DONE].
During tool execution, the connection stays open with keepalive comments (: keepalive) so proxies do not drop idle connections.
const response = await fetch("https://triggers.tensorify.io/h/whk_.../agent", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Accept": "text/event-stream",
},
body: JSON.stringify({
body: { message: "What is quantum computing?" },
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
for (const line of lines) {
if (!line.startsWith("data: ") || line === "data: [DONE]") continue;
const chunk = JSON.parse(line.slice(6));
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content); // or append to DOM
}
}
The Python SDK path uses an API Trigger with the openai-chat protocol, which exposes /v1/chat/completions natively.
from openai import OpenAI
client = OpenAI(
base_url="https://triggers.tensorify.io/h/whk_.../agent/v1",
api_key="not-needed", # or your API key if auth is configured
)
stream = client.chat.completions.create(
model="tensorify",
messages=[{"role": "user", "content": "Explain SSE streaming"}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://triggers.tensorify.io/h/whk_.../agent/v1",
apiKey: "not-needed",
});
const stream = await client.chat.completions.create({
model: "tensorify",
messages: [{ role: "user", content: "Explain SSE streaming" }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
import { useChat } from "ai/react";
export function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: "https://triggers.tensorify.io/h/whk_.../agent",
body: { stream: true },
headers: { Accept: "text/event-stream" },
});
return (
<div>
{messages.map((m) => (
<div key={m.id}>{m.role}: {m.content}</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
</form>
</div>
);
}
curl -N --no-buffer \
-X POST https://triggers.tensorify.io/h/whk_.../agent \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"body":{"message":"Hello"},"stream":true}'
Use -N and --no-buffer so curl prints each SSE line as it arrives instead of waiting for the full response.
Ensure streaming: true is set in the AI Agent node settings. Without it, the agent returns a complete response in a single JSON payload even when the client requests a stream.
Some proxies buffer SSE responses and deliver them in large batches. Tensorify sets X-Accel-Buffering: no automatically. If you run nginx in front of your endpoint, add proxy_buffering off; to the relevant location block.
Tensorify's hooks service allows all origins for trigger endpoints. If you route requests through a custom proxy, ensure CORS headers (Access-Control-Allow-Origin, etc.) are forwarded or set correctly.
Keepalive comments are sent every 15 seconds during long-running tool calls. If your proxy or load balancer has a shorter idle timeout, increase it to at least 60 seconds so the stream stays open until the agent finishes.
