Streaming AI Responses

Tensorify AI agents emit real-time, token-by-token responses using Server-Sent Events (SSE) in the OpenAI chat.completion.chunk format. Streaming works on both managed cloud deployments and self-hosted CLI runner deployments — your client receives tokens as the agent generates them, not after the full response completes.

How to Enable Streaming

Streaming requires two settings:

AI Agent node — Set streaming: true in the AI Agent node settings in the workflow editor.
Request signal — Tell the endpoint you want a stream by either:
- Sending the header Accept: text/event-stream, or
- Including "stream": true in the request body.

Both are recommended for maximum compatibility with OpenAI SDKs and custom clients.

SSE Wire Format

Tensorify streams chunks in the standard OpenAI chat completion chunk shape:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1719000000,"model":"tensorify","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

A typical stream follows this sequence:

Role delta — First chunk may include "delta": {"role": "assistant"}.
Content deltas — Subsequent chunks carry token text in "delta": {"content": "..."}.
Finish — Final chunk sets "finish_reason": "stop".
Done — Stream ends with data: [DONE].

During tool execution, the connection stays open with keepalive comments (: keepalive) so proxies do not drop idle connections.

Browser JavaScript

const response = await fetch("https://triggers.tensorify.io/h/whk_.../agent", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
  },
  body: JSON.stringify({
    body: { message: "What is quantum computing?" },
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop() || "";

  for (const line of lines) {
    if (!line.startsWith("data: ") || line === "data: [DONE]") continue;
    const chunk = JSON.parse(line.slice(6));
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) process.stdout.write(content); // or append to DOM
  }
}

OpenAI Python SDK

The Python SDK path uses an API Trigger with the openai-chat protocol, which exposes /v1/chat/completions natively.

from openai import OpenAI

client = OpenAI(
    base_url="https://triggers.tensorify.io/h/whk_.../agent/v1",
    api_key="not-needed",  # or your API key if auth is configured
)

stream = client.chat.completions.create(
    model="tensorify",
    messages=[{"role": "user", "content": "Explain SSE streaming"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

OpenAI Node.js SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://triggers.tensorify.io/h/whk_.../agent/v1",
  apiKey: "not-needed",
});

const stream = await client.chat.completions.create({
  model: "tensorify",
  messages: [{ role: "user", content: "Explain SSE streaming" }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Vercel AI SDK

import { useChat } from "ai/react";

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: "https://triggers.tensorify.io/h/whk_.../agent",
    body: { stream: true },
    headers: { Accept: "text/event-stream" },
  });

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  );
}

curl

curl -N --no-buffer \
  -X POST https://triggers.tensorify.io/h/whk_.../agent \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"body":{"message":"Hello"},"stream":true}'

Use -N and --no-buffer so curl prints each SSE line as it arrives instead of waiting for the full response.

Troubleshooting

No streaming output

Ensure streaming: true is set in the AI Agent node settings. Without it, the agent returns a complete response in a single JSON payload even when the client requests a stream.

Buffered output

Some proxies buffer SSE responses and deliver them in large batches. Tensorify sets X-Accel-Buffering: no automatically. If you run nginx in front of your endpoint, add proxy_buffering off; to the relevant location block.

CORS errors

Tensorify's hooks service allows all origins for trigger endpoints. If you route requests through a custom proxy, ensure CORS headers (Access-Control-Allow-Origin, etc.) are forwarded or set correctly.

Connection drops

Keepalive comments are sent every 15 seconds during long-running tool calls. If your proxy or load balancer has a shorter idle timeout, increase it to at least 60 seconds so the stream stays open until the agent finishes.