Streaming

Receive response tokens in real time via Server-Sent Events instead of waiting for the complete response. Use streaming when you want tokens to appear as they are generated, show progress for long responses, or reduce perceived latency in a chat-like interface. This guide covers how to enable streaming, parse the event stream, and handle the connection lifecycle.

Key concepts

  • Server-Sent Events (SSE): A standard for streaming data over HTTP. The server sends a sequence of data: lines, each containing a JSON event. The stream ends with a data: [DONE] signal.
  • Streaming mode: When stream is true, the API returns an SSE stream instead of a single JSON response. You must also enable streaming on the HTTP client side.

Prerequisites

  • You’ve already uploaded your content, and the asset has reached the ready status. See the Upload content page for details.
  • You’ve already created a knowledge store. See the Create a knowledge store page for details.
  • You’ve already added at least one asset to the knowledge store, and the item has reached the ready status. See the Add assets to a knowledge store page for details.
  • You’ve already read the Create a response page and understand the basic request and response format.

Enable streaming

To stream a response, set stream to true in the request body. You must also pass stream=True to the requests.post() call so the HTTP client reads the response incrementally rather than buffering the entire body.

1import requests
2
3HEADERS = {"x-api-key": "YOUR_API_KEY", "Content-Type": "application/json"}
4BASE_URL = "https://api.twelvelabs.io/v1.3"
5
6response = requests.post(
7 f"{BASE_URL}/responses",
8 headers=HEADERS,
9 json={
10 "model": "jockey1.0",
11 "stream": True,
12 "input": [
13 {"type": "message", "role": "user", "content": "Describe what happens in these videos"}
14 ],
15 "knowledge_store_id": "your_store_id"
16 },
17 stream=True
18)

Parse the event stream

The response is a sequence of data: lines. Each line contains a JSON object representing one event. The final line is data: [DONE], which signals the end of the stream.

1for line in response.iter_lines():
2 if line:
3 decoded = line.decode("utf-8")
4 if decoded.startswith("data: "):
5 data = decoded[6:] # Strip "data: " prefix
6 if data == "[DONE]":
7 break
8 import json
9 event = json.loads(data)
10 print(event)

A typical event looks like this:

1{"type": "response.output_text.delta", "delta": "The main"}

Events arrive incrementally - each delta contains a fragment of the generated text. Concatenate the deltas to build the full response.

Combine with other features

Streaming works with instructions, structured output, and multi-turn sessions. Set stream to true alongside any other parameters:

1response = requests.post(
2 f"{BASE_URL}/responses",
3 headers=HEADERS,
4 json={
5 "model": "jockey1.0",
6 "stream": True,
7 "session_id": session_id, # Multi-turn
8 "instructions": "You are a sports analyst.",
9 "input": [
10 {"type": "message", "role": "user", "content": "Summarize the key plays"}
11 ],
12 "knowledge_store_id": "your_store_id"
13 },
14 stream=True
15)

Common pitfalls

  • Set stream=True in both places. The JSON body tells the API to stream; the requests.post() parameter tells the HTTP client to read incrementally. Missing either one breaks streaming.
  • Handle the [DONE] signal. It marks the end of the stream. Without it, your code may hang waiting for more data.
  • No automatic reconnection. If the connection drops, start a new request. SSE reconnection is not built in for this endpoint.

Next steps

Jupyter notebook

Download the notebook to run this guide interactively.

API reference