Streaming responses

How HiWay forwards Server-Sent Events end-to-end.

HiWay supports the full SSE streaming protocol natively. Set stream: true in your request and you'll get the standard data: {...}\n\n chunks, one token at a time, exactly as if you were talking to the provider directly.

Latency impact

First-token latency is provider latency + ~5 ms of routing. We don't buffer the stream, don't rewrite chunks, don't add a JSON wrapper. Your client sees the same SSE events the provider would have sent.

Tool calls in streams

Tool/function call deltas stream through unchanged. Whatever the provider does with tool_calls inside a streaming chunk, HiWay forwards it as-is - your OpenAI-compatible client parses them without any adapter.

Client-side example

python
from openai import OpenAI

client = OpenAI(base_url="https://app.hiway2llm.com/v1", api_key="hw_live_YOUR_KEY")

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Write a haiku about routers"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)