Why max_tokens controls your worst-case cost

Understanding how the strict spend reservation works.

HiWay never lets you overdraft. Before forwarding a request to the provider, we compute a worst-case cost and atomically reserve it from your balance. If your balance can't cover the worst case, the request is rejected with 402 Insufficient Credits — the provider is never called.

The formula

text
worst_case = (input_chars / 4 / 1_000_000) * price_in
           + (max_tokens / 1_000_000) * price_out

Input cost is bounded by what you sent (we count characters, divide by ~4 chars/token). Output cost is bounded by max_tokens — the provider will never emit more tokens than that. The sum is your worst-case ceiling.

Why tuning max_tokens matters

Default SDKs send max_tokens: 4096 or higher. For simple classification or summarization, that's 8× more than you'll actually use — and it inflates the reservation. Set max_tokens to a realistic ceiling and you unlock more budget headroom.

Over-reservations are refunded

After the stream ends, HiWay sees the actual token count and refunds the difference back to your balance. The reservation is a hold, not a charge. You only pay for what you actually consumed.

Example

Scenariomax_tokensReservedActualRefund
Classification50$0.0003$0.0002$0.0001
Short summary500$0.0025$0.0011$0.0014
Long report4096$0.0205$0.0180$0.0025