Why max_tokens controls your worst-case cost
Understanding how the strict spend reservation works.
HiWay never lets you overdraft. Before forwarding a request to the provider, we compute a worst-case cost and atomically reserve it from your balance. If your balance can't cover the worst case, the request is rejected with 402 Insufficient Credits — the provider is never called.
The formula
worst_case = (input_chars / 4 / 1_000_000) * price_in
+ (max_tokens / 1_000_000) * price_outInput cost is bounded by what you sent (we count characters, divide by ~4 chars/token). Output cost is bounded by max_tokens — the provider will never emit more tokens than that. The sum is your worst-case ceiling.
Why tuning max_tokens matters
Default SDKs send max_tokens: 4096 or higher. For simple classification or summarization, that's 8× more than you'll actually use — and it inflates the reservation. Set max_tokens to a realistic ceiling and you unlock more budget headroom.
Over-reservations are refunded
After the stream ends, HiWay sees the actual token count and refunds the difference back to your balance. The reservation is a hold, not a charge. You only pay for what you actually consumed.
Example
| Scenario | max_tokens | Reserved | Actual | Refund |
|---|---|---|---|---|
| Classification | 50 | $0.0003 | $0.0002 | $0.0001 |
| Short summary | 500 | $0.0025 | $0.0011 | $0.0014 |
| Long report | 4096 | $0.0205 | $0.0180 | $0.0025 |