Router bypass (per-key direct routing)
Skip CORTEX routing for explicit model requests. Forward straight to the provider. Standard markup still applies.
By default, every request goes through HiWay's CORTEX routing pipeline: smart model selection, fallback chains, A/B experiments. Router bypass is a per-key toggle that short-circuits all that when the client sends an explicit qualified model (e.g. anthropic/claude-sonnet-4-6). The request goes directly to the matching provider, no auto-downgrade, no fallback to a different model, no A/B assignment.
When to use it
Production traffic where you want deterministic model selection, paying customers, regulated workflows, anything that must run on a specific model. Or migrations where you want byte-for-byte parity with what a direct provider call would do.
What changes when bypass = true
- Requests with an explicit qualified model (
provider/model-name) skip CORTEX entirely. - No fallback, if the model is unreachable, you get the upstream error (5xx, rate-limit) directly.
- No auto-downgrade to a cheaper tier on cost-sensitive workloads.
- No A/B experiment assignment, even if the key is enrolled.
- Markup still applies at the normal rate, bypass is about routing behavior, not pricing.
- Requests with
model: "auto"continue to route normally (bypass is ignored for those).
What stays on
Auth, budget control, rate limiting, Security Shield, semantic cache, Anthropic prompt cache injection, observability, all still run. Bypass affects model selection, not the gateway features.
Enable it
curl -X PATCH https://app.hiway2llm.com/api/v1/workspaces/WS_ID/keys/KEY_ID \
-H "Authorization: Bearer hw_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"router_bypass": true}'Verify it's active
The response envelope shows the routing decision. With bypass enabled and an explicit model:
{
"_hiway": {
"routed_model": "anthropic/claude-sonnet-4-6",
"routed_tier": "bypass",
"fallback_chain": []
}
}You give up the savings
CORTEX's smart routing typically saves 10-40% on mixed workloads by picking the cheapest capable model per request. Bypass disables this. Use it where determinism matters more than cost.