How smart routing works

Seven analyzers, sub-millisecond decision, no LLM call.

Most LLM proxies route based on the model name you ask for. HiWay routes based on what your prompt actually needs. The decision is made by a deterministic CPU-only scoring engine in less than a millisecond - no second LLM call, no model in the loop, no surprise latency.

The seven analyzers

Each request runs through seven independent analyzers, each producing a score between 0 and 1. The weighted sum gives a complexity score, which maps to a tier (light, standard, heavy).

  • Intent - greeting, simple question, confirmation, action request, expert query…
  • Complexity - number of constraints, structured output requirements, multi-step instructions
  • Tools - presence of function/tool definitions and how many
  • System prompt - length and density (a 4 K-token system prompt usually means an agent context)
  • Code presence - does the user attach code blocks, mention file paths, talk about debugging?
  • Domain - finance, medical, legal, security, etc. → bumps to a higher tier
  • Conversation context - total tokens already exchanged, depth of the conversation

From score to tier

Four tiers, in ascending capability/cost order: ultra-light → light → standard → heavy. The ultra-light tier (added 2026-05-03) catches single-shot trivial tasks - classification, sentiment scoring, JSON extraction, keyword pulls, parsing - where a 10x-cheaper model gives the same answer. Anything multi-turn, anything with a complexity cue (reasoning, code, tools), and anything with > 600-char prompt or > 500 tokens of context skips ultra-light and lands at least on light.

Tier cheatsheet

TierUse caseReference modelCost / M tokens (in)
Ultra-lightClassification, sentiment, parsing, JSON extractGemini 2.5 Flash Lite$0.10
LightConversational simple, multi-turn chatClaude Haiku 4.5$1.00
StandardProduction workhorse, summarization, RAGClaude Sonnet 4.6$3.00
HeavyComplex reasoning, code generation, agentsClaude Opus 4.7$5.00

From tier to provider

Once HiWay knows the tier, it picks the actual model based on your settings: which providers you enabled in Settings → Providers and your routing profile (Budget / Balanced / Quality). The resolver looks at every model that matches the tier across your enabled providers and picks one according to the strategy.

Routing profile cheatsheet

ProfileSelection strategyTypical winner for the light tier
BudgetCheapest input priceMistral Small ($0.10/M)
BalancedBest quality / dollar ratioClaude Haiku 4.5 ($0.80/M, quality 60)
QualityHighest quality_scoreClaude Haiku 4.5

Override anything

Set model to a fully-qualified id (e.g. "anthropic/claude-haiku-4-5", "openai/gpt-4o") to bypass scoring and pin the request to that model. Use "auto" to let the router decide.

Why ultra-light matters

A typical agent codebase has ~30-40% of its calls going to dispatch / classification / one-line transforms ("is this prompt safe?", "extract the author from this email", "label this support ticket"). Routing those to Gemini Flash Lite at $0.10/M instead of Haiku at $1/M is a flat 10x cost reduction on that slice - adds 3-5 percentage points of overall savings on top of the multi-tier router. The narrow gate (no complexity cues, short prompt, single turn) keeps quality intact.