How smart routing works

Seven analyzers, sub-millisecond decision, no LLM call.

Most LLM proxies route based on the model name you ask for. HiWay routes based on what your prompt actually needs. The decision is made by a deterministic CPU-only scoring engine in less than a millisecond - no second LLM call, no model in the loop, no surprise latency.

The seven analyzers

Each request runs through seven independent analyzers, each producing a score between 0 and 1. The weighted sum gives a complexity score, which maps to a tier (light, standard, heavy).

Intent - greeting, simple question, confirmation, action request, expert query…
Complexity - number of constraints, structured output requirements, multi-step instructions
Tools - presence of function/tool definitions and how many
System prompt - length and density (a 4 K-token system prompt usually means an agent context)
Code presence - does the user attach code blocks, mention file paths, talk about debugging?
Domain - finance, medical, legal, security, etc. → bumps to a higher tier
Conversation context - total tokens already exchanged, depth of the conversation

From score to tier

Four tiers, in ascending capability/cost order: ultra-light → light → standard → heavy. The ultra-light tier (added 2026-05-03) catches single-shot trivial tasks - classification, sentiment scoring, JSON extraction, keyword pulls, parsing - where a 10x-cheaper model gives the same answer. Anything multi-turn, anything with a complexity cue (reasoning, code, tools), and anything with > 600-char prompt or > 500 tokens of context skips ultra-light and lands at least on light.

Tier cheatsheet

Tier	Use case	Reference model	Cost / M tokens (in)
Ultra-light	Classification, sentiment, parsing, JSON extract	Gemini 2.5 Flash Lite	$0.10
Light	Conversational simple, multi-turn chat	Claude Haiku 4.5	$1.00
Standard	Production workhorse, summarization, RAG	Claude Sonnet 4.6	$3.00
Heavy	Complex reasoning, code generation, agents	Claude Opus 4.7	$5.00

From tier to provider

Once HiWay knows the tier, it picks the actual model based on your settings: which providers you enabled in Settings → Providers and your routing profile (Budget / Balanced / Quality). The resolver looks at every model that matches the tier across your enabled providers and picks one according to the strategy.

Routing profile cheatsheet

Profile	Selection strategy	Typical winner for the light tier
Budget	Cheapest input price	Mistral Small ($0.10/M)
Balanced	Best quality / dollar ratio	Claude Haiku 4.5 ($0.80/M, quality 60)
Quality	Highest quality_score	Claude Haiku 4.5

Override anything

Set model to a fully-qualified id (e.g. "anthropic/claude-haiku-4-5", "openai/gpt-4o") to bypass scoring and pin the request to that model. Use "auto" to let the router decide.

Why ultra-light matters

A typical agent codebase has ~30-40% of its calls going to dispatch / classification / one-line transforms ("is this prompt safe?", "extract the author from this email", "label this support ticket"). Routing those to Gemini Flash Lite at $0.10/M instead of Haiku at $1/M is a flat 10x cost reduction on that slice - adds 3-5 percentage points of overall savings on top of the multi-tier router. The narrow gate (no complexity cues, short prompt, single turn) keeps quality intact.