June 20263 min readJohan Bretonneau

A 200 OK Is Not a Good Answer: Routing LLMs on Quality, Not Just Cost
Cheap routing that ignores quality is a tax you pay later, in bad outputs

Most LLM routers optimize for price and a successful HTTP call. But a cheap model that returns a confident, wrong answer still costs you, in rework and lost trust. Here is why HiWay measures the real output quality of every model and routes on quality-for-the-price, not just the cheapest call that does not error.

Every LLM cost-cutting story ends the same way: "we moved to a cheaper model and saved 60%." Almost nobody writes the sequel, the one where support tickets creep up two weeks later and the team quietly switches half of it back.

The reason is a number most teams never track. Not cost. Not latency. Not error rate. Quality.

A successful call is not a good answer

A cheaper model returns a 200 OK. The pipeline is green, the latency chart looks great, the bill drops. Everyone is happy.

Except the answer was shallow, or subtly wrong, or only addressed half of what was asked. No exception was thrown. No alert fired. Your product just got a little worse, one response at a time.

That is the hidden tax of naive cost routing. You optimize the invoice and pay it back somewhere you are not measuring: rework, escalations, a user who trusts the output a little less each time.

What naive routing sees	What the user actually gets
HTTP 200	A confident answer that misses the point
Low latency	A fast wrong answer
Lower token cost	A second prompt to fix the first
Green dashboard	A quietly worse product

Cheap is not the same as good-enough, and not everywhere

For a large share of tasks, a small model is genuinely as good as a flagship: classification, extraction, short summaries, routine formatting. Routing those down is free money, and you should take it.

For others, careful reasoning, nuanced writing, multi-step tool use, the gap is real. Route those down and you feel it in the output.

The hard part: the line between the two is different for every workload, and it moves every time a model is updated. A static "use the cheap model for everything" rule is how you end up writing the switch-back sequel.

We do not assume quality. We measure it.

HiWay scores the real output quality of the models in your routing mix. An independent AI judge rates a sample of actual responses on what matters to a user, accuracy, completeness, and relevance, so a model that returns a confident but weak answer ranks below one that genuinely answers well.

That signal feeds the router. It learns the best quality-for-the-price model for each kind of task in your workspace, and keeps learning as your traffic and the model landscape change.

The result is the part that makes the savings durable: you capture the discount exactly where a cheaper model is genuinely good enough, and you keep the strong model exactly where it earns its price. Quality-aware, not just cost-aware.

Why "independent" is the whole point

A model grading its own work flatters itself. If the quality signal comes from the same model that produced the answer, you are measuring confidence, not correctness. The judgment has to come from outside the model being judged, on the dimensions a user cares about, not on the single question most routers ask, "did it return something."

What this changes for you

You stop choosing between cheap and good. You get cheap where it is safe and strong where it counts, decided per task, automatically.
The savings hold up, because they are not bought with a silent quality regression that surfaces a month later as churn.
It adapts. When a new model ships or an existing one drifts, the measurement catches it before your users do.

Cutting your LLM bill is easy. Cutting it without quietly degrading your product is the actual job. That is the difference between routing on cost and routing on quality.

Start Saving →

No credit card required

LinkedIn X Email

Was this useful?

Comments

…

Be the first to comment.