LLM gateway pricing models explained: per-token, per-request, BYOK, flat
Four pricing models, four incentive structures — pick the one aligned with your use
LLM gateways charge four different ways: provider markup, metered, per-request flat, and OSS self-host. Each one aligns incentives differently. Here's how to pick.
Most teams pick an LLM gateway based on features. That's the wrong first filter. The first filter should be the pricing model, because the pricing model determines what the gateway is structurally incentivized to optimize for — and therefore what it will ship, deprecate, and quietly resist over time.
There are four pricing models in the LLM gateway market. Each creates a different alignment between you and your gateway. Once you see the pattern, the feature differences start to make sense.
Model 1: Provider markup (OpenRouter-style)
How it works. You send requests to the gateway. The gateway calls providers on its account. The gateway bills you for tokens at a small markup over wholesale rates. Your provider keys are not involved — in fact, you don't even need provider accounts.
What it's optimized for. Ease of onboarding. One credit card, one signup, you're live. Massive model catalog because the gateway pre-negotiates with every provider.
The incentive. The gateway's revenue grows with your token spend. Every token you save is a token of revenue lost. Features that reduce your spend — aggressive routing to cheaper models, advanced caching, burn-rate alerts — are features the gateway has a structural disincentive to ship well.
Where it works. Prototyping. Side projects. Use cases where the total spend is small and convenience dominates. At a few hundred dollars a month, a 5% markup is noise.
Where it breaks. At scale. Once your spend crosses a few thousand dollars a month, the markup compounds into a real line item. A 5% markup on €10K/month is €500/month, recurring. And the gateway has no structural reason to help you reduce the other €10K.
Model 2: Metered / usage-based (Vercel AI Gateway-style)
How it works. You pay the provider for tokens directly (BYOK, often). The gateway charges you a metered fee — per request, per token routed, per MB processed — on top. Sometimes there's a free tier.
What it's optimized for. Platforms where the gateway is embedded in a broader product. Vercel bundles it with their platform. Cloudflare bundles it with their edge. The meter is designed to scale gently with your usage so it doesn't feel punitive at low volume.
The incentive. Softer than provider markup, but still aligned with usage growth. The gateway makes more money when your request volume goes up. Intelligence-based routing to cheaper models still reduces their revenue, though less dramatically than with provider markup.
Where it works. When the metered fee is small relative to your provider spend and you value the platform integration. For teams already committed to Vercel or Cloudflare, this fits naturally.
Where it breaks. At very high volumes, where even a small per-request meter adds up. And when the incentive misalignment starts to show — the gateway might ship observability and fallbacks before it ships features that cut your costs.
Model 3: Per-request flat subscription (HiWay-style)
How it works. You pay providers directly at wholesale (BYOK). You pay the gateway a flat monthly subscription based on request volume tier — e.g. Free at 2,500 req/mo, Build at €15 for 100K, Scale at €39 for 500K, Business at €249 for 5M, enterprise custom. The same subscription covers whichever model you route to.
What it's optimized for. Predictable cost. Your gateway cost is a known line item for the month, independent of how you route traffic.
The incentive. The gateway's revenue is decoupled from your token spend. It doesn't matter whether you route to Haiku or Opus — the subscription is the same. Features that reduce your token spend (routing to cheaper models, caching, burn-rate alerts) don't cost the gateway anything, so the gateway has a structural incentive to ship them well.
Where it works. Production teams with non-trivial spend that care about cost controls and predictable margins. EU teams that also want BYOK + EU hosting alongside this pricing model.
Where it breaks. At near-zero volumes where the Free tier (2,500 req/mo) is enough and even the Build tier at €15/month feels heavy for a hobby project. For those, the Free plan or direct provider APIs are simpler.
Model 4: OSS self-hosted (LiteLLM-style)
How it works. You download the software, deploy it yourself, pay the provider directly. The gateway itself has no runtime cost beyond the infrastructure you run it on.
What it's optimized for. Control, cost at scale (if you already have the platform), avoidance of vendor lock-in.
The incentive. None — there's no vendor to be incentivized. The software's roadmap is driven by maintainers and community contributions. Features land when someone writes them.
Where it works. Teams with platform engineers and existing ops infrastructure. Regulated environments where running your own gateway is the only compliant option.
Where it breaks. Small teams without platform engineering capacity. The "free" label hides 4-8 hours/week of ops, on-call, upgrade burden. See our separate post on self-host economics.
The alignment matrix
Here's the same information in matrix form:
| Model | Who pays for tokens | Gateway revenue model | Aligned to help you save? |
|---|---|---|---|
| Provider markup | Gateway (then bills you) | % of token spend | No — revenue grows with spend |
| Metered | You (BYOK) | Per-request meter | Partially — tied to volume |
| Per-request flat | You (BYOK) | Flat subscription | Yes — revenue independent of spend |
| OSS self-host | You (BYOK) | None | N/A — no vendor |
The third row is the one that changes things. When the gateway's revenue is disconnected from your token spend, the features it ships start to look different. That's not magic; it's just incentive alignment.
No credit card required
The break-even reasoning
Let's work through a realistic break-even between the two managed options — provider markup and per-request flat — at different spend levels.
Scenario: 500K requests/month, mixed workload
Assume each request has an average token profile that costs you €0.008 in wholesale provider fees. Total wholesale spend: €4,000/month.
Provider markup model (5% markup)
- Wholesale tokens: €4,000
- Markup: €200
- Total: €4,200/month
Per-request flat model (Scale tier at €39, which covers up to 500K requests)
- Wholesale tokens (BYOK, paid to providers): €4,000
- Gateway subscription: €39
- Total: €4,039/month
Per-request flat wins by €161/month at this scale — and that's before smart routing even kicks in.
Scale it up: 5M requests/month, €40,000 of provider spend
- Provider markup (5%): €42,000/month
- Per-request flat (Business tier at €249, 5M requests): €40,249/month
Per-request flat wins by €1,751/month. That's over €21,000/year, entirely from the absence of percentage markup.
Add smart routing to the second scenario — route 60% of traffic to Haiku-tier models instead of Sonnet — and your wholesale bill might drop 40-85%, compounding the savings. Under provider markup, 30% of that savings is still going to the gateway as markup. Under per-request flat, 100% of the savings is yours. Smart routing is volume-independent — it kicks in on every request, whether you run 5K or 5M a month.
What this looks like in practice
Three concrete patterns I see when teams change pricing model:
Pattern 1: Markup team tries BYOK + per-request flat at 10M req/mo. They were running at €8,000/month including a 4% markup. They move to per-request flat on an Enterprise custom tier + BYOK. Their percentage markup disappears entirely. Their provider bill stays €7,700/month initially. But more importantly, the incentive flip kicks in — they start enabling aggressive smart routing because the gateway actively helps them route cheaper. Six months later their token bill is 35% lower. Total annual savings: around €38,000.
Pattern 2: Self-hosted team of 4 engineers moves to managed. They were "saving money" on a gateway by self-hosting LiteLLM, but spending 6 hours/week of engineering time on it. Move to per-request flat gateway on the Business tier at €249/month. Their engineering capacity for product work goes up by ~25 hours/month. Features ship faster. The subscription pays for itself on the first avoided delay — plus smart routing starts trimming the underlying provider bill too.
Pattern 3: Solo founder stays on markup. Total monthly spend is €80. A markup model with instant signup is right for them. The 5% markup is €4. Friction of setting up BYOK + subscription isn't worth the €4. They'll migrate in 18 months if the business grows.
All three decisions are correct for their contexts. The mistake is picking the wrong one for your context, usually because you only looked at features and not at the pricing model underneath.
Watch for these anti-patterns
"We charge a small percentage on token usage" + "We have great cost-reduction features"
These are in tension. If the gateway's revenue grows when you spend more, shipping features that make you spend less is a counter-incentive. Some companies ship them anyway because their brand demands it; most deprioritize them silently.
Tier structures that feel generous at the low end and gouge at the high end
Especially common in metered models. The first 100K requests are free, the next 400K are cheap, and somehow above 1M/month the cost per request triples. Always look at the curve across your projected 12-month usage, not just today's bucket.
Opaque markup
Some gateways don't disclose their markup openly. You pay them, they pay the provider, you never see the wholesale rate. If you can't tell what the markup is, it's probably larger than you'd guess.
"Free forever" tiers that trap your data
A free tier that doesn't let you export your request logs, prompt history, or observability data is a lock-in moat disguised as generosity. Always verify export terms.
How to pick your pricing model
Three questions:
1. How big is your spend and how fast is it growing? Small and flat → markup is fine. Growing quickly → BYOK + flat pricing compounds in your favor.
2. Do you want cost-reduction features? If you want smart routing, caching, burn-rate alerts — you want a gateway whose revenue doesn't go down when those features work. That's BYOK + flat, or OSS self-host.
3. Do you have platform engineering capacity? If yes, OSS self-host is on the table. If no, eliminate it and pick among the managed options.
Answer those three, and you'll have your answer in 30 seconds.
The takeaway
LLM gateways don't just differ on features. They differ on the shape of the incentives baked into their pricing. Over 12-24 months of usage, that incentive structure matters more than any specific feature, because it determines which features keep improving and which quietly stall.
Pick the pricing model first. The right features tend to follow.
Next: Vercel AI Gateway in production — strengths, limits, alternatives.
Was this useful?
Comments
Be the first to comment.