
LLM gateway pricing models explained: per-token, per-request, BYOK, flat
Four pricing models, four incentive structures - pick the one aligned with your use
LLM gateways charge four different ways: provider markup, metered, per-request flat, and OSS self-host. Each one aligns incentives differently. Here's how to pick.
Gateway pricing models compared - $10k/mo spend
Monthly all-in cost ($) across three billing architectures
Most teams pick an LLM gateway based on features. That's the wrong first filter. The first filter should be the pricing model, because the pricing model determines what the gateway is structurally incentivized to optimize for - and therefore what it will ship, deprecate, and quietly resist over time.
There are four pricing models in the LLM gateway market. Each creates a different alignment between you and your gateway. Once you see the pattern, the feature differences start to make sense.
Model 1: Provider markup (OpenRouter-style)
How it works. You send requests to the gateway. The gateway calls providers on its account. The gateway bills you for tokens at a small markup over wholesale rates. Your provider keys are not involved - in fact, you don't even need provider accounts.
What it's optimized for. Ease of onboarding. One credit card, one signup, you're live. Massive model catalog because the gateway pre-negotiates with every provider.
The incentive. The gateway's revenue grows with your token spend. Every token you save is a token of revenue lost. Features that reduce your spend - aggressive routing to cheaper models, advanced caching, burn-rate alerts - are features the gateway has a structural disincentive to ship well.
Where it works. Prototyping. Side projects. Use cases where the total spend is small and convenience dominates. At a few hundred dollars a month, a 5% markup is noise.
Where it breaks. At scale. Once your spend crosses a few thousand dollars a month, the markup compounds into a real line item. A 5% markup on €10K/month is €500/month, recurring. And the gateway has no structural reason to help you reduce the other €10K.
Model 2: Metered / usage-based (Vercel AI Gateway-style)
How it works. You pay the provider for tokens directly (BYOK, often). The gateway charges you a metered fee - per request, per token routed, per MB processed - on top. Sometimes there's a free tier.
What it's optimized for. Platforms where the gateway is embedded in a broader product. Vercel bundles it with their platform. Cloudflare bundles it with their edge. The meter is designed to scale gently with your usage so it doesn't feel punitive at low volume.
The incentive. Softer than provider markup, but still aligned with usage growth. The gateway makes more money when your request volume goes up. Intelligence-based routing to cheaper models still reduces their revenue, though less dramatically than with provider markup.
Where it works. When the metered fee is small relative to your provider spend and you value the platform integration. For teams already committed to Vercel or Cloudflare, this fits naturally.
Where it breaks. At very high volumes, where even a small per-request meter adds up. And when the incentive misalignment starts to show - the gateway might ship observability and fallbacks before it ships features that cut your costs.
Model 3: Degressive BYOK markup (HiWay-style)
How it works. You pay providers directly at wholesale (BYOK). You pay the gateway a degressive markup on your monthly API spend - e.g. Free to get started, Scale with 9-12.5% markup by volume (12.5% below $500/mo, 11% between $500 and $5,000/mo, 10% between $5,000 and $20,000/mo), Enterprise negotiated above that. The same markup structure applies regardless of which model you route to.
What it's optimized for. Cost proportional to value. Your gateway cost evolves with your actual spend, but stays at a known proportion. The more you route intelligently to cheaper models, the lower the absolute cost of the markup.
The incentive. The gateway's revenue is proportional to your token spend, but routing savings benefit you 100% on the provider side. Features that reduce your token spend (routing to cheaper models, caching, burn-rate alerts) lower your total cost - gateway included - so the gateway has an incentive to ship them.
Where it works. Production teams with variable spend wanting a simple, transparent cost structure. EU teams that also want BYOK + EU hosting. Mixed workloads where smart routing (40-85% savings) more than covers the markup.
Where it breaks. For very predictable, stable volumes, a flat subscription might be easier to budget. For hobby projects or minimal traffic, the Free plan covers basic features at no cost.
Model 4: OSS self-hosted (LiteLLM-style)
How it works. You download the software, deploy it yourself, pay the provider directly. The gateway itself has no runtime cost beyond the infrastructure you run it on.
What it's optimized for. Control, cost at scale (if you already have the platform), avoidance of vendor lock-in.
The incentive. None - there's no vendor to be incentivized. The software's roadmap is driven by maintainers and community contributions. Features land when someone writes them.
Where it works. Teams with platform engineers and existing ops infrastructure. Regulated environments where running your own gateway is the only compliant option.
Where it breaks. Small teams without platform engineering capacity. The "free" label hides 4-8 hours/week of ops, on-call, upgrade burden. See our separate post on self-host economics.
The alignment matrix
Here's the same information in matrix form:
| Model | Who pays for tokens | Gateway revenue model | Aligned to help you save? |
|---|---|---|---|
| Provider markup | Gateway (then bills you) | % of token spend | No - revenue grows with spend |
| Metered | You (BYOK) | Per-request meter | Partially - tied to volume |
| Degressive BYOK markup | You (BYOK) | Degressive % markup on API spend | Yes - routing savings reduce both provider and gateway cost |
| OSS self-host | You (BYOK) | None | N/A - no vendor |
The third row is the one that changes things. With a degressive BYOK markup, smart routing savings reduce both your provider bill and your gateway cost simultaneously. That's not magic; it's just incentive alignment.
No credit card required
The break-even reasoning
Let's work through a realistic break-even between the two managed options - provider markup and per-request flat - at different spend levels.
Scenario: 500K requests/month, mixed workload
Assume each request has an average token profile that costs you €0.008 in wholesale provider fees. Total wholesale spend: €4,000/month.
Provider markup model (5% markup)
- Wholesale tokens: €4,000
- Markup: €200
- Total: €4,200/month
Degressive BYOK markup model (HiWay Scale, 10% for $500-$5,000/mo spend)
- Wholesale tokens (BYOK, paid to providers): €4,000
- HiWay markup (10% on €4,000): €400
- Total: €4,400/month
At this raw spend level both models are comparable. The differentiator is what happens when smart routing kicks in.
With smart routing: route 60% to Haiku-tier models
Wholesale bill drops to €2,000/month.
- Provider markup (5%, calculated on full spend before routing): ~€200+ unchanged
- Degressive BYOK markup (10% on €2,000): €200
Under degressive BYOK markup, the gateway markup drops proportionally with your spend. 100% of provider savings flow to you - and the markup on the reduced base is lower too. Smart routing is volume-independent - it kicks in on every request, whether you run 5K or 5M a month.
What this looks like in practice
Three concrete patterns I see when teams change pricing model:
Pattern 1: Markup team tries BYOK + degressive markup at 10M req/mo. They were running at €8,000/month including a 4% markup. They move to degressive BYOK markup Enterprise + BYOK. Their residual markup is aligned with actual spend. More importantly, the incentive flip kicks in - they start enabling aggressive smart routing because every euro saved on the provider side also reduces the gateway markup. Six months later their token bill is 35% lower. Total annual savings: around €38,000.
Pattern 2: Self-hosted team of 4 engineers moves to managed. They were "saving money" on a gateway by self-hosting LiteLLM, but spending 6 hours/week of engineering time on it. Move to BYOK degressive markup gateway (Scale plan). Their engineering capacity for product work goes up by ~25 hours/month. Features ship faster. The markup pays for itself on the first avoided delay - plus smart routing starts trimming the underlying provider bill too.
Pattern 3: Solo founder stays on markup. Total monthly spend is €80. A markup model with instant signup is right for them. The 5% markup is €4. Friction of setting up BYOK + subscription isn't worth the €4. They'll migrate in 18 months if the business grows.
All three decisions are correct for their contexts. The mistake is picking the wrong one for your context, usually because you only looked at features and not at the pricing model underneath.
Watch for these anti-patterns
"We charge a small percentage on token usage" + "We have great cost-reduction features"
These are in tension. If the gateway's revenue grows when you spend more, shipping features that make you spend less is a counter-incentive. Some companies ship them anyway because their brand demands it; most deprioritize them silently.
Tier structures that feel generous at the low end and gouge at the high end
Especially common in metered models. The first 100K requests are free, the next 400K are cheap, and somehow above 1M/month the cost per request triples. Always look at the curve across your projected 12-month usage, not just today's bucket.
Opaque markup
Some gateways don't disclose their markup openly. You pay them, they pay the provider, you never see the wholesale rate. If you can't tell what the markup is, it's probably larger than you'd guess.
"Free forever" tiers that trap your data
A free tier that doesn't let you export your request logs, prompt history, or observability data is a lock-in moat disguised as generosity. Always verify export terms.
How to pick your pricing model
Three questions:
1. How big is your spend and how fast is it growing? Small and flat → markup is fine. Growing quickly → BYOK + degressive markup compounds in your favor (the rate decreases at each tier).
2. Do you want cost-reduction features? If you want smart routing, caching, burn-rate alerts - you want a gateway whose revenue decreases when those features work. That's BYOK + degressive markup, or OSS self-host.
3. Do you have platform engineering capacity? If yes, OSS self-host is on the table. If no, eliminate it and pick among the managed options.
Answer those three, and you'll have your answer in 30 seconds.
The takeaway
LLM gateways don't just differ on features. They differ on the shape of the incentives baked into their pricing. Over 12-24 months of usage, that incentive structure matters more than any specific feature, because it determines which features keep improving and which quietly stall.
Pick the pricing model first. The right features tend to follow.
Next: Vercel AI Gateway in production - strengths, limits, alternatives.
Was this useful?
Comments
Be the first to comment.