BYOK LLM Gateway — what it is and why it matters
BYOK stands for Bring Your Own Keys. A BYOK LLM gateway is a routing layer where you hand it the provider API keys you already own — your OpenAI key, your Anthropic key, your Google key — and it calls those providers on your behalf. The inference itself is billed by the providers directly on your account, at their wholesale rate. The gateway charges you a flat fee for the routing, observability, and safety layer on top.
This sounds like a small architectural detail. It is not. It is the biggest shift happening in LLM infrastructure right now, because it realigns the incentives between you and the middle layer.
BYOK vs reseller — the concrete difference
Most LLM gateways you hear about in 2026 are one of two things, even if the marketing blurs the line.
A reseller gateway holds its own accounts with OpenAI, Anthropic, Google and friends. You top up a balance with the gateway, and every request you make, the gateway charges your balance at its own rate. That rate sits on top of the provider's wholesale rate — typically a single-digit percentage markup on every token. No subscription, pure pay-as-you-go. OpenRouter is the canonical example.
A BYOK gateway does not hold accounts with the upstream providers. You hold those accounts. You give the gateway your API keys (encrypted at rest), and the gateway routes your requests through your keys. Inference shows up on your OpenAI/Anthropic/Google invoice, at the price you already negotiated. The gateway makes money from a flat monthly fee, not from a percentage of your token spend.
Mechanically both products let you send a request and get a response. Financially they are opposite animals.
| Reseller gateway | BYOK gateway | |
|---|---|---|
| Who holds provider accounts | The gateway | You |
| Who appears on provider invoice | The gateway | You |
| Pricing model | Per-token markup (~5%) | Flat subscription |
| Your cost when you scale | Grows linearly with usage | Flat (up to a tier) |
| Incentive of the gateway | More of your tokens = more revenue | More of your tokens = same revenue |
| Who absorbs provider price changes | The gateway (good) | You (transparent) |
| Who gets the enterprise pricing you negotiated | The gateway | You |
That last row is the one enterprise teams notice first. If your company has a committed spend agreement with Anthropic that gives you 15% off list, a reseller cannot use that pricing — they route through their own account at their own negotiated rate. A BYOK gateway routes through yours and preserves the discount.
Why the incentive alignment matters
Here is the uncomfortable truth about reseller gateways: they make money when you spend more. That means every feature that reduces your spend — smart routing to cheaper models, aggressive caching, budget caps, auto-downgrade rules — is a feature that eats their own revenue.
Some reseller products still ship those features because they are good citizens and they know buyers want them. But there is a gravity pulling them the other way. When an engineering team inside the reseller chooses between "build feature A that helps the customer save 10%" or "build feature B that gives us 10% more throughput", A wins half the time, not every time.
A BYOK gateway has the opposite gravity. Every dollar of inference it helps you save is a dollar of value it can point to on renewal. Cost controls, smart routing, prompt caching, burn-rate alerts — these are the product, not features in tension with the product. That is why BYOK gateways tend to ship them first and ship them harder.
This is not a moral argument. It is a structural one. Two years of watching the managed-LLM SaaS wave has made it clear: the products that charge a percentage of inference are systematically slower to ship the cost-reduction features customers keep asking for. That is not because the people building them are bad. It is because they are rational, and they are shipping the features that make money.
When BYOK is the right fit
BYOK is not universally better. It has real trade-offs.
BYOK wins when:
- You already have accounts with the main providers and want to pay them directly at wholesale.
- You want smart routing to trim the inference bill (40-85% typical on a mixed workload) — that savings is volume-independent and stacks on top of 0% markup.
- You have negotiated enterprise rates you want to preserve.
- You care about the incentive alignment — you want the routing layer to help you spend less, not more.
- Compliance / procurement wants a clear separation between the infrastructure vendor and the model vendor.
Reseller wins when:
- You are prototyping and do not want to sign up with five different providers to try five different models.
- You want pure pay-as-you-go with zero fixed cost and your volume is low enough that the Free plan on a BYOK gateway (2,500 req/mo on HiWay) wouldn't cover it.
- You want access to niche or community-hosted models (Together, Fireworks, DeepInfra, open-source finetunes) that you cannot reach with direct provider accounts.
- Setup speed matters more than anything else.
The honest exception: if your inference spend is near zero, a $0 subscription (reseller with a tiny absolute markup) is mechanically cheaper than a $15 BYOK subscription. The HiWay Free plan (2,500 req/mo) covers that case without forcing you off BYOK. Beyond that edge, BYOK + smart routing wins regardless of volume — the inference savings dwarf the subscription within hours of real use.
Which gateways offer BYOK in 2026
Per each product's public documentation as of 2026-04-22, the BYOK landscape looks like this:
- HiWay2LLM — BYOK-native. Flat subscription, 0% inference markup, smart routing across providers, EU-hosted.
- Portkey — BYOK-first. Flat subscription. Strong observability focus, with a model router on top.
- LiteLLM — BYOK in both the OSS library and the Cloud product. You self-host OSS with your own keys, or run Cloud and bring your keys.
- Vercel AI Gateway — offers BYOK mode. Sits on Vercel's edge. Most natural for teams already on Vercel.
- Cloudflare AI Gateway — BYOK by default; Cloudflare does not resell tokens, it routes your requests through your keys with caching and observability on top.
- Requesty — BYOK-first routing gateway with a focus on cost optimization.
- OpenRouter — NOT BYOK. Reseller model with per-token markup. Accepted as the standard alternative model; no BYOK mode at time of writing.
- Helicone — primarily an observability product. BYOK for the proxying; pricing is on logged requests, not inference markup.
The pattern: the gateways built after 2024 are mostly BYOK. The ones that were built when LLMs were still a novelty — when "giving you access" was the actual product — are mostly reseller. The market is moving in the BYOK direction, and the reseller model is increasingly defended as "convenience for prototyping" rather than "the right way to run production".
BYOK vs reseller — side by side
| Feature | HiWay2LLM | Reseller gateways |
|---|---|---|
You hold the provider accounts | ||
Flat pricing, 0% inference markup Reseller margins sit on top of every token | ||
Your enterprise rate is preserved | ||
Incentive to help you spend less Reseller revenue grows with your spend | ||
Provider price drops hit you immediately Resellers can pocket the delta | ||
Time to first call Reseller wins on setup speed | ~5 min | ~2 min |
Access to community/niche providers Resellers often aggregate broader catalogs | ||
Predictable monthly cost |
native · partial or plugin · not offered
What a good BYOK gateway actually ships
A passthrough proxy with a billing page is not a gateway. A serious BYOK gateway should give you at least five things, because the incentive to ship them is now aligned with yours:
- Smart routing across models. Read the request in under a millisecond, score its difficulty, pick the cheapest model capable of answering. Not every call needs the top tier.
- Prompt caching. Anthropic and OpenAI both expose caching APIs. The gateway should use them automatically and report the hit rate.
- Burn-rate alerts. If an agent starts spending $500/hour at 3am, you want to know before morning, not at the next invoice.
- Per-workspace audit log. Who called what model, when, with which key, for how many tokens, at what cost. Exportable.
- Automatic fallback. When one provider is down or rate-limited, route to the next-best model. Configurable by workspace, not hardcoded.
If a BYOK gateway does not ship these, it is a thin proxy with a marketing page, not infrastructure. The test is simple: can you name five decisions the gateway makes for you that you would otherwise have to build yourself? If the answer is fewer than three, you are overpaying for a wrapper.
HiWay in this landscape
HiWay is a BYOK gateway with smart routing as its core bet. The five items above are the product. Pricing is flat: Free at 2,500 req/mo, Build at $15/mo for 100K, Scale at $39/mo for 500K, Business at $249/mo for 5M. No percentage markup on inference — ever. EU-hosted on OVH, zero prompt logging by default, DPA on every plan.
The positioning is simple: if you want the routing layer's incentives aligned with your bill going down — smart routing, 0% markup, BYOK, EU hosting — HiWay is built for that. If you just want one key to prototype with 100+ models in two minutes and you do not care about those levers, a reseller model is likely the right call. The two are not in direct competition; they solve different problems.
FAQ
Frequently asked questions
Bottom line
BYOK is not a feature — it is a category. It separates buying intelligence from operating intelligence. It realigns the infrastructure layer's incentives with yours. And it future-proofs your stack against a market where token prices keep falling and the margin in the middle is getting harder to defend.
If you are on a reseller gateway today, the math almost always tilts toward BYOK + smart routing — the savings are volume-independent, not gated on a spend threshold. If you are starting out, BYOK is still worth understanding now, so you know what you are building on.
2,500 requests/mo free, EU-hosted, no credit card