Why we built HiWay: an EU-based BYOK alternative
The three problems that pushed us from 'we'll make do' to 'we'll build it ourselves'
We tried LiteLLM self-hosted, OpenRouter, and direct APIs. None of them worked for an EU team that wanted BYOK, transparent billing, and real cost controls. So we built HiWay.
I'm Johan Bretonneau, co-founder of Mytm-Group with Antoine Brodiez. We build AI apps and tools for European companies. HiWay2LLM is our LLM gateway, and this is why we ended up shipping it instead of using something that already existed.
It's not a marketing story. It's the honest sequence of "we tried X, here's what broke, we tried Y, here's what broke, OK, we'll build it."
The starting point: $200 at 3 AM
In early 2026 we were running a handful of AI products — some internal, some for clients. Our monthly LLM spend was growing fast. Not startup-scale growing — we weren't burning VC money — but growing enough to notice.
Then one Tuesday morning I opened the Anthropic console and saw a $200 charge from 3 AM. An agent had entered a retry loop against Claude Opus. No human was awake. No alert fired. The bill just kept climbing for four hours until the rate limiter finally caught it.
That's the moment we stopped shopping for "a good LLM gateway" and started seriously evaluating every option on the table, because our existing stack had failed in a specific, expensive way.
What we tried: LiteLLM self-hosted
LiteLLM is the obvious OSS answer. We deployed it on our VPS, pointed our apps at it, and it worked. For a few weeks.
Here's what broke for us:
Ops burden. Every provider API change meant a config update, a test cycle, a redeploy. Provider quota upgrades weren't centralized. When a client's key needed rotating, we had to do it by hand.
On-call. LiteLLM going down meant our AI features went down. No one pays you to fix your own gateway at 2 AM.
Feature gaps. LiteLLM had routing and fallbacks. It didn't have the things we actually needed — burn-rate alerts, agentic loop detection, per-endpoint budgets that auto-downgrade instead of just blocking. We started patching.
The patches piled up. Every week we spent 4-6 hours on gateway work instead of shipping product. We calculated the fully-loaded cost of self-hosting: at our scale, it was worse than paying a managed service.
LiteLLM is a great piece of software. It's the wrong piece of software for a team that wants to ship product and doesn't have a dedicated platform engineer.
What we tried: OpenRouter
We gave OpenRouter a serious look. Fast signup, broad model catalog, genuinely impressive breadth. We piloted it on one product.
Three problems surfaced:
The markup on growing bills. OpenRouter's markup on inference is small percentage-wise, but our spend trajectory was vertical. At $500/month it was noise. Projecting to $5K-10K/month, the markup became a line item we had to justify. Our clients would see it. We'd have to explain it.
US-hosted, no EU option. Most of our clients are European. Several are in regulated sectors. "We route your user prompts through a US-based third party whose sub-processor list is X and whose DPA reads Y" was a conversation I did not want to have repeatedly. The EU AI Act was coming into force. Schrems II was still a live issue. US hosting became a non-starter.
No BYOK. OpenRouter was the one paying. Our clients' usage showed up under our OpenRouter account, not under their Anthropic accounts. Quota increases went through OpenRouter. Key rotation went through OpenRouter. Everything went through OpenRouter. The moment we wanted to leave, we had to touch every integration.
None of those were reasons to dunk on OpenRouter. They're good at what they do. They're just not fit for the specific shape of problem we had: EU team, serving EU clients, with growing spend and a hard BYOK preference.
What we tried: direct APIs with a homegrown shim
For a few weeks we considered just calling Anthropic and OpenAI directly and building a minimal in-house routing/budget layer.
This is the classic "just a weekend project" trap. You sketch it out, it looks easy, and then:
- You need retry logic with backoff.
- You need per-endpoint rate limiting.
- You need a budget cap that downgrades instead of erroring.
- You need burn-rate detection.
- You need per-request cost tracking.
- You need streaming.
- You need tool use.
- You need structured outputs.
- You need to keep up with every provider's API changes.
What starts as "200 lines of glue code" becomes a six-month platform project. We had product to ship. This path was a trap.
What we decided to build
After those three dead ends, we wrote down what we actually wanted. The constraints were specific:
- BYOK. Our clients' provider keys, their accounts, their billing. No markup on inference.
- EU-hosted. On European infrastructure, with a DPA we could hand to compliance, sub-processors we could name.
- Cost controls worth the name. Burn-rate alerts, per-endpoint budgets, automatic downgrade at thresholds, the stuff no provider offers because it would hurt their revenue.
- OpenAI-compatible. Zero SDK rewrites for migration in or out.
- Flat pricing. Our margin shouldn't compound against our clients as they grow.
HiWay2LLM is what those constraints produced:
- BYOK on Anthropic, OpenAI, Google, Mistral, Groq, DeepSeek, xAI, Cerebras — 60+ models across them all.
- Hosted on OVH, French infrastructure. DPA available. Zero prompt logging by default.
- Smart routing that reads every request in under 1ms and picks the optimal tier. Burn-rate alerts via Slack. Per-endpoint budgets. Automatic downgrade logic.
- OpenAI-compatible endpoints. Drop-in swap for any OpenAI-SDK app.
- Flat per-request pricing: Free at 2,500 req/mo, Build at $15/mo for 100K, Scale at $39/mo for 500K, Business at $249/mo for 5M, enterprise custom.
No markup on inference, ever. If you route 100M tokens through us, you pay your providers for 100M tokens and you pay us nothing extra. Our revenue is the flat subscription. That's it.
No credit card required
Why flat per-request pricing specifically
This one is worth unpacking because it's the most opinionated choice we made.
Per-token markup aligns the gateway's interests with provider upsell. More tokens, more revenue. The gateway makes more money when your bill goes up, which means the features that would bring your bill down — smart routing to cheaper models, aggressive caching, burn-rate alerts — are features the gateway has no incentive to ship.
Metered pricing on top of provider costs (Vercel AI Gateway-style) is a softer version of the same problem. The gateway's revenue still grows with your usage.
Per-request flat pricing breaks that. We charge the same for a request that routes to Haiku and a request that routes to Opus. Which means our best-aligned move is to route your traffic to the cheapest model that still does the job. When we ship smart routing improvements, you save money and we don't lose any.
The math works because our costs are largely fixed. Running the gateway, the observability stack, the burn-rate engine — those costs don't scale with whether you hit Haiku or Opus. They scale with request count. So we charge on request count.
What being an EU company actually means for us
Being an EU company isn't just a marketing angle. It means:
- Our infrastructure is on OVH, French cloud provider.
- Our DPA is built against the GDPR, not the EU-US Data Privacy Framework.
- Our sub-processor list is short and European where we can.
- We don't log prompt bodies by default. If you want prompt logging for debugging, you opt in, per API key, and the retention is bounded.
- We follow EU AI Act guidance on high-risk systems, even when our clients' use isn't classified as high-risk, because the definitions are drifting.
For a French SME selling to French hospitals, German banks, Belgian insurers — this is the difference between "we can deploy this" and "we can't, legal won't sign off."
Some of our US-based competitors have EU regions. Many don't. None of them have us beat on "we are actually an EU entity, under EU law, with EU-based people answering your data subject requests." That's not a small thing.
What we learned along the way
Three lessons that might be useful if you're thinking about similar infra work.
1. The "I'll just build it myself" math is almost always wrong at first estimate. We estimated 2 weeks for the MVP. The honest count including ops tooling, burn-rate engine, observability, and onboarding was closer to 4 months of real engineering. We're glad we did it, but it was not a weekend project.
2. Flat pricing is a discipline, not just a label. It forces you to be efficient because you can't hide costs in the margin. When Anthropic releases a new model, we add support the same week. Not because we love the work, but because our revenue doesn't depend on you staying on older, more expensive models.
3. Your customer's incentives teach you what to build. We shipped burn-rate alerts because we'd been bitten by $200 at 3 AM. We shipped per-request pricing because our clients kept asking "can I model this cost predictably for next year's budget?" We shipped EU hosting because every enterprise sales call hit the same compliance wall. The roadmap basically wrote itself from customer friction.
Who HiWay is for
Not everyone. If your spend is $20/month and your only constraint is "does it work," use Anthropic directly. If you're US-only and love Vercel, use the Vercel AI Gateway. If your team has a dedicated platform engineer and you want maximum control, self-host LiteLLM.
HiWay is for teams that:
- Run LLM calls in production with growing spend.
- Care about EU hosting for compliance or preference reasons.
- Want BYOK — keys in their accounts, billing directly from providers.
- Want cost controls that actually fire before the budget is gone.
- Want to spend their weeks shipping product, not gateway ops.
If that sounds like you, we'd like to earn your traffic. If not, one of the ten alternatives in our other post is probably a better fit, and we genuinely hope you find the right one.
Next: LiteLLM vs managed gateways — the honest trade-off.
Was this useful?
Comments
Be the first to comment.