HiWay2LLM vs Cloudflare AI Gateway
Head-to-head comparison of HiWay2LLM and Cloudflare AI Gateway. Why an edge gateway and a model router do different jobs, how their pricing and hosting compare, and when you might even use both.
Cloudflare AI Gateway is an edge CDN-style layer for LLM calls: caching, rate-limiting, analytics, very cheap at scale. HiWay is a router that picks the cheapest capable model per request with BYOK and 0% markup. Different layers. Cloudflare wins if your traffic is cacheable and you want edge latency. HiWay wins if your bill is driven by overpaying on every unique request. Stacking both is a legitimate setup.
Cloudflare AI Gateway and HiWay2LLM both claim the word "gateway", both sit between your app and the upstream LLM, and both are OpenAI-compatible on the wire. If you are skimming landing pages, they look substitutable. They are not. They operate at different layers and they optimize for different things.
Cloudflare's product is an edge gateway: a caching, rate-limiting and analytics layer deployed on Cloudflare's global network. You proxy your LLM calls through a URL like https://gateway.ai.cloudflare.com/v1/<account>/<gateway>/openai, and you get caching, retries, rate limits and analytics for basically free at hobby volume. It is the AI equivalent of putting Cloudflare in front of a website.
HiWay's product is a model router: it reads each request, scores complexity in under 1ms, and picks the cheapest capable model — with BYOK and 0% markup on inference. It is not a CDN. It is not trying to cache at the edge. Its job is that the right model answers the request.
These are not competing products in the strict sense. They operate at different layers of the stack. But since most teams only have budget and integration bandwidth for one middleware, the practical comparison matters.
Quick decision
- A lot of your LLM traffic is repeat questions (support bots, doc Q&A with hot spots, classification loops)? Cloudflare's cache hits are nearly free and will drop your bill hard on the cacheable slice.
- Your traffic is mostly unique requests (agents, custom prompts, per-user context)? Caching does little; you need a router that picks cheaper models per request. HiWay.
- You are already all-in on Cloudflare Workers and want everything at the edge? Cloudflare AI Gateway is the native choice; it sits next to your Workers.
- You want EU-hosted middleware with a signed DPA and 0% inference markup? HiWay is EU-hosted on OVH, with BYOK and per-provider wholesale billing.
- You want both edge caching AND complexity-based routing? Stack them. Cloudflare in front for caching + rate-limiting, HiWay for routing. The migration examples below show the shape.
Pricing
Cloudflare AI Gateway is famously cheap. At hobby/low-volume tiers it is free, and paid tiers scale with advanced features and volume (check the Cloudflare Workers and AI Gateway pricing pages as of 2026-04-22 for current details). You still pay the upstream LLM provider for inference — Cloudflare is a proxy, not a reseller.
HiWay charges a flat monthly fee for the routing layer. Inference is billed by the provider directly on your card at wholesale (BYOK, 0% markup on tokens):
| Plan | Price | Routed requests / mo |
|---|---|---|
| Free | $0 | 2,500 |
| Build | $15/mo | 100,000 |
| Scale | $39/mo | 500,000 |
| Business | $249/mo | 5,000,000 |
| Enterprise | on request | custom quotas, SSO, DPA |
Smart routing also auto-downgrades simple requests to cheaper models — 40-85% savings on a typical mix — which overtakes the $15/mo Build subscription within hours of real use, at any scale.
These are not directly comparable because they do different jobs. Cloudflare's ultra-cheap pricing is possible partly because caching and analytics are commodified at scale for them. HiWay's pricing is in line with dedicated LLM middleware because the routing intelligence is the product. If pure-edge caching is what you need, Cloudflare's price floor is unbeatable. If you need a router that lowers per-request cost, HiWay is priced for that job.
Feature-by-feature
| Feature | HiWay2LLM | Cloudflare AI Gateway |
|---|---|---|
Bring your own keys (BYOK) Cloudflare proxies your provider key; HiWay stores providers centrally and fans out | ||
Smart routing by request complexity Cloudflare forwards to the model you specify; it does not score prompts | ||
Edge caching (CDN-style) This is Cloudflare's core strength | ||
Analytics dashboards Cloudflare's are cleaner at the edge; HiWay's are deeper per-workspace | ||
Rate limiting | ||
OpenAI-compatible API | ||
Automatic fallback across providers | ||
Cost-based model auto-selection | ||
EU hosting (GDPR) Cloudflare is global-edge; check residency controls for your plan | ||
Zero prompt logging by default Cloudflare AI Gateway analytics can capture prompts — configurable | ||
Pricing model | flat €/mo per request tier, 0% inference markup | very cheap at hobby tier, scales with volume + features |
Primary job | cost optimization via routing | edge caching + analytics |
native · partial or plugin · not offered
When to pick which
Pick HiWay2LLM if
- Your traffic is mostly unique requests where caching does not help and the bill comes from overpaying on every call
- You want a router that picks the cheapest capable model per request instead of forwarding whatever your code asked for
- You want BYOK with zero inference markup and flat per-request pricing
- You are in the EU or serve EU customers and need GDPR-aligned hosting + a signed DPA
- Zero prompt logging by default is a compliance requirement
- You want burn-rate alerts and hard budget caps on inference spend
Pick Cloudflare AI Gateway if
- A lot of your LLM traffic is repetitive and cacheable — support bots, classification loops, FAQ answers
- You want the cheapest possible middleware bill and you accept that you still overpay on the model choice itself
- You are already deep in the Cloudflare ecosystem (Workers, Pages, KV, D1) and want AI traffic at the same edge
- You need global edge latency — low TTFB for users everywhere in the world
- Your volume at the hobby tier is small enough that Cloudflare's free allocation covers you fully
- Your pain is 'protect origin' and 'rate limit abuse', not 'pick a cheaper model'
Migration — what actually changes in your code
If you are on Cloudflare AI Gateway, your base URL is the gateway proxy URL pattern (https://gateway.ai.cloudflare.com/v1/<account_id>/<gateway>/openai). Switching to HiWay is a single base URL swap plus an API key change — the rest of your client code does not move.
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.ai.cloudflare.com/v1/ACCOUNT_ID/GATEWAY_ID/openai",
api_key="sk-openai-...",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)from openai import OpenAI
client = OpenAI(
base_url="https://app.hiway2llm.com/v1",
api_key="hw_live_...",
)
response = client.chat.completions.create(
model="auto", # let the router pick
messages=[{"role": "user", "content": "Hello"}],
)Two extra steps before the switch: add your provider keys once in the HiWay dashboard (Settings → Providers), and keep model: "auto" if you want the router to pick — or pin a specific model if you want to force it.
Edge gateway vs model router — two different jobs
The clearest way to see why these are not the same product: imagine your LLM traffic today, and look at where the money is being burned.
If money is burned on repeated identical requests, an edge gateway saves it. Cloudflare caches the response for identical prompts at the edge, serves the cached reply in tens of milliseconds, and you do not pay the upstream LLM at all on a cache hit. This is the classic CDN shape, applied to AI. It is extraordinarily cheap because it is commodity infrastructure running at Cloudflare's scale.
If money is burned on unique requests hitting the wrong (too-expensive) model, an edge gateway does nothing. Every prompt is different, nothing caches, and you still pay GPT-4 prices to answer "what is 2+2?" because that is what your code asked for. Here what saves money is routing: reading the prompt, scoring its complexity, sending the easy ones to Haiku-class models and keeping the big ones for jobs that need them. That is what HiWay does.
Two layers, two jobs. Not competing — complementary in theory. The reason teams often pick one is not that the other is bad, it is that each extra middleware is another hop in the critical path, another thing to operate, another thing that can fail. If one hop gets you 80% of the savings, you stop there.
Some teams do stack them: Cloudflare as the outer edge for caching, rate limiting and DDoS protection, then HiWay as the router inside. The flow is app → Cloudflare AI Gateway → HiWay → upstream provider. Cache hits never touch HiWay or the provider. Cache misses go through HiWay, get scored, get routed to the cheapest capable model, and pay wholesale. It is a legitimate architecture if the caching layer earns its keep on your traffic pattern.
Data & compliance
Cloudflare AI Gateway is deployed on Cloudflare's global edge. That is the point — low latency everywhere. Analytics can capture prompts and responses depending on configuration; check Cloudflare's current docs for residency options and data retention on your plan. If strict EU residency is a hard requirement, validate the plan before committing.
HiWay is operated from France by Mytm-Group, hosted on OVH servers in the EU. Zero prompt logging is the default — prompts transit in-memory and are never persisted. We sign a DPA on request (even on the free plan) and publish our sub-processors. If you need request logs for your own debugging, it is opt-in per workspace.
For teams whose compliance posture requires EU residency without additional configuration work, HiWay's default fits. For teams already on Cloudflare's infrastructure with a compliance stance that accepts Cloudflare's data handling, Cloudflare AI Gateway is a zero-new-vendor option.
FAQ
FAQ
Bottom line
Cloudflare AI Gateway and HiWay are both legitimate, but they are not substitutes. Cloudflare is an edge caching and analytics layer — fantastic when your traffic is cacheable or when you want everything at the edge for almost free. HiWay is a model router that picks cheaper capable models per request — fantastic when your bill is driven by overpaying on unique calls.
If your traffic is repetitive, cache it at the edge. If your bill is driven by per-request overpayment, route it smarter. If it is both, stack them.
When the bill is the number to move, plug your current spend into the savings calculator and see what complexity-based routing does to it.
BYOK, EU-hosted, no credit card