LiteLLM vs managed gateways: when self-hosting actually costs more
The fully-loaded cost of running your own LLM gateway, honestly
LiteLLM OSS is great. It's also not free. The real cost of self-hosting vs a managed gateway — ops time, on-call, feature lag, upgrade burden — broken down.
LiteLLM is a well-built, widely-used OSS LLM gateway. It's a legitimately good piece of software. It's also not the right choice for every team, and the "it's free" argument hides a set of costs that only show up after you've been running it for three months.
This isn't a hit piece. We used LiteLLM. We'd use it again for the right shape of team. The question is: when does self-hosting actually win, and when does a managed gateway win?
Let's do the math honestly.
What LiteLLM actually gives you
The core offering: an OSS Python library and proxy that exposes 100+ LLM providers behind an OpenAI-compatible API. Deploy it in Docker, Kubernetes, or a VM. Point your app at it. Done.
Out of the box you get:
- Provider abstraction (one SDK, many backends)
- Fallback routing (try provider A, then B on failure)
- Basic caching
- Request logging
- Key management
- Budget tracking (in the enterprise tier)
That's a real list. Self-hosted, with zero per-request cost going to a vendor.
The hidden costs of self-hosting
Here's where the "it's free" framing starts to leak.
Cost 1: Ops time
A realistic LiteLLM deployment needs, at minimum:
- Docker or Kubernetes to run it.
- A PostgreSQL instance (for logs, budgets, key state).
- Redis (for cache and rate limiting).
- Monitoring (Prometheus/Grafana or equivalent).
- Log aggregation.
- Secret management for provider keys.
- CI/CD to deploy updates.
If you already run this stack for other reasons, the marginal cost is small. If you don't, you're provisioning and maintaining 3-5 pieces of infrastructure you otherwise wouldn't.
A conservative estimate: 4-8 hours/week of platform engineering for a non-trivial deployment. At a loaded cost of €100/hour for an experienced engineer, that's €400-800/week, or €1,600-3,200/month.
Cost 2: On-call
If your LLM calls are in the critical path of a production app, your gateway going down means your product goes down. Someone has to be reachable.
On-call rotations have a cost even when the pager doesn't fire — it's a constraint on engineer life that management often doesn't want to admit is real. For a small team, adding a new on-call surface for the LLM gateway is not free.
If your gateway has an incident at 2 AM, the cost of the incident is the engineer's sleep, the backlog delay, the actual debugging time, and potentially customer impact. A managed gateway takes that incident off your plate — their on-call rotation fires, not yours.
Cost 3: Feature lag
Providers ship new models constantly. New API fields (structured outputs, extended thinking, prompt caching parameters). New pricing tiers. New error codes. New rate limit semantics.
LiteLLM OSS keeps up with most of this, but there's always a lag. The cycle is: provider ships feature → LiteLLM issue filed → PR lands → new release cut → you upgrade your deployment → you test → you roll out.
For us, that cycle was typically 1-4 weeks behind a managed gateway that ships continuously. In a fast-moving space, four weeks matters.
Cost 4: Feature parity
LiteLLM OSS has routing and fallback. It doesn't have (or doesn't match managed-gateway depth on):
- Intelligence-based routing that reads prompt complexity in real-time
- Burn-rate detection and pre-budget-blown alerting
- Zombie-agent detection (off-hours agent activity)
- Loop detection (identical requests firing repeatedly)
- Fine-grained per-endpoint budgets with auto-downgrade
- Polished observability UI for non-engineers
You can build any of these. You end up building all of them. Each one is a week of engineering you're not spending on your actual product.
Cost 5: Upgrade burden
Every few weeks LiteLLM ships a new version. Usually minor, occasionally breaking. You have to:
- Read the changelog.
- Test the new version against your config.
- Upgrade in a staging environment.
- Smoke-test.
- Roll out.
When it breaks — and it occasionally does, that's the nature of a fast-moving OSS project — you debug, sometimes against the OSS community, sometimes by reading code.
That's real work, and it's work that scales linearly with how many deployments you run.
The fully-loaded comparison
Let's model a concrete scenario: a team doing 1M LLM requests/month with €3,000/month of provider spend.
LiteLLM self-hosted
| Cost line | Monthly |
|---|---|
| Provider spend | €3,000 |
| Infrastructure (VM, DB, Redis, monitoring) | €150 |
| Platform engineering (6h/week × €100/h × 4.3) | €2,580 |
| On-call stipend (rotational) | €400 |
| Total | €6,130 |
Managed gateway (HiWay-style, per-request flat)
| Cost line | Monthly |
|---|---|
| Provider spend | €3,000 |
| Gateway subscription (Business tier, 5M req/mo) | €249 |
| Platform engineering (integration only, ~1h/month) | €100 |
| Total | €3,349 |
Managed is €2,781/month cheaper at this scale, almost entirely on engineering labor. And that's before factoring in smart routing, which typically cuts the €3,000 provider bill by 40-85% on top — a lever you don't get with self-hosted LiteLLM out of the box.
These are illustrative numbers. Your engineer hourly rate might be €80 or €150. Your ops overhead might be 3h/week or 12h/week. But the shape of the calculation is robust: once you price engineering time honestly, self-hosted stops being "free."
No credit card required
When self-hosting actually wins
Self-hosted LiteLLM wins when at least one of these is true:
You already have the platform
If your team runs a Kubernetes cluster, has PostgreSQL, Redis, Grafana, and on-call rotations in place for other reasons, adding LiteLLM is close to free. The infrastructure is already there; the marginal operational burden is small.
You have a dedicated platform engineer
A team with someone whose job is platform work — not a part-time ops rotation — can absorb LiteLLM cleanly. The 4-8 hours/week we discussed earlier is inside their existing workload, not incremental.
You have hard data residency requirements
If you need your gateway to run in a specific region, on specific hardware, with specific sub-processors — and no managed service can meet those constraints — self-hosted is the only option. This is rare but real. Defense, healthcare in some jurisdictions, government.
You're cost-sensitive and at scale
At very high request volumes (10M+ requests/month) where gateway subscription costs become a real line item, self-hosted can be more efficient if your platform costs are amortized across multiple workloads. But "very high" means high enough that the platform-engineering math also changes — you probably have the team to support it.
You want the OSS escape hatch on principle
Some teams will not depend on a closed-source vendor for a critical path component, full stop. That's a legitimate choice. LiteLLM gives you an OSS option that a closed managed service doesn't.
When a managed gateway wins
You're a small team shipping product
If your engineering team is under 10 people and your job is shipping features, spending 4-8 hours/week on gateway ops is 4-8 hours not spent on product. The ROI on offloading this is immediate.
You want features you'd never build yourself
Burn-rate detection. Zombie-agent blocking. Intelligence-based routing. Per-endpoint auto-downgrade. These are 2-4 weeks of engineering each. For most teams, they'd never happen — the product roadmap crowds them out. A managed gateway ships them as defaults.
You want observability without building it
A polished observability UI — cost breakdowns, latency percentiles, per-endpoint dashboards — is a real product in itself. LiteLLM's UI is functional. Managed gateways are typically ahead here because it's their product.
You want incident response on someone else's pager
The moment a gateway incident is not your team's problem, you've moved a whole category of worry off your shoulders. For teams that don't have on-call infrastructure, this is substantial.
You want EU hosting without running an EU data center
For European teams needing EU data residency, using a managed gateway that's already EU-hosted (on OVH, on Scaleway, on a European cloud) is infinitely faster than setting up your own regionally-compliant deployment.
The head-to-head
| Dimension | LiteLLM self-hosted | Managed gateway |
|---|---|---|
| Upfront cost | Free (software) | Subscription |
| Ops time | 4-8h/week | ~0 |
| Feature velocity | Lags by 1-4 weeks | Shipped continuously |
| Feature depth | Core features | Core + guardrails + observability |
| Control | Full | Vendor-bounded |
| Lock-in risk | Low (OSS) | Medium (migrate by swapping base_url) |
| Upgrade burden | You | Them |
| On-call | You | Them |
| Best for | Teams with platform engineers | Teams shipping product |
The hybrid path (which is real)
Some teams run LiteLLM in development and test, and a managed gateway in production. Others do the inverse. Both can work.
If you're comfortable running LiteLLM locally for engineers to test against — no ops burden, no scale — and using a managed gateway for production where reliability and feature depth matter, you get most of the best of both. The main cost is configuration drift between environments.
The honest bottom line
LiteLLM is a great piece of software. It's also not free. When you honestly account for engineering time, on-call burden, feature lag, and the operational surface area of a production gateway, self-hosted LiteLLM is cheaper than a managed service only at specific combinations of team shape and scale.
For most teams — small engineering groups, product-focused, no dedicated platform engineer — a managed gateway is the cheaper, faster, less distracting choice. For teams that already run the platform infrastructure, have the headcount, or need OSS on principle, LiteLLM is the right answer.
There's no universal winner. There's just the honest math for your specific team.
Next: LLM gateway pricing models explained — per-token, per-request, BYOK, flat.
Was this useful?
Comments
Be the first to comment.