April 202610 min readJohan Bretonneau

LiteLLM vs managed gateways: when self-hosting actually costs more
The fully-loaded cost of running your own LLM gateway, honestly

LiteLLM OSS is great. It's also not free. The real cost of self-hosting vs a managed gateway — ops time, on-call, feature lag, upgrade burden — broken down.

LiteLLM is a well-built, widely-used OSS LLM gateway. It's a legitimately good piece of software. It's also not the right choice for every team, and the "it's free" argument hides a set of costs that only show up after you've been running it for three months.

This isn't a hit piece. We used LiteLLM. We'd use it again for the right shape of team. The question is: when does self-hosting actually win, and when does a managed gateway win?

Let's do the math honestly.

What LiteLLM actually gives you

The core offering: an OSS Python library and proxy that exposes 100+ LLM providers behind an OpenAI-compatible API. Deploy it in Docker, Kubernetes, or a VM. Point your app at it. Done.

Out of the box you get:

  • Provider abstraction (one SDK, many backends)
  • Fallback routing (try provider A, then B on failure)
  • Basic caching
  • Request logging
  • Key management
  • Budget tracking (in the enterprise tier)

That's a real list. Self-hosted, with zero per-request cost going to a vendor.

The hidden costs of self-hosting

Here's where the "it's free" framing starts to leak.

Cost 1: Ops time

A realistic LiteLLM deployment needs, at minimum:

  • Docker or Kubernetes to run it.
  • A PostgreSQL instance (for logs, budgets, key state).
  • Redis (for cache and rate limiting).
  • Monitoring (Prometheus/Grafana or equivalent).
  • Log aggregation.
  • Secret management for provider keys.
  • CI/CD to deploy updates.

If you already run this stack for other reasons, the marginal cost is small. If you don't, you're provisioning and maintaining 3-5 pieces of infrastructure you otherwise wouldn't.

A conservative estimate: 4-8 hours/week of platform engineering for a non-trivial deployment. At a loaded cost of €100/hour for an experienced engineer, that's €400-800/week, or €1,600-3,200/month.

Cost 2: On-call

If your LLM calls are in the critical path of a production app, your gateway going down means your product goes down. Someone has to be reachable.

On-call rotations have a cost even when the pager doesn't fire — it's a constraint on engineer life that management often doesn't want to admit is real. For a small team, adding a new on-call surface for the LLM gateway is not free.

If your gateway has an incident at 2 AM, the cost of the incident is the engineer's sleep, the backlog delay, the actual debugging time, and potentially customer impact. A managed gateway takes that incident off your plate — their on-call rotation fires, not yours.

Cost 3: Feature lag

Providers ship new models constantly. New API fields (structured outputs, extended thinking, prompt caching parameters). New pricing tiers. New error codes. New rate limit semantics.

LiteLLM OSS keeps up with most of this, but there's always a lag. The cycle is: provider ships feature → LiteLLM issue filed → PR lands → new release cut → you upgrade your deployment → you test → you roll out.

For us, that cycle was typically 1-4 weeks behind a managed gateway that ships continuously. In a fast-moving space, four weeks matters.

Cost 4: Feature parity

LiteLLM OSS has routing and fallback. It doesn't have (or doesn't match managed-gateway depth on):

  • Intelligence-based routing that reads prompt complexity in real-time
  • Burn-rate detection and pre-budget-blown alerting
  • Zombie-agent detection (off-hours agent activity)
  • Loop detection (identical requests firing repeatedly)
  • Fine-grained per-endpoint budgets with auto-downgrade
  • Polished observability UI for non-engineers

You can build any of these. You end up building all of them. Each one is a week of engineering you're not spending on your actual product.

Cost 5: Upgrade burden

Every few weeks LiteLLM ships a new version. Usually minor, occasionally breaking. You have to:

  • Read the changelog.
  • Test the new version against your config.
  • Upgrade in a staging environment.
  • Smoke-test.
  • Roll out.

When it breaks — and it occasionally does, that's the nature of a fast-moving OSS project — you debug, sometimes against the OSS community, sometimes by reading code.

That's real work, and it's work that scales linearly with how many deployments you run.

The fully-loaded comparison

Let's model a concrete scenario: a team doing 1M LLM requests/month with €3,000/month of provider spend.

LiteLLM self-hosted

Cost lineMonthly
Provider spend€3,000
Infrastructure (VM, DB, Redis, monitoring)€150
Platform engineering (6h/week × €100/h × 4.3)€2,580
On-call stipend (rotational)€400
Total€6,130

Managed gateway (HiWay-style, per-request flat)

Cost lineMonthly
Provider spend€3,000
Gateway subscription (Business tier, 5M req/mo)€249
Platform engineering (integration only, ~1h/month)€100
Total€3,349

Managed is €2,781/month cheaper at this scale, almost entirely on engineering labor. And that's before factoring in smart routing, which typically cuts the €3,000 provider bill by 40-85% on top — a lever you don't get with self-hosted LiteLLM out of the box.

These are illustrative numbers. Your engineer hourly rate might be €80 or €150. Your ops overhead might be 3h/week or 12h/week. But the shape of the calculation is robust: once you price engineering time honestly, self-hosted stops being "free."

Start Saving →

No credit card required

When self-hosting actually wins

Self-hosted LiteLLM wins when at least one of these is true:

You already have the platform

If your team runs a Kubernetes cluster, has PostgreSQL, Redis, Grafana, and on-call rotations in place for other reasons, adding LiteLLM is close to free. The infrastructure is already there; the marginal operational burden is small.

You have a dedicated platform engineer

A team with someone whose job is platform work — not a part-time ops rotation — can absorb LiteLLM cleanly. The 4-8 hours/week we discussed earlier is inside their existing workload, not incremental.

You have hard data residency requirements

If you need your gateway to run in a specific region, on specific hardware, with specific sub-processors — and no managed service can meet those constraints — self-hosted is the only option. This is rare but real. Defense, healthcare in some jurisdictions, government.

You're cost-sensitive and at scale

At very high request volumes (10M+ requests/month) where gateway subscription costs become a real line item, self-hosted can be more efficient if your platform costs are amortized across multiple workloads. But "very high" means high enough that the platform-engineering math also changes — you probably have the team to support it.

You want the OSS escape hatch on principle

Some teams will not depend on a closed-source vendor for a critical path component, full stop. That's a legitimate choice. LiteLLM gives you an OSS option that a closed managed service doesn't.

When a managed gateway wins

You're a small team shipping product

If your engineering team is under 10 people and your job is shipping features, spending 4-8 hours/week on gateway ops is 4-8 hours not spent on product. The ROI on offloading this is immediate.

You want features you'd never build yourself

Burn-rate detection. Zombie-agent blocking. Intelligence-based routing. Per-endpoint auto-downgrade. These are 2-4 weeks of engineering each. For most teams, they'd never happen — the product roadmap crowds them out. A managed gateway ships them as defaults.

You want observability without building it

A polished observability UI — cost breakdowns, latency percentiles, per-endpoint dashboards — is a real product in itself. LiteLLM's UI is functional. Managed gateways are typically ahead here because it's their product.

You want incident response on someone else's pager

The moment a gateway incident is not your team's problem, you've moved a whole category of worry off your shoulders. For teams that don't have on-call infrastructure, this is substantial.

You want EU hosting without running an EU data center

For European teams needing EU data residency, using a managed gateway that's already EU-hosted (on OVH, on Scaleway, on a European cloud) is infinitely faster than setting up your own regionally-compliant deployment.

The head-to-head

DimensionLiteLLM self-hostedManaged gateway
Upfront costFree (software)Subscription
Ops time4-8h/week~0
Feature velocityLags by 1-4 weeksShipped continuously
Feature depthCore featuresCore + guardrails + observability
ControlFullVendor-bounded
Lock-in riskLow (OSS)Medium (migrate by swapping base_url)
Upgrade burdenYouThem
On-callYouThem
Best forTeams with platform engineersTeams shipping product

The hybrid path (which is real)

Some teams run LiteLLM in development and test, and a managed gateway in production. Others do the inverse. Both can work.

If you're comfortable running LiteLLM locally for engineers to test against — no ops burden, no scale — and using a managed gateway for production where reliability and feature depth matter, you get most of the best of both. The main cost is configuration drift between environments.

The honest bottom line

LiteLLM is a great piece of software. It's also not free. When you honestly account for engineering time, on-call burden, feature lag, and the operational surface area of a production gateway, self-hosted LiteLLM is cheaper than a managed service only at specific combinations of team shape and scale.

For most teams — small engineering groups, product-focused, no dedicated platform engineer — a managed gateway is the cheaper, faster, less distracting choice. For teams that already run the platform infrastructure, have the headcount, or need OSS on principle, LiteLLM is the right answer.

There's no universal winner. There's just the honest math for your specific team.


Next: LLM gateway pricing models explained — per-token, per-request, BYOK, flat.

Share

Was this useful?

Comments

Be the first to comment.