April 20265 min readJohan Bretonneau

Switch Your LLM Provider in 3 Minutes
A Safety-Net Migration Guide

Moving from OpenAI to Claude, from Claude to Mistral, or running both in parallel, without rewriting your app. Here's the two-line change that gives you provider optionality and a safety net.

Most teams that want to switch LLM providers never do. Not because switching is hard, but because switching safely, without breaking production, without losing weeks to a rewrite, without committing to a vendor you haven't validated yet, feels hard.

It doesn't have to. Here's the minimum-viable migration path from OpenAI to Claude (or between any two OpenAI-compatible providers), with a safety net that lets you roll back in seconds.

The Two-Line Change

If your code uses the OpenAI SDK today, and most Python and JavaScript code does, the smallest possible switch is:

# Before
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# After
client = OpenAI(
    api_key=os.environ["HIWAY_API_KEY"],
    base_url="https://app.hiway2llm.com/v1",
)

Two lines. Your client.chat.completions.create(...) calls don't change. Your streaming code doesn't change. Your tool use doesn't change.

The proxy speaks OpenAI's API on the inbound side and translates to whatever provider you configured on the outbound side. Anthropic, Google, Mistral, xAI. You pick the provider in a dashboard, not in code.

This works because the industry has converged on "OpenAI-compatible" as the lingua franca. Most modern proxies, gateways, and routers implement the OpenAI chat completions API, even if they're calling Claude or Gemini underneath.

The Safety Net

Here's the part most migration guides skip: you don't have to commit to one provider immediately. The correct migration path is:

Phase 1, run both in shadow. Your primary traffic still hits OpenAI. A small percentage (say, 5%) gets duplicated to Claude via the proxy. You log the latency, cost, and output for each. After a week, you have real data on quality, latency, and cost delta for your specific workload.

Phase 2, graduate. Once the shadow data is clean, you switch primary traffic. The proxy is still there. You keep OpenAI as a fallback, if Claude errors or the latency spikes, the proxy rolls over automatically.

Phase 3, retire the fallback. Months later, once you're confident, you drop the fallback. Or you don't. Keeping provider redundancy is cheap and saves you the day a provider has an outage.

Most teams try to do all three phases at once, in a weekend, and end up rolling back when the first incident hits.

Implementation

Step 1: Get an API key at your target provider

Create an account at Anthropic, OpenAI, Google, whoever. Get an API key. Fund the account. This takes 3 minutes with a credit card.

Critical: the key lives in your account, not in some middleware's reseller account. This is what BYOK means. If the middleware goes down, your key still works directly.

Step 2: Wire the proxy

You can skip this if you want to go provider-direct. But if you want the routing, budget controls, and safety net, you point at a proxy:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["HIWAY_API_KEY"],
    base_url="https://app.hiway2llm.com/v1",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",  # or "auto" for smart routing
    messages=[
        {"role": "user", "content": "What's 2+2?"},
    ],
)

The model parameter now accepts any provider's model name. Set it to "auto" and the router picks per request.

Step 3: Turn on shadow mode

In the proxy dashboard, configure shadow routing: 5% of requests go to Claude in parallel with OpenAI. The primary response is returned to the user; the shadow response is logged.

After 1,000 shadow requests you have a decent dataset. Compare:

  • Cost per request (Anthropic is usually cheaper on Sonnet-tier)
  • Latency (Anthropic's TTFT is usually better; OpenAI's throughput is often higher)
  • Output quality (eyeball 20-30 pairs yourself; don't trust an LLM judge for this)

Step 4: Flip the primary

Once you're happy with shadow results, flip the routing: Claude is now primary, OpenAI is the fallback. This is a dashboard change, not a deploy.

Watch your error rate for 24-48 hours. If it's stable, you're done.

Step 5: Decide about the fallback

Keeping OpenAI as a fallback costs nothing until there's an incident. When Anthropic had its Jan outage, our traffic seamlessly moved to OpenAI for 90 minutes with no user-visible impact. That day alone paid for keeping the fallback wired for the year.

If you're on a tight budget and you trust your primary, you can drop the fallback. But for production services, I'd keep it.

The Things That Will Differ

A few behaviors that aren't perfectly identical between providers, even through an OpenAI-compatible proxy:

System prompts. Anthropic treats the system prompt distinctly. OpenAI merges it with user messages. Most proxies normalize this, but if you have very long system prompts, test carefully.

Tool use JSON strictness. Claude is stricter about malformed tool schemas. If your tool definitions have loose typing, OpenAI will tolerate it; Claude will error. Fix the tool schemas, don't blame the proxy.

Streaming deltas. Streaming chunk shape is slightly different. If your frontend parses raw deltas instead of the normalized content field, you may see artifacts. Use the SDK's abstraction layer.

Rate limits. Anthropic's tier structure is different from OpenAI's. You may need to request tier increases separately on each.

None of these are deal-breakers. They're the usual friction of switching any infrastructure component.

What You Don't Need To Change

  • Your prompts. Move them as-is first, then tune later if needed.
  • Your retry logic. The OpenAI SDK's retries work unchanged through the proxy.
  • Your streaming code. Chunks look the same to your code.
  • Your tool use code. Same schema, same response handling.
  • Your caching. If you were caching on the client side, keep doing it.

The switch is primarily a configuration change. The code stays.

Why Bother

The usual reasons: lower cost, better fit, provider redundancy. The underrated reason: optionality. Once you can switch providers with a config change, you negotiate differently. You're never locked in. When Anthropic releases a better model, you try it immediately. When OpenAI cuts prices, you evaluate. When Mistral opens up a new model, you shadow-test it in an afternoon.

That optionality is worth more than the cost savings for most production teams.

Start Saving →

No credit card required


Related: BYOK Explained, why bringing your own keys is the foundation of this whole approach.

Share

Was this useful?

Comments

Be the first to comment.