Anthropic prompt caching (auto-injected)

We add `cache_control` breakpoints to your Anthropic requests automatically. ~10x cheaper input on cache hits, zero config.

Anthropic's prompt caching can cut input cost by ~10x on repeated prefixes (system prompt, tool schemas, long context). But the API ignores cache_control silently when the prefix is too small, and most BYOK users never know it exists. HiWay injects the right breakpoints on the right blocks of every Anthropic request, invisibly. You pay 0.1x on cache hits and never have to think about it.

On by default

No setup. Anthropic models (Sonnet, Opus, Haiku) automatically get cache injection. Hits show up in your usage as cache_read_input_tokens.

What gets cached

System prompt, the most stable prefix across an agent loop or multi-turn conversation.
Tool definitions, your full toolbelt is cached once, then replayed at 0.1x for every subsequent call.
Skipped on purpose: the last user turn, it changes on every call, so caching it would waste a write.
Skipped on purpose: requests where you already set cache_control yourself, we never override your intent.

When it actually fires

Anthropic enforces a minimum prefix size before caching kicks in (it varies by model, newer Haiku families need a larger prefix than Sonnet/Opus). HiWay estimates the prefix size and only injects when the threshold will be met. Below it, no breakpoint is added, there's nothing to gain.

Combines with semantic cache

Both layers run. Semantic cache is checked first, if it hits, no upstream call happens at all (100% savings on that request). On miss, the request goes to Anthropic with cache_control injected, 0.1x on the prefix. The two layers catch different things and add up.

Opting out

Cache write costs 1.25x (5-min TTL) or 2x (1h TTL) input. Break-even is one repeat call. For workloads that genuinely never repeat (one-shot scripts), the writes are wasted. Disable auto-injection per workspace in Dashboard → Settings → Prompt caching → Anthropic auto-cache, or set the env flag in your self-hosted deployment.

How to verify it's working

Look for cache_read_input_tokens and cache_creation_input_tokens in the Anthropic usage block of your response. They also show up in Dashboard → Usage under "Cache hits" and in the per-request log.