Prompt Caching Break Even Calculator

Find the minimum cache-reuse rate required for prompt caching (Anthropic, OpenAI, Google) to actually save money versus paying the write premium. Numbers update live as you type.

Minimum reuse rate to break even
Enter your inputs above.
Effective $ per 1M tokens (with caching)
Cost savings vs no caching

Math: amortized cost per cached token = (writePremium + reuseRate * cacheDiscount) / (1 + reuseRate). Break-even vs base price (1.0) gives the minimum reuseRate.

How prompt caching break-even works

Prompt caching lets an LLM provider store the prefix of a prompt so that repeat requests charge a steeply discounted cache-read rate instead of the full input rate. The catch: the very first write of that prefix carries a write premium (Anthropic charges 1.25x, OpenAI 1.25-2x depending on model) on top of the base price. Caching only pays off once you reuse the same cached prefix enough times to amortize that upfront surcharge.

This prompt caching break even calculator solves for the tipping point. Given your write premium, cache-read discount, and how many times you expect to reuse each cached block before its TTL expires, it returns the exact minimum reuse count required for caching to beat the no-cache baseline, plus the effective cost per token and percentage savings at your actual reuse level. If your real reuse count clears the threshold, caching is a net win; if it falls short, you are paying extra for nothing.

Typical break-even thresholds are low (often 1-2 reuses) because cache-read discounts are aggressive (Anthropic Sonnet caches at 0.1x, OpenAI at 0.5x), but they balloon quickly for large system prompts, long RAG contexts, or few-shot examples that change rarely. Use this tool before wiring caching into a production pipeline so you know the reuse volume your traffic must sustain.