Claude API Pricing Guide — May 2026

Compare Haiku 4.5, Sonnet 4.6, and Opus 4.7 costs per million tokens. Calculate monthly spend for your specific workload.

Current Claude API Pricing (May 2026)

Anthropic offers three tiers of Claude models, each designed for different workload profiles. Pricing is based on input and output tokens separately, which means the cost of a request depends on both how much context you send and how long the response is. All prices below are per one million tokens.

ModelInput / 1M tokensOutput / 1M tokensContext Window
Claude Haiku 4.5$1.00$5.00200K
Claude Sonnet 4.6$3.00$15.001M
Claude Opus 4.7$5.00$25.001M

These prices apply to the standard real-time API. Opus 4.7 and Sonnet 4.6 now support 1M token context at flat rates with no surcharge. Anthropic also offers prompt caching, which reduces the cost of repeated prefixes in your prompts. Cached input tokens cost 90% less than standard input tokens, making it particularly effective for system prompts and few-shot examples that remain constant across requests.

When to Use Each Claude Tier

Choosing the right Claude model is the single most impactful cost decision you will make. Running Opus when Haiku would suffice can inflate your bill by 60x. Here is a practical breakdown of when each tier makes sense.

Claude Haiku 4.5 is ideal for high-volume, low-complexity tasks: content classification, entity extraction, simple question answering, data validation, and routing queries to more expensive models. At $1.00 per million input tokens, you can process roughly 1 million requests per dollar with short prompts. Haiku responds in under 500 milliseconds for most requests, making it suitable for latency-sensitive applications.

Claude Sonnet 4.6 is the workhorse for production applications. It handles multi-step reasoning, code generation, summarization, and conversational AI with quality that approaches Opus at a fraction of the cost. Most teams should default to Sonnet and only escalate to Opus for tasks where Sonnet measurably underperforms. At $3/$15 per million tokens, a chatbot handling 10,000 conversations per day with 2,000 tokens each costs roughly $960/month.

Claude Opus 4.7 is reserved for tasks requiring the highest level of reasoning: complex code architecture decisions, nuanced legal or medical text analysis, multi-document synthesis, and advanced mathematical proofs. At $5/$25 pricing (significantly reduced from earlier generations), Opus is now more accessible for production use cases that demand frontier-class quality. Note that Opus 4.7 uses a new tokenizer that can generate up to 35% more tokens for the same input text, so effective cost per request may be higher than per-token pricing suggests.

Cost Optimization Tips

There are several strategies to reduce your Claude API bill without sacrificing quality. First, implement a model routing layer that sends simple requests to Haiku and only escalates to Sonnet or Opus based on complexity signals. This alone can reduce costs by 50-80% for mixed workloads.

Second, use Anthropic's Batch API for non-real-time workloads. The Batch API processes requests within a 24-hour window and offers a 50% discount across all models. This makes Sonnet 4.6 batch processing cost just $1.50/$7.50 per million tokens, which is competitive with many open-source hosted alternatives.

Third, leverage prompt caching for repetitive system prompts. Cached input tokens cost 90% less. If your system prompt is 4,000 tokens and you send 100,000 requests per day, caching saves substantially on Sonnet 4.6. Fourth, minimize output tokens by instructing the model to be concise and using structured output formats like JSON, which tend to be more token-efficient than natural language responses.

Finally, consider the total cost of ownership. While Haiku is the cheapest per token, a task that requires three Haiku retries to get right may cost more than a single Sonnet call. Track your success rates per model tier and optimize for cost-per-successful-completion rather than raw token cost.

Frequently Asked Questions

How much does Claude API cost per month?

Monthly costs depend on your model choice and usage volume. At 1 million tokens per day, Claude Haiku 4.5 costs roughly $90/month, Sonnet 4.6 costs $270/month, and Opus 4.7 costs $450/month. Use the KickLLM calculator for exact estimates based on your workload.

What is the difference between Claude Haiku 4.5, Sonnet 4.6, and Opus 4.7?

Haiku 4.5 ($1/$5 per 1M tokens) is the fastest and cheapest, ideal for classification and simple tasks. Sonnet 4.6 ($3/$15) balances quality and cost for most production workloads. Opus 4.7 ($5/$25) is the most capable model for complex reasoning, coding, and analysis with 1M token context.

Does Anthropic offer batch API discounts for Claude?

Yes. Anthropic's Batch API provides a 50% discount on all Claude models. Batch requests are processed within 24 hours rather than in real-time, making them ideal for bulk processing, evaluations, and offline workloads.

How does Claude pricing compare to GPT-4o?

Claude Sonnet 4.6 ($3/$15) is slightly more expensive than GPT-4o ($2.50/$10) on both input and output tokens. Claude Opus 4.7 ($5/$25) competes at the premium tier. Claude Haiku 4.5 ($1/$5) is more expensive than GPT-4o mini ($0.15/$0.60) but offers stronger structured output quality.

What is the Claude API context window and how does it affect cost?

Claude models support up to 200K tokens of context. Longer contexts increase input token costs proportionally. For cost efficiency, keep prompts concise and use retrieval-augmented generation (RAG) instead of stuffing entire documents into the context window.

Related Guides

GPT-4 API Pricing Guide LLM Model Comparison LLM API Cost Calculator Guide

Built by Michael Lip. Pricing data updated regularly from official provider pages.