GPT-4o API Pricing Guide — May 2026

GPT-4o at $2.50/$10 per 1M tokens. Compare GPT-4o, GPT-4o mini, and legacy GPT-4 Turbo costs. Calculate monthly API spend for any workload.

Current GPT-4o API Pricing (May 2026)

OpenAI offers several GPT-4 variants at different price points. Pricing is billed per token, with separate rates for input (prompt) tokens and output (completion) tokens. Understanding these tiers is essential for budgeting your AI infrastructure costs.

Model	Input / 1M tokens	Output / 1M tokens	Context Window
GPT-4o	$2.50	$10.00	128K
GPT-4o mini	$0.15	$0.60	128K
GPT-4 Turbo (legacy)	$10.00	$30.00	128K

GPT-4o is OpenAI's flagship model at $2.50/$10 per 1M tokens -- a significant price reduction from its original $5/$15 launch pricing. GPT-4o mini targets the budget tier at $0.15/$0.60, competing directly with Claude Haiku 4.5 and Gemini Flash for high-volume, cost-sensitive workloads. GPT-4 Turbo ($10/$30) is now a legacy model and should not be used for new projects. Cached input tokens are available at $1.25 per 1M (50% off).

GPT-4o vs Claude: Head-to-Head Pricing (May 2026)

The most common comparison developers make is between GPT-4o and Claude Sonnet 4.6, as both target the same mid-tier quality segment. GPT-4o is now cheaper at $2.50 input versus Claude Sonnet 4.6's $3 per million tokens. Output tokens also favor GPT-4o at $10 versus $15 per million. For cost-sensitive workloads, GPT-4o offers a meaningful price advantage, though Claude Sonnet 4.6 tends to outperform on coding and instruction-following benchmarks.

At the budget tier, GPT-4o mini ($0.15/$0.60) significantly undercuts Claude Haiku 4.5 ($1.00/$5.00) on per-token cost. GPT-4o mini is 85% cheaper on input and 88% cheaper on output. However, Claude Haiku 4.5 tends to produce more reliable structured outputs in benchmarks, so the effective cost per successful completion may be closer than the raw token prices suggest. The right choice depends on your quality requirements and error tolerance.

At the top tier, Claude Opus 4.7 ($5/$25) competes more directly with GPT-4o ($2.50/$10) than previous Opus generations did. Both now target a similar price range, with Opus 4.7 offering higher quality at roughly 2x the cost. Opus 4.7 and Sonnet 4.6 also support 1M token context windows, compared to GPT-4o's 128K.

Cost Optimization Strategies for GPT-4

The most impactful optimization is ensuring you are on GPT-4o rather than legacy GPT-4 Turbo. GPT-4o at $2.50/$10 is 75% cheaper on input and 67% cheaper on output than GPT-4 Turbo at $10/$30. GPT-4 Turbo is effectively deprecated, so this migration is essential.

For applications with mixed complexity, implement a model router that sends simple queries to GPT-4o mini and escalates complex ones to GPT-4o. A well-tuned router can handle 70-80% of requests with the mini model, reducing average costs by 60% or more. Use classification signals like query length, topic complexity, and required reasoning depth to make routing decisions.

OpenAI's Batch API processes requests asynchronously within 24 hours and offers a 50% discount. For workloads like content generation, data extraction, or evaluation pipelines that do not require real-time responses, batch processing cuts GPT-4o costs to $1.25/$5.00 per million tokens, making it one of the cheapest high-quality API options available.

Additional optimizations include setting appropriate max_tokens limits to prevent unnecessarily long outputs, using system prompt caching where available, compressing context with summarization before sending to the API, and using function calling with structured outputs to reduce token waste from conversational filler in responses.

Frequently Asked Questions

How much does GPT-4o cost per API call?

GPT-4o costs $2.50 per million input tokens and $10.00 per million output tokens as of May 2026. A typical API call with 1,000 input tokens and 500 output tokens costs $0.0075. At 10,000 calls per day, that is roughly $2,250 per month.

What is the difference between GPT-4o and GPT-4 Turbo?

GPT-4o is OpenAI's flagship multimodal model at $2.50/$10 per million tokens, while GPT-4 Turbo is the deprecated previous generation at $10/$30. GPT-4o is faster, cheaper, and handles text, images, and audio natively. GPT-4 Turbo should not be used for new projects.

Is GPT-4o mini cheaper than Claude Haiku?

GPT-4o mini costs $0.15/$0.60 per million tokens while Claude Haiku 4.5 costs $1.00/$5.00. GPT-4o mini is significantly cheaper on both input and output tokens, though Claude Haiku 4.5 may offer better quality for certain tasks. Test both for your specific use case.

Does OpenAI offer volume discounts on GPT-4?

OpenAI offers tiered rate limits based on usage level but does not publicly offer per-token volume discounts. Enterprise customers can negotiate custom pricing. OpenAI's Batch API provides a 50% discount for non-real-time workloads processed within 24 hours.

How do I reduce GPT-4 API costs?

Use GPT-4o mini for simple tasks, limit max_tokens in responses, cache frequent prompts, use the Batch API for offline processing, and implement model routing to send only complex queries to GPT-4o. Switching from GPT-4 Turbo to GPT-4o alone saves 50%.

Related Guides

Claude API Pricing Guide LLM Model Comparison LLM Token Counter

Built by Michael Lip. Pricing data updated regularly from official provider pages.