GPT-4o API Pricing 2026 — Complete Cost Breakdown
GPT-4o costs $2.50 per 1M input tokens and $10.00 per 1M output tokens as of May 2026. The Batch API drops this to $1.25 / $5.00 (50% off). Prompt caching hits at $1.25/1M for repeated prefixes. A typical API call (1K input + 500 output tokens) costs $0.0075.
GPT-4o Cost Calculator
Enter your token volumes below to calculate the exact API cost. Switch between Standard (real-time) and Batch (async, 50% off) pricing.
GPT-4o Interactive Cost Calculator
GPT-4o vs Alternatives: Full Pricing Comparison (2026)
The table below shows all current major frontier models ranked by input token cost. Quality scores reflect aggregate benchmark performance across MMLU, HumanEval, MATH, and GPQA.
| Model | Provider | Input / 1M | Output / 1M | Batch Input / 1M | Context | Quality | Speed |
|---|---|---|---|---|---|---|---|
| GPT-4o-mini | OpenAI | $0.15 | $0.60 | $0.075 | 128K | 79/100 | ~120 tok/s |
| Gemini 2.5 Flash | $0.30 | $2.50 | N/A | 1M | 82/100 | ~150 tok/s | |
| Gemini 2.5 Pro | $1.25 | $10.00 | N/A | 1M | 89/100 | ~80 tok/s | |
| GPT-4o | OpenAI | $2.50 | $10.00 | $1.25 | 128K | 90/100 | ~90 tok/s |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | $1.50 | 200K | 88/100 | ~80 tok/s |
| GPT-4.5 | OpenAI | $75.00 | $150.00 | $37.50 | 128K | 93/100 | ~40 tok/s |
| Claude Opus 4.6 | Anthropic | $15.00 | $75.00 | $7.50 | 200K | 92/100 | ~30 tok/s |
Highlighted row (★) = subject of this page. Prices verified May 2026. Quality scores are KickLLM aggregate estimates across public benchmarks.
Key Takeaways from the Comparison
- GPT-4o is the input-price sweet spot among frontier-quality models. Gemini 2.5 Pro undercuts it at $1.25/1M input, but GPT-4o's output pricing is identical and it has broader ecosystem support.
- GPT-4o-mini is 17x cheaper on input — use it for any task where 79/100 quality suffices. The break-even point: if mini passes your quality bar, you save ~94% per token.
- Claude Sonnet 4.6 output is 50% more expensive ($15 vs $10/1M). For output-heavy tasks (long-form generation, code), GPT-4o has a meaningful cost advantage.
- GPT-4.5 is a research/flagship tier — 30x the cost of GPT-4o for ~3 quality points. Only defensible for tasks where state-of-art reasoning is the product differentiator.
When to Use GPT-4o vs Alternatives: Decision Matrix
The right model choice depends on your quality threshold, output volume, and latency requirements. Use this matrix to route workloads efficiently.
| Workload Type | GPT-4o | GPT-4o-mini | Claude Sonnet 4.6 | Gemini 2.5 Pro | Recommendation |
|---|---|---|---|---|---|
| Customer-facing chatbot Quality errors are visible, real-time |
Use | Test first | Use | Use | GPT-4o or Sonnet 4.6; A/B test mini for deflection |
| Classification / routing Intent detection, triage, tagging |
Overkill | Use | Overkill | Overkill | GPT-4o-mini saves 94% vs GPT-4o |
| Code generation / debugging Multi-step, tool-use, agentic |
Use | Insufficient | Strong | Use | Claude Sonnet 4.6 leads on code; GPT-4o close second |
| Document summarization Long PDFs, reports, contracts |
Use | Test | Preferred | Preferred | Gemini 2.5 Pro wins on 1M context; Sonnet for quality |
| Batch data extraction Async, non-real-time, high volume |
Batch API | Batch API | Batch API | No batch | GPT-4o Batch API at $1.25/5.00 is competitive |
| Vision / image understanding Screenshots, charts, diagrams |
Native | Native | Native | Native | GPT-4o excels; mini good for simple extraction |
| Research / complex reasoning Multi-step, expert-level tasks |
Sub-optimal | Insufficient | Opus 4.6 | Sub-optimal | Upgrade to Claude Opus 4.6 or GPT-4.5 for frontier reasoning |
| RAG over large corpus Retrieval, 100K+ token context |
Use | With caching | 200K context | 1M context | Gemini 2.5 Pro or Sonnet for very long context |
GPT-4o Batch API: 50% Off for Async Workloads
The OpenAI Batch API processes requests within a 24-hour window rather than immediately. In exchange, you pay half the standard rate. This is the single most impactful cost lever available for GPT-4o workloads that do not require real-time response.
Batch API Pricing Breakdown
| Pricing Type | Input / 1M tokens | Output / 1M tokens | Latency |
|---|---|---|---|
| Standard API | $2.50 | $10.00 | Real-time (<30s) |
| Batch API | $1.25 | $5.00 | Up to 24 hours |
When Batch API Pays Off: A Real Example
Suppose you run a nightly pipeline that extracts structured data from 50,000 product descriptions. Each request: 800 input tokens, 300 output tokens.
- Standard API: (50,000 × 800 / 1,000,000 × $2.50) + (50,000 × 300 / 1,000,000 × $10.00) = $100 + $150 = $250/night
- Batch API: (50,000 × 800 / 1,000,000 × $1.25) + (50,000 × 300 / 1,000,000 × $5.00) = $50 + $75 = $125/night
- Monthly savings: $3,750 — just by switching to async processing.
Workloads that are good candidates for Batch API: document processing, data extraction, embedding generation, offline classification, content moderation queues, research pipelines, and any scheduled nightly jobs.
Prompt Caching: Saving on Repeated Prefixes
GPT-4o automatically applies prompt caching when a request shares a long prefix with a recent prior request. Cached input tokens cost $1.25/1M instead of $2.50/1M — exactly 50% off. The cache is maintained for approximately one hour of inactivity.
How Prompt Caching Works
If your system prompt is 5,000 tokens and you send 10,000 API calls per day, the first call pays full price. Every subsequent call within the cache window pays half-price on those 5,000 tokens:
- Without caching: 10,000 × 5,000 / 1,000,000 × $2.50 = $125/day just for the system prompt
- With caching (99% hit rate): First call $0.0125 + 9,999 × 5,000 / 1,000,000 × $1.25 = $62.51/day
- Daily savings: ~$62.50 | Annual: ~$22,800
Prompt caching is automatic — you do not need to change your API calls. Maximize its value by placing static content (system prompts, RAG context, few-shot examples) at the beginning of messages and dynamic content (user messages) at the end.
Cached vs Uncached Input Price
| Token Type | Price / 1M tokens | Notes |
|---|---|---|
| Standard input (uncached) | $2.50 | All new input tokens |
| Cached input | $1.25 | Repeated prefix within cache TTL |
| Standard output | $10.00 | All generated tokens (no caching) |
5 Cost Optimization Strategies with Exact Math
These are the five highest-leverage changes you can make to reduce GPT-4o API costs, in rough order of impact for most workloads.
Route simple tasks to GPT-4o-mini
Audit your API calls. Any call that does classification, extraction, intent detection, or simple Q&A is likely over-spending on GPT-4o. GPT-4o-mini handles these at $0.15/$0.60/1M — 17x and 16.7x cheaper on input/output respectively.
Routed to mini: 0.40 × $1,000 × (1 - 1/17) = saves ~$376/month
Use Batch API for all non-real-time workloads
Any pipeline that does not need results in under 30 seconds is a Batch API candidate. Scheduled jobs, nightly reports, offline enrichment, async moderation queues — all qualify. Submit via the /v1/batches endpoint.
If 60% of volume is async: 0.60 × $1,000 × 0.50 = saves $300/month
Maximize prompt cache hit rate
Structure every message with static content first (system prompt, documents, examples) and dynamic content last (user query). This ensures the maximum prefix is eligible for caching. A 5K-token system prompt with a 95% cache hit rate on 5,000 daily calls saves ~$28/day.
Savings = 0.95 × 5,000 × 5,000 / 1M × ($2.50 - $1.25) = $29.69/day
Compress system prompts aggressively
Every character in your system prompt is paid per request. A 3,000-token system prompt trimmed to 1,000 tokens (33%) saves $0.005 per call — $150/month at 1,000 requests/day. Techniques: remove redundant examples, use structured formats, cut explanations that are clear from context.
2,000 / 1M × $2.50 × 1,000 × 30 = saves $150/month
Set max_tokens to bound output costs
Output tokens cost 4x input tokens. Unbounded generation is the most common source of cost surprises. Set max_tokens to the minimum your use case requires. For classification: 5–20 tokens. For summaries: 200–500. For code generation: 1,000–2,000. Add output length monitoring to catch regressions.
500 tokens saved × 10K × 30 / 1M × $10 = saves $1,500/month
GPT-4o Monthly Cost Estimates by Usage Tier
Reference costs at common usage levels. All figures use a 70/30 input/output token split (typical for conversational workloads).
| Tier | Requests/Day | Avg Input Tokens | Avg Output Tokens | Monthly Cost (Standard) | Monthly Cost (Batch) |
|---|---|---|---|---|---|
| Hobby / Side Project | 50 | 800 | 400 | $3.60 | $1.80 |
| Small SaaS (launch) | 500 | 1,200 | 600 | $54.00 | $27.00 |
| Medium SaaS | 5,000 | 1,500 | 750 | $675.00 | $337.50 |
| Production App | 50,000 | 2,000 | 1,000 | $9,000.00 | $4,500.00 |
| Enterprise Scale | 500,000 | 2,000 | 1,000 | $90,000.00 | $45,000.00 |
At the Enterprise Scale tier, consider negotiating a committed-use discount directly with OpenAI — volume commitments typically unlock 15%–30% off list pricing. The Batch API provides the easiest guaranteed 50% reduction for eligible workloads.
GPT-4o Cost Per Common Task
| Task | Input Tokens | Output Tokens | Standard Cost | Batch Cost |
|---|---|---|---|---|
| Email reply draft | ~400 | ~300 | $0.0040 | $0.0020 |
| 1-page document summary | ~800 | ~400 | $0.0060 | $0.0030 |
| Customer support response | ~1,200 | ~500 | $0.0080 | $0.0040 |
| Code function generation | ~800 | ~600 | $0.0080 | $0.0040 |
| 10-turn conversation | ~8,000 | ~2,500 | $0.0450 | N/A (real-time) |
| 10K token research brief | ~3,000 | ~2,500 | $0.0325 | $0.0163 |
| Classify 1,000 records | ~300K | ~10K | $0.85 | $0.425 |
| Full codebase review (20K tokens) | ~20,000 | ~4,000 | $0.09 | $0.045 |
GPT-4o API: Code Example with Cost Tracking
Use the usage field from the API response to calculate exact costs per call. The pattern below works for all OpenAI models — swap in the per-token rates for the model you are using.
from openai import OpenAI
client = OpenAI() # uses OPENAI_API_KEY from env
# GPT-4o pricing (per 1M tokens)
GPT4O_INPUT_RATE = 2.50 # standard input
GPT4O_OUTPUT_RATE = 10.00 # standard output
GPT4O_CACHED_RATE = 1.25 # cached input (prompt caching)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize this document: ..."}
],
max_tokens=500 # always set to bound output cost
)
usage = response.usage
cached = getattr(usage, "prompt_tokens_details", None)
cached_tokens = cached.cached_tokens if cached else 0
uncached_tokens = usage.prompt_tokens - cached_tokens
input_cost = (uncached_tokens / 1_000_000) * GPT4O_INPUT_RATE
cached_cost = (cached_tokens / 1_000_000) * GPT4O_CACHED_RATE
output_cost = (usage.completion_tokens / 1_000_000) * GPT4O_OUTPUT_RATE
total_cost = input_cost + cached_cost + output_cost
print(f"Tokens in : {usage.prompt_tokens:,} ({cached_tokens:,} cached)")
print(f"Tokens out: {usage.completion_tokens:,}")
print(f"Cost : ${total_cost:.6f}")
# Example: Tokens in: 1,248 (800 cached) | Tokens out: 412 | Cost: $0.005770
Frequently Asked Questions
GPT-4o costs $2.50 per 1M input tokens and $10.00 per 1M output tokens at standard (real-time) pricing as of May 2026. OpenAI's Batch API halves both rates to $1.25 input / $5.00 output for asynchronous workloads. Prompt caching discounts repeated input prefixes to $1.25/1M. These rates have been stable since the model's pricing revision in late 2025.
GPT-4o costs $0.0025 per 1K input tokens and $0.01 per 1K output tokens at standard pricing. A typical API call with 1,000 input + 500 output tokens costs $0.0075 total. With Batch API: $0.00125 input + $0.005 output per 1K tokens. These numbers are easy to sanity-check: 1M tokens ÷ 1,000 = the per-1K rate.
It depends on the workload composition. GPT-4o is cheaper on input ($2.50 vs $3.00/1M), while Claude Sonnet 4.6 is more expensive on output ($15.00 vs $10.00/1M). For a typical chatbot with 70% input / 30% output split, GPT-4o is about 20% cheaper overall. For code generation or long-form writing (output-heavy), the gap widens to 30–40% in GPT-4o's favor. Claude Sonnet 4.6 has a 200K context window vs GPT-4o's 128K — relevant for large document workloads.
GPT-4o-mini costs $0.15/1M input and $0.60/1M output — 16.7x–17x cheaper than GPT-4o. For a workload spending $1,000/month on GPT-4o, routing to mini drops that to roughly $60/month if quality is comparable. GPT-4o-mini scores approximately 79/100 on aggregate benchmarks versus 90/100 for GPT-4o — so the quality drop is real. For classification, extraction, routing, and simple Q&A, mini is almost always the right call. For complex reasoning, nuanced writing, or multi-step tool use, GPT-4o or higher is required.
The Batch API accepts a JSONL file of up to 50,000 requests and processes them within 24 hours at 50% off standard rates. GPT-4o batch pricing: $1.25/1M input, $5.00/1M output. You submit via POST /v1/batches, poll for status with GET /v1/batches/{batch_id}, and retrieve results via a file download. Ideal for: document processing pipelines, nightly data enrichment, offline classification, and any job where 24-hour latency is acceptable. Not suitable for: customer-facing chat, real-time suggestions, or interactive applications.
Prompt caching automatically discounts repeated input prefixes. When your request begins with tokens that match a recently cached sequence, those tokens are billed at $1.25/1M (half the standard $2.50). The cache is maintained for approximately 1 hour of inactivity. To maximize hits: put your system prompt and any static context first, then the dynamic user message last. A 10K-token system prompt with 95% cache hit rate on 5,000 daily calls saves roughly $59/day — $1,770/month — purely from cache hits on the static portion. The usage.prompt_tokens_details.cached_tokens field shows how many tokens were served from cache.
GPT-4o-mini ($0.15/$0.60): classification, extraction, routing, simple Q&A, high-volume low-stakes tasks. Your default starting point — only upgrade if quality testing reveals failures.
GPT-4o ($2.50/$10.00): complex reasoning, customer-facing chat, multi-step tool use, vision, nuanced writing. The balanced frontier choice — strong quality without GPT-4.5's premium.
GPT-4.5 ($75/$150): frontier research, elite writing quality, tasks where state-of-art performance is the product's core value proposition. 30x the cost of GPT-4o for ~3–4 quality points on benchmarks. Hard to justify unless you've exhausted GPT-4o's capability ceiling.
Monthly cost varies significantly by volume and configuration. Reference points at 70/30 input/output split:
Small SaaS (500 req/day, 1,200/600 tokens): ~$54/month standard, ~$27/month batch
Medium SaaS (5K req/day, 1,500/750 tokens): ~$675/month standard, ~$337/month batch
Production (50K req/day, 2,000/1,000 tokens): ~$9,000/month standard, ~$4,500/month batch
Tip: at $9K+/month, contact OpenAI sales for a volume commitment discount (typically 15–30% off). Use the interactive calculator above to model your exact scenario.
Use KickLLM's full calculator to model your exact GPT-4o spend alongside every major LLM provider in one view — free, no sign-up.