GPT-4o API Pricing 2026 — Complete Cost Breakdown

Direct Answer

GPT-4o costs $2.50 per 1M input tokens and $10.00 per 1M output tokens as of May 2026. The Batch API drops this to $1.25 / $5.00 (50% off). Prompt caching hits at $1.25/1M for repeated prefixes. A typical API call (1K input + 500 output tokens) costs $0.0075.

Input (standard)
$2.50
per 1M tokens
Output (standard)
$10.00
per 1M tokens
Batch Input
$1.25
50% off, async
Batch Output
$5.00
50% off, async
Cached Input
$1.25
prompt caching
Context Window
128K
tokens

GPT-4o Cost Calculator

Enter your token volumes below to calculate the exact API cost. Switch between Standard (real-time) and Batch (async, 50% off) pricing.

GPT-4o Interactive Cost Calculator

Total Cost
$0.0075
1 request
Cost per Request
$0.0075
input + output
Input Cost
$0.0025
1,000 tokens
Output Cost
$0.0050
500 tokens

GPT-4o vs Alternatives: Full Pricing Comparison (2026)

The table below shows all current major frontier models ranked by input token cost. Quality scores reflect aggregate benchmark performance across MMLU, HumanEval, MATH, and GPQA.

Model Provider Input / 1M Output / 1M Batch Input / 1M Context Quality Speed
GPT-4o-mini OpenAI $0.15 $0.60 $0.075 128K 79/100 ~120 tok/s
Gemini 2.5 Flash Google $0.30 $2.50 N/A 1M 82/100 ~150 tok/s
Gemini 2.5 Pro Google $1.25 $10.00 N/A 1M 89/100 ~80 tok/s
GPT-4o OpenAI $2.50 $10.00 $1.25 128K 90/100 ~90 tok/s
Claude Sonnet 4.6 Anthropic $3.00 $15.00 $1.50 200K 88/100 ~80 tok/s
GPT-4.5 OpenAI $75.00 $150.00 $37.50 128K 93/100 ~40 tok/s
Claude Opus 4.6 Anthropic $15.00 $75.00 $7.50 200K 92/100 ~30 tok/s

Highlighted row (★) = subject of this page. Prices verified May 2026. Quality scores are KickLLM aggregate estimates across public benchmarks.

Key Takeaways from the Comparison

When to Use GPT-4o vs Alternatives: Decision Matrix

The right model choice depends on your quality threshold, output volume, and latency requirements. Use this matrix to route workloads efficiently.

Workload Type GPT-4o GPT-4o-mini Claude Sonnet 4.6 Gemini 2.5 Pro Recommendation
Customer-facing chatbot
Quality errors are visible, real-time
Use Test first Use Use GPT-4o or Sonnet 4.6; A/B test mini for deflection
Classification / routing
Intent detection, triage, tagging
Overkill Use Overkill Overkill GPT-4o-mini saves 94% vs GPT-4o
Code generation / debugging
Multi-step, tool-use, agentic
Use Insufficient Strong Use Claude Sonnet 4.6 leads on code; GPT-4o close second
Document summarization
Long PDFs, reports, contracts
Use Test Preferred Preferred Gemini 2.5 Pro wins on 1M context; Sonnet for quality
Batch data extraction
Async, non-real-time, high volume
Batch API Batch API Batch API No batch GPT-4o Batch API at $1.25/5.00 is competitive
Vision / image understanding
Screenshots, charts, diagrams
Native Native Native Native GPT-4o excels; mini good for simple extraction
Research / complex reasoning
Multi-step, expert-level tasks
Sub-optimal Insufficient Opus 4.6 Sub-optimal Upgrade to Claude Opus 4.6 or GPT-4.5 for frontier reasoning
RAG over large corpus
Retrieval, 100K+ token context
Use With caching 200K context 1M context Gemini 2.5 Pro or Sonnet for very long context

GPT-4o Batch API: 50% Off for Async Workloads

The OpenAI Batch API processes requests within a 24-hour window rather than immediately. In exchange, you pay half the standard rate. This is the single most impactful cost lever available for GPT-4o workloads that do not require real-time response.

Batch API Pricing Breakdown

Pricing TypeInput / 1M tokensOutput / 1M tokensLatency
Standard API$2.50$10.00Real-time (<30s)
Batch API$1.25$5.00Up to 24 hours

When Batch API Pays Off: A Real Example

Suppose you run a nightly pipeline that extracts structured data from 50,000 product descriptions. Each request: 800 input tokens, 300 output tokens.

Workloads that are good candidates for Batch API: document processing, data extraction, embedding generation, offline classification, content moderation queues, research pipelines, and any scheduled nightly jobs.

Prompt Caching: Saving on Repeated Prefixes

GPT-4o automatically applies prompt caching when a request shares a long prefix with a recent prior request. Cached input tokens cost $1.25/1M instead of $2.50/1M — exactly 50% off. The cache is maintained for approximately one hour of inactivity.

How Prompt Caching Works

If your system prompt is 5,000 tokens and you send 10,000 API calls per day, the first call pays full price. Every subsequent call within the cache window pays half-price on those 5,000 tokens:

Prompt caching is automatic — you do not need to change your API calls. Maximize its value by placing static content (system prompts, RAG context, few-shot examples) at the beginning of messages and dynamic content (user messages) at the end.

Cached vs Uncached Input Price

Token TypePrice / 1M tokensNotes
Standard input (uncached)$2.50All new input tokens
Cached input$1.25Repeated prefix within cache TTL
Standard output$10.00All generated tokens (no caching)

5 Cost Optimization Strategies with Exact Math

These are the five highest-leverage changes you can make to reduce GPT-4o API costs, in rough order of impact for most workloads.

Strategy 01

Route simple tasks to GPT-4o-mini

Audit your API calls. Any call that does classification, extraction, intent detection, or simple Q&A is likely over-spending on GPT-4o. GPT-4o-mini handles these at $0.15/$0.60/1M — 17x and 16.7x cheaper on input/output respectively.

If 40% of your GPT-4o calls are simple tasks and you spend $1,000/month:
Routed to mini: 0.40 × $1,000 × (1 - 1/17) = saves ~$376/month
Strategy 02

Use Batch API for all non-real-time workloads

Any pipeline that does not need results in under 30 seconds is a Batch API candidate. Scheduled jobs, nightly reports, offline enrichment, async moderation queues — all qualify. Submit via the /v1/batches endpoint.

$1,000/month standard → $500/month batch.
If 60% of volume is async: 0.60 × $1,000 × 0.50 = saves $300/month
Strategy 03

Maximize prompt cache hit rate

Structure every message with static content first (system prompt, documents, examples) and dynamic content last (user query). This ensures the maximum prefix is eligible for caching. A 5K-token system prompt with a 95% cache hit rate on 5,000 daily calls saves ~$28/day.

System prompt tokens cached at 95% hit rate, 5K requests/day:
Savings = 0.95 × 5,000 × 5,000 / 1M × ($2.50 - $1.25) = $29.69/day
Strategy 04

Compress system prompts aggressively

Every character in your system prompt is paid per request. A 3,000-token system prompt trimmed to 1,000 tokens (33%) saves $0.005 per call — $150/month at 1,000 requests/day. Techniques: remove redundant examples, use structured formats, cut explanations that are clear from context.

Trim 2K tokens/request, 1K req/day, 30 days:
2,000 / 1M × $2.50 × 1,000 × 30 = saves $150/month
Strategy 05

Set max_tokens to bound output costs

Output tokens cost 4x input tokens. Unbounded generation is the most common source of cost surprises. Set max_tokens to the minimum your use case requires. For classification: 5–20 tokens. For summaries: 200–500. For code generation: 1,000–2,000. Add output length monitoring to catch regressions.

Avg output 800 tokens → capped at 300 tokens, 10K req/day:
500 tokens saved × 10K × 30 / 1M × $10 = saves $1,500/month

GPT-4o Monthly Cost Estimates by Usage Tier

Reference costs at common usage levels. All figures use a 70/30 input/output token split (typical for conversational workloads).

Tier Requests/Day Avg Input Tokens Avg Output Tokens Monthly Cost (Standard) Monthly Cost (Batch)
Hobby / Side Project 50 800 400 $3.60 $1.80
Small SaaS (launch) 500 1,200 600 $54.00 $27.00
Medium SaaS 5,000 1,500 750 $675.00 $337.50
Production App 50,000 2,000 1,000 $9,000.00 $4,500.00
Enterprise Scale 500,000 2,000 1,000 $90,000.00 $45,000.00

At the Enterprise Scale tier, consider negotiating a committed-use discount directly with OpenAI — volume commitments typically unlock 15%–30% off list pricing. The Batch API provides the easiest guaranteed 50% reduction for eligible workloads.

GPT-4o Cost Per Common Task

Task Input Tokens Output Tokens Standard Cost Batch Cost
Email reply draft ~400 ~300 $0.0040 $0.0020
1-page document summary ~800 ~400 $0.0060 $0.0030
Customer support response ~1,200 ~500 $0.0080 $0.0040
Code function generation ~800 ~600 $0.0080 $0.0040
10-turn conversation ~8,000 ~2,500 $0.0450 N/A (real-time)
10K token research brief ~3,000 ~2,500 $0.0325 $0.0163
Classify 1,000 records ~300K ~10K $0.85 $0.425
Full codebase review (20K tokens) ~20,000 ~4,000 $0.09 $0.045

GPT-4o API: Code Example with Cost Tracking

Use the usage field from the API response to calculate exact costs per call. The pattern below works for all OpenAI models — swap in the per-token rates for the model you are using.

from openai import OpenAI

client = OpenAI()  # uses OPENAI_API_KEY from env

# GPT-4o pricing (per 1M tokens)
GPT4O_INPUT_RATE  = 2.50   # standard input
GPT4O_OUTPUT_RATE = 10.00  # standard output
GPT4O_CACHED_RATE = 1.25   # cached input (prompt caching)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "Summarize this document: ..."}
    ],
    max_tokens=500  # always set to bound output cost
)

usage = response.usage
cached = getattr(usage, "prompt_tokens_details", None)
cached_tokens = cached.cached_tokens if cached else 0
uncached_tokens = usage.prompt_tokens - cached_tokens

input_cost  = (uncached_tokens / 1_000_000) * GPT4O_INPUT_RATE
cached_cost = (cached_tokens   / 1_000_000) * GPT4O_CACHED_RATE
output_cost = (usage.completion_tokens / 1_000_000) * GPT4O_OUTPUT_RATE
total_cost  = input_cost + cached_cost + output_cost

print(f"Tokens in : {usage.prompt_tokens:,} ({cached_tokens:,} cached)")
print(f"Tokens out: {usage.completion_tokens:,}")
print(f"Cost      : ${total_cost:.6f}")
# Example: Tokens in: 1,248 (800 cached) | Tokens out: 412 | Cost: $0.005770

Frequently Asked Questions

GPT-4o costs $2.50 per 1M input tokens and $10.00 per 1M output tokens at standard (real-time) pricing as of May 2026. OpenAI's Batch API halves both rates to $1.25 input / $5.00 output for asynchronous workloads. Prompt caching discounts repeated input prefixes to $1.25/1M. These rates have been stable since the model's pricing revision in late 2025.

GPT-4o costs $0.0025 per 1K input tokens and $0.01 per 1K output tokens at standard pricing. A typical API call with 1,000 input + 500 output tokens costs $0.0075 total. With Batch API: $0.00125 input + $0.005 output per 1K tokens. These numbers are easy to sanity-check: 1M tokens ÷ 1,000 = the per-1K rate.

It depends on the workload composition. GPT-4o is cheaper on input ($2.50 vs $3.00/1M), while Claude Sonnet 4.6 is more expensive on output ($15.00 vs $10.00/1M). For a typical chatbot with 70% input / 30% output split, GPT-4o is about 20% cheaper overall. For code generation or long-form writing (output-heavy), the gap widens to 30–40% in GPT-4o's favor. Claude Sonnet 4.6 has a 200K context window vs GPT-4o's 128K — relevant for large document workloads.

GPT-4o-mini costs $0.15/1M input and $0.60/1M output — 16.7x–17x cheaper than GPT-4o. For a workload spending $1,000/month on GPT-4o, routing to mini drops that to roughly $60/month if quality is comparable. GPT-4o-mini scores approximately 79/100 on aggregate benchmarks versus 90/100 for GPT-4o — so the quality drop is real. For classification, extraction, routing, and simple Q&A, mini is almost always the right call. For complex reasoning, nuanced writing, or multi-step tool use, GPT-4o or higher is required.

The Batch API accepts a JSONL file of up to 50,000 requests and processes them within 24 hours at 50% off standard rates. GPT-4o batch pricing: $1.25/1M input, $5.00/1M output. You submit via POST /v1/batches, poll for status with GET /v1/batches/{batch_id}, and retrieve results via a file download. Ideal for: document processing pipelines, nightly data enrichment, offline classification, and any job where 24-hour latency is acceptable. Not suitable for: customer-facing chat, real-time suggestions, or interactive applications.

Prompt caching automatically discounts repeated input prefixes. When your request begins with tokens that match a recently cached sequence, those tokens are billed at $1.25/1M (half the standard $2.50). The cache is maintained for approximately 1 hour of inactivity. To maximize hits: put your system prompt and any static context first, then the dynamic user message last. A 10K-token system prompt with 95% cache hit rate on 5,000 daily calls saves roughly $59/day — $1,770/month — purely from cache hits on the static portion. The usage.prompt_tokens_details.cached_tokens field shows how many tokens were served from cache.

GPT-4o-mini ($0.15/$0.60): classification, extraction, routing, simple Q&A, high-volume low-stakes tasks. Your default starting point — only upgrade if quality testing reveals failures.

GPT-4o ($2.50/$10.00): complex reasoning, customer-facing chat, multi-step tool use, vision, nuanced writing. The balanced frontier choice — strong quality without GPT-4.5's premium.

GPT-4.5 ($75/$150): frontier research, elite writing quality, tasks where state-of-art performance is the product's core value proposition. 30x the cost of GPT-4o for ~3–4 quality points on benchmarks. Hard to justify unless you've exhausted GPT-4o's capability ceiling.

Monthly cost varies significantly by volume and configuration. Reference points at 70/30 input/output split:

Small SaaS (500 req/day, 1,200/600 tokens): ~$54/month standard, ~$27/month batch
Medium SaaS (5K req/day, 1,500/750 tokens): ~$675/month standard, ~$337/month batch
Production (50K req/day, 2,000/1,000 tokens): ~$9,000/month standard, ~$4,500/month batch

Tip: at $9K+/month, contact OpenAI sales for a volume commitment discount (typically 15–30% off). Use the interactive calculator above to model your exact scenario.

Pricing data sourced from OpenAI, Anthropic, and Google pricing pages. LLM pricing changes frequently — always verify current rates before finalizing budget estimates.

Use KickLLM's full calculator to model your exact GPT-4o spend alongside every major LLM provider in one view — free, no sign-up.