Question 1

How much have LLM API prices dropped since 2023?

Accepted Answer

LLM API prices have dropped approximately 100x at the frontier tier and over 500x at the budget tier since early 2023. GPT-3.5 Turbo launched at $2.00/1M input tokens and equivalent-capability models now cost $0.01-0.03/1M tokens. GPT-4 launched at $30/1M input tokens; today's comparable-quality GPT-4o costs $2.50/1M tokens (12x cheaper). The biggest drops came from competition, hardware improvements, and distillation techniques.

Question 2

Why did LLM prices drop so dramatically?

Accepted Answer

Four main factors drove the price collapse: (1) Competition from open-source models (Llama, Mistral) forced proprietary providers to lower prices; (2) Hardware improvements including custom chips (Google TPU v5, AWS Trainium) reduced inference costs; (3) Architectural innovations like Mixture-of-Experts reduced compute per token; (4) Distillation and quantization techniques enabled smaller models to match larger model quality at a fraction of the cost.

Question 3

Which LLM API is cheapest in 2026?

Accepted Answer

As of April 2026, the cheapest LLM APIs are: Gemini 2.5 Flash at $0.15/1M input tokens, GPT-4o mini at $0.15/1M, DeepSeek V3 at $0.27/1M, and Mistral Small 3.1 at $0.10/1M. For self-hosted options, Llama 3.3 70B on Groq or Together AI can cost under $0.50/1M tokens. The cheapest option depends on your quality requirements, latency needs, and volume.

Question 4

Will LLM prices continue to drop?

Accepted Answer

Prices are expected to continue declining but at a slower rate. The rapid 100x drops from 2023-2025 were driven by one-time factors (competition entry, architecture shifts). Future reductions will come from incremental hardware improvements, better quantization, and speculative decoding. Reasoning models (o3, DeepSeek R1) may actually cost more per token due to extended inference compute, creating a two-tier pricing structure.

Question 5

How do I estimate my LLM API costs?

Accepted Answer

Use KickLLM's free calculator at kickllm.com to estimate costs. Key inputs: average input tokens per request, average output tokens, requests per day, and your chosen model. Common benchmarks: a typical chatbot message is 100-500 input tokens and 200-800 output tokens. A RAG query with context is 2,000-8,000 input tokens. Batch processing with long documents can be 50,000-128,000 input tokens per request.

Date	Model	Provider	Input $/1M	Output $/1M	Event	Drop from Prior
2023-03	GPT-4 (8K)	OpenAI	$30.00	$60.00	Launch	—
2023-03	GPT-3.5 Turbo	OpenAI	$2.00	$2.00	Launch	—
2023-03	Claude 1	Anthropic	$11.02	$32.68	Launch	—
2023-07	Claude 2	Anthropic	$8.00	$24.00	Launch	-27%
2023-07	Llama 2 70B	Meta	Free*	Free*	Open-source release	—
2023-11	GPT-4 Turbo	OpenAI	$10.00	$30.00	Launch (3x cheaper than GPT-4)	-67%
2023-11	GPT-3.5 Turbo (new)	OpenAI	$1.00	$2.00	Price cut	-50%
2024-02	Gemini 1.0 Pro	Google	$0.50	$1.50	Launch	—
2024-03	Claude 3 Opus	Anthropic	$15.00	$75.00	Launch (frontier)	—
2024-03	Claude 3 Sonnet	Anthropic	$3.00	$15.00	Launch	—
2024-03	Claude 3 Haiku	Anthropic	$0.25	$1.25	Launch (budget tier)	—
2024-04	Llama 3 70B	Meta	Free*	Free*	Open-source release	—
2024-05	GPT-4o	OpenAI	$5.00	$15.00	Launch (50% cheaper than Turbo)	-50%
2024-05	Gemini 1.5 Pro	Google	$3.50	$10.50	Launch	—
2024-05	Gemini 1.5 Flash	Google	$0.35	$1.05	Launch	—
2024-07	GPT-4o mini	OpenAI	$0.15	$0.60	Launch (replaces 3.5 Turbo)	-97% vs GPT-4
2024-07	Llama 3.1 405B	Meta	Free*	Free*	Largest open model	—
2024-09	o1-preview	OpenAI	$15.00	$60.00	Reasoning model launch	—
2024-10	GPT-4o (reduced)	OpenAI	$2.50	$10.00	Price cut	-50%
2024-10	Claude 3.5 Sonnet	Anthropic	$3.00	$15.00	Launch (same price, better quality)	—
2024-11	Gemini 1.5 Flash (reduced)	Google	$0.075	$0.30	Price cut	-79%
2024-11	Gemini 1.5 Pro (reduced)	Google	$1.25	$5.00	Price cut	-64%
2024-12	DeepSeek V3	DeepSeek	$0.27	$1.10	Launch (frontier quality at flash price)	—
2025-01	DeepSeek R1	DeepSeek	$0.55	$2.19	Open reasoning model	—
2025-02	GPT-4.5	OpenAI	$75.00	$150.00	Launch (premium research tier)	—
2025-03	Gemini 2.5 Pro	Google	$1.25	$10.00	Launch	—
2025-03	Gemini 2.5 Flash	Google	$0.15	$0.60	Launch	—
2025-04	o3	OpenAI	$10.00	$40.00	Launch	-33% vs o1
2025-04	o4-mini	OpenAI	$1.10	$4.40	Launch (budget reasoning)	—
2025-06	Claude Opus 4.6	Anthropic	$15.00	$75.00	Launch (frontier)	—
2025-06	Claude Sonnet 4	Anthropic	$3.00	$15.00	Launch	—
2025-10	Claude Haiku 3.5	Anthropic	$0.80	$4.00	Launch	—
2025-10	Mistral Small 3.1	Mistral	$0.10	$0.30	Launch	—

Milestone	Date	Significance
GPT-4 Launch	2023-03	Set the $30/1M input benchmark for frontier models
Llama 2 Open-Source	2023-07	First competitive open-weights model, triggered price war
GPT-3.5 Turbo price halved	2023-11	Signal that budget tier would race to zero
GPT-4o mini at $0.15	2024-07	97% cheaper than original GPT-4 with comparable quality
DeepSeek V3 at $0.27	2024-12	Frontier-quality open model at flash prices
Gemini 2.5 Flash at $0.15	2025-03	Google matches GPT-4o mini price with 1M context
GPT-4.5 at $75	2025-02	Premium research tier creates new price ceiling

LLM Pricing History — How API Costs Dropped 100x Since 2023

Methodology

Key Milestones

Frequently Asked Questions