What Are the Context Window Sizes in 2026?

Gemini 2.0: 2M tokens. Claude: 200K tokens. GPT-4o: 128K tokens. Llama 3.1: 128K. Mistral Large: 128K. Context quality degrades beyond ~32K for most models.

Full Comparison Table

ModelContext Window~Pages of TextProvider
Gemini 2.0 Pro2,000,000~6,154Google
Gemini 2.0 Flash1,000,000~3,077Google
Claude Opus/Sonnet200,000~615Anthropic
GPT-4o128,000~394OpenAI
GPT-4o Mini128,000~394OpenAI
Llama 3.1 (all sizes)128,000~394Meta
Mistral Large128,000~394Mistral
Mistral Small 3.1128,000~394Mistral
DeepSeek V3128,000~394DeepSeek

The "Lost in the Middle" Problem

Research shows that LLMs perform best on information at the beginning and end of the context window, but struggle with information in the middle. This means a 128K context window doesn't give you 128K tokens of equally reliable retrieval. For most models, information recall starts degrading noticeably beyond ~32K tokens.

Practical Implications

Context Window vs. Effective Context

The advertised context window is the maximum. The effective context — where the model reliably retrieves and reasons about information — is typically much smaller. Claude and Gemini have shown the best long-context performance in benchmarks like NIAH (Needle In A Haystack), but all models degrade with length.

Calculate your LLM API costs with KickLLM — free, no sign-up required.