Vision Token Cost Calculator

Estimate how many tokens an image costs and translate that into USD across GPT-4o, Claude, and Gemini. Adjust image dimensions, detail level, and model to see cost in real time.

Pixels per side. Common: 512×512, 768×768, 1024×1024.
High resamples into 512px tiles; Low uses a single fixed tile.
Batch cost across N identical images.
Add prompt/answer text tokens to the total cost.

Live estimate

Image + text cost (USD)
$0.0000
0 image tokens + 0 text tokens
Image tokens (per image)0
Tiles processed0
All images tokens0
Text tokens0
Total tokens0
Price per 1M tokens$0.00
Same image on GPT-4o$0.00
Same image on Claude Sonnet$0.00
Same image on Gemini Flash$0.00

How vision token cost is calculated

Multimodal models do not read pixels directly. They slice an image into fixed-size tiles, embed each tile, and charge you for the resulting tokens just like text. This vision token cost calculator makes that hidden cost visible before you send a batch of images to an API.

The tile math

OpenAI's GPT-4o family scales images so the shortest side is 768px (High) or uses a single 512px tile (Low). Each 512×512 tile costs a base of 85 tokens, plus the model adds a fixed overhead of 170 tokens per image. So a 1024×1024 photo at High detail becomes four tiles: 4 × 85 + 170 = 510 tokens. Larger documents with many tiles multiply quickly.

Claude and Gemini differences

Anthropic's Claude models approximate vision tokens at roughly (width × height) / 750 for typical screenshots, with a 1.6x safety multiplier on dense content. Google's Gemini charges about (width × height) / 285 tokens for inline images. This tool uses the conservative tile model for GPT-4o and the published per-pixel approximations for Claude and Gemini, so the comparison row shows how the same picture costs different amounts on each provider.

Practical cost-saving tips