How Many LLM Tokens Per Word?
English averages 1 word = 1.3 tokens. So 1,000 words = 1,300 tokens. Code averages 1 word = 1.5 tokens. Chinese/Japanese: 1 character = 1-2 tokens.
Quick Reference Table
| Content Type | Tokens per Word | Words per 1K Tokens |
|---|---|---|
| English (conversational) | ~1.2 | ~830 |
| English (formal/technical) | ~1.3 | ~770 |
| Code (Python, JS) | ~1.5 | ~670 |
| Code (verbose: Java, C#) | ~1.7 | ~590 |
| JSON/structured data | ~2.0 | ~500 |
| Chinese | 1-2 per character | ~500-1000 chars |
| Japanese | 1-2 per character | ~500-1000 chars |
| Spanish/French | ~1.4 | ~710 |
Common Text Length Estimates
| Text Length | Approximate Tokens |
|---|---|
| 1 sentence | 15-25 tokens |
| 1 paragraph | 50-100 tokens |
| 1 page (250 words) | ~325 tokens |
| 1,000 words (blog post) | ~1,300 tokens |
| 5,000 words (long article) | ~6,500 tokens |
| 80,000 words (novel) | ~100,000 tokens |
Why Tokens Matter for Cost
LLM APIs charge per token, not per word. If you're estimating costs, use the 1.3x multiplier for English text. For example, processing a 2,000-word document with GPT-4o costs roughly: 2,000 × 1.3 = 2,600 input tokens = $0.0065 (at $2.50/1M tokens).
How Tokenization Works
LLMs use subword tokenization (BPE or similar). Common words like "the" are one token. Uncommon words are split into parts: "tokenization" might be "token" + "ization" (2 tokens). Numbers, punctuation, and whitespace all consume tokens. This is why the ratio varies by content type.
Calculate your LLM API costs with KickLLM — free, no sign-up required.