How Many LLM Tokens Per Word?

English averages 1 word = 1.3 tokens. So 1,000 words = 1,300 tokens. Code averages 1 word = 1.5 tokens. Chinese/Japanese: 1 character = 1-2 tokens.

Quick Reference Table

Content TypeTokens per WordWords per 1K Tokens
English (conversational)~1.2~830
English (formal/technical)~1.3~770
Code (Python, JS)~1.5~670
Code (verbose: Java, C#)~1.7~590
JSON/structured data~2.0~500
Chinese1-2 per character~500-1000 chars
Japanese1-2 per character~500-1000 chars
Spanish/French~1.4~710

Common Text Length Estimates

Text LengthApproximate Tokens
1 sentence15-25 tokens
1 paragraph50-100 tokens
1 page (250 words)~325 tokens
1,000 words (blog post)~1,300 tokens
5,000 words (long article)~6,500 tokens
80,000 words (novel)~100,000 tokens

Why Tokens Matter for Cost

LLM APIs charge per token, not per word. If you're estimating costs, use the 1.3x multiplier for English text. For example, processing a 2,000-word document with GPT-4o costs roughly: 2,000 × 1.3 = 2,600 input tokens = $0.0065 (at $2.50/1M tokens).

How Tokenization Works

LLMs use subword tokenization (BPE or similar). Common words like "the" are one token. Uncommon words are split into parts: "tokenization" might be "token" + "ization" (2 tokens). Numbers, punctuation, and whitespace all consume tokens. This is why the ratio varies by content type.

Calculate your LLM API costs with KickLLM — free, no sign-up required.