Model the true per-conversation token spend of an LLM chatbot — including multi-turn context replay — then scale it to a monthly bill across your active user base.

How the cost is calculated

Most cost calculators multiply a single prompt by its price and stop. Real chatbots are stateful: every turn replays the entire prior conversation plus the system prompt, so input tokens grow quadratically across a session. This tool models that replay explicitly, which is why its per-conversation numbers run higher (and truer) than a naive single-call estimate.

The replay formula

For a conversation of T turns, turn n sends the system prompt plus all previous user and bot messages plus the new user message. Billed input tokens accumulate as:

inputTokens = Σ(n=1..T) [ sys + (n−1)·(inTok + outTok) + inTok ]

Output tokens are simpler — the model only generates a fresh reply each turn: outputTokens = T · outTok. The closed form for the growing history term is (inTok+outTok)·T·(T−1)/2, the classic arithmetic-series sum, which is what the script evaluates.

Pricing and caching

Cost is (inputTokens/1e6)·inPrice + (outputTokens/1e6)·outPrice. When you enable a cached-input discount, the repeated system-prompt and history portion is re-priced at the reduced rate — modeling prompt-caching features where unchanged prefix tokens bill at a fraction of the standard input price. Only the genuinely new user tokens each turn stay full price.

Scaling to a monthly bill

Monthly spend is per-conversation cost times conversations-per-user times active users. The breakdown table separates input from output dollars so you can see whether your bot is context-heavy (cut the system prompt or add caching) or generation-heavy (cap max_tokens or use a cheaper output tier). Trim turns, shrink the RAG window, or raise cache hit rate and watch the monthly figure move in real time.

Estimate Your LLM Cost Per Chatbot Conversation

Calculator

How the cost is calculated

The replay formula

Pricing and caching

Scaling to a monthly bill

Related Tools