Project the real dollar cost of vectorizing a corpus for retrieval-augmented generation. Enter your token volume, the model's price per million tokens, and how often you re-embed.

How this calculator works

Embedding pricing is metered purely on input tokens — unlike chat models there are no output tokens, so cost is a clean linear function of how much text you push through the API. The calculator first derives your one-time corpus volume, then layers on the recurring work that RAG systems quietly accumulate: scheduled re-indexing, incremental churn as documents change, and the live query embeddings every search request generates.

corpus_tokens = documents × tokens_per_document
init_cost = corpus_tokens / 1e6 × price_per_M
reembed_cost = init_cost × reembeds_per_month
churn_cost = init_cost × (churn% / 100)
query_cost = query_count × tokens_per_document / 1e6 × price_per_M
monthly = reembed_cost + churn_cost + query_cost

The often-missed line items are churn and query traffic. Teams budget for the headline "embed everything once" number and forget that a corpus is alive: a 10% monthly change rate on a 25M-token corpus quietly re-bills 2.5M tokens every month. Query embedding is sneakier still — each search must embed the user's text before the vector lookup, so a high-QPS product can spend more on query vectors than on its entire document store. We approximate query token size with your per-document token figure, which you can lower if your queries are short.

To trim spend, the biggest levers are dimensionality and model tier. Smaller models (and Matryoshka-truncated dimensions) cut storage and sometimes price without wrecking recall, while batching documents into fewer, denser chunks reduces redundant overlap tokens. The annual projection multiplies the recurring monthly figure by twelve and adds the one-time initial embed, giving you a realistic first-year line for a finance spreadsheet rather than a misleading single-pass quote.

Estimate Your Embedding API Spend

How this calculator works

Related Tools