How Much Does a RAG Pipeline Cost?

A RAG pipeline costs $0.001-$0.05 per query: embedding (~$0.0001), vector DB lookup (~$0.0001), and LLM generation ($0.001-$0.05). At 10K queries/day, expect $10-$500/month plus $20-$200/month for vector DB hosting.

RAG Pipeline Cost Breakdown

ComponentCost Per Query10K Queries/DayNotes
Embedding (query)~$0.0001~$1/monthtext-embedding-3-small at $0.02/1M tokens
Vector DB lookup~$0.0001$20-200/month (hosting)Pinecone, Weaviate, or pgvector
LLM generation$0.001-$0.05$10-$500/monthVaries by model (see below)

LLM Generation Cost by Model

Per query: ~2,500 input tokens (query + retrieved context), ~800 output tokens.

ModelCost/Query10K/Day Monthly100K/Day Monthly
Gemini 2.0 Flash$0.0004$128.25$1282.50
GPT-4o Mini$0.0009$256.50$2565.00
Claude Haiku 4$0.0052$1560.00$15600.00
DeepSeek V3$0.0016$466.50$4665.00
Claude Sonnet 4$0.02$5850.00$58500.00
GPT-4o$0.01$4275.00$42750.00

One-Time Indexing Costs

Embedding your document corpus is a one-time cost (plus re-indexing for updates).

Corpus SizeTokens (est.)Embedding Cost
1,000 pages~1.5M tokens$0.03
10,000 pages~15M tokens$0.30
100,000 pages~150M tokens$3.00
1M pages~1.5B tokens$30.00

Vector Database Costs

ProviderFree TierProductionBest For
Pinecone100K vectors$70+/monthManaged, scalable
Weaviate Cloud1M vectors$25+/monthHybrid search
pgvector (self-hosted)Unlimited$10-50/month (VPS)Simple, PostgreSQL users
Qdrant Cloud1M vectors$25+/monthPerformance-focused

FAQ

How much does a RAG pipeline cost per query?

A RAG query costs $0.001-$0.05 total: ~$0.0001 for embedding, ~$0.0001 for vector lookup, and $0.001-$0.05 for LLM generation depending on model choice. The LLM generation step is 95%+ of the cost.

What is the cheapest way to build a RAG pipeline?

Use Gemini 2.0 Flash ($0.075/$0.30) for generation, text-embedding-3-small for embeddings, and pgvector on a $10/month VPS for storage. Total cost: under $20/month for 10K queries/day.

Do I need a vector database for RAG?

For small corpora (under 10K documents), you can use simple in-memory search or SQLite FTS. For larger corpora, a vector database provides faster retrieval and better relevance. pgvector is a good free starting point.

Prices last verified: April 2026. Pricing may change — always check provider websites for current rates.

Calculate your LLM API costs with KickLLM — free, no sign-up required.