How Much Does a RAG Pipeline Cost?
A RAG pipeline costs $0.001-$0.05 per query: embedding (~$0.0001), vector DB lookup (~$0.0001), and LLM generation ($0.001-$0.05). At 10K queries/day, expect $10-$500/month plus $20-$200/month for vector DB hosting.
RAG Pipeline Cost Breakdown
| Component | Cost Per Query | 10K Queries/Day | Notes |
|---|---|---|---|
| Embedding (query) | ~$0.0001 | ~$1/month | text-embedding-3-small at $0.02/1M tokens |
| Vector DB lookup | ~$0.0001 | $20-200/month (hosting) | Pinecone, Weaviate, or pgvector |
| LLM generation | $0.001-$0.05 | $10-$500/month | Varies by model (see below) |
LLM Generation Cost by Model
Per query: ~2,500 input tokens (query + retrieved context), ~800 output tokens.
| Model | Cost/Query | 10K/Day Monthly | 100K/Day Monthly |
|---|---|---|---|
| Gemini 2.0 Flash | $0.0004 | $128.25 | $1282.50 |
| GPT-4o Mini | $0.0009 | $256.50 | $2565.00 |
| Claude Haiku 4 | $0.0052 | $1560.00 | $15600.00 |
| DeepSeek V3 | $0.0016 | $466.50 | $4665.00 |
| Claude Sonnet 4 | $0.02 | $5850.00 | $58500.00 |
| GPT-4o | $0.01 | $4275.00 | $42750.00 |
One-Time Indexing Costs
Embedding your document corpus is a one-time cost (plus re-indexing for updates).
| Corpus Size | Tokens (est.) | Embedding Cost |
|---|---|---|
| 1,000 pages | ~1.5M tokens | $0.03 |
| 10,000 pages | ~15M tokens | $0.30 |
| 100,000 pages | ~150M tokens | $3.00 |
| 1M pages | ~1.5B tokens | $30.00 |
Vector Database Costs
| Provider | Free Tier | Production | Best For |
|---|---|---|---|
| Pinecone | 100K vectors | $70+/month | Managed, scalable |
| Weaviate Cloud | 1M vectors | $25+/month | Hybrid search |
| pgvector (self-hosted) | Unlimited | $10-50/month (VPS) | Simple, PostgreSQL users |
| Qdrant Cloud | 1M vectors | $25+/month | Performance-focused |
FAQ
How much does a RAG pipeline cost per query?
A RAG query costs $0.001-$0.05 total: ~$0.0001 for embedding, ~$0.0001 for vector lookup, and $0.001-$0.05 for LLM generation depending on model choice. The LLM generation step is 95%+ of the cost.
What is the cheapest way to build a RAG pipeline?
Use Gemini 2.0 Flash ($0.075/$0.30) for generation, text-embedding-3-small for embeddings, and pgvector on a $10/month VPS for storage. Total cost: under $20/month for 10K queries/day.
Do I need a vector database for RAG?
For small corpora (under 10K documents), you can use simple in-memory search or SQLite FTS. For larger corpora, a vector database provides faster retrieval and better relevance. pgvector is a good free starting point.
Prices last verified: April 2026. Pricing may change — always check provider websites for current rates.
Calculate your LLM API costs with KickLLM — free, no sign-up required.