How Much Does a RAG Pipeline Cost?

A RAG pipeline costs $0.001-$0.05 per query: embedding (~$0.0001), vector DB lookup (~$0.0001), and LLM generation ($0.001-$0.05). At 10K queries/day, expect $10-$500/month plus $20-$200/month for vector DB hosting.

RAG Pipeline Cost Breakdown

Component	Cost Per Query	10K Queries/Day	Notes
Embedding (query)	~$0.0001	~$1/month	text-embedding-3-small at $0.02/1M tokens
Vector DB lookup	~$0.0001	$20-200/month (hosting)	Pinecone, Weaviate, or pgvector
LLM generation	$0.001-$0.05	$10-$500/month	Varies by model (see below)

LLM Generation Cost by Model

Per query: ~2,500 input tokens (query + retrieved context), ~800 output tokens.

Model	Cost/Query	10K/Day Monthly	100K/Day Monthly
Gemini 2.0 Flash	$0.0004	$128.25	$1282.50
GPT-4o Mini	$0.0009	$256.50	$2565.00
Claude Haiku 4	$0.0052	$1560.00	$15600.00
DeepSeek V3	$0.0016	$466.50	$4665.00
Claude Sonnet 4	$0.02	$5850.00	$58500.00
GPT-4o	$0.01	$4275.00	$42750.00

One-Time Indexing Costs

Embedding your document corpus is a one-time cost (plus re-indexing for updates).

Corpus Size	Tokens (est.)	Embedding Cost
1,000 pages	~1.5M tokens	$0.03
10,000 pages	~15M tokens	$0.30
100,000 pages	~150M tokens	$3.00
1M pages	~1.5B tokens	$30.00

Vector Database Costs

Provider	Free Tier	Production	Best For
Pinecone	100K vectors	$70+/month	Managed, scalable
Weaviate Cloud	1M vectors	$25+/month	Hybrid search
pgvector (self-hosted)	Unlimited	$10-50/month (VPS)	Simple, PostgreSQL users
Qdrant Cloud	1M vectors	$25+/month	Performance-focused

FAQ

How much does a RAG pipeline cost per query?

A RAG query costs $0.001-$0.05 total: ~$0.0001 for embedding, ~$0.0001 for vector lookup, and $0.001-$0.05 for LLM generation depending on model choice. The LLM generation step is 95%+ of the cost.

What is the cheapest way to build a RAG pipeline?

Use Gemini 2.0 Flash ($0.075/$0.30) for generation, text-embedding-3-small for embeddings, and pgvector on a $10/month VPS for storage. Total cost: under $20/month for 10K queries/day.

Do I need a vector database for RAG?

For small corpora (under 10K documents), you can use simple in-memory search or SQLite FTS. For larger corpora, a vector database provides faster retrieval and better relevance. pgvector is a good free starting point.