How much does it cost to self-host an LLM?

A single A100 GPU costs ~$1.50-2.00/hr on cloud providers (~$1,100-1,500/month). A 70B model needs 2-4 A100s ($2,200-6,000/month). Buying hardware (2x A100) costs ~$20,000-30,000 upfront plus electricity and maintenance.

Is self-hosting cheaper than API?

Only at high volume. Below ~$3,000/month in API costs, self-hosting is more expensive when you factor in GPU rental, engineering time, monitoring, and maintenance. Above $3,000/month with consistent throughput, self-hosting can save 40-60%.

When to Self-Host an LLM

Self-host when: (1) API costs exceed $3,000/month, (2) You need <50ms latency, (3) Data can't leave your network, (4) You need full model control. Otherwise, APIs are cheaper and simpler.

The 4 Reasons to Self-Host

1. Cost Threshold: $3,000+/month in API spend

Below $3K/month, the overhead of self-hosting (GPU rental, engineering time, monitoring, updates) exceeds what you'd save. Above $3K/month with consistent throughput, self-hosting a 70B model can reduce costs by 40-60%. The break-even depends on utilization — GPUs you're not using still cost money.

2. Latency: Sub-50ms first-token requirements

API calls include network latency (50-200ms) plus queue time. If your application needs sub-50ms time-to-first-token (real-time voice, gaming, high-frequency trading), self-hosting on local GPUs eliminates network overhead.

3. Data Privacy: Data cannot leave your network

Regulated industries (healthcare, finance, defense) may require that data never leaves your infrastructure. Self-hosting ensures prompts, completions, and fine-tuning data stay on your hardware. Note: some API providers offer on-premises deployments, but at significant cost.

4. Model Control: Fine-tuning, custom inference, research

If you need to modify model weights, run custom inference pipelines, or experiment with architectures, self-hosting gives you full control. APIs only expose what the provider chooses to expose.

Self-Hosting Cost Breakdown

Setup	Monthly Cost	Good For
1x A100 (cloud)	~$1,100-1,500	7B-13B models
2x A100 (cloud)	~$2,200-3,000	34B-70B models
4x A100 (cloud)	~$4,400-6,000	70B+ models, high throughput
RunPod spot (A100)	~$800-1,200	Non-critical workloads
Mac Studio M2 Ultra (owned)	~$30 electricity	7B-13B hobby/dev

When to Stick with APIs

API spend is under $3,000/month
You need access to frontier models (GPT-4o, Claude Opus) that aren't open-source
Your traffic is bursty — GPUs sitting idle waste money
You don't have ML engineering expertise to maintain infrastructure
You want the latest model updates without redeployment

Calculate your LLM API costs with KickLLM — free, no sign-up required.