What Is the Cheapest Way to Run Llama?
Groq API: $0.05-0.59/1M tokens (fastest). Together.ai: $0.20/1M. Self-host on RunPod A100: ~$1.50/hr (~$0.30/1M tokens at throughput). For hobby use: Ollama on Mac M-series is free.
Option 1: Inference APIs (Easiest)
| Provider | Model | Input/1M | Output/1M | Speed |
|---|---|---|---|---|
| Groq | Llama 3 8B | $0.05 | $0.08 | ~800 tok/s |
| Groq | Llama 3 70B | $0.59 | $0.79 | ~300 tok/s |
| Together.ai | Llama 3.1 8B | $0.20 | $0.20 | ~100 tok/s |
| Together.ai | Llama 3.1 70B | $0.88 | $0.88 | ~50 tok/s |
| Fireworks | Llama 3.1 8B | $0.20 | $0.20 | ~80 tok/s |
Best for: Most users. No infrastructure to manage, pay only for what you use, instant start.
Option 2: Self-Host on Cloud GPUs
| Provider | GPU | Hourly | ~Monthly | Best For |
|---|---|---|---|---|
| RunPod | A100 80GB | $1.50 | ~$1,100 | 70B models |
| RunPod (spot) | A100 80GB | $0.80 | ~$580 | Non-critical |
| Lambda | A100 80GB | $1.25 | ~$900 | 70B models |
| Vast.ai | RTX 4090 | $0.30 | ~$220 | 8B-13B models |
Best for: High volume (saving 40-60% vs API at $3K+/month spend), full control, custom fine-tuned models.
Option 3: Run Locally (Free)
- Ollama on Mac — Install Ollama, run
ollama run llama3. M1/M2/M3 with 16GB+ RAM handles 8B models smoothly at ~30 tok/s. Free. - llama.cpp — Run quantized models on CPU or GPU. GGUF format supports 4-bit quantization, fitting 70B models in 32GB RAM.
- Desktop GPU — RTX 3090/4090 (24GB VRAM) runs 8B-13B models at ~50-80 tok/s. Power cost only (~$0.10/hr).
Best for: Development, hobby projects, privacy-sensitive work, experimentation.
Which Option Should You Choose?
- Just trying Llama -> Ollama on your Mac (free)
- Building a product, low volume -> Groq API ($0.05/1M tokens)
- Building a product, high volume -> Self-host on RunPod when API spend exceeds $3K/month
- Need maximum speed -> Groq API (800 tok/s, faster than any self-hosted setup)
- Data must stay private -> Local Ollama or self-hosted cloud with private networking
Calculate your LLM API costs with KickLLM — free, no sign-up required.