What Is the Cheapest Way to Run Llama?

Groq API: $0.05-0.59/1M tokens (fastest). Together.ai: $0.20/1M. Self-host on RunPod A100: ~$1.50/hr (~$0.30/1M tokens at throughput). For hobby use: Ollama on Mac M-series is free.

Option 1: Inference APIs (Easiest)

ProviderModelInput/1MOutput/1MSpeed
GroqLlama 3 8B$0.05$0.08~800 tok/s
GroqLlama 3 70B$0.59$0.79~300 tok/s
Together.aiLlama 3.1 8B$0.20$0.20~100 tok/s
Together.aiLlama 3.1 70B$0.88$0.88~50 tok/s
FireworksLlama 3.1 8B$0.20$0.20~80 tok/s

Best for: Most users. No infrastructure to manage, pay only for what you use, instant start.

Option 2: Self-Host on Cloud GPUs

ProviderGPUHourly~MonthlyBest For
RunPodA100 80GB$1.50~$1,10070B models
RunPod (spot)A100 80GB$0.80~$580Non-critical
LambdaA100 80GB$1.25~$90070B models
Vast.aiRTX 4090$0.30~$2208B-13B models

Best for: High volume (saving 40-60% vs API at $3K+/month spend), full control, custom fine-tuned models.

Option 3: Run Locally (Free)

Best for: Development, hobby projects, privacy-sensitive work, experimentation.

Which Option Should You Choose?

Calculate your LLM API costs with KickLLM — free, no sign-up required.