| Hardware Needed | Notes | |
|---|---|---|
| Full V3.2 (671B) | 8x NVIDIA H100 80GB | Enterprise grade; multi GPU with NVLink |
| Quantized V3.2 (4 bit) | 4x NVIDIA A100 80GB | Reduced precision with some quality trade off |
| R1-Distill-70B | 1 to 2x A100 or 2x RTX 4090 | Feasible for professional workstations |
| R1-Distill-32B | 1x RTX 4090 24GB (quantized) | Good balance of quality and accessibility |
| R1-Distill-14B | 1x RTX 3090 or 4070 Ti | Consumer GPU; workable for personal use |
| R1-Distill-7B/8B | 1x RTX 3060 12GB or Apple M2+ | Laptop friendly with quantization |
| R1-Distill-1.5B | CPU only, any modern computer | Runs anywhere; prototyping and edge devices |
Self hosting frameworks
🚀
vLLM
Production API serving with high throughput. The go to choice for deploying models at scale.
🔧
llama.cpp
CPU and mixed CPU/GPU inference. Maximum hardware flexibility and quantization support.
🎯
Ollama
One command install and run. The simplest path to running DeepSeek locally on your machine.
MIT license: no restrictions
Unlike Meta's Llama license (which restricts usage above certain user counts and revenue thresholds), DeepSeek's MIT license imposes no limitations. You can use the models commercially, fine tune them, distill them, and redistribute derivatives without paying royalties. This makes DeepSeek the most permissive frontier AI model family available.
Start small
For most people wanting to try local AI, start with Ollama and the R1-Distill-7B model (requires ~5GB RAM). It runs on almost any modern laptop and gives you a feel for DeepSeek's reasoning capabilities before investing in larger hardware setups.