| Tiny | Base | Small | Medium | Large-v3 | Turbo | |
|---|---|---|---|---|---|---|
| Parameters | 39M | 74M | 244M | 769M | 1.55B | 809M |
| VRAM (approx) | ~1 GB | ~1 GB | ~2 GB | ~5 GB | ~10 GB | ~6 GB |
| Relative speed | ~32x | ~16x | ~6x | ~2x | 1x (baseline) | ~3x |
| English WER | ~8% | ~6% | ~4.5% | ~3.5% | ~2.5% | ~2.7% |
| Translation | ||||||
| Best for | Edge, IoT | CPU inference | Good balance | High accuracy | Maximum accuracy | Production default |
0+
GitHub Stars
0
Languages Supported
0K hrs
Training Data
0+
Monthly HF Downloads (turbo)
The turbo model is the default recommendation
For most production use cases, large-v3-turbo offers the best balance of speed and accuracy. It runs roughly 3x faster than large-v3 with less than 1% difference in word error rate on English. The main limitation is that turbo cannot perform translation (speech in language X to English text). If you need translation, use large-v3 instead.