Google Gemma
open-sourceby google
Google DeepMind's open weight LLM family built from the same research as Gemini, available in sizes from 270M to 27B parameters.
Key features
Free tier available
Developers wanting to run AI models locally on their own hardware without API dependencies or usage fees
Completely free to download and use, with no API fees or usage limits when self hosted
What it does
Open Weights
Download the full model weights and run Gemma anywhere: your laptop, a cloud server, a mobile device, or an edge appliance. No API dependency, no usage fees, complete control.
Learn moreMultimodal Input (Text + Image)
Gemma 3 models at 4B parameters and above accept both text and image inputs, enabling visual question answering, image captioning, document understanding, and multimodal reasoning.
128K Token Context Window
Process long documents, codebases, and extended conversations with a 128K token context window across all Gemma 3 sizes.
140+ Language Support
Gemma 3 supports over 140 languages out of the box, making it one of the most multilingual open model families available.
Quantization Aware Training
Gemma 3 checkpoints are trained with quantization awareness, meaning the models maintain high quality even when compressed to lower bit formats (INT4, INT8) for efficient deployment.
Learn moreFunction Calling
Gemma 3 supports structured function calling, enabling the model to invoke external tools, APIs, and databases as part of its response workflow.
Runs on Consumer Hardware
The Gemma 3 4B model fits in just 3.4GB at Q4 quantization, making it feasible to run on laptops, desktops, and even smartphones without a dedicated GPU.
Learn moreGemma 3n for Mobile and On Device
A specialized variant designed for mobile and edge deployment, offering E2B and E4B configurations with audio input support. Optimized for on device inference with minimal resource consumption.
Specialized Variants
A growing ecosystem of purpose built models: MedGemma (medical), CodeGemma (code generation), PaliGemma (vision and language), ShieldGemma (safety classification), FunctionGemma (tool use), and TranslateGemma (translation).
Learn moreAvailable Everywhere
Download from Hugging Face, run via Ollama, experiment on Kaggle, or use through Google AI Studio and Vertex AI. Gemma integrates into every major ML ecosystem and inference framework.
Fine Tuning Friendly
All Gemma models support LoRA, QLoRA, and full parameter fine tuning. The smaller sizes (270M, 1B, 4B) are particularly well suited for custom fine tuning on consumer hardware.
Learn morePricing
Open Weight
Gemma is completely free to download and run. All model weights are available on Hugging Face, Kaggle, and other platforms at no cost. You only pay for the compute hardware you choose to run the models on, whether that is your own laptop, a cloud GPU, or a managed inference service.
- All model sizes (270M to 27B) free to download
- Full model weights on Hugging Face and Kaggle
- Run locally via Ollama, llama.cpp, vLLM, or any compatible framework
- Commercial use permitted under Gemma Terms of Use
- Fine tuning and modification allowed
- No API fees or usage limits when self hosted
- Community of 60,000+ variants and fine tunes
Pros & Cons
Pros
- Completely free to download and use, with no API fees or usage limits when self hosted
- Runs on consumer hardware: the 4B model fits in 3.4GB at Q4, feasible on laptops and smartphones
- 128K token context window across all Gemma 3 sizes, competitive with much larger proprietary models
- Multimodal input (text and image) starting at the 4B parameter size
- Massive community with 100M+ downloads and 60,000+ fine tuned variants on Hugging Face
- Specialized variants for medical, safety, code, vision, and translation tasks, ready to use out of the box
- Quantization aware trained checkpoints maintain high quality even at INT4 precision, maximizing hardware efficiency
- Built from the same research as Gemini, delivering frontier quality relative to model size
Cons
- Not Apache 2.0: the Gemma Terms of Use allow commercial use but include restrictions on certain applications, so review the license carefully
- Smaller than proprietary Gemini models (27B max vs Gemini's much larger architectures), so raw capability has a ceiling
- Knowledge cutoff of August 2024, meaning the model has no awareness of events after that date
- No native audio input or output support (except Gemma 3n which supports audio input only)
- Requires technical setup to run locally: downloading models, configuring inference frameworks, and managing hardware resources
- The smallest sizes (270M, 1B) are limited in reasoning capability and best suited for simple tasks or fine tuned applications
How to get started
Choose your model size
Select the Gemma 3 variant that fits your hardware. The 4B model (3.4GB at Q4) works on most modern laptops. The 12B model needs a dedicated GPU with 8GB+ VRAM. The 27B model requires a high end GPU with 16GB+ VRAM. The 270M and 1B models run on virtually any device, including smartphones.
Run with Ollama (easiest path)
Install Ollama, then run 'ollama run gemma3:4b' to start chatting with Gemma 3 locally. Ollama handles downloading, quantization, and serving automatically. For the 12B model, use 'ollama run gemma3:12b'. For 27B, use 'ollama run gemma3:27b'.
Try in Google AI Studio (no setup)
If you want to experiment without any local setup, use Google AI Studio to try Gemma models directly in the browser. This is useful for testing prompts and capabilities before committing to a local deployment.
Fine tune for your use case
Once you have selected a base model, fine tune it on your own data using LoRA or QLoRA for maximum customization. Hugging Face PEFT, Unsloth, and Keras all support Gemma fine tuning. The 4B and smaller models are particularly accessible for fine tuning on consumer GPUs.
Deep dive
Detailed guides with comparisons, tips, and visuals for each feature.
Model Sizes and Hardware Requirements
Comparison of all Gemma 3 sizes from 270M to 27B, including VRAM requirements, benchmarks, and guidance on when to use each.
The Gemma Ecosystem: Specialized Variants and Community
The expanding universe of Gemma variants: MedGemma, CodeGemma, PaliGemma, ShieldGemma, and the thriving community of 60,000+ fine tunes.
Gemma vs Other Open Models
How Gemma compares to Llama, DeepSeek, Mistral, Qwen, and Phi across licensing, performance, ecosystem, and hardware requirements.
Links
Similar Tools
DeepSeek
chatbotdeepseek
Frontier AI models that are completely free to chat with, the cheapest to build on, and fully open source under the MIT license.
Kimi
chatbotmoonshot-ai
Powerful AI assistant with Agent Swarm, 256K context, visual coding, and plans from free to $199/mo.
Gemini
chatbotGoogle's multimodal AI chatbot with the deepest ecosystem integration and largest context window (the amount of text AI can process at once)
Get notified about updates
We'll email you when this tool's pricing or features change.
Last updated: 2026-02-21