Google Gemma
open-sourceby google
Google DeepMind's open weight LLM family built from the same research as Gemini, now in its fourth generation with sizes from E2B to 31B parameters.
Key features
Free tier available
Developers wanting to run AI models locally on their own hardware without API dependencies or usage fees
Completely free to download and use, with no API fees or usage limits when self hosted
What it does
Open Weights
Download the full model weights and run Gemma anywhere: your laptop, a cloud server, a mobile device, or an edge appliance. No API dependency, no usage fees, complete control.
Learn moreMultimodal Input (Text, Image, Video, Audio)
Gemma 4 models natively process text, images, and video with variable resolution support. The E2B and E4B edge models also feature native audio input for speech recognition and understanding.
Up to 256K Token Context Window
Process long documents, codebases, and extended conversations with a 256K token context window on the 26B and 31B models, and 128K on the E2B and E4B edge models.
140+ Language Support
Gemma 4 supports over 140 languages out of the box, making it one of the most multilingual open model families available.
Quantization Aware Training
Gemma checkpoints are trained with quantization awareness, meaning the models maintain high quality even when compressed to lower bit formats (INT4, INT8) for efficient deployment.
Learn moreFunction Calling and Agentic Workflows
Gemma 4 features native support for function calling, structured JSON output, and system instructions, enabling you to build autonomous agents that interact with tools and APIs and execute multi step workflows reliably.
Runs on Consumer Hardware
Gemma 4 models are sized to run and fine tune efficiently on hardware ranging from billions of Android devices to laptop GPUs and developer workstations. Quantized versions of the 26B and 31B run natively on consumer GPUs.
Learn moreGemma 4 E2B and E4B for Mobile and On Device
Edge optimized models that activate an effective 2 billion and 4 billion parameter footprint during inference to preserve RAM and battery life. They support text, image, video, and audio input, run completely offline with near zero latency on phones, Raspberry Pi, and NVIDIA Jetson Orin Nano.
Specialized Variants
A growing ecosystem of purpose built models: MedGemma (medical), CodeGemma (code generation), PaliGemma (vision and language), ShieldGemma (safety classification), FunctionGemma (on device function calling and tool use), and TranslateGemma (translation).
Learn moreAvailable Everywhere
Download from Hugging Face, run via Ollama, experiment on Kaggle, or use through Google AI Studio and Vertex AI. Gemma integrates into every major ML ecosystem and inference framework.
Fine Tuning Friendly
All Gemma models support LoRA, QLoRA, and full parameter fine tuning. Gemma 4 can be fine tuned on platforms ranging from Google Colab and Vertex AI to consumer gaming GPUs, with day one support from Unsloth, Keras, and Hugging Face TRL.
Learn morePricing
Open Weight
Gemma is completely free to download and run. All model weights are available on Hugging Face, Kaggle, and other platforms at no cost. You only pay for the compute hardware you choose to run the models on, whether that is your own laptop, a cloud GPU, or a managed inference service.
- All model sizes (E2B, E4B, 26B, 31B) free to download
- Full model weights on Hugging Face, Kaggle, and Ollama
- Run locally via Ollama, llama.cpp, vLLM, LM Studio, or any compatible framework
- Apache 2.0 license with full commercial use and open source flexibility
- Fine tuning and modification allowed
- No API fees or usage limits when self hosted
- Community of 100,000+ variants and fine tunes
Pros & Cons
Pros
- Completely free to download and use, with no API fees or usage limits when self hosted
- Runs on consumer hardware: quantized Gemma 4 models run natively on consumer GPUs, and E2B/E4B run on smartphones and edge devices
- Up to 256K token context window on Gemma 4 larger models (128K on edge models), competitive with much larger proprietary models
- Multimodal input (text, image, video) across all Gemma 4 sizes, plus native audio on E2B/E4B edge models
- Massive community with 400M+ downloads and 100,000+ fine tuned variants on Hugging Face
- Specialized variants for medical, safety, code, vision, and translation tasks, ready to use out of the box
- Quantization aware trained checkpoints maintain high quality even at INT4 precision, maximizing hardware efficiency
- Built from the same research as Gemini, delivering frontier quality relative to model size
Cons
- Gemma 4 is Apache 2.0, but older Gemma 3 and earlier models remain under the more restrictive Gemma Terms of Use, so check which generation you are using
- Smaller than proprietary Gemini models (31B max vs Gemini's much larger architectures), so raw capability has a ceiling
- As an open weight model you run locally, there is no live data access, so responses are limited to the model's training data cutoff
- Audio input is only available on the E2B and E4B edge models; the 26B and 31B models do not support native audio
- Requires technical setup to run locally: downloading models, configuring inference frameworks, and managing hardware resources
- The E2B edge model is limited in reasoning capability compared to the larger sizes and is best suited for simple on device tasks or fine tuned applications
How to get started
Choose your model size
Select the Gemma 4 variant that fits your hardware. The E2B and E4B models run on smartphones, Raspberry Pi, and edge devices. The 26B MoE model (4B active parameters) focuses on inference speed and fits on consumer GPUs. The 31B Dense model maximizes quality and fits on a single 80GB GPU in bfloat16, or on consumer GPUs when quantized.
Run with Ollama (easiest path)
Install Ollama, then run 'ollama run gemma4' to start chatting with Gemma 4 locally. Ollama handles downloading, quantization, and serving automatically. Choose the model variant that fits your hardware from the Ollama library.
Try in Google AI Studio (no setup)
If you want to experiment without any local setup, use Google AI Studio to try Gemma 4 31B and 26B directly in the browser. This is useful for testing prompts and capabilities before committing to a local deployment.
Fine tune for your use case
Once you have selected a base model, fine tune it on your own data using LoRA or QLoRA for maximum customization. Hugging Face TRL, Unsloth, and Keras all support Gemma 4 fine tuning. You can fine tune on platforms from Google Colab and Vertex AI to consumer gaming GPUs.
Deep dive
Detailed guides with comparisons, tips, and visuals for each feature.
Model Sizes and Hardware Requirements
Comparison of all Gemma 4 sizes from E2B to 31B, including hardware requirements, benchmarks, and guidance on when to use each.
The Gemma Ecosystem: Specialized Variants and Community
The expanding universe of Gemma variants: MedGemma, CodeGemma, PaliGemma, ShieldGemma, and the thriving community of 100,000+ fine tunes.
Gemma vs Other Open Models
How Gemma compares to Llama, DeepSeek, Mistral, Qwen, and Phi across licensing, performance, ecosystem, and hardware requirements.
Links
Similar Tools
DeepSeek
deepseek
chatbotFrontier AI models (V4 Pro and V4 Flash) that are completely free to chat with, the cheapest to build on, and fully open source under the MIT license.
Kimi
moonshot-ai
chatbotPowerful AI assistant with Agent Swarm, 256K context, multimodal vision, visual coding, and plans from free to $199/mo. Powered by K2.6 (and K2.5), built by the fastest decacorn in AI.
Gemini
Google's multimodal AI chatbot with the deepest ecosystem integration and largest context window (the amount of text AI can process at once)
Get notified about updates
We'll email you when this tool's pricing or features change.
Last updated: 2026-06-01