Google Gemma logo

Google Gemma

open-source

by google

Google DeepMind's open weight LLM family built from the same research as Gemini, available in sizes from 270M to 27B parameters.

Key features

Open Weights
Multimodal Input (Text + Image)
128K Token Context Window
140+ Language Support
Quantization Aware Training
Function Calling
Pricing

Free tier available

Best For

Developers wanting to run AI models locally on their own hardware without API dependencies or usage fees

Verdict

Completely free to download and use, with no API fees or usage limits when self hosted

What it does

Open Weights

Download the full model weights and run Gemma anywhere: your laptop, a cloud server, a mobile device, or an edge appliance. No API dependency, no usage fees, complete control.

Learn more

Multimodal Input (Text + Image)

Gemma 3 models at 4B parameters and above accept both text and image inputs, enabling visual question answering, image captioning, document understanding, and multimodal reasoning.

128K Token Context Window

Process long documents, codebases, and extended conversations with a 128K token context window across all Gemma 3 sizes.

140+ Language Support

Gemma 3 supports over 140 languages out of the box, making it one of the most multilingual open model families available.

Quantization Aware Training

Gemma 3 checkpoints are trained with quantization awareness, meaning the models maintain high quality even when compressed to lower bit formats (INT4, INT8) for efficient deployment.

Learn more

Function Calling

Gemma 3 supports structured function calling, enabling the model to invoke external tools, APIs, and databases as part of its response workflow.

Runs on Consumer Hardware

The Gemma 3 4B model fits in just 3.4GB at Q4 quantization, making it feasible to run on laptops, desktops, and even smartphones without a dedicated GPU.

Learn more

Gemma 3n for Mobile and On Device

A specialized variant designed for mobile and edge deployment, offering E2B and E4B configurations with audio input support. Optimized for on device inference with minimal resource consumption.

Specialized Variants

A growing ecosystem of purpose built models: MedGemma (medical), CodeGemma (code generation), PaliGemma (vision and language), ShieldGemma (safety classification), FunctionGemma (tool use), and TranslateGemma (translation).

Learn more

Available Everywhere

Download from Hugging Face, run via Ollama, experiment on Kaggle, or use through Google AI Studio and Vertex AI. Gemma integrates into every major ML ecosystem and inference framework.

Fine Tuning Friendly

All Gemma models support LoRA, QLoRA, and full parameter fine tuning. The smaller sizes (270M, 1B, 4B) are particularly well suited for custom fine tuning on consumer hardware.

Learn more

Pricing

Open Weight

Free

Gemma is completely free to download and run. All model weights are available on Hugging Face, Kaggle, and other platforms at no cost. You only pay for the compute hardware you choose to run the models on, whether that is your own laptop, a cloud GPU, or a managed inference service.

  • All model sizes (270M to 27B) free to download
  • Full model weights on Hugging Face and Kaggle
  • Run locally via Ollama, llama.cpp, vLLM, or any compatible framework
  • Commercial use permitted under Gemma Terms of Use
  • Fine tuning and modification allowed
  • No API fees or usage limits when self hosted
  • Community of 60,000+ variants and fine tunes

Pros & Cons

Pros

  • Completely free to download and use, with no API fees or usage limits when self hosted
  • Runs on consumer hardware: the 4B model fits in 3.4GB at Q4, feasible on laptops and smartphones
  • 128K token context window across all Gemma 3 sizes, competitive with much larger proprietary models
  • Multimodal input (text and image) starting at the 4B parameter size
  • Massive community with 100M+ downloads and 60,000+ fine tuned variants on Hugging Face
  • Specialized variants for medical, safety, code, vision, and translation tasks, ready to use out of the box
  • Quantization aware trained checkpoints maintain high quality even at INT4 precision, maximizing hardware efficiency
  • Built from the same research as Gemini, delivering frontier quality relative to model size

Cons

  • Not Apache 2.0: the Gemma Terms of Use allow commercial use but include restrictions on certain applications, so review the license carefully
  • Smaller than proprietary Gemini models (27B max vs Gemini's much larger architectures), so raw capability has a ceiling
  • Knowledge cutoff of August 2024, meaning the model has no awareness of events after that date
  • No native audio input or output support (except Gemma 3n which supports audio input only)
  • Requires technical setup to run locally: downloading models, configuring inference frameworks, and managing hardware resources
  • The smallest sizes (270M, 1B) are limited in reasoning capability and best suited for simple tasks or fine tuned applications

How to get started

1

Choose your model size

Select the Gemma 3 variant that fits your hardware. The 4B model (3.4GB at Q4) works on most modern laptops. The 12B model needs a dedicated GPU with 8GB+ VRAM. The 27B model requires a high end GPU with 16GB+ VRAM. The 270M and 1B models run on virtually any device, including smartphones.

2

Run with Ollama (easiest path)

Install Ollama, then run 'ollama run gemma3:4b' to start chatting with Gemma 3 locally. Ollama handles downloading, quantization, and serving automatically. For the 12B model, use 'ollama run gemma3:12b'. For 27B, use 'ollama run gemma3:27b'.

3

Try in Google AI Studio (no setup)

If you want to experiment without any local setup, use Google AI Studio to try Gemma models directly in the browser. This is useful for testing prompts and capabilities before committing to a local deployment.

4

Fine tune for your use case

Once you have selected a base model, fine tune it on your own data using LoRA or QLoRA for maximum customization. Hugging Face PEFT, Unsloth, and Keras all support Gemma fine tuning. The 4B and smaller models are particularly accessible for fine tuning on consumer GPUs.

Deep dive

Detailed guides with comparisons, tips, and visuals for each feature.

Get notified about updates

We'll email you when this tool's pricing or features change.

Last updated: 2026-02-21