Gemma 3n: Open Source Multimodal AI

Hugging Face and Google have launched Gemma 3n, an open source multimodal large language model built to run efficiently on consumer hardware. Available in two compact sizes, E2B and E4B, it requires just 2–4 GB of GPU memory while delivering top-tier performance. Gemma 3n natively handles text, images, audio, and video through integrated MobileNet vision, USM audio, and a cutting-edge MatFormer backbone. It features per-layer embeddings and KV-cache sharing, driving impressive benchmarks over 1300 on LMArena—outperforming many sub-10B models.

Deeply integrated with Hugging Face’s open ecosystem, it supports straightforward fine-tuning, deployment, and community contributions. This release highlights a move toward high-quality, accessible, open source AI for developers and researchers on everyday devices.