Microsoft Maia 200 Rivals Google TPU, xAI Chips

Author: Łukasz Grochal

Microsoft's Maia 200 is the latest in-house AI accelerator from Azure, aimed at powering big model inference without relying too much on Nvidia. Built on TSMC's 3nm process with over 100 billion transistors, it packs 216GB of HBM3E memory at 7 TB/s bandwidth and 272MB on-chip SRAM. Performance-wise, it hits 10,145 teraFLOPS in FP4 (four times Amazon Trainium 3's level), 5,072 teraFLOPS in FP8 (beating Google's TPU v7), and 1,268 teraFLOPS in BF16, all at 880W TDP. What sets it apart is the smart memory setup that keeps model weights local, cutting down on hardware needs for huge models, plus Ethernet-based networking at 400 Gb/s for easy scaling across racks. It's rolling out first in US data centers for synthetic data gen, reinforcement learning, Microsoft Foundry, and 365 Copilot, with 30% better perf-per-dollar than prior gear.

Google's no slouch; their TPU v7 (used for Gemini 3 training) offers solid FP8 at 4,614 teraFLOPS and 192GB HBM3E, though Maia edges it in low-precision inference. Amazon's Trainium 3 lags in FP4/FP8 with lower bandwidth. Elon Musk's xAI is jumping in too, developing custom 3nm chips like X1 for inference while stockpiling Nvidia gear amid shortages.

Maia positions Microsoft to compete head-on in the hyperscaler chip race, focusing on efficiency for their ecosystem rather than broad sales. Nodes with four chips per setup are already live, promising lower TCO.