Baidu’s Multimodal ERNIE AI: A Real Challenger to GPT-5

Baidu ERNIE-4.5-VL-28B-A3B-Thinking
Author: Łukasz Grochal

Baidu has just rolled out a new open-source multimodal AI model, ERNIE-4.5-VL-28B-A3B-Thinking, and it’s already making waves in the tech world. Designed for businesses and developers, the model handles text, images, diagrams, and even video analysis within a unified framework. Unlike many mainstream AIs, ERNIE is lightweight in operation, activating only 3 billion parameters per task, making it notably efficient for its size. On benchmarks, it beats heavy-hitters like GPT-5 and Gemini 2.5 Pro in areas like visual reasoning, chart and diagram interpretation, and multi-document analysis.

Key enterprise applications include extracting structured information from complex visuals, like surveillance footage or technical schematics, and managing tool use for automation tasks. With its Apache 2.0 license, the model is free for commercial use, and Baidu offers deployment kits for customization on proprietary data. While requiring significant hardware like an 80GB GPU card. ERNIE is pitched as a production-ready, agentic AI that shifts from just perceiving data to actually acting on it, marking a leap forward for multimodal AI in enterprise settings.