Baidu’s Multimodal ERNIE AI: A Real Challenger to GPT-5

Łukasz Grochal

Baidu has just rolled out a new open-source multimodal AI model, ERNIE-4.5-VL-28B-A3B-Thinking, and it’s already making waves in the tech world. Designed for businesses and developers, the model handles text, images, diagrams, and even video analysis within a unified framework. Unlike many mainstream AIs, ERNIE is lightweight in operation, activating only 3 billion parameters per task, making it notably efficient for its size. On benchmarks, it beats heavy-hitters like GPT-5 and Gemini 2.5 Pro in areas like visual reasoning, chart and diagram interpretation, and multi-document analysis.

Key enterprise applications include extracting structured information from complex visuals, like surveillance footage or technical schematics, and managing tool use for automation tasks. With its Apache 2.0 license, the model is free for commercial use, and Baidu offers deployment kits for customization on proprietary data. While requiring significant hardware like an 80GB GPU card. ERNIE is pitched as a production-ready, agentic AI that shifts from just perceiving data to actually acting on it, marking a leap forward for multimodal AI in enterprise settings.

References
2 sources
01
huggingface.coHuggingFace
02
github.comGitHub
DeepSeek V4‑Pro 1.6T‑Parameter AI Model Architecture

DeepSeek V4: 1M‑Token Context and Budget Frontier AI Models

Palantir Manifesto Graphic: AI Defense and Culture Clash

Palantir Manifesto Hits at Regressive Cultures and AI Shift

OpenAI ChatGPT Images 2.0 feature overview

OpenAI Updates ChatGPT Images With Better Text

Publishers Are Shutting Out Internet Archive

News Giants Block Wayback Machine Over AI Fears

Claude Design Launch: Brand-Aware AI Prototyping Image

Anthropic Launches Claude Design to Rival Figma Tools

Qwen3.6 Coding Agent Benchmarks Chart Visual

Exploring Qwen3.6: Coding Benchmarks and Speed

Palantier Dilemma Human Rights vs Sercurity

Europe's Palantir Boom Amid Sovereignty and Rights Fears

Project Glasswing: Anthropic Mythos Zero-Day Exploit Finder Art

Claude Mythos Leak Ignites Fears of Unstoppable AI Exploits

OpenRouter LLM Leaderboard April

Chinese AI Models Dominate OpenRouter Top Six in Token Usage

Claude Code’s Big npm Leak

Inside the Claude Code Leak and Anthropic’s Agent Design