Qwen3 Max Thinking: New Contender in Deep Reasoning

Łukasz Grochal

Qwen3 Max Thinking is Alibaba Cloud’s new high end reasoning model that aims to compete directly with US flagships like Gemini 3 Pro and GPT 5.2 on demanding benchmarks such as Humanity’s Last Exam (HLE). Instead of just sampling lots of answers and picking the best, it uses a heavier “test time scaling” mode that reuses previous reasoning steps, detects dead ends early and pushes compute only into the hardest parts of a problem. In practice this means multi round, experience based thinking that delivers strong gains on tough tests like GPQA (from 90.3 to 92.8) and LiveCodeBench v6 (from 88.0 to 91.4). On HLE with web search tools the model reaches 49.8 percent accuracy, ahead of Gemini 3 Pro at 45.8 and GPT 5.2 Thinking at 45.5, which is notable on a “Google proof” graduate level exam designed specifically to measure reasoning rather than memorization.

Qwen3 Max Thinking can also mix deep reasoning with web search, memory and a code interpreter in one flow, which helps reduce hallucinations and makes it well suited to agentic workflows like complex analysis or coding with live data. Technically it plugs into existing OpenAI and Anthropic style APIs, and economically it is positioned as cheaper than many US models, with token prices below Gemini 3 Pro and GPT 5.2 plus separate, usage based fees for search strategies and tools. For enterprises that are open to using a Chinese provider, this combination of benchmark wins, flexible tooling and aggressive pricing makes Qwen3 Max Thinking a serious alternative to the big US systems, although national security and compliance rules will still keep some American customers on domestic models.

References
2 sources
01
qwen.aiQwen
02
venturebeat.comVenture Beat
DeepSeek V4‑Pro 1.6T‑Parameter AI Model Architecture

DeepSeek V4: 1M‑Token Context and Budget Frontier AI Models

Palantir Manifesto Graphic: AI Defense and Culture Clash

Palantir Manifesto Hits at Regressive Cultures and AI Shift

OpenAI ChatGPT Images 2.0 feature overview

OpenAI Updates ChatGPT Images With Better Text

Publishers Are Shutting Out Internet Archive

News Giants Block Wayback Machine Over AI Fears

Claude Design Launch: Brand-Aware AI Prototyping Image

Anthropic Launches Claude Design to Rival Figma Tools

Qwen3.6 Coding Agent Benchmarks Chart Visual

Exploring Qwen3.6: Coding Benchmarks and Speed

Palantier Dilemma Human Rights vs Sercurity

Europe's Palantir Boom Amid Sovereignty and Rights Fears

Project Glasswing: Anthropic Mythos Zero-Day Exploit Finder Art

Claude Mythos Leak Ignites Fears of Unstoppable AI Exploits

OpenRouter LLM Leaderboard April

Chinese AI Models Dominate OpenRouter Top Six in Token Usage

Claude Code’s Big npm Leak

Inside the Claude Code Leak and Anthropic’s Agent Design