Qwen3 Max Thinking: New Contender in Deep Reasoning

Author: Łukasz Grochal

Qwen3 Max Thinking is Alibaba Cloud’s new high end reasoning model that aims to compete directly with US flagships like Gemini 3 Pro and GPT 5.2 on demanding benchmarks such as Humanity’s Last Exam (HLE). Instead of just sampling lots of answers and picking the best, it uses a heavier “test time scaling” mode that reuses previous reasoning steps, detects dead ends early and pushes compute only into the hardest parts of a problem. In practice this means multi round, experience based thinking that delivers strong gains on tough tests like GPQA (from 90.3 to 92.8) and LiveCodeBench v6 (from 88.0 to 91.4). On HLE with web search tools the model reaches 49.8 percent accuracy, ahead of Gemini 3 Pro at 45.8 and GPT 5.2 Thinking at 45.5, which is notable on a “Google proof” graduate level exam designed specifically to measure reasoning rather than memorization.

Qwen3 Max Thinking can also mix deep reasoning with web search, memory and a code interpreter in one flow, which helps reduce hallucinations and makes it well suited to agentic workflows like complex analysis or coding with live data. Technically it plugs into existing OpenAI and Anthropic style APIs, and economically it is positioned as cheaper than many US models, with token prices below Gemini 3 Pro and GPT 5.2 plus separate, usage based fees for search strategies and tools. For enterprises that are open to using a Chinese provider, this combination of benchmark wins, flexible tooling and aggressive pricing makes Qwen3 Max Thinking a serious alternative to the big US systems, although national security and compliance rules will still keep some American customers on domestic models.