Qwen 3.5: Faster, Cheaper And More Multimodal AI

Łukasz Grochal

Qwen 3.5 is Alibaba’s new flagship family of open and hosted models aimed at “native multimodal agents” that can reason over text, images, and long videos while also acting as autonomous agents on phones and PCs. The main open-weight version, Qwen3.5‑397B‑A17B, uses a sparse mixture‑of‑experts (MoE) architecture with 397B total parameters but only 17B active per token, which keeps inference cost relatively low while pushing performance close to or on par with frontier US models from OpenAI, Anthropic, and Google on many public benchmarks. Benchmarks in knowledge and reasoning areas like MMLU‑Pro, GPQA and instruction‑following tests such as IFBench show Qwen 3.5 matching or beating some current US models in specific tasks, although leadership still varies by metric.

The model is natively multimodal, with early fusion of text and vision, improved spatial reasoning, better OCR and document understanding, and support for videos up to roughly two hours, which enables use cases like GUI agents, video‑to‑code, and complex visual automation workflows. Qwen 3.5 also pushes hard on long‑context and agentic use: the cloud “Plus” variant exposes a 1M‑token context window with built‑in tools for web search and code execution, while reinforcement‑learning‑driven post‑training focuses on multi‑step agents that can plan, call tools and operate in realistic environments.

Compared with the earlier Qwen3 line, throughput has reportedly improved by up to about 8x for large workloads, and the base model’s knowledge and coding scores move into the same band as much larger systems like the 1T‑parameter Qwen3‑Max or leading closed Chinese competitors. On the ecosystem side, Alibaba continues to lean into open‑weights distribution via Hugging Face and ModelScope, and has expanded language coverage from 119 to 201 languages and dialects, which helps Qwen 3.5 function as a global open model standard rather than a China‑only stack. Media coverage highlights that this release tightens the gap with US leaders, especially in open‑source, even though some of the direct comparisons still use slightly earlier US model versions and real‑world deployment remains the key metric rather than download counts alone.

Overall, Qwen 3.5 positions Alibaba as one of the few players outside the US with a broadly competitive, multimodal, agent‑ready stack that is available both as open weights for local deployment and as a managed cloud service.

Key new features

  • Native multimodal input and output across text, images, documents and long videos, with stronger OCR, spatial reasoning and video understanding than the previous generation.​
  • Hybrid MoE architecture (Gated DeltaNet plus sparse experts) that activates only about 17B parameters per token while maintaining 397B total, improving efficiency at near‑frontier capability.​
  • Up to 1M‑token context window in the hosted Plus version, enabling book‑scale or multi‑hour session reasoning and more persistent agent workflows.​
  • Expanded multilingual support to 201 languages and dialects, including many low‑resource tongues, and a larger 250k tokenizer vocabulary for faster encoding and decoding.​
  • Strong gains on agent benchmarks like BFCL‑V4, DeepPlanning, Tool‑Decathlon and BrowseComp through large‑scale reinforcement learning in multi‑tool environments.
  • Open‑weight release on Hugging Face and ModelScope, with Alibaba positioning Qwen as a default open model family for developers who want both local and cloud options.​

How much it closes the US gap

  • Benchmarks published by Alibaba and secondary outlets suggest Qwen 3.5 now sits in roughly the same performance band as models like Gemini 3 Pro, Claude 4.5 and GPT‑5.2 on several reasoning, multilingual and instruction‑following suites, sometimes leading, sometimes trailing.
  • The biggest advances are in instruction following and agent benchmarks, where Qwen 3.5 reportedly sets or matches state‑of‑the‑art among widely reported systems, which strengthens China’s position in the high‑end open‑source segment.
  • At the same time, coverage stresses that many comparisons use earlier or specific variants of US models and that open‑model download metrics do not directly translate into production usage, so a full convergence with the US ecosystem cannot be assumed yet.
References
4 sources
01
qwen.aiQwen.AI
02
github.comGitHub
03
huggingface.coHugging Face
04
modelscope.cnModelScope
DeepSeek V4‑Pro 1.6T‑Parameter AI Model Architecture

DeepSeek V4: 1M‑Token Context and Budget Frontier AI Models

Palantir Manifesto Graphic: AI Defense and Culture Clash

Palantir Manifesto Hits at Regressive Cultures and AI Shift

OpenAI ChatGPT Images 2.0 feature overview

OpenAI Updates ChatGPT Images With Better Text

Publishers Are Shutting Out Internet Archive

News Giants Block Wayback Machine Over AI Fears

Claude Design Launch: Brand-Aware AI Prototyping Image

Anthropic Launches Claude Design to Rival Figma Tools

Qwen3.6 Coding Agent Benchmarks Chart Visual

Exploring Qwen3.6: Coding Benchmarks and Speed

Palantier Dilemma Human Rights vs Sercurity

Europe's Palantir Boom Amid Sovereignty and Rights Fears

Project Glasswing: Anthropic Mythos Zero-Day Exploit Finder Art

Claude Mythos Leak Ignites Fears of Unstoppable AI Exploits

OpenRouter LLM Leaderboard April

Chinese AI Models Dominate OpenRouter Top Six in Token Usage

Claude Code’s Big npm Leak

Inside the Claude Code Leak and Anthropic’s Agent Design

Qwen 3.5: Faster, Cheaper And More Multimodal AI | LucasGraphic