DeepSeek V4: 1M‑Token Context and Budget Frontier AI Models

Łukasz Grochal
Generated by AI·FLUX.2

DeepSeek V4 is a new family of large‑scale open‑weights AI models from the Chinese lab DeepSeek, released in April 2026 as a “preview” under the MIT license. It consists of two main variants: DeepSeek‑V4‑Pro (around 1.6 trillion total parameters, 49 billion active per token) and DeepSeek‑V4‑Flash (about 284 billion total parameters, 13 billion active). Both models support a 1 million‑token context window, use a Mixture‑of‑Experts (MoE) architecture, and are positioned as highly efficient, especially for long‑context workloads.

New technical tricks, such as a hybrid “CSA + HCA” attention mechanism and ultra‑sparse KV‑cache handling, let DeepSeek compress computation and memory in million‑token runs, so that even the giant Pro model uses only a fraction of the FLOPs and KV‑cache of its earlier V3.2. This efficiency is why DeepSeek can price V4‑Flash at about 0.14 USD per million input tokens and 0.28 USD per million output tokens, and V4‑Pro at 1.74 / 3.48 USD per million, making both significantly cheaper than comparable frontier models from OpenAI, Anthropic, and Google.

Benchmarks from third‑party sites such as Arena.ai and Vals AI put DeepSeek‑V4‑Pro among the top open‑source models, especially in code and agent‑style tasks. In some code‑arena rankings, V4‑Pro (thinking mode) lands in the top‑3 open‑source models and ahead of several closed‑source systems, while V4‑Flash scores over 10× higher than V3.2 on Vals’ Vibe Code Benchmark. Internal and external evaluations also suggest that V4‑Pro is roughly on par with leading closed‑source models like Gemini‑3.1‑Pro and GPT‑5‑ish systems in reasoning and math, though still a bit behind the absolute cutting‑edge, which DeepSeek itself frames as a 3–6‑month gap.

On the hardware side, DeepSeek emphasizes domestic Chinese compute: V4 is said to be the first trillion‑scale model trained on Huawei Ascend NPUs and other local chips, with fine‑grained expert‑parallelism optimizations that push inference speed up by roughly 1.5–1.7× on Ascend hardware. The Pro model’s raw size (over 1.6 trillion parameters) means it is aimed at data‑center‑scale deployments, while the smaller Flash variant is more suited for local or edge‑like stacks, with some enthusiasts expecting quantized Flash builds to run on high‑end laptops (for example, 128 GB RAM systems).

Controversies and debates around V4 are relatively mild so far: the main talking points are the price–performance jump versus U.S.‑style frontier models, questions about whether the long‑context features are practically useful for most users, and some skepticism from long‑time users who feel the day‑to‑day experience of Flash is not clearly better than V3.2. DeepSeek also openly warns that Pro‑tier throughput is still limited by high‑end compute availability, with hints that prices could drop further once newer Ascend 950 super‑node hardware rolls out in the second half of 2026.

Model / variant

Approx. size (total params)

Active params per token

Context length

Input price / 1M tokens (USD)

Output price / 1M tokens (USD)

Notable code‑bench remarks

DeepSeek‑V4‑Pro

~1.6 trillion

~49B

1M tokens

1.74

3.48

Top‑3 open‑source in code arena; 10× leap vs V3.2 on Vals Vibe Code

DeepSeek‑V4‑Flash

~284B

~13B

1M tokens

0.14

0.28

Lowest‑priced small‑scale model; strong in rapid coding tasks

Gemini‑3.1‑Pro (closed)

undisclosed

undisclosed

up to 1M

~2.00

~12.00

Often used as reference for “top‑tier” closed code‑reasoning

GPT‑5.4 (closed, large‑tier)

undisclosed

undisclosed

long‑context

~2.50

~15.00

Benchmark baseline for advanced reasoning and coding

Claude‑Sonnet‑4.6 (closed)

undisclosed

undisclosed

long‑context

~3.00

~15.00

Frequently compared to V4 in agent‑style coding runs

References
1 source
01
api-docs.deepseek.comDeepSeek API docs
Palantir Manifesto Graphic: AI Defense and Culture Clash

Palantir Manifesto Hits at Regressive Cultures and AI Shift

OpenAI ChatGPT Images 2.0 feature overview

OpenAI Updates ChatGPT Images With Better Text

Publishers Are Shutting Out Internet Archive

News Giants Block Wayback Machine Over AI Fears

Claude Design Launch: Brand-Aware AI Prototyping Image

Anthropic Launches Claude Design to Rival Figma Tools

Qwen3.6 Coding Agent Benchmarks Chart Visual

Exploring Qwen3.6: Coding Benchmarks and Speed

Palantier Dilemma Human Rights vs Sercurity

Europe's Palantir Boom Amid Sovereignty and Rights Fears

Project Glasswing: Anthropic Mythos Zero-Day Exploit Finder Art

Claude Mythos Leak Ignites Fears of Unstoppable AI Exploits

OpenRouter LLM Leaderboard April

Chinese AI Models Dominate OpenRouter Top Six in Token Usage

Claude Code’s Big npm Leak

Inside the Claude Code Leak and Anthropic’s Agent Design

China AI accelerator card shipments vs NVIDIA 2025 chart

NVIDIA’s AI Chip Share in China Drops from 95% to 55%