How DeepSeek Trains Powerful Models On A Budget

Łukasz Grochal

DeepSeek is pushing a new architecture idea called Manifold Constrained Hyper Connections that tries to fix a very practical headache in large models: training becomes unstable and expensive once you start widening residual streams and adding fancy connectivity patterns. Classic residual connections are stable but rigid, while newer hyper connection style designs boost accuracy at the cost of instability, memory overhead and poor scaling, which quickly translates into huge GPU and power bills. DeepSeek’s trick is to constrain the residual mapping onto a specific mathematical manifold, using doubly stochastic matrices and infrastructure level optimizations so signals stay well behaved even in deep, wide models.

In tests on models from roughly 3 to 27 billion parameters, this framework showed better scaling and efficiency, hinting that you can get more capability per watt instead of relying only on massive clusters. That fits the broader story around DeepSeek, which already surprised the industry with the low cost reasoning focused R1 family and keeps iterating up to the current 3.2 line while working on the flagship R2 model expected around Chinese New Year, a release many analysts think could again shake up the LLM leaderboard despite US export controls on advanced chips.

References(4)
Sources
Palantier Dilemma Human Rights vs Sercurity

Europe's Palantir Boom Amid Sovereignty and Rights Fears

Project Glasswing: Anthropic Mythos Zero-Day Exploit Finder Art

Claude Mythos Leak Ignites Fears of Unstoppable AI Exploits

OpenRouter LLM Leaderboard April

Chinese AI Models Dominate OpenRouter Top Six in Token Usage

Claude Code’s Big npm Leak

Inside the Claude Code Leak and Anthropic’s Agent Design

China AI accelerator card shipments vs NVIDIA 2025 chart

NVIDIA’s AI Chip Share in China Drops from 95% to 55%

TurboQuant KV Cache Compression Visualization

Google’s TurboQuant makes AI caches smaller and faster

Black Forest Labs FLUX.2 klein

FLUX.2 klein 9B-KV Explained: Speed, Quality, GPUs

Nvidia Slashes LLM Context Memory With KVTC Design

KVTC: Nvidia’s 20x LLM Memory Cut Without Retraining

OpenAI Sora shutdown concept

Sora’s Short Life: Inside OpenAI’s Quiet Retreat

Stitch (stitch.withgoogle.com) experimental Google Labs tool

Google Stitch: From simple prompt to working app UI