Inside MiniMax M2.5: MoE Design, Speed and Cost

Łukasz Grochal

MiniMax M2.5 is a new frontier‑class large language model from MiniMax that targets high‑end coding, agents and office workloads while keeping costs relatively low and speed very high. Built on a roughly 229B‑parameter Mixture of Experts architecture (MoE) with a long context window of around 200k tokens, it focuses on efficient task decomposition and fast tool‑using behavior rather than just raw size. In coding benchmarks like SWE‑Bench Verified it reaches about 80% and outperforms the previous M2.1 generation by roughly a third in speed, landing in the same performance band as top commercial models such as Claude Opus and recent GPT‑series systems for code and agentic workflows.

The “Lightning” serving variant reaches about 100 tokens per second and is priced at around 0.3 USD per million input tokens with relatively low output pricing, which makes it attractive for always‑on agents, search pipelines and large‑scale automation. Compared with competitors, M2.5 usually trades a bit of general reasoning and multimodal breadth for strong coding, efficient tool calling and good price‑to‑performance, so it fits best where execution speed, cost and integration into complex workflows matter more than being the single most capable generalist model.

References(3)
Sources
TurboQuant KV Cache Compression Visualization

Google’s TurboQuant makes AI caches smaller and faster

Black Forest Labs FLUX.2 klein

FLUX.2 klein 9B-KV Explained: Speed, Quality, GPUs

Nvidia Slashes LLM Context Memory With KVTC Design

KVTC: Nvidia’s 20x LLM Memory Cut Without Retraining

OpenAI Sora shutdown concept

Sora’s Short Life: Inside OpenAI’s Quiet Retreat

Stitch (stitch.withgoogle.com) experimental Google Labs tool

Google Stitch: From simple prompt to working app UI

Yann LeCun’s AMI vision for physically grounded AI

Yann LeCun’s AMI Lab Pioneers Physical‑World AI

Project Maven Dashboards Visualizing Targets and Risks

Claude, Palantir and Who Controls AI in Modern War

OpenSandbox Logo

OpenSandbox: A Unified Sandbox Layer For AI Agents

Qwen Beats gpt-oss-120B with Laptop Power

Alibaba's Tiny Qwen Beats Big OpenAI Model

QuitChatGPT – Street Art Mural

Is it time to quit ChatGPT? Inside the QuitGPT revolt