Kani TTS 2: Fast Open Voice Cloning on 3 GB VRAM

Łukasz Grochal

Kani TTS 2 is a compact open‑source text to speech model that aims to bring realistic voice synthesis and cloning to regular consumer hardware rather than datacenter‑class machines. With roughly 400 million parameters and a 22 kHz output, it runs in about 3 GB of VRAM while still offering zero‑shot voice cloning from a short reference clip and support for English and early multilingual variants. The system is built around a token based audio representation and a two stage pipeline that separates semantic/acoustic token generation from final waveform decoding, which helps it keep latency low enough for near real time conversations on modern GPUs. In practice, users report a real time factor around 0.2 on high end cards, so generating 10 seconds of speech takes roughly 2 seconds, and typical VRAM usage stays slightly under the advertised 3 GB limit.

The model also focuses on capturing accents and local speech styles, offering region flavored English voices and the option to mimic a custom speaker without per speaker fine tuning, which makes it attractive for tools, personal assistants and content workflows that need distinctive but reproducible voices.

At the same time, its relatively small size means it will not match very large proprietary systems in extreme expressiveness or multilingual coverage, and its strong voice cloning raises the usual ethical concerns around consent, impersonation and deepfake style misuse, so it is better seen as a practical, well engineered option in the open ecosystem rather than a magic solution to every TTS problem.

References
2 sources
01
kanitts.comKani TTS
02
huggingface.coHugging Face
DeepSeek V4‑Pro 1.6T‑Parameter AI Model Architecture

DeepSeek V4: 1M‑Token Context and Budget Frontier AI Models

Palantir Manifesto Graphic: AI Defense and Culture Clash

Palantir Manifesto Hits at Regressive Cultures and AI Shift

OpenAI ChatGPT Images 2.0 feature overview

OpenAI Updates ChatGPT Images With Better Text

Publishers Are Shutting Out Internet Archive

News Giants Block Wayback Machine Over AI Fears

Claude Design Launch: Brand-Aware AI Prototyping Image

Anthropic Launches Claude Design to Rival Figma Tools

Qwen3.6 Coding Agent Benchmarks Chart Visual

Exploring Qwen3.6: Coding Benchmarks and Speed

Palantier Dilemma Human Rights vs Sercurity

Europe's Palantir Boom Amid Sovereignty and Rights Fears

Project Glasswing: Anthropic Mythos Zero-Day Exploit Finder Art

Claude Mythos Leak Ignites Fears of Unstoppable AI Exploits

OpenRouter LLM Leaderboard April

Chinese AI Models Dominate OpenRouter Top Six in Token Usage

Claude Code’s Big npm Leak

Inside the Claude Code Leak and Anthropic’s Agent Design