News Giants Block Wayback Machine Over AI Fears

Łukasz Grochal
Generated by AI·FLUX.2

Lately, a bunch of big news publishers have started blocking the Internet Archive's crawlers, the ones that feed the Wayback Machine with snapshots of web pages. This tool has been saving over a trillion pages since the mid-90s, helping journalists, researchers, and courts check original versions of stories that get edited or pulled later. The main worry? AI companies might sneak in through the archive to grab content for training models without permission, even if publishers block direct scrapers. Outlets like NYT added bots like archive.org_bot to their robots.txt files late last year, and now at least 23 major sites, including USA Today and The Guardian, do the same. An analysis of over 1,100 news sites found 241 blocking at least one Archive bot, mostly Gannett-owned ones.

Why's this happening? Publishers want to protect their intellectual property and stop AI from using their journalism to build competing tools. NYT says they value human-led reporting and need lawful access control. The Guardian limited article access after spotting the Archive as a top crawler in logs, fearing it as a backdoor. It's not just theory; some evidence shows AI firms have tapped archives before, though not always proven for these sites. On the flip side, critics like EFF argue this won't halt AI but erases web history, leaving gaps where quality news vanishes while junk sites stay archived. Internet Archive founder Brewster Kahle warns it limits public access to records, hurting efforts against info chaos.

The balance is tricky: publishers fight real revenue threats from AI summaries stealing clicks, but blocking a nonprofit library risks future proof of events. No easy fix yet, as talks between Archive and outlets continue amid growing blocks. This could mean biased historical records, with big names missing from snapshots.

References
1 source
01
archive.orgInternet Archive
Claude Design Launch: Brand-Aware AI Prototyping Image

Anthropic Launches Claude Design to Rival Figma Tools

Qwen3.6 Coding Agent Benchmarks Chart Visual

Exploring Qwen3.6: Coding Benchmarks and Speed

Palantier Dilemma Human Rights vs Sercurity

Europe's Palantir Boom Amid Sovereignty and Rights Fears

Project Glasswing: Anthropic Mythos Zero-Day Exploit Finder Art

Claude Mythos Leak Ignites Fears of Unstoppable AI Exploits

OpenRouter LLM Leaderboard April

Chinese AI Models Dominate OpenRouter Top Six in Token Usage

Claude Code’s Big npm Leak

Inside the Claude Code Leak and Anthropic’s Agent Design

China AI accelerator card shipments vs NVIDIA 2025 chart

NVIDIA’s AI Chip Share in China Drops from 95% to 55%

TurboQuant KV Cache Compression Visualization

Google’s TurboQuant makes AI caches smaller and faster

Black Forest Labs FLUX.2 klein

FLUX.2 klein 9B-KV Explained: Speed, Quality, GPUs

Nvidia Slashes LLM Context Memory With KVTC Design

KVTC: Nvidia’s 20x LLM Memory Cut Without Retraining