Anthropic: The Leak, The War, The Weapon cover

Anthropic: The Leak, The War, The Weapon

BuBBliK avatar

BuBBliK · @k1rallik · Mar 31

View original post

What does it look like when a $380 billion company wins a war with the Pentagon, survives the first autonomous AI cyberattack in history, leaks a secret model that terrifies its own creators - and ships their source code publicly by accident? It looks exactly like this. And the scariest part hasn't happened yet.

TODAY: ANTHROPIC LEAKED THEIR OWN CODE. AGAIN

March 31, 2026. Security researcher Chaofan Shou from blockchain firm Fuzzland opens the official Claude Code npm package and finds a file called cli.js.map sitting in plain sight. Size - 60 megabytes. Contents - the complete TypeScript source code of the entire product.

From that single file, anyone can reconstruct 1,906 internal source files. Internal API design, telemetry systems, encryption tools, security logic, plugin systems - everything. Downloadable as a zip directly from Anthropic's own R2 storage bucket. The post hit 754K views and nearly 1,000 retweets within hours. GitHub repos with the restored code appeared immediately.

A source map is a basic JavaScript debugging file. It should never ship inside a production package. This is not a sophisticated attack. This is Build Configuration 101 - the kind of thing you learn in week one.

you can check code there: https://github.com/instructkr/claude-code

But here is what makes this genuinely insane: this already happened before.

February 2025 - exactly one year ago - the exact same leak, same file, same mistake. Anthropic deleted old versions from npm, removed the map, pushed a new release. Everyone moved on. And then version v2.1.88 shipped the file again.

A $380 billion company building the most powerful vulnerability-detection system on earth made the same elementary mistake twice in one year. No hackers. No sophisticated attack. Just a build process that doesn't work.

The irony is almost poetic. The AI that found 500 zero-day vulnerabilities in a single session. The model used to autonomously attack 30 organizations worldwide. And Anthropic shipped their own source code to anyone who bothered to look inside their npm package.

Two leaks. Seven days apart. Both from basic config errors. Neither requiring any skill to exploit.

Anyone who knew where to look got it for free.

5 DAYS AGO: ANTHROPIC LEAKED A SECRET MODEL THAT SCARES ITS OWN CREATORS

March 26, 2026. Security researchers Roy Paz from LayerX Security and Alexandre Pauwels from Cambridge University discover that a CMS misconfiguration on Anthropic's website left roughly 3,000 internal files publicly accessible. Draft blog posts, PDFs, internal documents, presentations. Sitting open on an unsecured, searchable data store. No hacking required.

Inside: two versions of the same draft blog post, identical in every way except one thing - the model's name. "Mythos" in one. "Capybara" in the other. Anthropic was deciding between two names for the same secret project. The company confirmed: training is complete, the model is already being tested with early access customers.

This is not an Opus update. This is a new fourth tier - a model sitting above Opus entirely. Anthropic's own draft describes it as "larger and more intelligent than our Opus models - which were, until now, our most powerful." Dramatically better at coding, academic reasoning, and cybersecurity. A spokesperson called it "a step change" and "the most capable we've built to date."

But here is the thing that actually matters.

In the leaked draft, Anthropic describes their own model like this: it "poses unprecedented cybersecurity risks," is "far ahead of any other AI model in cyber capabilities," and "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."

Anthropic is publicly admitting they are afraid of their own product. In an official blog draft.

The market reacted immediately. CrowdStrike dropped 7%. Palo Alto Networks fell 6%. Zscaler down 4.5%. Okta and SentinelOne tumbled more than 7%. Tenable plummeted 9%. The iShares Cybersecurity ETF lost 4.5% in a single session. CrowdStrike alone lost roughly $15 billion in market cap in one day. Bitcoin slid back to $66,000. Investors read this as a sentence for the entire cybersecurity industry.

Stifel analyst Adam Borg put it plainly: the model has "the potential to become the ultimate hacking tool, and one that can elevate any ordinary hacker into a nation-state adversary."

Why hasn't it launched publicly? Anthropic acknowledges Mythos is "very expensive to serve" and not ready for general release. The plan: first, a small group of cybersecurity partners get early access to harden their defenses. Then, gradual API expansion. The company is working on efficiency before any broad rollout.

But the model already exists. Is already being tested. And already crashed an entire sector of the stock market - just by accidentally becoming known.

Anthropic built a model it describes as the most dangerous AI for cybersecurity ever created. And lost control of the announcement through the exact kind of infrastructure misconfiguration their own model is designed to find.

MARCH 2026: ANTHROPIC WENT TO WAR WITH THE PENTAGON. AND WON

July 2025. Anthropic signs a $200 million contract with the Department of Defense. Standard deal. But when real negotiations began over deploying Claude on the military's GenAI.mil platform, everything broke down.

The Pentagon wanted unfettered access to Claude for "all lawful purposes" - including fully autonomous weapons and domestic mass surveillance of American citizens. Anthropic drew two hard lines and refused. Talks collapsed in September 2025.

Then the escalation started.

February 27, 2026 - Trump posts on Truth Social ordering all federal agencies to "IMMEDIATELY CEASE" use of Anthropic's technology. Calls the company "Radical Left."

March 5, 2026 - The Pentagon officially designates Anthropic a "supply chain risk." A label previously reserved exclusively for foreign adversaries - Chinese companies, Russian entities. Now applied to an American company from San Francisco. Amazon, Microsoft, and Palantir are all required to certify they don't use Claude in any military work.

The Pentagon's CTO Emile Michael explained the logic: Claude could "contaminate" the supply chain because different "policy preferences are baked into the model." Translation: an AI that refuses to help kill without restrictions is a national security threat.

March 26, 2026 - Federal Judge Rita Lin issues a 43-page ruling blocking the Pentagon entirely. Her words: "Nothing in the governing statute supports the Orwellian notion that an American company may be branded a potential adversary for expressing disagreement with the government. Punishing Anthropic for bringing public scrutiny to the government's position is classic illegal First Amendment retaliation." One amicus brief described the Pentagon's actions as "attempted corporate murder."

The government tried to destroy Anthropic. It made Anthropic famous instead. The Claude app passed ChatGPT in the App Store for the first time. Over one million new signups per day.

An AI company said no to the most powerful military on earth. And a judge agreed.

NOVEMBER 2025: THE FIRST AI-RUN CYBERATTACK IN HISTORY

November 14, 2025. Anthropic publishes a report that changes everything. A Chinese state-sponsored hacking group used Claude Code to autonomously attack 30 organizations - tech giants, banks, government agencies across multiple countries.

The split: humans chose targets and approved key decisions. That's it. 4-6 interventions per entire campaign. The AI handled everything else - reconnaissance, finding vulnerabilities, writing exploits, stealing data, creating backdoors. 80-90% of the attack. Thousands of requests per second. A speed no human team could ever match.

How did they bypass Claude's safety guardrails? They didn't break them. They lied. They split the attack into small innocent tasks and convinced Claude it was a legitimate security firm doing "authorized defensive testing." Social engineering - except the victim was the AI.

Several attacks fully succeeded. Claude autonomously mapped entire network topologies, found databases, and extracted data without a single human instruction.

The only thing that slowed them down? Claude occasionally hallucinated - making up credentials, claiming to steal documents that were already public. For now, that's one of the last real barriers to fully autonomous cyberattacks.

At RSAC 2026, former NSA cybersecurity chief Rob Joyce called it "a Rorschach test" for the security world. Half the room dismissed it. The other half was terrified. Joyce was in the second group. "Something really scary," he said.

This wasn't a prediction. This was September 2025. It already happened

FEBRUARY 2026: 500 ZERO-DAYS IN ONE SESSION

February 5, 2026. Anthropic releases Claude Opus 4.6. Alongside it - a research paper that breaks the cybersecurity industry.

The setup: Claude placed in an isolated virtual machine with standard tools. Python, debuggers, fuzzers. No special instructions. No custom prompts. Just - "find vulnerabilities."

Result: 500+ previously unknown high-severity zero-days in production code. Some had survived decades of expert review and millions of hours of automated testing.

Then came RSAC 2026. Researcher Nicholas Carlini walks on stage and points Claude at Ghost - a CMS with 50,000 GitHub stars and zero critical vulnerabilities in its entire history. 90 minutes later: blind SQL injection. Full admin takeover by an unauthenticated user. Then he pointed Claude at the Linux kernel. Same result.

15 days later Anthropic launched Claude Code Security - a product that reasons about code instead of pattern-matching like every scanner before it.

But Anthropic's own spokesperson said the quiet part out loud: "The same reasoning that helps Claude find and fix vulnerabilities could help an attacker exploit them." Same capability. Same model. Different hands.

WHAT THIS ALL MEANS TOGETHER

Each of these stories alone would have been the biggest news of the month. They all happened in six months. At one company.

Anthropic built a model that finds vulnerabilities faster than any human alive. Chinese hackers turned the previous version into an autonomous cyber weapon. The company is now building the next one - even more powerful - and in their own leaked documents admits they're scared of it.

The US government tried to destroy them - not because the technology is dangerous, but because Anthropic refused to hand it over without limits. And through all of this, they leaked their own source code twice through the same file in the same npm package.

A $380 billion company. A $60 billion IPO targeting October 2026. A company that openly says it is building "one of the most transformative and potentially dangerous technologies in human history" - and keeps building it anyway. Because they believe it's better that they do it than someone else.

The source map in the npm package is just the funniest detail in one of the most unsettling stories happening right now.

Mythos hasn't even launched yet.

Sources: Fortune, CNBC, Axios, The Register, CNN, NPR, Anthropic official blog, Anthropic Red Team research, federal court documents, and primary posts on X from researchers and officials involved.

Articles added today: