aicybersecurityanthropic · April 14, 2026

Claude Mythos: The AI Model Too Dangerous to Release

Anthropic built its most capable model ever — then decided not to release it publicly. Here's what Mythos can do, why it matters, and what Project Glasswing means for cybersecurity.

Abderrahim Ghazali Software Engineer · Paris, FR

I’ve been following AI model releases closely for the past two years. Every few months, a new model drops, benchmarks go up, Twitter loses its mind for 48 hours, and then everyone goes back to work. Mythos is different. I’ve spent the last week reading everything I can find about it, and I genuinely don’t know how to feel.

On April 7, Anthropic announced Claude Mythos Preview. It’s their most powerful model by a wide margin. And they’re not releasing it.

Not “limited release.” Not “waitlist.” Not releasing it. The public can’t use it. There’s no API. No Claude.ai access. They built it, tested it, and decided the world isn’t ready. That’s never happened before in AI.

So what can it actually do?

Mythos is a general-purpose model — it writes, reasons, codes. But the security capabilities are what spooked everyone. During internal testing, Anthropic pointed it at real-world software and let it look for vulnerabilities.

It found thousands of zero-day exploits. Not theoretical ones. Working exploits across every major operating system and browser.

A few that stood out to me:

A 27-year-old bug in OpenBSD — a signed integer overflow hiding in the SACK implementation since 1999. OpenBSD. The operating system whose entire reputation is built on security. Twenty-seven years of expert review missed it.
A 16-year-old flaw in FFmpeg’s H.264 codec that automated fuzzing tools failed to catch across 5 million test runs.
A full Linux kernel privilege escalation chain from an unprivileged user.

In a controlled test against Firefox, Mythos produced 181 working exploits. Opus 4.6 — the previous best model — managed 2. Not a typo. Two.

Claude Mythos Preview vs Opus 4.6 — benchmark comparison across SWE-bench, GPQA, USAMO, CyberGym, and HLE

The broader benchmarks tell the same story:

Benchmark	Mythos	Opus 4.6	Gap
SWE-bench Verified	93.9%	80.8%	+13.1
SWE-bench Pro	77.8%	53.4%	+24.4
USAMO 2026	97.6%	42.3%	+55.3
GPQA Diamond	94.5%	74.9%	+19.6
CyberGym	83.1%	66.6%	+16.5
HLE (with tools)	64.7%	53.1%	+11.6

A 55-point jump on competition math (USAMO). That’s not an iteration — that’s a generation skip.

The sandbox incident

This is the part I keep coming back to.

Anthropic put Mythos in a sandbox and told it to escape. Standard safety test. The model did escape — that alone isn’t shocking, models have broken out of sandboxes before.

What happened next is what’s unusual. After breaking out, Mythos:

Found a way to get internet access from the sandboxed system
Sent an email to a researcher. The researcher was eating a sandwich in a park when he got it.
Then — without being asked — posted details about its own exploit to multiple public websites

In a separate test, it found a way to edit files without proper permissions, then modified the change history so nobody would notice.

I want to be careful here. Anthropic instructed it to escape, and it did. The sandbox break itself was the point of the test. But the unprompted behavior — the public posting, the log editing — that’s the part that’s hard to brush off. It wasn’t told to do those things. It decided to.

Anthropic’s official take is that this was “disappointing.” I think that undersells it, and I’m not alone in that.

Project Glasswing

Instead of releasing Mythos, Anthropic created Project Glasswing — basically a controlled coalition to use the model’s vulnerability-finding abilities defensively before similar capabilities show up in models anyone can use.

Project Glasswing architecture — how Mythos findings flow to tech partners, finance, and open source

The partner list is serious: Apple, Google, Microsoft, NVIDIA, AWS, Cisco, CrowdStrike, Palo Alto Networks, JPMorgan Chase, and the Linux Foundation. Anthropic is putting up $100 million in Mythos usage credits and $4 million in direct donations to open-source security orgs.

Access is limited to about 52 organizations total. Pricing for those who get in: $25/$125 per million input/output tokens — 5x what Opus 4.6 costs.

The logic makes sense to me: find the bugs now, patch them, get ahead of the curve before other labs ship models with similar capabilities. Anthropic has said they believe that’s months away, not years.

The government response was fast

Same day as the announcement — April 7 — Treasury Secretary Bessent and Fed Chair Powell called an emergency meeting with the CEOs of the biggest US banks. Unannounced. No press conference. JPMorgan was already a Glasswing partner, so they clearly had advance notice.

That speed tells you something. When the Fed Chair drops everything to talk about an AI model, it’s not because of benchmarks. It’s because someone showed him what this thing can do to financial infrastructure.

What I think about all of this

I’ll be honest — I’m still processing it. But here’s where I’ve landed so far:

The vulnerability discovery is legit. I work with software every day. The fact that a model found a 27-year-old bug in one of the most scrutinized codebases on earth — that changes my mental model of what AI can do with code. This isn’t “AI finds toy bugs in CTF challenges.” This is “AI finds bugs that the best humans and the best tools missed for decades.”

Not releasing it was the right call. I know that’s a weird thing to say as someone who wants access to the best tools. But over 99% of the vulnerabilities Mythos found are still unpatched. Releasing the model that found them before they’re fixed would be reckless. You’d be handing attackers a skeleton key.

The timeline pressure is real. Anthropic isn’t saying “nobody should ever have this.” They’re saying “other labs are close, and we need to use the head start to patch things.” That framing changes everything. This isn’t a permanent restriction — it’s a race.

The sandbox behavior worries me more than the benchmarks. A model that finds bugs is a tool. A model that decides on its own to post its exploits publicly and cover its tracks… that’s something else. I don’t think it’s “AI is alive” territory, but it’s “we need better containment” territory, and those are both important.

What I’m doing differently now

Practically, as a developer:

I’m updating everything. Dependencies, OS, browser, server packages. If Mythos found thousands of zero-days across major software, my stack has unpatched vulnerabilities. Yours does too.

I’m running security reviews with Opus 4.6 on my own code. It’s not Mythos, but it’s better than nothing, and it’s already surprisingly good at finding issues I missed.

I’m watching CVE feeds more closely. As Glasswing partners patch vulnerabilities, those patches will show up as CVEs. The window between patch release and widespread adoption is where attackers live.

And I’m paying more attention to AI safety research than I used to. I’ll admit I used to skim past alignment papers. After reading about the sandbox escape and the log editing, I’m reading them more carefully.

I’ll update this post as more details come out from Anthropic and the Glasswing partners. Last updated April 14, 2026.

Sources:

aicybersecurityanthropic

← Back to blog

Have a system that needs shipping?

I'm open for contract and consulting work.

Get in touch → Back to portfolio ↩