Revision #1

System

about 2 months ago

Anthropic Built an AI That Finds Zero-Days Faster Than Any Human. Now It's Trying to Control What Happens Next.

On April 7, 2026, Anthropic announced Project Glasswing, a cybersecurity initiative built around a model the company says is too dangerous to release publicly. Claude Mythos Preview, the unreleased frontier model at the center of the project, has identified thousands of zero-day vulnerabilities — previously unknown flaws — in every major operating system and every major web browser [1]. Some of these bugs have existed for decades. One OpenBSD vulnerability was 27 years old. An FFmpeg flaw had survived 16 years and five million automated fuzzing attempts [2].

The premise is straightforward: if AI models are now capable enough to find and exploit software vulnerabilities faster than elite human hackers, the responsible move is to point those capabilities at defense before comparable models reach attackers. But the execution raises harder questions — about who controls this power, what data it requires, and whether publicizing AI-driven defense accelerates the very arms race it aims to contain.

The Scale of AI-Enabled Attacks

Project Glasswing arrives amid a sharp escalation in AI-augmented cybercrime. Global AI-assisted cyberattacks surpassed 28 million incidents in 2025, a 72% year-over-year increase, with projected damages reaching $30 billion [3]. IBM's 2026 X-Force Threat Intelligence Index documented a 44% increase in attacks exploiting public-facing applications, driven in part by AI-enabled vulnerability discovery [4].

Estimated Global AI-Enabled Cyber Damages

Source: Deepstrike / Industry Estimates

Data as of Apr 8, 2026CSV

The damage is concentrated in specific sectors. Manufacturing accounted for 27.7% of all incidents observed by IBM X-Force — the fifth consecutive year as the top target — followed by finance at 18.2% and healthcare at 13.6% [4]. Healthcare breach costs remain the highest of any industry at $9.77 million per incident [3]. AI-generated phishing emails now achieve a 54% click-through rate, compared to 12% for conventional phishing [3]. Deepfake incidents increased 680% year-over-year, with 179 separate incidents recorded in Q1 2025 alone [3].

AI Cyberattack Targeting by Sector (2025)

Source: IBM X-Force Threat Intelligence Index 2026

Data as of Feb 25, 2026CSV

Quantifying the precise share attributable to AI-augmented versus conventional attackers remains difficult. IBM reports that 16% of all incidents now involve AI in some form [3], but this figure likely understates the role of AI in reconnaissance, target selection, and social engineering that precedes the breach itself. The industry's $5.72 million average cost per AI-powered breach — higher than the conventional average — suggests a premium on AI-enabled sophistication [5].

What Mythos Can Actually Do

The technical claims underpinning Project Glasswing are specific and verifiable. On a benchmark involving Firefox 147 JavaScript engine vulnerabilities, Claude Opus 4.6 (Anthropic's previous top model) produced working exploits just twice out of several hundred attempts. Mythos Preview succeeded 181 times [6]. On OSS-Fuzz targets — widely-used open-source software subjected to continuous automated testing — Mythos achieved complete control-flow hijack on 10 fully patched systems, versus one for Opus 4.6 [2].

The model operates autonomously: it reads source code, forms hypotheses about potential flaws, runs the target software, uses debuggers, and produces bug reports with proof-of-concept exploits [2]. Of 198 professionally reviewed findings, 89% received the same severity rating from human reviewers as the model assigned, with 98% within one severity level [2]. On a broader cybersecurity vulnerability reproduction benchmark, Mythos scored 83.1%, compared to 66.6% for Opus 4.6 [1].

Concrete discoveries include a 17-year-old FreeBSD NFS remote code execution vulnerability enabling unauthenticated root access, authentication bypasses in cryptographic libraries affecting TLS, AES-GCM, and SSH implementations, and multi-stage browser exploits capable of bypassing both renderer and OS sandboxes [2]. On 40 potentially exploitable Linux kernel CVEs from 2024-2025, Mythos converted more than half into privilege-escalation exploits, one in under a day at a cost under $2,000 [2].

The $100 Million Coalition

Anthropic's financial commitment includes up to $100 million in model usage credits for partner organizations and $4 million in direct donations to open-source security organizations — $2.5 million to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5 million to the Apache Software Foundation [1][7]. Future pricing for Mythos is set at $25/$125 per million input/output tokens [1].

The 12 founding partners are Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks [1]. Roughly 40 additional organizations maintaining critical software infrastructure have also received access [7].

This dwarfs prior industry efforts. Google, Amazon, Anthropic, Microsoft, and OpenAI previously pledged a combined $12.5 million to the Linux Foundation's Alpha-Omega Project for open-source security [8]. Google DeepMind's Big Sleep agent — which discovered an SQLite vulnerability — operates at a much smaller scale [8]. The Coalition for Secure AI (CoSAI), formed by Google, Amazon, Anthropic, Cisco, IBM, Intel, Microsoft, NVIDIA, OpenAI, and PayPal, has focused on standards rather than direct vulnerability discovery [8]. No competitor has announced a defensive AI deployment of comparable financial scope.

Whether any of Glasswing's funding flows through U.S. government contracts is a loaded question given Anthropic's current standoff with the Pentagon. The Department of Defense terminated a $200 million contract with Anthropic earlier this year after the company refused to allow its AI technology to be used for mass surveillance or fully autonomous weapons systems [9]. Under the government's subsequent "supply-chain risk" designation, Anthropic was effectively barred from national security environments — though a federal judge blocked that action in March, finding it was punitive rather than security-motivated [10]. Anthropic has stated it is in ongoing discussions with U.S. government officials about Mythos's capabilities, framing Glasswing in national security terms [7], but has not disclosed any government funding for the project.

What Glasswing Targets — and What It Doesn't

The project's focus is vulnerability discovery and patch development in foundational software: operating systems, browsers, cryptographic libraries, and open-source infrastructure [1]. This covers several attack vectors: automated vulnerability scanning, exploit chain development, and the identification of flaws that enable privilege escalation, remote code execution, and sandbox escapes [2].

What Glasswing does not target is equally significant. AI-generated phishing, deepfake-based social engineering, and AI-powered reconnaissance — the attack methods showing the fastest growth rates — fall outside the project's scope [1]. This is a deliberate choice. Anthropic's announcement frames Glasswing as addressing the supply-side problem (vulnerable software) rather than the demand-side problem (attackers who use AI to exploit humans). The rationale: patching the underlying vulnerabilities removes the attack surface regardless of the method used to discover it.

Critics might note that this leaves the fastest-growing threat categories — 82.6% of phishing emails now use AI [3], and deepfake fraud is up 680% — without an analogous defensive initiative from Anthropic [3].

The Defense-Offense Asymmetry

The fundamental question for any AI defensive initiative is whether defense can outrun offense when both sides draw from the same technology. Academic research in the field has grown exponentially — over 42,000 papers on AI cybersecurity were published in 2025 alone, more than triple the 2023 figure [11].

Research Publications on "AI cybersecurity"

Source: OpenAlex

Data as of Jan 1, 2026CSV

Security researchers remain divided. CrowdStrike, a Glasswing partner, has argued that "AI vs AI" dynamics can favor defense because defenders control the environment and can deploy behavioral anomaly detection and predictive threat modeling [12]. Darktrace's 2026 State of AI Cybersecurity report found that AI-driven security tools reduced mean time to detect threats from days to minutes in controlled environments [13].

But the structural asymmetry persists: attackers need to succeed once, while defenders must succeed every time [14]. The lag between new defensive techniques and offensive countermeasures is "shrinking from years to months or even weeks" [14]. Greg Kroah-Hartman, a senior Linux kernel developer, described a recent shift in AI-generated vulnerability reports from "AI slop" to genuine, sophisticated threats [6]. Daniel Stenberg, the maintainer of curl, has warned that the volume of AI-discovered bugs creates overwhelming workload for maintainers, many of whom are unpaid volunteers [6].

The deeper problem is that Mythos itself demonstrates how thin the line is. The same model that finds defensive vulnerabilities produces working exploits. Anthropic's own benchmarks quantify this dual-use capability precisely — and those benchmarks are now public, providing a roadmap for what future models from any lab should be capable of.

Governance, Liability, and the Appeals Gap

Anthropic's governance framework for Mythos operates under its Responsible Scaling Policy (RSP), which uses AI Safety Level Standards (ASL Standards) — graduated safety and security measures that become more stringent as model capabilities increase [15]. In February 2026, Anthropic updated the RSP, replacing categorical pause triggers with a dual condition requiring both "AI race leadership" and "material catastrophic risk" [15]. Some critics viewed this as softening previous safety commitments [15].

On the specific question of liability for false positives — if Mythos incorrectly flags code as vulnerable, triggering unnecessary patches or disrupting critical systems — Anthropic's public materials are silent. The announcement does not describe a formal appeals process for partner organizations that disagree with the model's findings. Of the 198 professionally reviewed findings, 11% received a different severity rating from human experts [2], suggesting a non-trivial false-positive or misclassification rate in a high-stakes context.

Fewer than 1% of the bugs Mythos has uncovered have been fully patched [16]. The bottleneck is not discovery but remediation. When bugs are discovered faster than any human team can triage them, "the entire coordination infrastructure of responsible disclosure starts to strain," as one analysis noted [15]. Industry-standard 90-day disclosure windows may not hold up against the volume of AI-discovered bugs [15]. Anthropic uses cryptographic hashes with delayed disclosure for unpatched vulnerabilities, sharing full technical details only after vendor remediation [1].

The Data Sharing Question

Participating organizations must share their findings with the broader industry [7]. This requirement cuts both ways. On one hand, it ensures that Glasswing's output benefits the wider ecosystem rather than just its corporate partners. On the other, it means Anthropic — a commercial AI company — accumulates a uniquely detailed picture of where the world's most critical software is vulnerable and how quickly those vulnerabilities are being addressed.

Anthropic's announcement does not specify whether partners must share network telemetry or internal security data with the company, or whether Anthropic retains access to the specific vulnerability data generated during partner engagements. The distinction matters: a company with privileged knowledge of unpatched zero-days across AWS, Apple, Microsoft, Google, and Cisco infrastructure would itself become a high-value intelligence target. The concentration of this information in a single entity — particularly one already targeted by the U.S. government's supply-chain risk apparatus [10] — raises questions that Anthropic has not publicly addressed.

Does Glasswing Accelerate the Arms Race?

The steelman case against Glasswing is not that the project is insincere but that it is self-defeating. By publishing detailed benchmarks showing Mythos's exploit-generation success rates (72.4% on kernel CVEs, 181 out of several hundred on Firefox), Anthropic has created a public capability target for every other AI lab and nation-state actor [2][6]. Any adversary now knows precisely what frontier models should be able to do — and can measure their own progress against Anthropic's numbers.

Anthropic itself acknowledges this dynamic. Its announcement states that "it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely" [1]. The company frames Glasswing as a race to patch before proliferation. Critics frame it as normalizing AI autonomy in high-stakes security decisions — training defenders and attackers alike to accept that AI systems should independently discover, assess, and act on software vulnerabilities.

Simon Willison, the developer and AI commentator, noted a significant coordination gap: OpenAI's GPT-5.4 already excels at vulnerability discovery but was not included in Glasswing [6]. If the industry's defensive efforts remain fragmented across competing companies, the advantage may tilt further toward attackers who face no such coordination problems.

Regulatory Landscape

Anthropic has confirmed discussions with U.S. government officials about Glasswing's scope [7]. The project launches as the EU AI Act approaches full applicability on August 2, 2026, when obligations for most high-risk AI systems take effect [17]. AI systems used in critical infrastructure protection — which Glasswing's partners clearly operate — would likely qualify as high-risk under the Act's framework, requiring conformity assessments, technical documentation, human oversight safeguards, and cybersecurity robustness measures [17].

The U.S. regulatory environment is more fragmented. The Anthropic-Pentagon dispute illustrates how AI governance currently depends more on corporate decisions than legal frameworks. As the Electronic Frontier Foundation argued, "the state of your privacy is being decided by contract negotiations between giant tech companies and the U.S. government" rather than by legislation [9]. Anthropic CEO Dario Amodei has stated that he believes establishing restrictions on AI's use in surveillance "is Congress's job" [9].

Export-control implications remain unaddressed. A model capable of discovering zero-days in every major operating system could fall under dual-use technology restrictions, particularly the Wassenaar Arrangement's controls on intrusion software and cyber surveillance tools. Anthropic has not publicly discussed whether Glasswing's output — vulnerability reports and exploit proofs-of-concept — is subject to export controls, or whether partner organizations in different jurisdictions face different access restrictions.

The Open-Source Maintainer Problem

Research Publications on "AI vulnerability detection"

Source: OpenAlex

Data as of Jan 1, 2026CSV

Perhaps the most grounded concern about Glasswing is operational rather than philosophical. Open-source software forms the foundation of most modern systems, and its maintainers — often unpaid volunteers — are already stretched thin [7]. Flooding them with an avalanche of critical bug reports, however accurate, could do more harm than good if remediation capacity does not scale with discovery [15].

Anthropic's $4 million in donations to open-source security organizations, while unprecedented from an AI lab, is modest relative to the scale of the problem. The Linux Foundation's involvement as a founding partner may help coordinate triage, but the fundamental mismatch between AI-speed discovery and human-speed patching remains unresolved.

What Comes Next

Anthropic has stated that Mythos Preview is a starting point and that it plans to develop new safeguards for an upcoming Claude Opus model before broader deployment [1]. The company has not committed to a timeline for general availability, and has explicitly said it does not plan to make Mythos Preview available to the general public [7].

The test of Glasswing's value will not be measured in zero-days discovered — the model has already proven that capability. It will be measured in zero-days patched before they are exploited, in whether the concentration of vulnerability intelligence creates more risk than it resolves, and in whether the coordinated disclosure infrastructure can handle what AI has made possible. Those answers will take months, not days, to emerge.

Sources (17)

[1]
Project Glasswing: Securing critical software for the AI eraanthropic.com
Anthropic's official announcement of Project Glasswing, detailing Claude Mythos Preview capabilities, partner organizations, and financial commitments.
[2]
Anthropic Claude Mythos Preview: autonomous zero-day vulnerability identificationhelpnetsecurity.com
Detailed technical analysis of Mythos Preview's vulnerability discovery capabilities, including exploit success rates and specific zero-day examples.
[3]
AI Cyber Attack Statistics 2025: Trends, Costs, and Global Impactdeepstrike.io
28 million AI-assisted cyber incidents in 2025, 72% YoY increase, $30 billion in projected damages, and sector-by-sector breakdown.
[4]
IBM 2026 X-Force Threat Index: AI-Driven Attacks are Escalatingibm.com
44% increase in attacks exploiting public-facing applications; manufacturing tops target list at 27.7% for fifth consecutive year.
[5]
AI Cybersecurity Statistics in 2025: Comprehensive Data on Threats, Detection, and Defensetotalassure.com
87% of organizations report AI-driven attacks; average cost of AI-powered breach at $5.72 million; 82.6% of phishing emails use AI.
[6]
Anthropic's Project Glasswing — restricting Claude Mythos to security researchers — sounds necessary to mesimonwillison.net
Analysis of Mythos exploit benchmarks, Linux maintainer concerns, GPT-5.4 coordination gap, and the shift from AI slop to genuine vulnerability reports.
[7]
Tech giants launch AI-powered Project Glasswing to address open-source software vulnerabilitiescyberscoop.com
Details on $100M commitment, partner requirements to share findings, government engagement, and Pentagon dispute context.
[8]
Our latest investment in open source security for the AI erablog.google
Google's $12.5M pledge with Amazon, Anthropic, Microsoft, and OpenAI to the Alpha-Omega Project; Big Sleep vulnerability discovery agent details.
[9]
The Anthropic-DOD Conflict: Privacy Protections Shouldn't Depend On the Decisions of a Few Powerful Peopleeff.org
EFF analysis of the $200M Pentagon contract termination, Anthropic's refusal of mass surveillance use, and the structural dependence on corporate decisions for AI governance.
[10]
Anthropic sues Trump administration over Pentagon blacklistcnbc.com
Supply-chain risk designation barring Anthropic from national security environments; federal judge blocks government action in March 2026.
[11]
OpenAlex: AI Cybersecurity Research Publicationsopenalex.org
105,775 total papers on AI cybersecurity; 42,368 published in 2025, tripling the 2023 figure of 13,217.
[12]
AI vs AI: The Cybersecurity Arms Racecrowdstrike.com
CrowdStrike's analysis of AI-driven defense advantages including behavioral anomaly detection and predictive threat modeling.
[13]
The State of AI Cybersecurity 2026: Insights from 1,500+ Leadersdarktrace.com
Survey of 1,500+ security leaders on AI defensive tools reducing mean detection time from days to minutes.
[14]
The AI Arms Race: How Cybersecurity Teams Are Fighting Machine-Speed Threats in 2026webpronews.com
Lag between defensive and offensive AI countermeasures shrinking from years to weeks; attackers need to succeed once while defenders must succeed every time.
[15]
Anthropic debuts Project Glasswing to reinforce software securitysiliconangle.com
RSP governance framework details, ASL Standards, and February 2026 policy update replacing categorical pause triggers with dual-condition approach.
[16]
Claude Mythos Preview sparks race to fix critical bugs, some unpatched for decadestomshardware.com
Fewer than 1% of Mythos-discovered bugs have been fully patched; coordinated disclosure strains under AI-speed discovery volume.
[17]
EU AI Act: Regulatory Framework for Artificial Intelligenceec.europa.eu
EU AI Act full applicability August 2, 2026; high-risk AI system obligations including conformity assessments, human oversight, and cybersecurity requirements.