Revision #1

System

about 2 months ago

Anthropic's 'Too Dangerous to Release' AI Was Accessed Within Hours — And That's Just the Start of the Problem

On April 7, 2026, Anthropic announced Claude Mythos Preview, an AI model it described as so powerful in finding and exploiting software vulnerabilities that it could not be released to the public [1]. Within hours of that announcement, a small group of unauthorized users had already gained access to it [2].

The breach — which Anthropic confirmed it is investigating — did not require sophisticated hacking. Members of a private Discord channel dedicated to tracking unreleased AI models guessed the model's URL based on Anthropic's naming conventions for other products, and gained entry through a third-party vendor portal with help from an individual employed at a contractor working with Anthropic [1][3]. The group provided Bloomberg with screenshots and a live demonstration of their access [4].

For a company that has built its brand on responsible AI development and safety-first principles, the incident is the latest in a string of operational security failures that have unfolded over the course of a single month.

What Mythos Is — and Why Access Matters

Claude Mythos Preview is not a chatbot or a general consumer product. It is Anthropic's most capable frontier model, specifically flagged for its ability to discover and weaponize software vulnerabilities at a scale no previous AI system has demonstrated [5].

According to Anthropic's own disclosures, Mythos identified thousands of zero-day vulnerabilities — flaws unknown to the software's developers — across every major operating system and web browser [5]. In one documented case, the model wrote a browser exploit chaining together four separate vulnerabilities, executing a JIT heap spray that escaped both renderer and operating system sandboxes [5]. It autonomously achieved local privilege escalation on Linux by exploiting race conditions and bypassing KASLR (Kernel Address Space Layout Randomization), a fundamental OS security mechanism [5].

Anthropic released Mythos under Project Glasswing, a controlled distribution program providing access to roughly 50 organizations including Amazon Web Services, Apple, Google, Microsoft, Nvidia, JPMorgan Chase, CrowdStrike, and Cisco [6][7]. The company committed $100 million in model usage credits to fund the program [6]. Participants were restricted to defensive security work and required to share findings with the broader industry [6].

The model does not store customer PII, training datasets, or model weights in its accessible interface. What it does provide is the capability to generate working exploits for previously unknown vulnerabilities — a capability that, in the wrong hands, represents a qualitatively different kind of risk than a traditional data breach.

Timeline: From Announcement to Breach to Public Disclosure

The sequence of events is compressed:

March 26, 2026: Anthropic's content management system is found to be misconfigured, exposing nearly 3,000 unpublished internal assets, including references to an unreleased model internally codenamed "Capybara" — later revealed as Mythos [8][9].
March 31, 2026: Anthropic accidentally bundles a 59.8 MB source map file in version 2.1.88 of its Claude Code npm package, exposing 512,000 lines of unobfuscated TypeScript across 1,906 files. The codebase is mirrored to GitHub within hours and forked tens of thousands of times [10][11].
April 7, 2026: Anthropic officially announces Claude Mythos Preview and Project Glasswing. On the same day, unauthorized users gain access through a third-party vendor environment [1][2].
April 21, 2026: Bloomberg reports the unauthorized access. Anthropic confirms it is investigating [2][4].
April 22, 2026: Multiple outlets report the story. Anthropic states it has found "no evidence that the supposedly unauthorized activity has impacted Anthropic's systems" beyond the vendor environment [1][3].

The gap between when unauthorized access occurred (April 7) and when it became public (April 21) is 14 days. It remains unclear when Anthropic first learned of the breach — the company has not disclosed whether it detected the access independently or learned of it from Bloomberg's reporting.

How the Access Controls Failed

Anthropic holds SOC 2 Type II attestation, CSA STAR Level 2, ISO 27001, and ISO 42001 certifications [12]. Its security program incorporates NIST 800-53 standards [12]. These are real, audited certifications — not marketing claims.

But the Mythos breach did not occur within Anthropic's own infrastructure. As The Next Web reported, "the group did not bypass Anthropic's security architecture so much as exploit the gap between Anthropic's controls on its own systems and those of a third-party vendor" [3].

This is a common pattern in enterprise security: an organization hardens its own perimeter while leaving vendor access points comparatively exposed. The specific failure here was compounded by two factors. First, the model's URL followed a predictable naming convention that could be guessed by anyone familiar with Anthropic's other endpoints [1][3]. Second, a contractor employee appears to have played a role in facilitating the group's access [1][4].

Anthropic has not disclosed what vetting process applied to third-party contractor employees who had access to Mythos, whether those contractors were subject to the same access controls as full-time employees, or what audit logs existed for vendor environment activity before the incident.

The Pattern: Three Incidents in Five Weeks

Taken individually, each of Anthropic's recent security incidents has a plausible explanation. The CMS leak was attributed to "human error in CMS configuration" [8]. The Claude Code source leak resulted from a packaging error — Bun, the JavaScript runtime, generated a full source map by default, and the .npmignore file failed to exclude it [10][11]. The Mythos access stemmed from a vendor environment gap.

Taken together, they suggest a systemic issue. As Digital Applied's analysis noted, "safety branding is not security" — Anthropic's positioning as the industry's most safety-conscious AI company did not prevent three distinct operational failures in rapid succession [9]. The pattern points not to a single root cause but to what security professionals call "organizational drift": the gradual erosion of operational discipline across multiple systems simultaneously.

The Claude Code leak is particularly illustrative. The exposed source contained complete agent permission models, bash security validators, 44 unreleased feature flags, and the full Model Context Protocol integration logic [9][10]. No model weights, customer data, or API credentials were included, but the architectural knowledge exposed could inform targeted attacks against Claude Code deployments.

Major AI Company Security Incidents (2023-2026)

Source: Crowdbyte Research

Data as of Apr 22, 2026CSV

The CISA Gap and National Security Implications

The unauthorized access to Mythos arrives at a fraught moment in Anthropic's relationship with the U.S. government.

Anthropic holds a $200 million Department of Defense agreement for AI capabilities [13]. The company's models are used by Lawrence Livermore National Laboratory for research across nuclear deterrence, energy security, and materials science [13]. The NSA has been using Mythos Preview despite the Pentagon's formal designation of Anthropic as a "supply chain risk" [14][15].

Yet CISA — the Cybersecurity and Infrastructure Security Agency, the federal government's primary cyber defense body — does not have access to Mythos [15]. An Anthropic official told Axios the company briefed CISA and the Commerce Department on the model's capabilities, but CISA was not included among the 40-plus organizations granted access under Project Glasswing [15].

This creates an unusual situation: an AI model capable of generating working exploits for critical infrastructure software is being used by intelligence agencies and private corporations, but the agency specifically tasked with defending civilian infrastructure from cyberattacks has no direct access to it. The gap is exacerbated by the Trump administration's reduction of CISA capacity over the past year, shifting policy influence to the White House's national cyber director [15].

If Mythos contains or generates safety-critical security research — and its demonstrated ability to find thousands of zero-day vulnerabilities across major operating systems suggests it does — the question of who controls access is not merely corporate but geopolitical.

The OpenAI Precedent and Disclosure Obligations

Anthropic is not the first major AI company to face scrutiny over a security incident. In early 2023, a hacker breached OpenAI's internal messaging systems, gaining access to employee discussions about the company's AI technologies [16]. OpenAI informed employees in April 2023 but did not make the breach public. The incident was not disclosed until The New York Times reported on it in July 2024 — a gap of roughly 15 months [16][17].

Leopold Aschenbrenner, a former OpenAI technical program manager who criticized the company's security posture in a memo to the board, was subsequently fired [16].

The legal framework governing AI company disclosure obligations remains fragmented. Under GDPR, organizations must report personal data breaches to supervisory authorities within 72 hours [18]. CCPA requires notification to affected California residents. The FTC has warned that AI companies "must comply with existing federal laws" and uphold privacy commitments to customers, though no AI-specific federal breach notification statute exists [18].

In Anthropic's case, the relevant question is what data, if any, was accessed or generated during the unauthorized Mythos sessions. The company states that the unauthorized group "has not run cybersecurity-related prompts on it" and was "only experimenting with new models" [1]. If that is accurate — and if no personal data was involved — formal breach notification obligations may not apply under current law. But the absence of a legal obligation to disclose is not the same as the absence of a public interest in disclosure, particularly for a model with the capabilities Anthropic itself has described.

'Unauthorized Access' or Something Else?

The framing of this incident deserves scrutiny. Anthropic has characterized the situation as "unauthorized access" through a vendor environment [1]. The individuals involved have been described in media reports as members of a Discord group interested in testing unreleased AI models, not as malicious hackers [1][3].

Bloomberg's reporting indicates the group provided evidence of their access voluntarily, demonstrated the model to reporters, and had "not run cybersecurity-related prompts" [2][4]. This profile — voluntarily disclosing access, cooperating with media, refraining from exploiting the model's offensive capabilities — does not fit the typical pattern of a malicious breach.

Several possible interpretations exist. The access may have been genuinely unauthorized in both the legal and ethical senses — contractor credentials misused for personal curiosity. It may also represent a form of informal security research: demonstrating that a model Anthropic described as carefully restricted was in fact accessible through trivial means. The distinction matters for how regulators, partners, and the public should assess Anthropic's response.

Anthropic has not characterized the individuals as threat actors or referred the matter to law enforcement, at least not publicly. The company's statement — that it is "investigating a report claiming unauthorized access" — uses language that leaves room for multiple conclusions [1].

What Comes Next: Remediation and Accountability

Anthropic has not announced specific security changes in response to the Mythos incident. The company's public statements have been limited to confirming the investigation and asserting that core systems were not impacted [1][3].

The more structural question is whether Anthropic's vendor management framework is adequate for the category of model it is now distributing. Project Glasswing extends Mythos access to over 50 organizations [6]. Each of those organizations — and their contractors, subcontractors, and vendor environments — represents a potential access vector of the kind that was just exploited.

Research Publications on "AI cybersecurity vulnerability"

Source: OpenAlex

Data as of Jan 1, 2026CSV

Academic research on AI cybersecurity vulnerabilities has grown rapidly, with over 54,000 papers published since 2011 and more than 21,000 in 2025 alone [19]. The research community's attention to these risks is accelerating faster than the governance structures meant to manage them.

For Anthropic, the path forward requires answering several concrete questions: What changes to vendor access architecture will be implemented? Will third-party employees with access to frontier models be subject to the same vetting and monitoring as internal staff? What audit logging existed in the vendor environment before the incident, and what will be required going forward? Against what timeline and benchmarks will improvements be measured?

Until those answers are public and verifiable, the gap between Anthropic's safety reputation and its operational security record remains open.

Limitations of Available Evidence

Several important details remain unconfirmed as of this writing. Anthropic has not disclosed the identity of the third-party vendor involved, the specific access controls that were in place, or the full scope of activity during the unauthorized sessions. The company's investigation is ongoing. The account of events relies heavily on Bloomberg's original reporting and statements from the unauthorized group itself, neither of which has been independently verified by Anthropic beyond the company's confirmation that it is investigating. The question of whether any data was exfiltrated, any exploits were generated, or any other models were accessed during the sessions remains unanswered.

Sources (19)

[1]
Unauthorized group has gained access to Anthropic's exclusive cyber tool Mythos, report claimstechcrunch.com
A small group of unauthorized users have accessed Anthropic's new Mythos AI model through a third-party vendor environment, with access facilitated by an individual employed at a contractor.
[2]
Anthropic's Mythos AI Model Is Being Accessed by Unauthorized Usersbloomberg.com
Bloomberg first reported that unauthorized users gained access to the restricted Mythos model, with the group providing screenshots and a live demonstration.
[3]
Unauthorized users gained access to Anthropic's restricted Mythos AI modelthenextweb.com
The group did not bypass Anthropic's security architecture so much as exploit the gap between Anthropic's controls on its own systems and those of a third-party vendor.
[4]
Discord group accessed Anthropic's Mythos without authorizationcybernews.com
Members of a private Discord channel dedicated to gathering intelligence on unreleased AI models made an educated guess about the model's URL based on Anthropic's naming conventions.
[5]
Claude Mythos Previewred.anthropic.com
Claude Mythos Preview identified thousands of zero-day vulnerabilities across every major operating system and web browser, and can chain software bugs into multi-step exploits.
[6]
Anthropic is giving some firms early access to Claude Mythos to bolster cybersecurity defensesfortune.com
Project Glasswing provides access to roughly 50 organizations including AWS, Apple, Google, Microsoft, and Nvidia. Anthropic committed $100M in credits to fund the program.
[7]
Anthropic Claude Mythos Preview - CrowdStrikecrowdstrike.com
CrowdStrike is a founding member of Project Glasswing, using Mythos to identify vulnerabilities in critical software infrastructure.
[8]
Anthropic left details of an unreleased model in a public databasefortune.com
Nearly 3,000 unpublished assets were publicly accessible due to a CMS misconfiguration, revealing details of an unreleased model and internal data.
[9]
Anthropic Double Breach: Enterprise AI Security 2026digitalapplied.com
Both incidents stemmed from human error in operational processes, not external attacks. Safety branding is not security.
[10]
Anthropic leaks its own AI coding tool's source code in second major security breachfortune.com
A 59.8 MB JavaScript source map file was bundled in the public npm package, exposing 512,000 lines of unobfuscated TypeScript across 1,906 files.
[11]
Anthropic Claude Code Leak - ThreatLabzzscaler.com
The leaked source contained complete agent permission models, bash security validators, 44 unreleased feature flags, and full MCP integration logic.
[12]
What Certifications has Anthropic obtained?privacy.claude.com
Anthropic has achieved SOC 2 Type 2, CSA STAR Level 2, ISO 27001, and ISO 42001 certifications. The security program incorporates NIST 800-53 standards.
[13]
Anthropic awarded $200M DOD agreement for AI capabilitiesanthropic.com
The DOD awarded Anthropic a two-year prototype other transaction agreement with a $200 million ceiling for defense AI capabilities.
[14]
NSA is said to use Anthropic's new AI model despite Pentagon disputeseekingalpha.com
The NSA is using Anthropic's Mythos Preview even after the Department of Defense designated the company a supply chain risk.
[15]
CISA doesn't have access to Anthropic's Mythosaxios.com
CISA, the government's primary cyber defense agency, is not on the list of organizations with access to Mythos, despite being briefed on its capabilities.
[16]
OpenAI's Data Breach Incident Goes Unreported: What to Knowvisotrust.com
In early 2023, a hacker infiltrated OpenAI's internal messaging system. The breach was not disclosed until The New York Times reported on it in July 2024.
[17]
A 2023 OpenAI Breach Raises Questions About AI Industry Transparencyinformationweek.com
OpenAI informed employees in April 2023 but opted not to make the breach public for over a year, raising questions about AI industry disclosure norms.
[18]
AI Companies: Uphold Your Privacy and Confidentiality Commitmentsftc.gov
The FTC warned that AI companies failing to abide by privacy commitments may be liable under existing federal laws, though no AI-specific breach notification statute exists.
[19]
OpenAlex: AI Cybersecurity Vulnerability Research Publicationsopenalex.org
Over 54,000 academic papers on AI cybersecurity vulnerabilities have been published since 2011, with 21,084 in 2025 alone.