Unauthorized Access to a "Dangerous" AI
In a significant security lapse, a small group of users operating within a private Discord channel reportedly gained unauthorized access to Anthropic's Mythos AI model, a technology the company itself described as capable of enabling "dangerous cyberattacks." This breach reportedly occurred on the very day Anthropic publicly announced its limited release of Mythos as part of Project Glasswing, an initiative designed to provide select companies and government users with access to the model for defensive cybersecurity purposes. Anthropic has confirmed it is investigating the report, stating that the unauthorized access appears to have been facilitated "through one of our third-party vendor environments."
The Mythos AI model is touted for its ability to identify and exploit vulnerabilities across major operating systems and web browsers, with Anthropic claiming it can surpass most skilled humans in finding and exploiting software flaws. Concerns about the model's potential for misuse led Anthropic to restrict its release, sharing it only with a limited group of major companies including Amazon, Apple, Cisco, JPMorgan Chase, and Nvidia. The incident has ignited fresh concerns regarding the control and security of high-end cybersecurity tools, especially those with dual-use risks.
How the Breach Unfolded
The unauthorized access was not the result of a sophisticated hack, but rather a combination of internet sleuthing and an insider connection. Members of the Discord group, focused on uncovering information about unreleased AI models, reportedly made an "educated guess" about Mythos's online location. This was based on their familiarity with Anthropic's URL formatting conventions for other models, information that may have been exposed in a recent data breach at Mercor, an AI startup that works with large AI companies.
Further facilitating the breach, an individual employed at a third-party contractor working with Anthropic reportedly had privileged access and was involved in enabling the group's entry. This suggests a potential weakness in third-party vendor security protocols, a common vector for cyber incidents. While the group claims their intentions are not malicious and they have primarily used Mythos for tasks like building simple websites, the fact that such a powerful and restricted AI model fell into unauthorized hands raises serious questions about Anthropic's security posture and its ability to safeguard sensitive technology.
Implications and Broader Context
This incident underscores the inherent challenges in controlling access to advanced AI models, particularly those with significant cybersecurity capabilities. The UK's AI Security Institute (AISI) previously warned that Mythos was a "step up" from prior models in terms of its cyber-threat potential, noting its ability to conduct multi-action attacks and discover IT system weaknesses without human intervention. The unauthorized access, even if not immediately malicious, highlights the critical need for robust security measures that extend beyond internal systems to encompass third-party vendor environments.
The breach also comes amidst broader concerns about AI security. Just last month, details about Mythos were inadvertently stored in a publicly accessible data cache due to human error. Days later, Anthropic experienced another security lapse, exposing nearly 2,000 source code files and over half a million lines of code associated with Claude Code. These repeated incidents, coupled with reports of Anthropic's Claude models introducing serious security issues into code, paint a concerning picture for a company positioning itself as a leader in responsible AI development.