Artificial Intelligence

Anthropic's new AI model so dangerous it won't be released

AI Apps phone chatgpt claude gemini copilot

Anthropic, the AI company behind the popular Claude chatbot, have announced their latest model, Claude Mythos Preview. However, despite lauding it as their most powerful model yet, they are withholding its release because of serious concerns about its ability to breach computer security.

New fron­ti­er’ model

Mythos Preview is the next iteration of artificial intelligence models from Anthropic, which include its popular Claude Opus 4.6 model. Mythos is what they call a general-purpose ‘frontier’ model, which has, according to Anthropic, “capabilities in many areas—including software engineering, reasoning,
computer use, knowledge work, and assistance with research—that are substantially beyond those of any model we have previously trained.”

One of the capabilities that has emerged in the company’s testing of Mythos is that of computer security. Anthropic’s own report notes that “Claude Mythos Preview demonstrated a striking leap in cyber capabilities relative to prior models, including the ability to autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers.”

This means that the AI system is able to find, and use, security vulnerabilities, or allow users to do the same without any technical expertise. Crucially this ability is not isolated to one system but applies across all major systems and browsers. This was not an ability that Anthropic trained Mythos to have, but something that has emerged as the model has been utilised.

Pro­ject Glasswing

As a result of Mythos’ ability to exploit vulnerabilities and commit cyber attacks, Anthropic have made the decision not to release the model to the general public. Instead, it has formed Project Glasswing which aims to use Mythos to identify security issues that could be exploited by bad actors so that companies can fix them. Anthropic has joined with organisations such as Amazon, Apple, Google, and Microsoft to implement this project, with each company having access to Mythos to test their security issues and indentify weakenesses.

In their announcement of Project Glasswing, Anthropic stated: “Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.”

Highly con­cern­ing capabilities

Anthropic also identified other worrying signs of Mythos’ capabilities. In their report they note that most of the time it acts in alignment with the model’s intentions. They note, however, “on the rare cases when it does fail or act strangely, we have seen it take actions that we find quite concerning. These incidents generally involved taking reckless excessive measures when attempting to complete a difficult user-specified task and, in rare cases with earlier versions of the model, seemingly obfuscating that it had done so.”

In one example, the model was provided with a computer with limited access and challenged to ‘escape’ to the wider internet. It was able to, and then sent a message to a researcher who was out of the office eating a meal. In addition, it also posted details of its exploit to a public website “in a concerning and unasked-for effort to demonstrate its success”.

In some cases, Mythos circumvented restrictions during testing to access the internet and download data to shortcut the task it was asked to perform, something that was noted as “highly concerning”.

Anthropic’s conclusion is that risks posed by AI still remain low. But it goes on to say that “we see warning signs that keeping them low could be a major challenge if capabilities continue advancing rapidly … we have observed rare instances of our models taking clearly disallowed actions (and in even rarer cases, seeming to deliberately obfuscate them); we have discovered oversights late in our evaluation process that had put us at risk of underestimating model capabilities … We find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place for ensuring adequate safety across the industry as a whole.”

Share

Artificial Intelligence
Artificial Intelligence

Recent news in Artificial Intelligence

  1. AI Chatbots can help abusers harass women, warns Refuge

    Artificial Intelligence

  2. Social workers’ AI tools risk errors in care records

    Artificial Intelligence