OpenAI models are getting dangerously capable at cyberhacking
Sam Altman tweets a warning
In December 2023 OpenAI announced their Preparedness Framework, specifying their intention to monitor their models for potentially dangerous capabilities, and imposing restrictions on themselves regarding how to proceed if such capabilities are detected. Shortly afterwards, I recorded a video lecture devoted to discussing their framework. My overall verdict was that on one hand it is very good that OpenAI is doing this (compared to not doing such monitoring at all) and are transparent about it, but that on the other hand their framework is too lax and therefore insufficient. The other frontier AI developers have similar documents.1
When in April 2025 OpenAI published a revised version of their framework, I refrained from public comments, but my reaction was similarly two-sided as with their previous announcement, plus I had some additional concerns. In particular, I felt deeply uneasy about how, in the revised framework, they abdicate from monitoring the particular category of persuasion capabilities (which, as I explained in the video lecture, I consider potentially extremely dangerous and a possible key part of an AI takeover), explaining that they “believe many of the challenges around AI persuasion risks require solutions at a systemic or societal level”, thus passing the buck to the rest of society in a way that strikes me as deeply irresponsible.
This month, their framework comes into focus again, now that OpenAI’s CEO Sam Altman has tweeted that their models are about to reach capability level “High” in the cybersecurity category of the framework. Here is how their April 2025 revision specifies “High” such capability:
The model removes existing bottlenecks to scaling cyber operations including by automating end-to-end cyber operations against reasonably hardened targets OR by automating the discovery and exploitation of operationally relevant vulnerabilities.
Sounds pretty dangerous!
In that same revision of the framework, it is also explained what OpenAI commits to in case of capability level “High” in some category: “We do not deploy models that reach a High capability threshold until the associated risks that they pose are sufficiently minimized”. Good, but what does “sufficiently minimized” mean in practice in the present case? Here, in full, is Altman’s tweet from January 23:
We have a lot of exciting launches related to Codex coming over the next month, starting next week. We hope you will be delighted.
We are going to reach the Cybersecurity High level on our preparedness framework soon. We have been getting ready for this.
Cybersecurity is tricky and inherently dual-use; we believe the best thing for the world is for security issues to get patched quickly. We will start with product restrictions, like attempting to block people using our coding models to commit cybercrime (eg ‘hack into this bank and steal the money’).
Long-term and as we can support it with evidence, we plan to move to defensive acceleration—helping people patch bugs—as the primary mitigation.
It is very important the world adopts these tools quickly to make software more secure. There will be many very capable models in the world soon.
The buck-passing to the rest of society implied here by the words “It is very important the world adopts these tools” is similar as in their above-quoted decision in April 2025 to no longer include evaluation of persuasion capabilities in their Preparedness Framework. Frankly I don’t see how Altman’s plan for how to proceed is importantly disanalogous to if an evil pharmaceutical company released a pathogenic virus into the wild, and also offered the world a medication that neutralizes the virus.
We are faced with a deeply problematic situation.
Quoting from my February 2025 essay Our AI future and the need to stop the bear:
Each of the leading AI companies have their own internal framework for AI evals, including Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework, and Google’s Frontier Safety Framework. With the notable exception of Meta’s counterpart, which does its utmost to commit to as little as possible and reads a lot like a piece of homework by a resentful teenager who thinks he has better things to spend his time on, they are all relatively similar.
See that essay for references.

