How humanity could survive the AI crisis
A trichotomy
Apart from advocates of AI successionism — a minority view I wrote about in an earlier post — nearly everyone would prefer humanity to not be wiped out in an AI apocalypse. But there is growing concern that such a catastrophe might nevertheless happen, even on a relatively near time scale.
Due to the ongoing race between a small number of leading AI developers, mainly in northern California, AI capabilities are advancing at a very rapid pace. They can be expected to accelerate even more dramatically if it is correct as many experts say that we are standing on the verge of a new regime of AI self-improvement, turbocharging the growth curves. See, e.g., this lecture from earlier this year for my take on these developments, but all in all the ambition that Anthropic’s CEO Dario Amodei has expressed to build an AI with capabilities corresponding to “a country of geniuses in a data center”, perhaps as soon as within a couple of years, does not seem entirely unrealistic.
Would it be safe to build such a superhumanly intelligent AI, thereby abdicating from our role as the most intelligent species on the planet and handing over this honor to the AI? That is at best unclear, and a lot hinges on whether we will be able to solve the so-called AI alignment problem — that of how to make sure that the first truly powerful AIs have goals that are in line with human values and that prioritize to a sufficient extent human welfare — in time for the creation of the first such machines. At present, AI alignment research lags far behind AI capabilities, and no one seems to have a convincing plan for how to do it. If, due to commercial logic and related incentives, the leading AI companies keep pushing ahead towards superintelligence even in the absence of such a solution, we risk ending up with a misaligned superintelligent AI, in which case the default scenario is roughly what AI alignment pioneer Eliezer Yudkowsky in his classic 2008 paper described as “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else”. The reasoning here can (and should) be expanded to a couple of dozen pages or a full book, but at a basic level, this is why I think there’s a real risk that Homo sapiens is wiped out by the early 2030s or perhaps even sooner.
This view, and my outspokenness about it, has sometimes triggered accusations that I am like the doomsday prophet who shouts from the rooftops: “We are all going to die!”. What this accusation fails to take into account, however, is that “X will happen” and “there is a risk that X happens” are very different statements. Believe it or not, but more than once have I encountered STEM professors who are broadly critical of my view on AI risk and who turned out to have grave difficulties appreciating this distinction.12 But there really is a difference: I do think there is substantial risk of an AI apocalypse within the coming decade or so, but I am not claiming it will happen.
There are various ways in which we might, despite the recklessness of the ongoing AI race, not end up in an AI apocalypse. Simplifying somewhat, such scenarios can mostly be categorized according to the following trichotomy.
Some technical obstacle in the development of more powerful AI models might show up that turns out to be so severe that it causes the currently very steep capability curves to flatten out and grind to a halt before the AIs reach a level where they have the ability to overpower humanity.
The AI alignment problem turns out be much easier to solve than expected, or even something that is solved by default without any specific effort from the AI developers’ side.
We (humanity, or some relevant group of sufficiently powerful decision makers) mange to put a stop to the race towards superintelligent AI, and institute an effective moratorium that is either in force indefinitely or cannot be lifted until guarantees that it is safe to proceed are properly established.
I am not claiming this is a clean partition of futures that avoid an AI apocalypse. One might, for instance, include a fourth category where our civilization is wiped out by an ultra-lethal pandemic or some other (non-AI-related) catastrophe before the AI apocalypse has come to fruition. There are furthermore various possible futures that can be described as combinations or mixes of two or more of the three categories. For instance, if some serious-but-not-unsurmountable technical obstacle to pushing AI capabilities causes not a permanent stop to the capabilities curves, but a delay that postpones the superintelligence breakthrough until the year 2070, and if this delay allows AI alignment researchers to use the intervening decades to find a medium-hard solution to the problem, then that can be seen as a mix of (1) and (2). Or if the frontier AI developers find themselves faced with both a technical obstacle and some heavily bureaucratic safety regulations, such that each on their own would have been possible to overcome, but the combination of both causes the AI developers such pain that they just give up, then that can be seen as a mix of (1) and (3). It’s complicated.
With that said, I think the trichotomy works as a simplified first pass at structuring one’s thoughts about how humanity might survive the current AI crisis. I believe each of (1), (2) and (3) are plausible enough to merit taking seriously. As regards (1), all exponential or superexponential growth curves tend sooner or later to turn into something that looks more like a sigmoid, and while there is in this particular case no strong reason to expect the leveling-out to be imminent, it might just happen anyway. Regarding (2), we may note that Yudkowsky and Soares are quite dismissive about this possibility in their 2025 book, and present good arguments for their skepticism. I give somewhat more credence than they do to (2), based not so much on the one case involving moral realism I’ve thought most carefully about for why (2) might be true,3 but more on the possibility of unknown unknowns and how we might all simply be confused about this extraordinarily difficult question. As to (3), well, it’s up to us: if we choose to get our act together, then we can make it happen.
There is considerable political disagreement over whether (3) is desirable and whether we should work towards some intervention to halt the AI race that might otherwise bring upon us the ultimate catastrophe. With the framework I am suggesting here, we see that those who are opposed to such action implicitly attach so much credence to (1) and/or (2) that we don’t need to bother with taking the actions required to make (3) happen.4 And here I think is the core of our disagreement. While it my well turn out in the end that they were right and that (1) and/or (2) is true, I don’t have enough faith in either of those propositions to be willing to bet the entire future of humanity on it. And I do think (most of) these opponents of (3) would do the AI debate an excellent service by making their credence in (1) and/or (2) explicit, and to step up their efforts to explain what motivates their judgement.5
This really is very strange, because even people without multiple academic degrees tend to effortlessly understand the logic behind, say, fire insurance: there is a risk that a fire happens which is worth insuring against, but this is not to say that a fire will happen.
I wasn’t planning to mention any names here, but it is just too tragicomic a coincidence that just as I am typing this paragraph, my phone goes bing and directs me to a brand-new professor-authored op-ed that attacks me based on exactly this kind of conflation.
In short, it just might save us if an objectively true morality exists, and if it is knowable by any sufficiently intelligent entity, and if the morality automatically compels such an entity to act on it, and if it favors humans rather than, say, sacrificing us to the hedonium apocalypse; see my 2019 paper on this.
There actually exists one other category of opponents to taking action for making (3) happen, namely those who think it is pointless due to being not just difficult but literally impossible. I have no patience for such fatalistic and self-fulfilling crap.
One final note: From my formulation of (3), some readers may be tempted to infer that as soon as AI alignment is solved and safety guarantees are established, I would happily agree to the AI companies proceeding towards superintelligence. This, however, is not my view. Even with the AI alignment problem solved, there are so many concerns we need to think long and carefully about, in a way that respects democracy and everyone’s right to not have their life suddenly uprooted without their consent, before (possibly) proceeding. One such concern is whether the meaning of life can satisfactorily be preserved when AIs can do everything for us; Nick Bostrom makes a heroic attempt at a yes answer in his 2024 book Deep Utopia, whose proposed worlds are however so weird that the book can equally well be read as a reductio ad absurdum. And there is a host of similarly difficult problems around gradual disempowerment, power concentration, and whatnot.


Olle, it is a gift writing about so serious things in such an entertaining manner.
Here comes my take on this question, again as a pretty long comment - I am sorry for writing so much. As I have expressed many times in comments here, I am not overly worried about AI existential risk, because I believe that other disastrous events have higher probabilities. But to entirely dismiss all research related to this on grounds that AI will not have sufficient abilities in the coming decades sounds wrong.
AI has surpassed many milestones that had been considered impossible. There was a time when it seemed highly questionable whether a computer could ever beat the strongest players in chess/go. Yet, that was something that one could imagine could happen anyway after some more time. But who could have imagined that AI could create poems, paintings, music and videos that are really hard to differentiate from the works of human artists? Some people will claim that they easily can tell the difference, but I think they are fooling themselves - and even if they were right today, this may not be the case tomorrow.
To believe that we are safeguarded from e.g. all advanced AI systems with cyberhacking capabilities also seems wrong. First, there are simply no error-free systems in practice, and there is always a risk that something unexpectedly fails.
Second, people have efficiently used AI agents to build software-based systems for them. In this process, AI will ask for permission several times to perform various operations. It seems almost certain that users will not make a careful assessment in every single case before providing those permissions. Overly confident people in the leadership of certain strong countries or companies may end up in decision situations when they could give permissions to AI agents that provides AI with dangerous resources, and it is hard to judge such situations to be free of risk.
So, yes, I agree that confidently stating that AI existential risk does not exist would require some good and explicitly formulated arguments. The blog post is well written and the trichotomy may provide a good framework for further discussion.
And we should indeed work towards (3). Even those who do not acknowledge AI existential risk could in principle agree to (3), because there are numerous other disadvantages and risks related to AI development, and addressing those would also benefit from a temporary moratorium. Let's reap the benefits of the amazing technology which is already available rather than further developing it before it is clearly safe to do so!