Full speed ahead!
I worry about the attitude displayed in recent papers by Dario Amodei and Nick Bostrom
The influence on me from Dario Amodei is huge, and the same is true for Nick Bostrom. In Bostrom’s case, what I primarily have in mind here is the influence he’s had over the years on my views on AI. Ranking the AI thinkers that have, in this respect, influenced me most is a precarious enterprise which I will not attempt to undertake here, other than to say that Bostrom is a serious contender for ending up on the top three podium.
Amodei is also a medalist candidate, albeit in an entirely different subdiscipline of having influence on me. Namely, as the CEO of Anthropic and famous for his visionary ideas about creating ever more capable AI, he has more influence than just about anyone else on the ongoing race towards superintelligence, and thereby also on my present prospects of surviving until age 65 rather than being killed before then in an AI apocalypse. (I am 58.)
There are things to admire about both of them, but they have each recently posted a paper that argues for AI accelerationism, a position that, given the present circumstances, I find abhorrent. Here I’ll discuss both papers in a little more detail.
I’ll begin with Nick Bostrom’s recent Optimal Timing for Superintelligence: Mundane Considerations for Existing People, in which he comes out mostly in favor of rushing ahead towards superintelligence. While he had made hints in this direction in an interview with Jonas von Essen in October last year,1 the paper came as a bit of a shock to those of us who have come to know Bostrom as a leading advocate of existential risk mitigation in general, as well as an influential theoretician of such risk from advanced AI in particular.
So what leads Bostrom to such an accelerationist conclusion? Does he back off from his earlier view that human extinction caused by premature creation of superintelligent AI is both plausible and substantially likely? Or does he now side with those AI successionists who favor the human race being replaced by AIs? In fact, he does neither of those things. Instead, he insists (as one should) on considering both the pros and cons of any given action, and lands in the conclusion that the potential gains from promptly building superintelligent AI are so large that they outweigh even a fairly large risk of total disaster. A couple of sentences from his paper serves to illustrate what fuels this conclusion:
Developing superintelligence is not like playing Russian roulette; it is more like undergoing risky surgery for a condition that will otherwise prove fatal. […]
170,000 people die every day of disease, aging, and other tragedies.
And in a similarly striking formulation:
Yudkowsky and Soares maintain that if anyone builds AGI, everyone dies. One could equally maintain that if nobody builds it, everyone dies. In fact, most people are already dead. The rest of us are on course to follow within a few short decades.
The recipe under consideration for avoiding this fate is to build superintelligence. Indeed, the prospects for radical life-extension given the successful construction of an aligned superintelligent AI do look excellent, and the bulk of the paper then consists of working out the pros and cons calculation for how much those of us alive today stand to gain or lose in terms of life expectancy, under various assumptions regarding QALY adjustment, discounting, diminishing marginal returns, P(doom|launch now), decay rates for P(doom|launch at time t) due to alignment progress, and other parameters. The exact conclusions of course vary depending on the parameter values, but overall they tend to point in favor of early launch of superintelligence, and there are even cases where launching immediately is the recommendation even when P(doom|launch now) is as large as 95%.
What immediately jumps out at the reader is Bostrom’s single-minded focus on people alive today, which is very uncharacteristic of him given his earlier work which through its emphasis on the well-being of quadrillions of people living in the far future has become part of the foundational literature for the idea of longtermism.2 He is conscious and explicit about this limitation:
We may distinguish between a person-affecting perspective, which focuses on the interests of existing people, and an impersonal perspective, which extends consideration to all possible future generations that may or may not come into existence depending on our choices. […] In what follows, we adopt the person-affecting perspective (leaving an analysis from the impersonal perspective for future work).3
This passage serves as a kind of deniability clause. If confronted by thinkers who (like me) are upset by the recklessness of his policy recommendations, Bostrom can distance himself from them by saying “but those are not my recommendations, they are just the outcome of calculations I made to see where the person-affective perspective will lead”. While this dialogue is hypothetical, I am nevertheless infuriated by the motte-and-bailey tactic that Bostroms sets himself up for, and I insist that he needs to take responsibility for his research priorities: what he chooses to study and publish now versus what he postpones to a future which may not even come about before misaligned superintelligence blows up and destroys everything we hold dear.
Almost as striking as his restriction to people existing today is the restriction of most of the analysis to our hedonic well-being, rather than our preferences (other than the discounted preference for that very hedonic well-being). Bostrom has a section on preferences for others’ well-being, his reasoning there boiling down to such preferences mostly being commutative (if A cares for B then B cares for A) so it all mostly washes out and can be ignored in his analysis. But his discussion here, as in the rest of the paper, concerns only people alive today, so if I care for future generations (as in fact I do), that falls outside the scope of his analysis.
It seems to me that most people’s preferences deviate significantly from the monomaniacal prudential focus on well-being that Bostrom’s analysis assumes. If you ask the average person how much they would value living until age 1000 and what probability of being killed an an AI apocalypse in 2030 that prospect would outweigh, the most likely answer will not be 95% or 20% or even 1% but something more along the lines of “what the f**k have you been smoking?”. Learning about and then realistically incorporating people’s true preferences in a Bostrom-style optimization model seems like tall order. A solution to this that a hard-core advocate of prudential hedonism might favor is to say that while people’s preferences are indeed important, these people are typically deeply confused about their true preferences which are more in line with Bostrom’s analysis, so we can in fact go on to push for rapid launch of superintelligence without disrespecting their true preferences. That is a kind of paternalism, and while I’m not dogmatically against all kinds of paternalism, the kind where one gives people radical life extension at the mere cost of a 95% probability that everyone is killed in an AI apocalypse is not one that I favor.
So how do we go about respecting peoples preferences in this kind of situation? In my 2025 paper Advanced AI and the ethics of risking everything, I insist on the importance of informed consent. Asking that literally everyone on Earth consents to building superintelligence is tantamount to a blanket prohibition, which even I might say is a bit much, but at the very least I think consent from a solid majority of the world’s population is a reasonable ask before taking the most momentous action ever in the history of humanity. But Bostrom says not a word about consent, an omission that suggests a shockingly elitist mindset.
At this point I am tempted to repeat what turned out to become one of the most-oft quoted lines of my 2016 book Here Be Dragons. On p 242 of that book, I wrote about an older consideration of Bostrom’s that “I feel extremely uneasy about the prospect that it might become recognized among politicians and decision-makers as a guide to policy worth taking literally”.4 At that time, the discussion still seemed abstract and hypothetical, but now that we have arrived at crunch time for humanity, the same sentence applies with even greater force to his Optimal Timing for Superintelligence paper, and the prospect that that paper might fall into the hands of policymakers gives me nightmares.
One powerful such policymaker (in a broad sense of the term) is Dario Amodei. His The Adolescence of Technology comes as less of a surprise than Bostrom’s paper, and is a well-written follow-up to his 2024 essay Machines of Loving Grace. In both of his essays, Amodei speaks of a “country of geniuses in a data center” to describe the level of AI capabilities that he expects to quickly arrive at as soon as his company can get the AI recursive self-improvement spiral going — perhaps as soon as late 2026, but maybe a few years later. While Machines of Loving Grace is an inspired text that puts the idea of a century’s worth of scientific progress compressed into a decade at the center of his vision for what his country of geniuses will mean to the world, in The Adolescence of Technology Amodei treats the other side of the coin: the risks. Organized in chapters on misalignment, misuse, power concentration, labor market disruption and unknown unknowns, the essay gives an almost overwhelming array of ways in which things can go terribly wrong, and the discussions about possible approaches to mitigation can only do so much in reassuring the reader that we are on a benign trajectory, especially as Amodei insists that pushing forward on AI capabilities is the only way to go.
That the CEO of Anthropic has the guts to speak this openly about AI risk is commendable, but there are nevertheless aspects of the essay that are outright bad. While Amodei does not entirely dismiss existential AI risk,5 his strawman portrayal of those of us that he (following a recent very unfortunate terminological invention) calls “doomers” belongs clearly in this badness category. He accuses us of “sensationalistic social media accounts” and “off-putting language reminiscent of religion or science fiction”, but how in the world are we supposed to discuss leading AI executives’ ambition to build “a country of geniuses in a datacenter” without sounding like science fiction nerds?
Although the insertion of the word “inevitably” is unwarranted and uncharitable, Amodei is not entirely wrong in describing the theory of instrumental convergence (as developed in early work by Eliezer Yudkowsky, Steve Omohundro and Nick Bostrom) as the “intellectual basis of predictions that AI will inevitably destroy humanity”. But his way of dismissing these important insights is a highly unfortunate mixture of “it’s complicated” and appeal to authority:6
I think people who don’t build AI systems every day are wildly miscalibrated on how easy it is for clean-sounding stories to end up being wrong, and how difficult it is to predict AI behavior from first principles, especially when it involves reasoning about generalization over millions of environments (which has over and over again proved mysterious and unpredictable). Dealing with the messiness of AI systems for over a decade has made me somewhat skeptical of this overly theoretical mode of thinking.
Nobody ever denied that there are plenty of complications, but if Amodei’s experience from the lab floors has given him insights that warrants the conclusion that the dire predictions from instrumental convergence (about what is likely to happen with unaligned superintelligent AI) are likely to fail in ways beneficial to humanity, he owes it to us to concretely lay out the object-level reasoning leading to this conclusion.
More can be said, but most appalling of all in Amodei’s essay is the bombastic final paragraph, where his view of himself as a kind of superhero leading humanity on the narrow path towards a glorious future is laid bare with unforgiving clarity:
The years in front of us will be impossibly hard, asking more of us than we think we can give. But in my time as a researcher, leader, and citizen, I have seen enough courage and nobility to believe that we can win—that when put in the darkest circumstances, humanity has a way of gathering, seemingly at the last minute, the strength and wisdom needed to prevail. We have no time to lose.
This is the sort of thinking and personality traits that may lead a man in his position to launch a Singularity without asking the rest of us for proper informed consent.
Dario Amodei is obviously an unusually intelligent and capable person — surely intelligent enough to see through the deficiencies of Bostrom’s reasoning in Optimal Timing for Superintelligence. Yet, high intelligence is (as long as we’re talking about the human range of intelligence levels) no reliable vaccine against motivated reasoning. He is highly motivated — probably more so by glory and ambition than by strictly commercial incentives — to press full speed ahead in the AI race, and I worry that when he reads Bostrom’s paper it will further fuel this cataclysmically dangerous state of mind.
In retrospect, perhaps Bostrom’s 2024 book Deep Utopia can be seen as another such hint. In it, he argues that human life can remain meaningful even in quite extreme post-scarcity and post-work circumstances arising from having created superintelligent AI (although I suspect most readers’ overall verdict of his scenarios will be more along the lines of “yuck”).
See, e.g., his Astronomical Waste and Existential Risk Prevention as Global Priority papers.
Hence the term Existing People in the title of the paper. The term Mundane is similarly explained:
One distinction that may usefully be made is between what we could term mundane and arcane realms of consideration. By the former we refer to the ordinary kinds of secular considerations that most educated modern people would understand and not regard as outlandish or weird (given the postulated technological advances). The latter refers to all the rest—anthropics, simulation theory, aliens, trade between superintelligences, theology, noncausal decision theories, digital minds with moral status, infinite ethics, and whatnot. The arcane is, in the author’s view, relevant and important; but it is harder to get to grips with, and rolling it in upfront would obscure some simpler points that are worth making. In this paper, we therefore limit our purview to mundane considerations (leaving more exotic issues to possibly be addressed in subsequent work).
While it is of at most tangential importance to the present discussion, I mention for the record that the Bostromian claim I was referring to in the book was his observation in the aforementioned 2013 Existential Risk Prevention as Global Priority paper that
the expected value of reducing existential risk by a mere one millionth of one percentage point is at least a hundred times the value of a million human lives.
A claim by Amodei in a recent podcast episode with Dwarkseh Patel reveals that AI existential risk concerns are not at the top of his mind in the way that the current situation necessitates. He says this:
If we had the country of geniuses in a datacenter, we would know it. We would know it if you had the country of geniuses in a datacenter. Everyone in this room would know it. Everyone in Washington would know it. People in rural parts might not know it, but we would know it.
This claim is unwarranted, the problem being that he forgets that a misaligned superintelligent AI is likely to have the kind of situational awareness that may lead it to hide its misalignment or its vast intelligence from us. A growing body of empirical evidence, including some of Anthropic’s own research, is pointing towards the conclusion that such situational awareness is beginning to emerge already in present-day AI systems.
This is stressed in the excellent discussion of Amodei’s essay in a recent episode of the Doom Debates podcast with Liron Shapira and Harlan Stewart.


