black server racks on a room

Exaggerating the risks (Part 6: Introducing the Carlsmith report)

Some worry that the development of advanced artificial intelligence will result in existential catastrophe … My current view is that there is a disturbingly substantive chance that a scenario along these lines occurs, and that many people alive today – including myself – live to see humanity permanently disempowered by AI systems we’ve lost control over.

Joe Carlsmith, “Is power-seeking AI an existential risk?
Listen to this post

1. Recap

This is Part 6 of the series Exaggerating the risks. In this series, I look at some places where leading estimates of existential risk look to have been exaggerated.

Part 1 introduced the series. Parts 2-5 looked at climate risk. Part 2 introduced Toby Ord’s claim that humanity faces a 1/1,000 risk of irreversible existential catastrophe due to climate change by 2100. Parts 3, 4, and 5 of the series drew on work by John Halstead to take a look at all plausible mechanisms by which climate change could pose a serious near-term existential risk, and found them unconvincing.

Part 5 also used discussions of climate risk to illustrate what I have called a regression to the inscrutable in leading discussions of existential risk:

We saw throughout this series what I have called a regression to the inscrutable. Effective altruists begin by chronicling risks that are relatively tractable using ordinary scientific methods, such as crop failure, heat stress, or flooding. Finding these risks to be slight, effective altruists increasingly place most of their confidence in esoteric risks such as tipping cascades or runaway greenhouse effects. These risks are distinguished by the fact that they are increasingly inaccessible to our best scientific methods.

This passage suggests that we see a regression to the inscrutable within risk areas: for example, most climate risk is claimed to come from some of the least scrutable sources. However, I also argued that we see a regression to the inscrutable across risk areas: effective altruists take most existential risk to come from threats far less scrutable than climate risk:

Increasingly, effective altruists invest most of their confidence in relatively inscrutable risks such as bioterrorism and risks from misaligned artificial intelligence. Indeed, Ord puts the combined existential risk from these two sources at over 13% by 2100. As the focus of this series shifts to more inscrutable risks, readers would do well to bear in mind the failure of more scrutable risks, such as climate risk, to stand up to scientific scrutiny, as well as the tendency within more scrutable areas, such as climate risk, to concentrate risk estimates within the least scrutable sources of risk.

As promised, the focus of this series will shift to more speculative risks, beginning with existential risks posed by artificial intelligence. However, I hope that lessons from the more scrutable case of climate risk will not be lost on readers of this series. Part 5 suggested three lessons: in addition to a warning about regressions to the inscrutable, we also saw:

  • Risk estimates can be inflated by orders of magnitude, even when offered by leading authors in texts often treated as authoritative. For example, Ord puts climate risk at 1/1,000 by 2100, whereas Halstead struggles to get climate risk above 1/100,000 not just by 2100, but over all time.
  • The evidential basis for existential risk estimates is often very slim, in many cases resting on a few thin sentences in a long discussion. For example, Ord’s positive case for climate risk rests entirely on a few sentences about lessons from the paleoclimate, which we saw in Part 4 to be not only thin, but also misleading.

Bearing these lessons in mind, let us turn our attention to AI safety. I will begin my discussion by looking at a report by Joe Carlsmith on the dangers of power-seeking artificial intelligence. But first, I want to situate this discussion in a broader perspective that is often lost in discussions by effective altruists.

2. AI safety

With each passing year, effective altruists invest ever more confidence in the claim that artificial intelligence poses a significant existential risk to humanity. Will MacAskill estimates existential risk from artificial intelligence in this century at 5%. Joe Carlsmith estimates risk by 2070 at 10% or higher. Eliezer Yudkowsky suggests, quite possibly in earnest, that we are doomed:

It’s obvious at this point that humanity isn’t going to solve the alignment problem, or even try very hard, or even go out with much of a fight.  Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with slightly more dignity.

Eliezer Yudkowsky, “MIRI announces new `Death with dignity’ strategy

Because effective altruists have become more concerned about AI risk, they have begun to devote an increasing fraction of resources to combatting AI risk. New organizations such as the Center for AI Safety have been founded and have begun pouring unprecedented amounts of money into the field. (An example: $100,000 of prizes for conference papers presented at a single symposium).

I am, for my own part, deeply concerned about what future developments in artificial intelligence may bring. I worry that AI may decimate the labor market, with harms falling heaviest among the most vulnerable workers. I worry that decisionmaking will become increasingly less transparent and explainable. But I am not very worried that AI may one day murder us all.

3. The Carlsmith report

One challenge in addressing the issue of AI safety is that there is no single orthodox statement of the concern. However, one of the most prominent recent attempts to articulate the concern is a report by Joe Carlsmith, “Is power-seeking AI an existential risk?“.

Joe Carlsmith is a serious guy. Carlsmith is a senior research analyst at Open Philanthropy, and is finishing a PhD in philosophy at the University of Oxford. Before that, Carlsmith graduated with a perfect 4.0 from Yale University, and completed a BPhil in philosophy at Oxford.

Carlsmith argues for the conclusion that existential risk from artificial intelligence by 2070 is at least 5% (now updated to “>10%”).

I was one of a panel of reviewers commissioned to comment on Carlsmith’s report. As part of the review process, I was asked to provide my own credence that AI will lead to existential catastrophe by 2070. I returned an estimate of 0.00002%.

This view did not make me especially popular among effective altruists. For a while, the most upvoted comment on the Less Wrong thread containing the Carlsmith reviews opined:

In this spirit, I am often asked if I genuinely meant to suggest that the existential risk from artificial intelligence (AI risk) is so low. I did mean it, with a couple of caveats.

The estimate I offered was a bit generous, as one tends to be when being compensated by the foundation which commissioned a report. And since offering this estimate, I have had more extended contact with arguments for AI risk, causing me to lower my estimate further. I now think that an estimate of 0.00002% was unduly generous, (although, as I will explain shortly, I am also not convinced that we are in a position where estimating AI risk makes good methodological sense). Let me explain why I take AI risk to be lower than Carlsmith does.

Before I begin, we need to talk about methodology. This isn’t an easy conversation to have, but it must be done.

4. Some methodological observations

The study of AI safety raises uncomfortable questions about methodology that are well-known, but rarely discussed in public. I think it is important to be open and honest about these methodological issues because they explain the silence of much of the scholarly community on issues of AI safety as well as the attitude of skepticism with which this work is often viewed.

At the outset of this discussion, I want to make my allegiance plain. I am an academic. I think that scholarly methods are among the richest and most reliable methods on offer for studying the world. I am deeply suspicious of discussions that cannot be brought up to the standards of rigor, clarity and evidence expected of scholarly discussions, even when those discussions are carried out by world-class scholars. Like many academics, I tend to think that such a situation is evidence of a defect in the underlying discussion and a good reason to stay away.

There is some risk that I will be accused of snobbery in making these remarks. I understand and appreciate this risk. That will be particularly true given that a small, but growing number of serious academics have taken some first steps towards putting the field of AI safety on a secure academic footing (a good development). Nevertheless, I think that these words must be said.

One central issue concerns the degree of permissible speculation. Academics are, as a rule, quite hesitant to discuss matters which can only be approached in a speculative manner. We want clear models, good evidence, and promising theories. When we do not find these things, we think it is best to hold off on speculation, because we think that highly speculative theorizing runs a significant risk of being misleading.

For my own part, I spent more than a year refusing to engage with any work in AI safety, precisely on the grounds that I judged it to be too speculative. This is the stance of most scholars today, even scholars interested in artificial intelligence. My attitude has not changed. I engage in these discussions against my better judgment, and conscious of a significant risk that my own remarks will be speculative and misleading. I do this because I think that effective altruists have placed a dangerous degree of confidence in arguments that artificial intelligence poses a significant existential threat in this century, and because that confidence is being used to misdirect billions of dollars of philanthropic funding away from what I view as worthier causes.

A related issue concerns standards of evidence. Standards of evidence are, as a rule, hard to articulate, but in general scholars impose high standards of evidence before they are willing to invest significant confidence in a theory. At many points, the most helpful complaint that I will find myself able to make is that existing arguments for existential risks from artificial intelligence (AI risks) do not meet what many would regard as minimal standards of evidence.

A third issue concerns the notion of a scholarly literature. I am often told by AI safety researchers, even those holding academic positions, that there is a robust and flourishing scholarly literature on AI safety. When I ask them to show me this literature, I am shown a few published papers intermixed with a large volume of blog posts, forum discussions, discord chats, and other nontraditional media. I was once asked, quite possibly in earnest, to read and cite a 700-page transcript of blog debates bearing the illustrious title `AI go FOOM‘.

To my mind, a scholarly literature consists primarily of research articles and monographs published by reputable journals and publishers after a process of pre-publication peer review. The authors of scholarly articles typically hold PhDs in their fields, as do their reviewers. I am hesitant to recognize the current assemblage of blog posts, forum discussions and the like, often written by nonterminal degree holders, as constituting a scholarly literature.

The notion of a scholarly literature will come to the foreground at one key point in our discussion (Section 2.2 of the Carlsmith report), where confident predictions of strong progress in artificial intelligence are made almost entirely without argument. Instead, we are asked to defer to the literature. What literature? Here we are offered citations to three unpublished reports, all by the same foundation that commissioned the Carlsmith report (Cotra 2020, Davidson 2021, Roodman 2020), and a selection of mostly unpublished surveys carried out almost entirely by effective altruist organizations.

This is not a scholarly literature, and it is not a literature worthy of our trust. The authors of all three of the cited reports lack terminal degrees in their fields; have no academic affiliation; and have little record of serious scholarship. These reports and surveys have not been through rigorous processes of peer review (a point I return to below), and have largely been self-published by a few philanthropic foundations which are known for pushing very strong views about existential risk from artificial agents. I do not, and will not, defer to such an assemblage of writings, nor am I terribly excited to read them.

(Edit: A reader points out that some authors have some of the marks of authority emphasized above. For example, Roodman has several well-respected and well-cited publications under his belt, and Carlsmith is finishing a PhD at Oxford. It is important to bear in mind that scholarly authority is not an all-or-nothing question, and to be clear about what marks of authority an author or foundation does or does not have).

A fourth issue concerns the nature of peer review. Scholarly articles are typically evaluated through processes of pre-publication peer review, in which leading and independent specialists commission reviews from experts in the field. Reviewers have the power to decide whether and in what form the article is published. Reviews are financially uncompensated, or in special cases poorly compensated, and typically private.

Carlsmith’s report did not go through any comparable process, and would likely not have made it through peer review at a leading journal. Carlsmith’s report was subject only to post-publication review in which staff members of the Open Philanthropy Foundation (which commissioned the report) commissioned reviews from a range of authors. Reviewers had no power to alter or reject the report. Reviews were generously financially compensated, and were posted in public. As we saw, public reviews can have real consequences for reviewers (a public guffaw met my report), so we are typically more reluctant to speak our minds, especially when we are being paid by the foundation which commissioned the report.

Many scholars view articles which have not passed processes of peer review with deep suspicion. A report such as the Carlsmith report would be viewed in much the same light as a post on a good blog – that is to say, a far cry from a contribution to the scholarly literature. (Although I hope that readers will enjoy and learn from this blog, I would never dream of demanding that it be cited in scholarly discussions. If you want to learn about the ethics of artificial intelligence, you can read any of the many excellent papers on this subject written by world-leading scholars).

A final issue concerns explicitness of reasoning. Reviewers are notorious for demanding exact and precise statements of the reasons underlying all claims, even comparatively minor claims. A danger of circumventing ordinary processes of peer review is that reports can be published even when a substantial part of the reasoning in support of key premises is not specified.

We have already seen one instance of this in our discussion of Ord’s views on climate risk, beginning in Part 2 of the series Exaggerating the risks. We saw that Ord asserts, almost entirely without argument, a 1/1,000 risk of irreversible existential catastrophe from climate change by 2100. And we saw the dangers of this strategy: it took us a great deal of time to chase down every possible argument that Ord could have had in mind, since Ord did not tell us. We found that none of these arguments came anywhere close to grounding the conclusions that Ord gave.

At key points in my response to Carlsmith, the complaint will be much the same complaint I have made against Ord. Central premises of the argument are defended by little, if any explicit reasoning. That is, to my mind, good cause for skepticism.

5. Carlsmith’s argument

Carlsmith argues that humanity faces at least a 5% probability of existential catastrophe from power-seeking artificial intelligence by 2070 (updated to 10% in March 2022). Here is how Carlsmith outlines the argument, with probabilities reflecting Carlsmith’s weaker, pre-2022 view (“|” represents conditionalization; descriptive titles for premises are added but otherwise the following is a direct quotation from Carlsmith).

By 2070:

1. (Possibility) 65% It will become possible and financially feasible to build AI systems with the following properties:

  • Advanced capability: They outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering and persuasion/manipulation).
  • Agentic planning: they make and execute plans, in pursuit of objectives, on the basis of models of the world.
  • Strategic awareness: the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining power over humans and the real-world environment.

(Call these “APS” – Advanced, Planning, Strategically aware – systems).

2. (Incentives) 80% There will be strong incentives to build and deploy APS systems | (1).

3. (Alignment difficulty) 40% It will be much harder to build APS systems that would not seek to gain and maintain power in unintended ways (because of problems with their objectives) on any of the inputs they’d encounter if deployed, than to build APS systems that would do this, but which are at least superficially attractive to deploy anyway | (1)-(2).

4. (Damage) 65% Some deployed APS systems will be exposed to inputs where they seek power in unintended and high-impact ways (say, collectively causing >$1 trillion dollars worth of damage) because of problems with their objectives. | (1)-(3).

5. (Disempowerment) 40% Some of this power-seeking will scale (in aggregate) to the point of permanently disempowering ~all of humanity | (1)-(4).

6. (Catastrophe) 95% This disempowerment will constitute an existential catastrophe | (1)-(5).

Aggregate probability: 65% * 80% * 40% * 65% * 40% * 95% ≈ 5%.

6. Looking ahead

In the next few iterations of this series, I will take a look at Carlsmith’s argument for these six claims. I will focus on questions such as the following:

  • What argument is given for (Alignment Difficulty)? How much evidence do we really have that artificial intelligence is likely to be substantially misaligned?
  • How bad is the misalignment expected to be? Everyone should grant that AI systems will sometimes fail to do what we want them to. They already have. But why think they will take over the world and disempower us all?
  • Why are we so confident in (Possibility)? Optimists have a long history of promising that artificial general intelligence is just around the corner. After so many failed predictions, why should we take them at their word today?

Do let me know in the comments if there are other questions regarding the Carlsmith report that you would like to hear discussed. I am also interested in hearing which other arguments for existential risk from artificial intelligence, beyond the Carlsmith report, readers would like to hear discussed.


10 responses to “Exaggerating the risks (Part 6: Introducing the Carlsmith report)”

  1. Violet Hour Avatar
    Violet Hour

    Happy to see the first post — looking forward to reading this series!

    Here’s one reason I’m happy: while many people working in this field have called for more academic engagement, most of these people (I think) start from the assumption that the nascent ‘literature’ of informal discussion is onto something, even if the core arguments need to be tightened. I don’t necessarily think this is a bad thing — for instance, I wouldn’t be surprised if there were established academic subfields which initially started in more grassroots fashion, before gradually refining and honing a set of key insights. Nonetheless, I think it’s also good to have skeptics who are less convinced by the nascent collection of extant writing who still hold themselves to more scholarly standards.

    I’m really excited to see some of your more specific object-level criticisms of the Carlsmith report. But I’ll end by listing a few meta topics I’d be interested to hear your take on.

    (1) As you know, there exists (peer-reviewed!) literature which argues that the epistemic benefits of pre-publication peer review rests on shaky grounds. [1] If you’re critiquing the the AI safety research ecosystem (partially) through the lens of social epistemology, I’d personally like to hear more about your more positive views on the social epistemology of science, and the ways in which you think discussions of x-risk from AI fall short wrt those standards.

    (2) I’m interested to hear more about what you consider to be permissible amounts of speculativeness for a research field. Here’s one thought I had while reading this post: many branches of academic philosophy are more speculative than many branches of natural science. But you (maybe?) think that the degree of speculation licensed by the standard norms of academic philosophy is permissible. If so, what’s the justification for the current boundary you draw around ‘permissible degrees of speculation’?

    (3) Finally, a more idiosyncratic preference on my part: I’d love to hear you discuss your sense of the evidence-base for x-risk from AI in terms of distinctions laid out by Joyce [2] (‘weight’, ‘balance’, ‘specificity’), or whatever upgraded distinctions are offered by contemporary epistemologists when analyzing the nature of our total evidence. I can imagine that you would’ve done something like this without me commenting, so I’ll state my motivation for mentioning these distinctions explicitly.

    I can imagine people complaining about the paucity of our evidence-base when our total evidence is *any one of*: ‘unspecific’, ‘balanced’, or ‘unweighty’. However, I think it’s likely that different practical conclusions could follow if our evidence-base was poor primarily because it was ‘unweighty’ or poor primarily because it was ‘balanced’. So more discussion on the nature of the evidence-base within this field seems like it’d be useful. 🙂


    1. David Thorstad Avatar

      Hi Violet Hour,

      Thanks for your engagement and for the kind words! I have enjoyed discussing these issues with you, and I also liked much of your recent work on the epistemology of effective altruism. For example, I see a good deal of overlap between our views about the relatively strong role of speculative reasoning and precise quantification in effective altruism relative to other methodological traditions.

      << 1. On peer review >>

      It is indeed important to emphasize recent scholarly work in peer review, both recent work in epistemology (often driven by the two authors you cite, who are very good philosophers) and also to cite earlier empirical work in the scientific literature. This work has been particularly spurred in recent years by the advent of systems such as preprint archives which could be considered competitors to traditional pre-publication peer review. I think that in this regard, I would want to stress four things.

      First, while it is true that some good complaints have been raised about existing systems of peer review, the call for abolition of pre-publication peer review remains by far a minority view. I would certainly not want to diminish the excellent work done by critics of peer review, and I agree with some of their criticisms. However, we should also acknowledge that the overwhelming majority of scholars and academic institutions remain deeply committed to practices of pre-publication peer review, and view new competitors as interesting supplements that should not yet be allowed to replace pre-publication peer review.

      Second, many of the grounds on which pre-publication peer review is questioned is that it is slow and presents strong obstacles to publication. These features are exactly the features which many scholars take to enable pre-publication peer review to present a strong quality filter on accepted papers. When a body of writing, such as AI safety work, finds itself challenged on grounds of quality and accuracy, it would be a good strategic move for writers in this tradition to show that they can pass the strong quality filter of more traditional methods. Otherwise, they may be viewed as optimizing for quantities such as speed of publication that do not always march in lockstep with reliability, and that is not the impression they should want to convey.

      Third, proposals to replace pre-publication peer review take place against a strong fixed background of epistemic institutions, practices and history. Scholars publishing articles in pre-print archives still, by and large, have degrees from, and positions at leading academic institutions; present their work at leading conferences; are trained in standard disciplinary methods of reasoning and argument; and bear high reputational costs for shoddy research. More immediately, these scholars have proven repeatedly that they can get their work through peer reviewed journals when they want to, and they have the esteem of their peers. Many of these background institutions, practices and histories are not present in other areas, such as internet fora. This makes it much more dangerous to abandon the quality filter of pre-publication peer review in such venues.

      Finally, peer review is a small part of the sociology of scientific deliberation. Much recent work in social epistemology has emphasized that reasoning works well in heterogenous groups that conduct open arguments based on evidence, and works badly in homogenous groups or in contexts where evidence is insufficiently strong to counter other group deliberation dynamics. (See i.e. work by Hugo Mercier and Dan Sperber on the function of reasoning). I see effective altruism as containing many of the social epistemic features that these theorists are worried about, and as suffering many of the epistemic harms predicted to occur in homogenous, self-selecting groups deliberating on internet fora. These dynamics concern me much more than the presence or absence of peer review.

      << 2. On permissible speculation >>

      I think you hit the nail on the head with regard to academic philosophy. Philosophers have a relatively high degree of tolerance for speculation relative to other academic fields. For example, we are very willing to theorize about topics such as phenomenal consciousness that some scientists consider to be outside of their epistemic ken. I think this is why effective altruists have been relatively more successful at recruiting academic philosophers than they have been at recruiting academics in many other disciplines, and it is why I am writing a response to Carlsmith whereas most other academics would not go so far as to pen a response.

      This is also why I think effective altruists should be uniquely concerned when the vast majority of philosophers say to effective altruists that they regard the kinds of theorizing done on AI risk as a good deal too speculative to be reliable or worth engaging in. Philosophers are not unreasonably hard-nosed empiricists, STEM-lords, or model-mongers. But we have our limits, and those limits have been exceeded.

      I do genuinely try to meet effective altruists partway here. If I had my way, I would be writing nothing about the Carlsmith report. As I said, I think that this kind of speculation is dangerous. Nevertheless, the Carlsmith report is among the least speculative, most detailed, and best-grounded of existing large reports on AI risk, so I am doing my best to engage with it. This does not mean that I can engage with every claim in the report. But I will stretch my tolerance for speculation as far as it can reasonably be stretched, and probably beyond that.

      << 3. Evidential paucity >>

      I’m glad you like Joyce’s work on evidence. I do too. So that all readers are on the same page, for Joyce the balance of evidence “is a matter of how decisively the data tells in favor of the proposition” and is what point probabilities reflect. The weight of evidence “is a matter of the gross amount of relevant data available”. The specificity of evidence “is a matter of the degree to which the data discriminates the truth of the proposition from that of alternatives”.

      If forced to assess the balance of evidence for AI risk, I’d take it to be low: that’s why I offer low point estimates when forced. However, I complain bitterly about being forced to offer point estimates, precisely because they don’t reflect other factors such as Joyce’s weight and specificity.

      On weight, I’d also take it to be quite low. For example, in response to Richard Chappell, I mentioned that we might report the situation using an evidential interpretation of Dempster-Shafer theory, again returning a low value. Joyce (like many philosophers) likes probability theory better, so he holds that weight “is reflected in the concentration and stability of probabilities in the face of changing information”. That’s not my favorite gloss, but I might be disposed to report a claim of low weight in this framework too if I were made to use this framework.

      On specificity, this is the part of Joyce’s framework that many philosophers have had more difficulty understanding. Joyce does offer a probabilistic gloss of specificity: the spread of probabilities across a credal state over relevant alternatives. I’m a bit hesitant to use this gloss because it is both probabilistic and alternative-dependent. However, across the partition {AI doom by 2070, not-(AI doom by 2070)}, if forced to use probabilistic language, I’d probably report very high specificity: most probability mass falls on not-doom.

      For my own part, I do tend to favor nonprobabilistic approaches to decisionmaking under deep uncertainty. I’ve written a few papers about this, although they are regrettably pretty bad: see my “Tough enough? …” and my “General-purpose institutional decisionmaking heuristics …” on my website.

      If I had to use a taxonomy for describing ways in which our epistemic situation can be deeply uncertain, I’d probably use the Walker et al. taxonomy from the deep uncertainty literature (see “Defining Uncertainty: A Conceptual Basis for Uncertainty Management in Model-Based Decision Support”). An advantage of this taxonomy is that it allows us to talk about things like model uncertainty and parameter uncertainty which scientists use in their talk about uncertainty every day, but which are suppressed or made hard to articulate in most probabilistic frameworks.

      I hope this helps a bit! I’ll try to say more about approaches to uncertainty in my series on epistemics, which I have started writing.

  2. Richard Avatar

    Hi David, thanks for writing this. I’d be curious to hear more about your views on “permissible speculation”. This sounds a lot like what I call “epistemic cheems mindset” – – and I think is a significant barrier to progress, especially in time-sensitive emergency-type situations. If you prefer a more scholarly reference, my paper on ‘Pandemic Ethics and Status Quo Risk’ discusses some of these issues specifically in relation to pandemic ethics:

    My general worry is that *ignoring* uncertain possibilities entails systematically neglecting status quo risks (and my paper runs through many concrete examples of real-life harm resulting from this attitude). So I really think it’s incumbent upon academics and policymakers to try to form the most accurate judgments they can, which essentially requires speculation when engaging with inherently speculative topics about which much remains uncertain.

    That is, I argue that if any epistemic stance in this vicinity is impermissible, it is the *refusal* to speculate. (We should, of course, endeavour to speculate accurately.)

    In general, I worry that your methodology involves illicitly stacking the deck in favour of complacency.

    > “in general scholars impose high standards of evidence before they are willing to invest significant confidence in a theory”

    Yet you also said that you are >99.99998% confident in the theory that AI poses no existential risk. That’s…. a LOT of confidence! And it sounds like you want to redirect money away from precautionary measures (e.g. AI safety funding) on the basis of this highly speculative, disputable theory.

    Will you be addressing, in future posts, the standards of evidence that justify such an extraordinarily extreme claim on your part? Or are you assuming that it suffices to cast doubt on others’ positive claims, whereas your own (very extreme) views get to count as justified *by default*?

    1. David Thorstad Avatar

      Thanks Richard! I think I hear three questions from you here:

      (1) Why isn’t it permissible to speculate about future risks from AI, particularly given that AI could be very dangerous?
      (2) Isn’t it extreme behavior to invest low confidence in AI risk?
      (3) Don’t skeptics of AI risk have a positive burden to argue that AI is not going to kill or permanently disempower us all?

      (On 1): It is important for us as scholars to acknowledge the orthodox scholarly view about permissible speculation, particularly when our words are likely to be read by an audience of impressionable young readers who may get the wrong idea about what types of knowledge our best models and methods can and cannot deliver. The remarks I have made about evidence, speculation, and other scholarly standards in this post are, as you know, reflected in the scholarly mainstream. The mainstream approach reflects a set of views and practices that have served the scholarly community well for many years, and have made us deserving of public trust. When we abandon this approach or encourage others to abandon it, we run a great deal of epistemic risk.

      One way to model the point is in terms of signal detection theory. Arguments present us a noisy reflection of the truth, driven towards the truth by a signal (evidence) and away from the truth by noise (imperfect ability to discern the truth, or inherent unpredictability of the target phenomenon). When arguments become highly speculative, we are left with very little signal and a high degree of noise. This means that speculation is likely, in the best case, to be driven almost entirely by noise and to tell us very little about the truth.

      In the worst case, bad group deliberation dynamics threaten to take over, because even the weakest group deliberation dynamics are stronger than the evidential signal in speculative matters. It is now the mainstream view in cognitive science and social epistemology that groups with largely homogenous views and demographics, self-selecting into internet communities such as the EA Forum and LessWrong, run a very high risk of polarization, groupthink, information cascades, and other myside biases. This view predicts that over time, such communities will converge on increasingly confident and extreme views in speculative matters, as appears to have happened with effective altruists. When effective altruists invest, with each passing year, increasingly strong confidence in the claim that artificial intelligence poses a near-term threat to human existence, we should be disposed to treat this as a symptom of low-signal speculation being swamped by group deliberation dynamics and by the noise inherent to the prediction problem.

      This is not the first time that the world has seen a group of impressionable young people become increasingly confident that the world is soon going to end in a very specific way. It is usually far easier for those outside of the group than for those within it to discern the relative lack of evidence and the strong influence of deliberative dynamics in shaping predictions of impending doom. It may be worth recalling past examples of doomsday predictions that failed to materialize as a reminder of the dangers of speculation about the end of the world in insular communities.

      (On 2): I take it that most scholars are highly skeptical of the claim that artificial intelligence is likely to kill or permanently disempower us all in this century. They are skeptical of this claim in the same way that they are skeptical, for most objects X, of the claim that X is likely to kill or permanently disempower us all. That is to say, there are a lot of things in this world, and while many of them present known harms and benefits, we should be highly skeptical of ascribing to them the most extreme harms or benefits without a great deal of argument. If we don’t have a good argument for the claim that some X is likely to kill or disempower us all, then we shouldn’t be very confident that it will.

      For what it is worth, I am sympathetic to one way of reading your point: that the best model of my attitude towards AI risk should not be a precise credence of some low value p. (I gave a precise credence under protest). A number of scholars have recently emphasized the importance of turning to new ways of representing confidence and epistemic states under conditions of deep uncertainty. For example, it may be appropriate to represent the situation through the lens of Dempster-Shafer theory, on an evidential interpretation on which a low value represents a very low degree of accumulated evidence.

      (On 3): One pattern of behavior that I have noticed lately from effective altruists is trying to flip the argumentative burden: rather than arguing that AI is dangerous, they invite opponents to prove that it is not. This type of burden-shifting is inappropriate for two reasons.

      The first reason why burden-shifting is inappropriate is that, as I have mentioned, there are a great many things in this world and it is appropriate to invest low confidence, for most X, in the claim that X is likely to kill or disempower us all. If we are to invest higher confidence that some particular X is likely to kill or disempower us all, we need a good argument to convince us to do so.

      A second reason why burden-shifting is inappropriate is that it invites opponents to engage in the very kind of speculation they that have good reason, both theoretical and based on recent experience with effective altruists, to think will be misleading. Scholars who think that speculating about matters far beyond the reach of our present evidence is inappropriate and misleading do well to refuse to engage in speculation, and court danger when they nonetheless engage in speculation for the purpose of argument. As it happens, I am courting danger by engaging even in a limited form of rebuttal here, and I have the uncomfortable sensation of standing a bit too close to the edge of an epistemic cliff.

      1. Richard Avatar

        re:1: epistemic conservatism entails a corresponding epistemic risk (namely, neglecting important truths, reinforcing complacency, etc.). This can also undermine public trust, as I think we saw during the pandemic when public health authorities said ridiculously false things (e.g. claiming we had “no evidence” about immunity or vaccine safety, when really we just had not yet confirmed standard patterns for Covid in particular), and likely caused a staggering number of deaths by opposing challenge trials. So again, I think you’re stacking the deck in how you’re thinking about “risk” here (and I strongly urge you to read my ‘status quo risk’ paper, which is on exactly this error).

        Of course, we should be *clear* about when reasoning is speculative, and distinguish speculative conclusions from firmly-supported conclusions. But I think it’s just inevitable that we sometimes need to act in the absence of robust evidence, and in such cases it’s irresponsible not to do the best we can to work out what is most likely correct.

        > “bad group deliberation dynamics threaten to take over, because even the weakest group deliberation dynamics are stronger than the evidential signal in speculative matters.”

        Of course, the strongest cognitive bias of all is to asymmetrically attribute cognitive biases to one’s interlocutors, without considering how one’s own view might be similarly biased.

        If we look at the most relevant recent “test case” — the pandemic — it seems pretty clear that EA/rationalist thinkers did vastly better than “mainstream” opinion, which was (predictably) far too slow to update, far too conservative in what sorts of evidence they were willing to consider, etc.

        There’s no question that forming accurate beliefs about complex, uncertain matters is *hard*, and comes with no guarantee of success. Random rubes certainly do better to just defer to mainstream thought than to attempt to reason from first principles together with other random kooks on the internet. But I also think intelligent people, reasoning carefully and responsibly, can do a lot better than mainstream opinion on important issues. After all, the mainstream opinion is also, for the most part, more noise than anything else. The key question is: “Is this worth trying to think about carefully?” And for intelligent, reasonable people, addressing obviously important topics like the pandemic, AI risk, etc., I think it’s worth trying to do the best we epistemically can. (I think philosophers are especially well-trained for this, since we’ve basically dedicated our lives to thinking carefully about highly speculative matters where there are no authorities worth deferring to.)

        > “It may be worth recalling past examples of doomsday predictions that failed to materialize”

        I’m sure AI-risk boosters have their own preferred reference classes to appeal to here (e.g. species superseded by another that is more intelligent and capable, or whatever). But such “reference class tennis” is lazy and rarely persuasive. At the end of the day, there’s no substitute for evaluating the arguments on their merits.

        > “(On 2): I take it that most scholars are highly skeptical of the claim that artificial intelligence is likely to kill or permanently disempower us all in this century.”

        Not sure what you’re considering as the relevant class of “scholars” here, but I’d think that most well-informed and careful thinkers are instead *highly uncertain* of the risk here, given that there are very obvious respects in which AI is a transformative technology, unlike anything we’ve faced before. I’d expect most to be much more comfortable with a wide imprecise credence range, rather than asserting with very high confidence that there is *no* serious risk here.

        If you’re just saying that you haven’t accumulated much evidence on the topic, then you should just say that. You’re not in a position to positively claim that the risk is *definitely extremely low* unless you can support *that* claim. And without it, the debate shifts to how we should respond in situations of radical uncertainty — e.g. whether anything in the vicinity of the precautionary principle holds here.

        > “first reason why burden-shifting is inappropriate…”

        That’s just wrong. In general, the argumentative burden is on whomever is trying to establish a claim. Certainly, AI risk proponents need to (and, of course, DO) present arguments to try to persuade others to rethink their complacency. Equally, if you want to establish that complacency is warranted, then the burden is on you to support this conclusion with persuasive arguments. There’s no asymmetry here, except that you happen to hold one of the two views, and so don’t appreciate the need to support it.

        > “A second reason why burden-shifting is inappropriate is that it invites opponents to engage in the very kind of speculation they that have good reason, both theoretical and based on recent experience with effective altruists, to think will be misleading”

        (What on earth is that “based on recent experience with effective altruists” a reference to? Again, I think the pandemic case is good evidence that EA ideas are worth engaging with, as plausibly an improvement over the epistemically conservative mainstream.)

        Nobody is forcing you to debate speculative matters. I’m just saying that if you do, then you’re epistemically obligated to do so responsibly, rather than begging the question by assuming an epistemic asymmetry between yourself and your interlocutors. I think that’s probably the crux of the disagreement that I have with you here.

        (I don’t, myself, have strong first-order views on AI risk, except to think that it’s sufficiently credible on its merits to be worth taking reasonable precautionary measures.)

  3. bat (@bat1441) Avatar

    Despite strongly disagreeing with it, I do think you stating your p(doom) as 0.00002% is kind of badass and I genuinely hope (and am looking forward to find that) that is the result of you seeing flaws and holes in the arguments I have missed rather than the result of me not underweighing good arguments just because they’re in the form of blog posts rather than academic papers.

    Other than the Carlsmith report, would be interested in reading your thoughts on which parts of Richard Ngo’s The Alignment Problem from a Deep Learning Perspective ( you find implausible.

    1. David Thorstad Avatar

      Thanks! I appreciate your comments, and I’ll add Ngo to the list of potential next papers to address. (I know Richard and think well of him).

  4. River Avatar

    Since you seem to be aware of some of the ways you will come off as an arrogant academic, I will point out a way that you seem to be unaware of. You called 3 reports “unpublished”, and then linked to where they are published. In ordinary use, “unpublished” means “not publicly available”. A blog post, a newspaper, a magazine, a popular level book are all forms of publication. You seem to have used the word as a synonym for “not published in a peer reviewed journal”, and that is a grossly misleading manipulation of the English language. That is not how non-academics understand the word. It shows that you live in a world where people writing outside of academia somehow magically don’t count. And that is very arrogant.

    I should also note that taking academic AI researchers (I’m referring to capabilities here, not safety) as leading experts on AI also comes off as arrogant and misleading in a world where the frontiers of AI research are found in industry, not academia. A typical researcher at Anthropic or Deep Mind is a greater AI expert than a typical tenured professor specializing in AI, since they are the ones actually building the thing, the professors just write about the thing. The proof of expertise is in the ability to build the AI, and the professors aren’t demonstrating that. The people in industry, often without terminal degrees, sometimes without any degrees at all, are the ones demonstrating that. Let the professors build an LLM of their own, and then they can have a voice in conversations about them.

  5. Wyman Kwok Avatar

    Hi David,

    I appreciate and am thankful to your effort in attempting to inject a healthy dose of skepticism into the AI risk issue in particular and the EA movement in general. Please let me know if there are any serious replies to your reflections or criticisms (in any post of this website, I mean), especially those from your targets. Thanks!

    1. David Thorstad Avatar

      Thanks Wyman! Will do.

Leave a Reply