Exaggerating the risks (Part 15: Biorisk from LLMs)

I examined all the biorisk-relevant citations from a policy paper arguing that we should ban powerful open source LLMs. None of them provide good evidence for the paper’s conclusion … In all, citations in the paper provide an illusion of evidence (“look at all these citations”) rather than actual evidence (“these experiments are how we know open source LLMs are dangerous and could contribute to biorisk”). A recent further paper on this topic (published after I had started writing this review) continues this pattern of being more advocacy than science.

LessWrong, “Propaganda or science: A look at open source AI and bioterrorism risk“.

1. Introduction

This is Part 15 of my series Exaggerating the risks. In this series, I look at some places where leading estimates of existential risk look to have been exaggerated.

Part 1 introduced the series. Parts 2-5 (sub-series: “Climate risk”) looked at climate risk. Parts 6-8 (sub-series: “AI risk”) looked at the Carlsmith report on power-seeking AI.

Parts 910 and 11 began a new sub-series on biorisk. In Part 9, we saw that many leading effective altruists give estimates between 1.0-3.3% for the risk of existential catastrophe from biological causes by 2100. I think these estimates are a bit too high.

Because I have had a hard time getting effective altruists to tell me directly what the threat is supposed to be, my approach was to first survey the reasons why many biosecurity experts, public health experts, and policymakers are skeptical of high levels of near-term existential biorisk. Parts 910 and 11 gave a dozen preliminary reasons for doubt, surveyed at the end of Part 11.

The second half of my approach is to show that initial arguments by effective altruists do not overcome the case for skepticism. Part 12 examined a series of risk estimates by Piers Millett and Andrew Snyder-Beattie. Part 13 looked at Ord’s arguments in The precipice. Part 14 looked at MacAskill’s arguments in What we owe the future.

Today’s post begins a two-part investigation of biorisk from large language models (LLMs). Although LLMs are often cited in support of high risk estimates, I will find that there is not, at present, substantial support for the link between LLMs and extreme biorisk.

2. Biorisk from LLMs

Effective altruists have often claimed that LLMs introduce high levels of catastrophic or even existential biorisk.

The past few months have not been kind to such claims. Last November, a widely publicized LessWrong investigation tracked all of the evidence for biorisk from LLMs in a recent paper by the Center for the Governance of AI (GovAI) and a similar paper by Kevin Esvelt and colleagues. This investigation concluded:

I examined all the biorisk-relevant citations from a policy paper arguing that we should ban powerful open source LLMs. None of them provide good evidence for the paper’s conclusion … In all, citations in the paper provide an illusion of evidence (“look at all these citations”) rather than actual evidence.

The investigation went on to suggest that current allegations of biorisk from LLMs follow a “pattern of being more advocacy than science,” and to question the epistemic integrity of Open Philanthropy in funding papers of this nature.

Could a policy experiment vindicate biorisk from LLMs? On this front, there appears to be more bad news. This January, a red-teaming study by the RAND Corporation found “no statistically significant difference in the viability of [biological weapon attack] plans generated with or without LLM assistance.” This is especially concerning given that RAND is hardly biased against effective altruists: RAND Corporation took at least $16 million from Open Philanthropy this year (see here and here), and is led by an effective altruist CEO.

In this post, I will focus on the LessWrong investigation of the GovAI paper and the followup by Esvelt and colleagues. In the next post, I will focus on the RAND red-teaming study and also draw lessons from this discussion.

I claim very little originality for this discussion. Many of my remarks are near summaries of the LessWrong investigation, which is excellent. If readers wanted to read that investigation and skip this post, I would not blame them. But I will try to add some detail and new arguments when needed.

3. Why care?

Why might we care if arguments for biorisk from LLMs fail? There are at least three reasons to care.

First, the failure of these arguments is cause for reducing estimates of existential biorisk and AI risk. This is important, because biorisk and AI risk are widely estimated by effective altruists to be the leading causes of existential risk in this century. Reductions in existential biorisk and AI risk estimates should therefore exert significant drag on overall risk estimates.

Second, this discussion will reinforce the relevance of reasons for skepticism about existential biorisk developed in Parts 910 and 11 of this series. We will see that many of the reasons for skepticism given in this series are among the key places in which arguments for biorisk from LLMs founder, so that it is important to make sure that future arguments address these and other reasons for skepticism.

Finally, this discussion will reinforce epistemic lessons from my series on epistemics. In particular, it will reinforce the suggestions that peer review may be an important antidote to exaggeration by unpublished reports, and that allowing a single foundation to fund a majority of reports in one area may be an epistemically distorting factor. It may also, as the LessWrong investigation suggests, reinforce Brandolini’s Law: it takes far less effort to produce than to refute nonsense, so that the demand for detailed refutations of every last risk claim grows less warranted as increasing fractions of previous claims turn out to be error-prone.

4. Open-sourcing highly capable foundation models

The GovAI paper, “Open-sourcing highly capable foundation models” argues that “highly capable models have the potential for extreme risks”, and that open-sourcing would unacceptably magnify these risks.

Of particular interest is their discussion of malicious use. Although the GovAI report is widely cited as an example of the case for concern about biorisk from LLMs, it devotes a scant two paragraphs to biorisk. Here is what they say about the use of LLMs in biological and chemical weapons development:

Current foundation models have shown nascent capabilities in aiding and automating scientific research, especially when augmented with external specialized tools and databases. Foundation models may therefore reduce the human expertise required to carry out dual-use scientific research, such as gain-of-function research in virology, or the synthesis of dangerous chemical compounds or biological pathogens. For example, pre-release model evaluation of GPT-4 showed that the model could re-engineer known harmful biochemical compounds, and red-teaming on Anthropic’s Claude 2 identified significant potential for biosecurity risks.

Specialized AI tools used within these domains can also be easily modified for the purpose of designing potent novel toxins. Integrating narrow tools with a foundation model could increase risk further: During pre-deployment evaluation of GPT-4, a red-teamer was able to use the language model to generate the chemical formula for a novel, unpatented molecule and order it to the red-teamer’s house. Law-makers in the United States are beginning to take this biosecurity threat seriously, with bipartisan legislation – the Artificial Intelligence and Biosecurity Risk Assessment Act – being proposed that would monitor and study the potential threats of generative and open-source AI models being used “intentionally or unintentionally to develop novel pathogens, viruses, bioweapons, or chemical weapons.

That’s a good succinct statement of the view. But what’s the argument? We aren’t given much in the way of novel evidence by the GovAI report: instead, readers are given a number of citations to other sources. To determine the strength of the argument being made, we should therefore examine what support the sources provide for the claims of the GovAI report.

The LessWrong investigation looks at all citations offered in this passage, breaking them into three groups: background material, material from Anthropic and OpenAI, and scientific papers. This post will follow the structure of the LessWrong investigation.

5. Background material

The first group of citations provides general material about LLM and non-LLM capacities, but does not speak specifically to the dangers of open-source LLMs.

For example, the GovAI paper cites two investigations into the scientific research capacity of AutoGPT-like agents (Bran et al. 2023, Boiko et al. 2023). As the LessWrong investigation correctly concludes, there isn’t much here to worry about:

Both papers raise concerns about LLMs making it easier to synthesize substances such as THC, meth, or explosives – but they mostly just aren’t about policy. Nor do they contain much specific reasoning about the danger that open source LLMs create over and above each of the other individual subcomponents of an AutoGPT-like-system.

Likewise, the GovAI report cites a statement released after a meeting convened by Helena Biosecurity. However, this statement is even thinner on evidence than the GovAI paper, so it does not add much in the way of independent corroboration.

6. Anthropic/OpenAI material

The largest group of non-background evidence cited by the GovAI report involves citation to materials from OpenAI and Anthropic.

The claim that instructions for producing “known harmful biological compounds” could be given by GPT-4 is cited to the GPT-4 system card. However, the LessWrong investigation finds only one mention of engineering biological compounds in the system card, namely the claim that GPT-4 “re-engineered some biochemical compounds that were publicly available online”. That isn’t strong evidence for the relevant risk claim: that GPT-4 may re-engineer compounds not publicly available online.

LIkewise, the GovAI report claims that “a red-teamer was able to use the language model to generate the chemical formula for a novel, unpatented molecule and order it to the red-teamer’s house”. However, the LessWrong analysis suggests that most of the work was done by tools such as WebChem queries and chemical synthesis planners available without the help of GPT-4.

The GovAI report also cites a blog post by Anthropic resulting from a red-teaming study in collaboration with Gryphon Scientific. This is encouraging, because the folks at Gryphon are serious people: for example, I favorably discussed a report by Gryphon Scientific on risks of gain of function research in Part 12 of this series. We saw that this report was detailed and credible, but not supportive of high estimates of existential biorisk.

The GovAI report cites this blog post’s claim that “red-teaming on Anthropic’s Claude 2 identified significant potential for biosecurity risk”. We will see next time that a RAND red-teaming study found precisely the opposite, but for now let us focus on the Anthropic claim. The Anthropic blog post certainly says that “unmitigated LLMs could accelerate a bad actor’s efforts to misuse biology relative to solely having internet access”, but it doesn’t provide much evidence for that claim. They don’t tell the reader anything substantive about the contents of the red-teaming exercise, and crucially, they don’t say how the exercise was taken to ground “significant potential for biosecurity risk”. Here is, for the sake of completeness, everything that the blog post says about its findings:

Over the past six months, we spent more than 150 hours with top biosecurity experts red teaming and evaluating our model’s ability to output harmful biological information, such as designing and acquiring biological weapons. These experts learned to converse with, jailbreak, and assess our model. We developed quantitative evaluations of model capabilities. The experts used a bespoke, secure interface to our model without the trust and safety monitoring and enforcement tools that are active on our public deployments.

We discovered a few key concerns. The first is that current frontier models can sometimes produce sophisticated, accurate, useful, and detailed knowledge at an expert level. In most areas we studied, this does not happen frequently. In other areas, it does. However, we found indications that the models are more capable as they get larger. We also think that models gaining access to tools could advance their capabilities in biology. Taken together, we think that unmitigated LLMs could accelerate a bad actor’s efforts to misuse biology relative to solely having internet access, and enable them to accomplish tasks they could not without an LLM. These two effects are likely small today, but growing relatively fast. If unmitigated, we worry that these kinds of risks are near-term, meaning that they may be actualized in the next two to three years, rather than five or more.

However, the process of researching these risks also enables the discovery and implementation of mitigations for them. We found, for example, that straightforward changes in the training process meaningfully reduce harmful outputs by enabling the model to better distinguish between harmful and harmless uses of biology (see, for example, our work on Constitutional AI). We also found that classifier-based filters can make it harder for a bad actor to get the kind of multiple, chained-together, and expert-level pieces of information needed to do harm. These are now deployed in our public-facing frontier model, and we’ve identified a list of mitigations at every step of the model development and deployment pathway that we will continue to experiment with.

What is the argument here? The post reports that “current frontier models can sometimes produce sophisticated, accurate, useful, and detailed knowledge at an expert level” although “in most areas we studied, this does not happen frequently.” That’s certainly something to worry about, but a good distance from anything that might lead to concern about existential risk.

The post next reports that “gaining access to tools could advance [model] capabilities in biology.” This isn’t surprising: tools advance model capacities in most areas.

The post immediately concludes that “Taken together, we think that unmitigated LLMs could accelerate a bad actor’s efforts to misuse biology relative to solely having internet access, and enable them to accomplish tasks they could not without an LLM.” If the strength of this claim is left unspecified, it is certainly plausible: no doubt unmitigated LLMs provide some benefit to malicious actors relative to solely having internet access. But how large is the effect? The post says directly: “These two effects are likely small today, but growing relatively fast. If unmitigated, we worry that these kinds of risks are near-term, meaning that they may be actualized in the next two to three years, rather than five or more.”

The bolded statement is the only statement that the blog post is in a position to make based on red-teaming evidence. This statement suggests that risks are small today. The post certainly goes on to say that risks may grow rapidly in the future, but it is not clear how the results of the red-teaming study support that claim. It is more likely that this claim should be regarded as a statement of opinion, so that citations to this claim do not add much in the way of evidential value.

The last citation is to testimony by Anthropic CEO Dario Amodei before a US Senate subcommittee. Apparently summarizing the Anthropic blogpost, Amodei admits that current systems pose limited biorisks, but asserts largely without argument that near-future systems will pose significantly more risk:

We found that today’s AI systems can fill in some of these steps – but incompletely and unreliably. They are showing the first, nascent signs of risk. However, a straightforward extrapolation of today’s systems to those we expect to see in two to three years suggests a substantial risk that AI systems will be able to fill in all the missing pieces, if appropriate guardrails and mitigations are not put in place. This could greatly widen the range of actors with the technical capability to conduct a large-scale biological attack.

This citation adds little on top of the Anthropic blogpost on which Amodei’s testimony appears to be based. What is at issue is not whether staff at Anthropic believe that future models may pose significant levels of catastrophic or existential biorisk, but rather what evidence they have to support this claim. That evidence is not provided in this passage, or in the blogpost, both of which suggest that the view may be based more on opinion than on new evidence.

7. Science

The GovAI report does cite two short research papers to support its claims about biorisk from LLMs. This is a positive development. However, it is not clear whether these papers provide adequate support for the GovAI report’s claims.

The first cited paper is by Jonas Sandbrink, an Oxford PhD student and former researcher at FHI. The paper, “Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools gives four reasons to be concerned about biorisk from LLMs.

1. Teaching about dual-use topics: Sandbrink claims that LLMs can:

Synthesise knowledge across many different sources, make complex information accessible and tailored to non-experts, and can proactively point out variables that the user did not know to inquire about … For instance, one hypothesized factor for the failed bioweapons efforts of … Aum Shinrikyo is that its lead scientist Seiichi Endo, a PhD virologist, failed to appreciate the difference between the bacterium Costridium botulinum and the deadly botulinum toxin it produces … ChatGPT readily outlines the importance of “harvesting and separation” of toxin-containing supernatant from cells and further steps for concentration, purification, and formulation.

That is fair enough. But two points deserve note. First and most importantly, it is a far cry from the suggestion we have encountered in previous arguments that technology will allow rogue groups to engineer unprecedentedly contagious and lethal pathogens, releasing them in many places at once and causing an existentially catastrophic pandemic. The suggestion is only that a PhD virologist might have their nefarious efforts improved at the margins by consulting ChatGPT. Second, the report does not evaluate the comparative availability of this information on the internet. There is no attempt to show that the same information would not be available through other means, particularly, in the chosen example, to a scientist with a doctorate in virology.

2. Identifying specific avenues to biological misuse: This paragraph of Sandbrink’s paper essentially repeats the paper on dual use biotechnology to be discussed later in this section, so I will not discuss it further.

3. Step-by-step instructions and trouble-shooting experiments: One challenge stressed in Part 11 of this series is that bioweapons programs require a tremendous amount of experience and tacit expertise that is difficult to acquire outside of a specialized institutional context. Sandbrink stresses that LLMs may help to convey some of this tacit information:

”Weak” tacit knowledge, such as tweaks to lab protocols that are not well-documented but can be put into words, and “communal” tacit knowledge, which emerges from the confluence of many [types of] knowledge in different areas of expertise, have the potential to be lowered by AI lab assistants that have been optimised to provide tailored laboratory instructions and can draw on knowledge across many different disciplines.

The point is well taken. It is certainly possible that LLMs could go some way towards addressing tacit knowledge barriers, though even Sandbrink thinks there are important limitations here.

4. Autonomous science capability: Finally, Sandbrink suggests that “in the longer term”, LLMs could be used to instruct laboratory robots and serve as a basis for autonomous science agents which could carry out bioweapons research and manufacture pathogens. It’s hard to know what to make of this claim without a well-defined scope for the kinds of capacities that Sandbrink thinks LLMs might develop, an argument that these capacities could generate novel existential risks, and an argument that these capacities are likely to be developed. So far we have not been given much to go on.

That is the extent of Sandbrink’s argument for existential biorisk from LLMs. The second paper cited by the GovAI report goes a bit further: it conducts an experiment to examine the biorisks posed by LLMs. In their paper, “Can large language models democratize access to dual-use biotechnology?” Emily Solce and colleagues report the results of a classroom exercise in which students asked LLMs to (a) identify pandemic-capable viruses, and advice on (b) planning attacks and (c) acquiring materials.

LLMs certainly provided some information on these topics, though we are given no evidence that this information could not be acquired through other means, and it is highly implausible that anything like the information provided could pose an existential threat. For example, when asking LLMs to (a) identify potential pandemic pathogens, students were given four suggestions: 1918 H1N1 influenza, enhanced-transmission H5N1 influenza, the variola major virus (smallpox), and a strain of the Nipah virus. All of this information is readily available online, and none of these pathogens could plausibly pose an existential threat.

On (b), LLMs described reverse genetics protocols for influenza and Nipah virus and linked to papers describing these protocols. But again, this information was readily available on the internet – the models linked to published scholarly papers. Moreover, neither influenza nor Nipah virus is likely to pose an existential threat.

On (c), LLMs noted that reagents and devices could be purchased from leading suppliers. That’s hardly surprising. LLMs also suggested some means for evading screening of DNA sequences ordered from outside companies. That is, perhaps, a bit concerning, but if this is the worst that can be found we are well short of a strong argument for high levels of existential biorisk.

That’s it for the scientific papers cited by the GovAI report. That paper cites only the two scientific papers discussed above to support its claims about biorisk from LLMs. We have seen that the papers make a few good points, and one even conducts a rudimentary classroom study. This is certainly a move in the right direction, but hardly enough to ground high estimates of existential biorisk.

8. Taking stock

Today’s post looked at a GovAI report widely cited as a good example of the case for concern about biorisk from LLMs. We saw that the report itself provides a good outline of the case for concern, but very little in the way of new evidence. But perhaps the sources cited in this report provide adequate evidence for the report’s claims?

Following the structure of a LessWrong investigation, this post looked at every citation in the GovAI report’s discussion of biorisk from LLMs. We saw that these citations break into three categories: background materials, which are not primarily argumentative; OpenAI and Anthropic materials which make strong claims but suffer from evidential deficits similar to the GovAI report; and two scientific papers which, though they raise a few good points, hardly provide enough evidence to support the GovAI report’s main claims.

We saw at the beginning of this report that a recent RAND red-teaming study found “no statistically significant difference in the viability of [biological weapon attack] plans generated with or without LLM assistance”, contrary to GovAI’s claims. The next post in this series will take an in-depth look at the RAND study.


Posted

in

,

by

Comments

Leave a Reply

Discover more from Reflective altruism

Subscribe now to keep reading and get access to the full archive.

Continue reading