grayscale photo of boats on water

Epistemics: (Part 4: The fall of cost-effectiveness analysis)

Some people seem to think that our procedure for approving grants is roughly “YOLO #sendit.”

Nick Beckstead, “Some clarifications on the Future Fund’s approach to grantmaking
Listen to this post

1. Recap

This is Part 4 in my series on epistemics: practices that shape knowledge, belief and opinion within a community. In this series, I focus on areas where community epistemics could be productively improved.

Part 1 introduced the series and briefly discussed the role of funding, publication practices, expertise and deference within the effective altruist ecosystem.

Part 2 discussed the role of examples within discourse by effective altruists, focusing on the cases of Aum Shinrikyo and the Biological Weapons Convention.

Part 3 looked at the role of peer review within the effective altruism movement.

The next two posts in this series will discuss the role of cost-effectiveness analysis in guiding decisionmaking. Today’s post (Part 4) chronicles the declining importance of cost-effectiveness analysis in many corners of the effective altruism movement. The next post in this series (Part 5) will discuss the importance of cost-effectiveness analysis.

2. Early emphasis on cost-effectiveness analysis

Early effective altruists stressed the importance of conducting rigorous cost-effectiveness analyses to guide giving. Because charities differ dramatically in their cost-effectiveness, and because cost-effectiveness is not always apparent from a surface reading, it is important to use detailed cost-effectiveness analyses to determine which charities are in fact most cost-effective.

Early effective altruists did not merely support the use of cost-effectiveness analysis. They were instrumental in the popularization of rigorous cost-effectiveness analysis, and published some of the best available cost-effectiveness analyses. In The life you can save (second edition), Peter Singer writes:

One of the biggest changes in philanthropy and the international development community that has taken place since I first wrote this book is the increased focus of independent organizations on measuring the impact of particular interventions to help people in extreme poverty, and in assessing the effectiveness of the organizations providing the most successful interventions. GiveWell … was the pioneer here, setting new standards for rigorous evaluation of the work of charities … GiveWell’s strict standards mean that if you go to their website and select one of their top-ranked charities, you can be confident that people in extreme poverty will benefit from your donation, and benefit in a manner that is highly cost-effective.

Effective altruists were so wedded to rigorous cost-effectiveness analysis that many early criticisms of effective altruism suggested the group had, if anything, gone too far in its singleminded pursuit of rigorous cost-effectiveness analysis. Here is Brian Berkey, summarizing one theme from these early critiques:

It is sometimes suggested that effective altruists’ emphasis on relying on evidence about effectiveness and probability of success leads to a bias in favour of efforts the effects of which are easily measurable and quantifiable. To some extent, effective altruists invite this charge, since a significant proportion of their public outreach efforts highlight, often exclusively, their support for organizations whose programmes have been tested for effectiveness in randomized control trials (RCTs), and performed well. … Effective altruists’ efforts here have, however, contributed to the impression that they believe that nearly all of our efforts to, for example, combat global poverty should run through organizations whose programmes have been shown by RCTs to produce significant results.

While effective altruists were not, perhaps, quite so fanatical in their adherence to cost-effectiveness analysis as critics suggested, they did retain substantial allegiance to the practice.

Many short-termist organizations continue to practice rigorous cost-effectiveness analysis, viewing it as an essential guide to effective giving. For example, GiveWell’s website states directly:

Our cost-effectiveness analyses are the single most important input into our charity recommendations. We view cost-effectiveness analyses as valuable for helping us identify large differences in the cost-effectiveness of grants we’re considering for funding and to encourage staff to think through relevant issues related to charities’ work.

This is admirable. However, the growth of the effective altruism movement and the spread of longtermism have led, in some circles, to a dramatic about-face in the emphasis on cost-effectiveness analysis. Let’s look at how cost-effectiveness analysis has fared in other corners of the effective altruism movement.

3. Longtermism and cost-effectiveness analysis

In longtermist circles, cost-effectiveness analysis quickly lost its vaunted place. One reason for this is that conducting reliable cost-effectiveness analyses for longtermist projects is much harder than assessing the cost-effectiveness of short-termist projects.

Another reason for this shift is that many longtermists simply thought there was little need for cost-effectiveness analysis. Because they held that anything at all which could be done to improve the long-term future was, in expectation, orders of magnitude better than competing projects, they thought that almost any reasonable longtermist expenditure was worth funding, if only funds were available. With available funding in the tens of billions of dollars, the funds were plentiful and they were spent extravagantly. As one EA Forum reader commented last year:

I have noticed people (especially Bay Area longtermists) acting like almost anything that saves time or is at all connected to longtermism is a good use of money. As a result, money gets wasted because cheaper ways of creating the same impact are missed. For example, one time an EA offered to pay $140 of EA money (I think) for me for two long Uber rides so that we could meet up, since there wasn’t a fast public transport link. The conversation turned out to be a 30-minute data-gathering task with set questions that worked fine when we did it on Zoom instead.

Another user said much the same, again placing the blame for wasteful spending on a reduced tendency to emphasize cost-effectiveness within the movement.

I was at an EA party this year where there was definitely an overspend of hundreds of pounds of EA money on food which was mostly wasted. As someone who was there, at the time, this was very clearly avoidable. … I think this happened because the flow of money into EA has made the obligations to optimise cost-efficiency and to think counterfactually seem a lot weaker to many EAs.

Part 6 of my series on billionaire philanthropy spoke at length about some other ways in which easy money drove wasteful spending. But I would like to now emphasize something I forgot to emphasize before: the message that cost-effectiveness had assumed a dramatically reduced importance was communicated even by leading effective altruists. For example, here is what Will MacAskill had to say about cost-effectiveness last year:

Frugality is now comparatively less valuable, and saving time and boosting productivity in order to make more progress on the most pressing problems is comparatively more valuable. Creating projects that are maximally cost-effective is now comparatively less valuable; creating projects that are highly scalable with respect to funding, and can thereby create greater total impact even at lower cost-effectiveness, is comparatively more valuable.

Certainly, not everyone expressed this view. One reply to MacAskill has it that:

I think this framing is wrong, or at best unhelpful because we shouldn’t avoid prioritizing cost-effectiveness. When you stop prioritizing cost-effectiveness, it stops being effective altruism. Resources are still finite. The effectiveness of solutions to dire problems still differs dramatically. And we have only scratched the surface of understanding which solutions are gold and which are duds. I think it’s cost-effectiveness all the way down.

But this was by no means a universal view, or perhaps even the majority view.

In any case, we should not rest content with anecdotes. Let’s look at a concrete place where there is good evidence for a diminished role of cost-effectiveness analysis within effective altruism, and where longtermists look to be among the chief culprits.

4. Case study: the FTX Future Fund

The FTX Future Fund, led by leading effective altruists such as Nick Beckstead and Will MacAskill, pledged over a hundred million dollars in grants, almost exclusively to longtermist causes.

Ordinarily, an organization handing out that kind of money proceeds carefully, selecting the most deserving projects on the basis of rigorous vetting, designed to ensure that only the best projects are funded. The process at the FTX Future Fund appears to have been rather less rigorous than this.

(Some readers may complain that the FTX Future Fund is a special case, quite unrepresentative of the movement at large. While I agree that there are some ways in which the Future Fund was unusual, leadership and a substantial share of decisionmaking power within the Future Fund rested with leading effective altruists who exerted, and continue to exert sizable influence over the direction of funding within the effective altruism movement. The fraud perpetrated by FTX was, I hope, exceptional, but that doesn’t get the Future Fund off the hook for explaining its decisionmaking, or those involved off the hook for explaining their actions there.).

A number of commentators have suggested that the fast-paced spending at the Future Fund fell rather short of careful vetting. One asked how the fund could possibly make well-informed decisions with a small staff, promising a review turnaround time of just two weeks, substantially shorter than that at almost all foundations offering grants of this scale.

My two cents about why people may be concerned about the decision-making process without having concrete details: For instance, the initially advertised decision timeline of 2 weeks. While I appreciate the fast pace and the benefits that come with it, a complex system of review and decision-making is almost impossible to achieve at that timeline, especially given the interest in the program.

Another asked what kind of calculations were being made to evaluate the cost-effectiveness of grants. Surely nothing approaching a GiveWell-style cost-effectiveness analysis was being done, but was any form of modeling being done? At the minimum, perhaps grants were evaluated through a quick back-of-the-envelope calculation (BOTEC)? This commentator wrote:

Do you (FTX grantmakers) do or reference a BOTEC for each grant? Would you publish BOTECs you make or comments on the BOTECs you reference? Without this, it seems like EAs would often need to guess and reconstruct your reasoning or make their own models in order to critique a grant, which is much more costly for individuals to do, is much less likely to happen at all, and risks strawmanning or low quality critiques. I think this also gets at the heart of two concerns with “free-spending” EA: we don’t know what impact the EA community is buying with some of our spending, and we don’t have clear arguments to point to to justify particular possibly suspicious expenses to others.

Nick Beckstead clarified that only “a minority of decisions” were made even on the basis of a back-of-the-envelope calculation.

What, precisely, was the decisionmaking process at the Future Fund? To a large extent, we may never know. But here’s what Nick Beckstead said about the process (keep in mind that Beckstead was writing here in public defense of the rigor of the process, so this is if anything a charitable description):

Some people seem to think that our procedure for approving grants is roughly “YOLO #sendit.” This impression isn’t accurate. In reality, before a typical grant goes out it is:

  • Recommended by a staff member or regrantor,
  • Screened for red flags by a staff member, and then when needed reviewed (in specialized Slack channels created for this purpose) to address legal risks, public communications risks, interference with the work of other EA grantmakers, community health risks, and other potential harms, (we usually find a way to make good grants, but this process often improves them and reduces their risks)
  • (If relevant) Reviewed by technical expert(s),
  • Endorsed by another staff member, and
  • Independently reviewed for final signoff

On this basis, it is hard to be confident in the rigor of the process. The first, second and fifth bullet points describe basic preliminary and final steps that are used by nearly any reputable granting organization at any level. The third and fourth bullet points describe the substance of the grant review process.

In these bullet points, we are told that the grant is (sometimes!) “reviewed by technical expert(s)” and then “endorsed by another staff member”. That is certainly good to hear. Sizable grants made by large grantmakers should be, and almost always are, reviewed by experts (Step 3). And those reviews are then always used by grant administrators to make a funding recommendation (Step 4). What we would like to hear is this: exactly what kind of review (Step 3) and subsequent decisionmaking on the basis of reviews was being done?

Most granting organizations publish extensive details about their review and evaluation processes. These include the composition or selection of review committees (yes, committees); the criteria used to score applications; and the process used to make decisions based on scores. Each process is typically extensive, conducted by expert evaluators trained in the specific organization’s grantmaking process.

To my knowledge, nothing like this has ever been published by the Future Fund. I am somewhat pessimistic that the process, if publicly described, would approach minimum international standards. The process would be highly unlikely to exceed international standards, contrasting sharply with effective altruists’ early emphasis on methodological rigor.

Note also that all of the above applies to grants submitted directly to the Future Fund. In many cases, grants were made through the FTX Regrantor Program, which gave selected individuals six-to-seven-figure pots of money to regrant with comparatively less supervision and procedural oversight. This was not a small program. Beckstead wrote in June 2022 that at least a hundred million dollars had been set aside for the regranting program, and thirty-one million dollars allocated by June 2022.

Here is Beckstead’s description of the structure of the regranting program:

The basic structure is that the regrantors have been given discretionary budgets ranging from $100k to a few million dollars. (A larger number towards the lower end, a smaller number towards the higher end—there is a wide variation in budget sizes.) Regrantors submit grant recommendations to the Future Fund, which we screen primarily for downsides, conflicts of interests, and community effects. We typically review and approve regranting submissions within 72 hours. Grant recommenders have access to a streamlined grant recommendation form from us where we give them some deference, but they don’t have a discretionary budget. (We wanted to try out multiple versions, and in part randomized participation in the different programs.) We compensate regrantors for the volume and quality of their grantmaking, including an element based on whether we fund the projects they seeded ourselves in the future. We also unlock additional discretionary funding when we’re ready to see more of what they’ve been doing.

What, precisely, have we been told about the process? Precious little. If this description is accurate, authority to grant over a hundred million dollars was delegated primarily to regrantors, whose recommendations would be screened within three days, checking “primarily for downsides, conflicts of interest, and community effects.” We are told that some form of retroactive evaluation would be performed, with regrantors rewarded for the quality and quantity of their grants. What kind of evaluation was made? Again, we are not told.

It is almost impossible to emphasize enough how strongly such a process differs from the slow, rigorous, data-driven, quantitative, model-driven and often RCT-backed cost-effectiveness analysis championed by early effective altruists. The apple has fallen very far from the tree.

De-emphasizing rigorous cost-effectiveness analysis has benefits, such as speed and flexibility of grantmaking. But it also has costs. I’ll discuss those costs in the next post in this serious. (My apologies. I had initially intended to discuss those costs today, but I think I’ve gone on for too long already).

5. Conclusion

Today’s post discussed the declining role of cost-effectiveness analysis within the effective altruism movement. We saw that early effective altruists were so strongly committed to rigorous cost-effectiveness analysis that they not only insisted on using cost-effectiveness analyses to guide their own decisionmaking, but also pioneered the spread of cost-effectiveness analysis throughout many areas of philanthropic giving.

However, we also saw that the passage of time and the rise of longtermism brought a declining role for cost-effectiveness analysis within the effective altruism movement. We saw in general terms how the movement became, in the words of its own adherents, less concerned with cost-effectiveness and more free-spending. And we saw how the declining emphasis on cost-effectieness analysis was echoed even by leaders of the movement in their public statements.

We also looked at a concrete instance, the FTX Future Fund, in which the declining role of cost-effectiveness analysis can be clearly seen. We saw that the Future Fund donated, by many accounts, over a hundred million dollars on the basis of vetting procedures that not merely failed to be field-leading, but quite plausibly fell beneath minimal international standards. We also saw that the Future Fund set aside at least a hundred million dollars for its regranting program, which incorporated even less systematic vetting than the standard granting calls at the Future Fund did.

The decline of cost-effectiveness analysis is a notable change within the effective altruism movement, and it is worth asking whether this change was for good or for ill. The next post in this series will highlight some costs associated with the declining role of cost-effectiveness analysis within the effective altruism movement.

Comments

7 responses to “Epistemics: (Part 4: The fall of cost-effectiveness analysis)”

  1. Nuño Sempere Avatar

    Great post.

    I think that the Future Fund people knew what you are saying, and while moving fast and breaking things, they also invested a bit on the Quantified Uncertainty Research Institute’s work, and did their own work to come up with quantifications for high-level stuff, which could later have become more granular.

    An alternative would have been to stand still while coming up with methods to estimate the value of speculative longtermist interventions. I can see how one would choose not to do that.

    1. David Thorstad Avatar

      Thanks Nuño!

      I certainly agree that the Future Fund made some efforts to estimate the cost-effectiveness of longtermist interventions. That’s always good to see.

      There are, of course, reasons in favor of acting only on the basis of rigorous cost-effectiveness analysis, and also reasons in favor of taking the Silicon Valley approach of moving fast and breaking things. The next post in this series will cover some reasons favoring the slower approach.

      1. Nuño Sempere Avatar

        Looking forward to the next post, then

  2. Jason Avatar
    Jason

    I strongly agree with this post in a directional sense. I’m dismayed by a significant fraction of longtermist spend and the shaky reasoning employed in many cases.

    I would note, however, that classic-EA style cost-effectiveness analysis fits unusually well into EA’s role in global health & wellbeing. Outcomes and cause/effect relationships are much easier to measure for interventions like bednets. Because EA has never been more than a few percent of funding in that area, global health people have not needed to worry that much about positive effects/outcomes that don’t integrate into a spreadsheet well. It’s fair to assume that a truly excellent intervention would very likely be picked up by a non-EA funder employing different criteria.

    Farmed animal welfare is a middle ground. Analyzing cost-effectiveness is still valuable, but harder. The interactions between EA and non-EA efforts are generally more complex, and net-harmful interventions are more likely. Moreover, EA provides a large fraction of the funding, so relying too heavily on quantifiable outcomes risks warping the entire field in ways that are less likely in global health. In the end, I feel moderate confidence that one can develop algorithms and heuristics to account for these issues. 

    Cost-effectiveness analysis for catastrophic and/or existential risks seems really murky. Even if one could be confident in the magnitude of a specific risk, you’d still need confidence in how much intervention could reduce that risk, and how effective the specific intervention on the table would be at reducing the risk. Because of the low numbers involved, it’s not implausible to me that the confidence interval for each of those three steps could span roughly an order of magnitude . . . and the resulting error bars stack. And then there are all the assumptions that go into valuing avoidance of the existential risk.

    It’s going to be messy at best; I could see reasonable evaluators coming up with conclusions that are (say) five orders of magnitude apart from each other. The whole thing feels not really fit-for-purpose: a bit like trying to do nanogram-scale chemistry experiments with a balance that is only calibrated to the nearest centigram. The bulk of the variance in evaluations would depend on evaluator effects, not the true value of each proposal. One would either need a way to massively reduce this variance, and/or conclude that the range of plausible inputs was so narrow that almost any set of plausible inputs would achieve the same result. My crystal ball is cloudy when it comes to the medium to long-run, so I am skeptical that either of these options will work well.

    With that in mind, I’d suggest that there is likely no evaluative framework that will do a good job in all possible cause areas and all intervention types. To go back to animal welfare for a moment, classic EA cost-effectiveness analysis is going to struggle with valuing certain  “revolutionary” approaches. It is also likely to struggle with existential-risk reduction. As in law, certain “procedural” decisions can foreordain — or at least foreclose — certain substantive outcomes. 

    For example, deciding that we’re going to treat potential risks as functionally zero in the absence of rigorous evidence, and potential interventions as having zero efficacy in the absence of rigorous evidence, seems awfully close to a substantive decision about cause prioritization. Same with an approach that would allow something we are 99.999% sure is false to dominate the analysis because the payout is so big.

    So all that is to say that, in my view, part of the “flexibility” of moving away from a heavy focus on the classical EA cost-effectiveness methodology is the ability to meaningfully evaluate a wider range of causes and interventions. I doubt there is any Grand Unified Theory of Effectiveness against which everything can be measured. Again, none of that is to diminish my agreement that longtermist EA has too often gotten poor value for money and has gone too far into the YOLO/spendit model, and view that much of that money should have gone to neartermist interventions instead.

    1. David Thorstad Avatar

      Thanks Jason!

      It sounds like we agree on almost everything here. In particular, I think we both agree that (a) there’s been too much free spending lately, (b) that free-spending has been increasingly the fault of longtermists, and (c) free spending is linked to a decline in cost-effectiveness analysis. I think we also agree that (d) global health and development interventions are reasonably tractable using standard cost-effectiveness analysis, and (e) farmed animal welfare is a middle ground, but (f) catastrophic, and especially existential risk mitigation is very hard to treat using standard forms of cost-effectiveness analysis. Partly as a result, I think we also both agree that (g) it would be nice to see more money going to global health and development, and (h) it would be nice to see spending reined in across the movement.

      I think it might be worth emphasizing two further points. First, even when it is very difficult to construct detailed cost-effectiveness analyses, it is not so difficult to take reasonable measures to curtail wasteful spending. Most of us know that large expenditures should have a specific business justification, and if the same task (say, a thirty-minute data task) can be done without a sizable expenditure (say, a $140 Uber ride), then it is a good idea not to make the expenditure. Most of us also know that it is important to establish rigorous procedures for evaluating projects, so that for example from the fact that it’s hard to evaluate longtermist expenditures, it doesn’t follow that we should be approving six- and seven-figure grants in two weeks on the basis of lightweight review processes. While it might be difficult to say exactly how cost-effective these changes will be, it is not so difficult to ground a reasonable degree of confidence in the claim that they are worthwhile.

      Second, I do think it’s worth asking whether the relative opacity of existential risk mitigation to standard forms of cost-effectiveness analysis might be a symptom of a deeper problem. I’ve spoken elsewhere about a regression to the inscrutable in which effective altruists place increasing confidence in the least evidentially- and scientifically-tractable risks. I’ve suggested that this move succeeds in insulating claims about existential risk from criticism because it is very hard to get an evidentially-grounded take on the matter. But precisely for this reason, I’ve suggested, we might suspect that the views about existential risk defended by effective altruists are likely to be incorrect.

      We might see the opacity of existential risk mitigation to standard forms of cost-effectiveness analysis as another way of emphasizing how inscrutable they are, and in particular as a way of emphasizing the increasing move to ever-more-inscrutable risks and interventions within the effective altruism movement.

  3. Jason Avatar
    Jason

    100% agree on “don’t waste money” and “don’t YOLO big grants with little review in two weeks, unless maybe it is early 2020 and it’s a COVID grant.”

    As far as scrutability, I think we probably reach similar outcomes with perhaps somewhat different logic. I’m a bit hesitant to “penalize” very-difficult-to-scrutinize claims too much merely for that status, because I don’t have a strong prior reason to think the most important issues will be non-very-difficult-to-scrutinize. However, I think an adverse inference is generally appropriate where one would expect people with a reasonably well-founded claim to engage more broadly with other knowledge communities (e.g., to secure workers, funding, public-policy wins, etc.) and there is little to no evidence of such an attempt.

    I also think there generally has to be at least a certain modicum of affirmative evidence to support a claim, or we end up chasing a bunch of science fiction merely because it is impossible to definitively disprove. One possible exception: When someone is proposing a major change to the status quo, they bear the initial, affirmative burden to clearly establish an acceptable level of risk. Only once they have made a prima facie case that withstands scrutiny should the burden shift to the detractors to produce some affirmative evidence. For example, as relevant to AI risk, I want to see better affirmative evidence of the AI labs’ ability to understand and control this year’s models before I would vote to allow them to produce next year’s model. One doesn’t need to think AI doom is even remotely possible to conclude AI poses significant non-existential risk that warrant regulating and studying it like airplanes and nuclear power plants, not like toasters.

    1. David Thorstad Avatar

      Thanks Jason!

      We do agree on a lot indeed.

      I want to emphasize something that I think your last comment gets exactly right. *Everyone* should be concerned about the risks posed by feasible near-term AI systems. We already know about, and study any number of risks posed by AI systems: they can be used as weapons, and for surveillance; they significantly alter labor markets, with harms often falling on the poorest among us; they feed addiction and decrease attention; they can fuel online extremism and polarization; and so on.

      Almost all scholars working on AI ethics and related areas are concerned about these and other risks posed by AI systems, and have been writing about them for some time.

      I hope this can give us enough common ground to begin to have shared and productive conversations about the nature of the threat(s) posed by AI systems, and what should be done about them.

Leave a Reply

%d bloggers like this: