books with folded pages

Epistemics: (Part 3: Peer review)

EA shows a pattern of prioritising non-peer-reviewed publications – often shallow-dive blogposts – by prominent EAs with little to no relevant expertise … This is a worrying tendency, given that these works commonly do not engage with major areas of scholarship on the topics that they focus on, ignore work attempting to answer similar questions, nor consult with relevant experts, and in many instances use methods and/or come to conclusions that would be considered fringe within the relevant fields. … The fact remains that these posts are simply no substitute for rigorous studies subject to peer review (or genuinely equivalent processes) by domain-experts external to the EA community.

ConcernedEAs, “Doing EA Better
Listen to this post

1. Introduction

This is Part 3 in my series on epistemics: practices that shape knowledge, belief and opinion within a community. In this series, I focus on areas where community epistemics could be productively improved.

Part 1 introduced the series and briefly discussed the role of funding, publication practices, expertise and deference within the effective altruist ecosystem.

Part 2 discussed the role of examples within discourse by effective altruists, focusing on the cases of Aum Shinrikyo and the Biological Weapons Convention.

Today, I want to ask how research carried out by effective altruists is evaluated. The gold standard of research evaluation is often considered to be the process of anonymous, pre-publication peer review by expert scholars. While some pieces written by effective altruists are peer-reviewed, a good number are not.

Today’s post will discuss what peer review is, and why it is important. Then I will discuss the limited role of peer review within effective altruism and the epistemic costs of failing to expand the role of peer review.

2. What is peer review?

Most serious academic articles and books are published in the following way. First, manuscripts are submitted to a reputable independent scholarly publisher. Articles are sent to journals, whose editors and editorial boards are composed of leading scholars in the field. Journals are typically run either by one of a handful of experienced publishing companies, or else are published `in-house’ by universities or academic departments. Similarly, book manuscripts are sent to academic presses, typically run by leading universities, overseen by specialist editors, and advised by committees of scholars.

Second, manuscripts are subject to a process of pre-publication peer review. Nonspecialist readers need to be confident that published academic work reflects rigorous scholarship building on field-specific expertise. This is done by reviewing articles prior to publication for the quality of their scholarship.

Third, reviewers are invited from among leading specialists with significant scholarly track records in the subject covered by the manuscript. Typically, reviewers hold a PhD in their field and have a number of relevant publications in high-quality venues. This ensures that reviewers are qualified to assess the quality of submissions.

Fourth, review is anonymous. The minimum standard is `single-anonymous’ peer review: reviewers’ identities are shielded from authors. This allows reviewers to speak their minds without fear of retribution. Increasingly many disciplines have moved to `double-anonymous’ peer review, in which authors’ identities are also shielded from reviewers. This prevents evaluations from being biased by the identity of the author. Many fields practice `triple-anonymous’ review in which authors’ identities are also shielded from editors, ensuring that editorial decisions are also protected from biases related to the identity of the author.

Fifth, typically at least two reviewers submit reports on the manuscript. These reports advise the editors of the venue on the interest and scholarly merit of the manuscript. Reviewers may make three recommendations: that the manuscript be published, that the manuscript be accepted, or that the manuscript be revised and resubmitted. Increasingly, requests for revision are the norm. This allows scholarly publications to be shaped by the knowledge and views of independent experts, rather than reflecting idiosyncratic views of the author or accidents of the paper’s construction.

Sixth, review continues over a number of rounds until editors make a final decision to publish or reject the manuscript. At this point, the review process ends.

3. Benefits of peer review

To outsiders, the process of scholarly peer review can seem impossibly baroque. Yet most scholars are firmly attached to traditional practices of peer review. Why do academics put our work through such an impossibly demanding, cruel, expensive, slow and downright inhumane process of vetting?

The answer is simple: the system works. Fellow academics, as well as members of the public, need to trust that the articles they read are high-quality contributions to cutting-edge scholarly discussions. Articles should build on relevant literature to make a novel contribution to existing discussions. Only a very few readers are in a position to verify that any given article meets this standard, and the rest are loathe to take it on trust that a given article meets this standard. As a result, those readers in a position to verify that articles are up to scratch are recruited as reviewers through a system designed to allow them to express their honest and unfiltered judgments, and to use those judgments to either improve the manuscript or decide its fate.

This much is recognized by nearly all academics. A recent study by Mark Ware found:

  • Peer review is widely supported: The overwhelming majority (93%) disagree that peer review is unnecessary. The large majority (85%) agreed that peer review greatly helps scientific communication and most (83%) believe that without peer review there would be no control.
  • Peer review improves the quality of the published paper. Researchers overwhelmingly (90%) said the main area of effectiveness of peer review was in improving the quality of the published paper. In their own experience as authors, 89% said that peer review had improved their last published paper, both in terms of the language or presentation but also in terms of correcting scientific errors.

What evidence do we have that peer review works? First, let’s consider peer-reviewed grant proposals. This is, in many ways, a difficult form of peer review, since reviewers can see only a proposal, not the completed project, and since review is not double-anonymous. Here, nonetheless, the evidence for peer review’s effectiveness is pretty good.

For example, Li and Agha (2015) consider projects funded by the National Institutes of Health between 1980-2005. They find a consistent positive correlation between peer-review scores and the number of subsequent publications and citations from a research project. Keep in mind that this difference is only among funded proposals: unfunded proposals would likely produce fewer publications and citations than these, had they been funded.

Correlation between NIH review scores and citations (left) or publication (right) from funded research projects, Li and Agha (2015).

In the case of journal publication, referees have more information to go on. Review is still challenging: in many cases, journal acceptance rates are simply so low that most meritorious submissions will be rejected, and many disciplines have yet to adopt double-anonymous reviewing. Nevertheless, results are often quite good.

For example, Card and DellaVigna (2020) look at the fates of papers submitted to four top economics journals, as a function of the recommendations made by their original referees. They find a consistently increasing correlation between the strength of referee recommendations and the number of subsequent citations that papers receive:

Weighted asinh of Google Scholar citations versus referee recommendation, Card and DellaVigna (2020).

Again, referees are far from perfect, but they do often do a decent job at spotting high-quality papers.

The system works. Almost everyone who uses the system likes the system, submits to the system, and most tellingly, chooses to read papers that have made it through the system. Selected papers and funded projects have a higher probability of being cited, and this effect tracks not only the eventual verdict but also the strength of referees’ endorsement. That is exactly what we would like to see.

4. Criticism of peer review

There are, from time to time, murmurs of protest against the current system of pre-publication peer review. The system is slow, cruel, conservative, and at times arbitrary. However, these murmurs of protest need to be taken in context.

The vast majority of scholars, including many of those raising critiques, publish almost exclusively in standard scholarly venues through processes of pre-publication peer review. Indeed, many of the best critiques of peer review underwent such a process. Only much more rarely do scholars read or write in nontraditional formats, and then they do so with the understanding that their writing is not likely to be, and probably should not be treated as a high-quality contribution to a scholarly literature.

For example, I think that the best way for readers to appreciate the importance of peer review would be for them to stop reading this blog and to instead read through the many excellent academic papers written about the system of peer review. My average blog post takes perhaps four hours to write, is written about a subject in which I am not among the foremost academic experts, and goes through no vetting process of any kind. My average paper takes perhaps four months to write, is written about a subject in which I am among the foremost experts, and goes through intensive vetting. That’s a big difference.

This difference is recognized even by the majority of peer review’s critics. For example, a recent study of bias in peer review writes:

Despite concerns about bias, researchers still believe peer review is necessary for the vetting of knowledge claims. One of the most comprehensive surveys of perception of peer review to date found that 93% disagree with the claim that peer review is unnecessary; 85% believe peer review benefits scientific communication; and 83% believe that “without peer review there would be no control” (Ware & Monkman, 2008, p. 1). This suggests that, for researchers, “the most important question with peer review is not whether to abandon it, but how to improve it” (Smith, 2006, p. 180).

Note here that the study centrally draws on the same data from Mark Ware that I used to illustrate the benefits of peer review. This is one of many places where critics and supporters of peer review find substantial common ground.

Almost all scholars recognize that the vast majority of serious scholarship is conducted in peer-reviewed journals and books. Of course, peer review can and should be improved. But most scholars view works published through nontraditional processes with deep suspicion.

5. Peer review in effective altruism

Some effective altruists and aligned academics publish their work through traditional practices of pre-publication peer review. That is a good development, and something I hope to see more of.

Many of the most central writings by effective altruists are produced through a rather different process. Let’s contrast the process by which these writings are produced with the six things said about peer review in Section 2.

First, writings are rarely submitted to reputable, independent scholarly publishers. Most commonly, they are published on blogs, podcasts, and internet fora. These venues have no serious scholarly reputation, giving readers little guarantee that the content they carry is high-quality scholarship. Nor are these venues independent: they are typically dedicated to a single cause or issue, such as effective altruism or AI alignment, and often have a strong ideological bias towards mainstream EA views. This makes it harder for readers to be confident that content is being produced and evaluated on the basis of its independent argumentative merits, rather than its agreement with established views and practices among effective altruists.

Second, review is typically conducted after publication. While comments on forum posts, blogs and podcasts are a useful form of feedback, they nonetheless come after the content has been written and published. This means that review is unable to play the traditional vetting function of publishing high-quality work while excluding low-quality work. This also means that manuscripts cannot be revised on the basis of reviewer feedback. It is true that effective altruists sometimes seek feedback through informal channels before publishing their work. That is a good thing to do. However, academic authors also seek informal feedback before submitting their papers for review, as well as more formal feedback at academic conferences. This feedback is rightly viewed as a necessary precursor to the review process, rather than a substitute for peer review.

Third, reviewers are not selected from among leading specialists. In many online venues, reviewers are not selected at all: anyone can comment. Those who do comment may have some amount of exposure to the topic, but rarely have spent years studying the topic. They typically lack terminal degrees in the field, or indeed in any field. They also often lack a track record of similar publications. To a lesser extent, the reviewers selected by leading foundations often share many of these lacks. This means that it is much harder for reviews to be taken as evidence of the scholarly merit and importance of work published in a nontraditional way.

Fourth, review is not anonymous. Even foundations such as Open Philanthropy which commission pre-publication reviews tend to actively share the identities of reviewers with the authors of reports. This makes it difficult for reviewers to do their job. If reviewers want to preserve their relationship, not only with sensitive authors, but also with the foundations which commissioned and generously paid for the review, they know that they had best moderate their criticism and heap on an extra helping of praise.

This is doubly true when reviews will be posted publicly. If anyone at all can view the content of a review, then the reviewer is assured little protection against retaliation. Because reviewers credibly expect that most people reading their review will be committed effective altruists, this gives reviewers an especially strong incentive to toe the party line.

Fifth, reviewers rarely have the authority to deny publication or request revisions. This strips the review practice of its teeth. The purpose of peer review is not to express opinion, but to evaluate and improve manuscripts. If reviewers can neither reject nor require revisions to a manuscript, then they cannot exercise effective pre-publication control or vetting, no matter the timing of their review.

Finally, in most cases publication decisions have effectively already been made prior to review. As we saw, in most cases no form of pre-publication peer review is practiced at all. But even in the rare case, such as an Open Philanthropy report, where reviewers are commissioned, this is done with the understanding that the report is virtually certain to be published. I have never heard of an Open Philanthropy report being scrapped after negative reviews (perhaps this has been done very occasionally?). If the decision to publish is already, in effect, made, then the review process is again deprived of its teeth.

6. What scholars think

What does the average scholar think about this situation? Here is what they see. They see a group of talented, intelligent and motivated individuals producing work that they judge to be significantly sub-par. They think that the work falls beneath standards because it has not been through the system of review which nearly all scholars think is required for work to reliably meet minimal scholarly standards. They think that while some of the work produced by effective altruists is quite good, much of it would never have passed through traditional processes of peer review. As a result, they think that nontraditional forms of publication are being used to pass off work that would not survive rigorous scholarly evaluation.

Scholars typically become more concerned when they see the results of this work. They see an increasingly high degree of confidence in idiosyncratic views, such as the claim that artificial agents have a sizable chance of soon murdering us all. They see that many, though not all of the people advancing these claims lack traditional scholarly credentials and publication records. They see claims being made about their own areas of expertise which they can quickly judge to be false or based on conceptual misunderstandings.

As a result, most scholars judge that the majority of work written by effective altruists is unlikely to be worth reading or engaging with. Therefore, they neither read nor engage with it.

You don’t have to take my word for these concerns. Almost any faculty member at a leading university will tell you the same. Or you can read recent pushback against the writings produced by EA-sponsored AI-safety organizations such as Conjecture and Redwood Research. For example, here is a recent critique of Conjecture:

We believe most of Conjecture’s publicly available research to date is low-quality … We think the bar of a workshop research paper is appropriate because it has a lower bar for novelty while still having it’s own technical research. We don’t think Conjecture’s research (combined) would meet this bar.

The critique suggests that Conjecture focus on improving the quality of their research outputs by adhering to standard processes, including a limited form of peer review.

We recommend Conjecture focus more on developing empirically testable theories, and also suggest they introduce an internal peer-review process to evaluate the rigor of work prior to publicly disseminating their results.

This is, to be clear, a very minimal demand – much less than the standard form of pre-publication, anonymous and independent review at a leading journal needed to secure significant external credibility. When even such minimal demands are not met, it is hard to place much confidence in the quality of research outputs.

I am not sure if such demands will be met. Conjecture, for its part, complained that “the document is a hit piece” and refused to engage in detail with the substantive concerns raised by this critique. In particular, they said nothing to suggest that the characterization of their publication process is inaccurate or that they plan to implement robust practices of peer review. That does not look good.

7. Why care?

Why should effective altruists care about scholarly standards or scholarly reactions? One reason to care is that scholarly standards work. A great deal of what we know as a species has been produced through standard systems of peer review, and academics and members of the public alike rightly place high confidence in the rigor, quality and informativeness of research published in leading journals. Standard practices of pre-publication peer review are good ways of sorting and improving manuscripts, and they should be adopted because they work.

You would not read a physics paper published by someone with no serious training in physics, and no intention of ever publishing a paper in an academic journal. Why would you subject serious moral issues about altruism to a lesser standard of scrutiny?

A second reason to care is that work published through nontraditional processes may not be taken seriously. Because effective altruists often make strong and controversial claims, it is especially important for them to secure traditional marks of legitimacy in order to secure a fair hearing. Regardless of what effective altruists think about the value of peer review, they should seriously consider the benefits of peer review as a means of convincing serious but skeptical audiences to read and engage with their work.

I hope that effective altruists will come increasingly to care about peer review. Improving the prevalence and nature of peer review within effective altruism will improve the quality of work, as well as the public perception of that work. Some scholars within the movement have written a number of serious and high-quality peer-reviewed articles on topics related to effective altruism. That is a good sign, and something we should hope to see more of in years to come.





11 responses to “Epistemics: (Part 3: Peer review)”

  1. Adam I Avatar
    Adam I

    Wow there’s so much that could be deep-dived here–what I’ll say now is that the downside of taking ‘four months’ instead of ‘four hours’ isn’t mentioned. Time presents tradeoffs with regard to impact; surely non-peer-reviewed work should continue, while simultaneously some fraction of EA work should attempt peer review.

    1. David Thorstad Avatar

      Thanks Adam!

      Deep dives are good. There is already a substantial scholarly literature on peer review, and I hope that this literature might be the foundation for future conversations.

      It’s definitely important to have conversations about which outputs should be peer-reviewed and which should not. I’m open to having that conversation. I am, after all, writing these words on a non-peer-reviewed blog.

      Within that conversation, a few questions might be helpful.

      The first is whether some groups might be putting rather less stress on peer review than they ought to be. Many research organizations funded to the tune of $10M+ by effective altruists have almost no track record of peer-reviewed publication. For example, this has been a recurring complaint in recent critiques of AI labs such as Redwood Research and Conjecture. Those organizations won’t be able to turn around and make the point that you made (that not everyone has four months to spare), since they certainly do have the personnel and the budget to spare four months making sure their work is up to snuff.

      The second is what status should be given to non-peer-reviewed outputs. Effective altruists often give high weight to nontraditional outputs such as podcasts and forum/blog posts, which are widely considered to be substantially less reliable than peer reviewed publications in leading journals. Though some effective altruists also read journal articles, many rarely, if ever, engage with them. If effective altruists are serious about using reason and evidence to determine how to act, then they should give more weight to the most reliable outputs, and demonstrate appropriate suspicion of unvetted outputs.

      Many would suggest that these considerations become more pressing when the opinions being advanced by effective altruists become more controversial. For example, many effective altruists have suggested that the time of perils hypothesis is true because if AI doesn’t kill us, it will quickly become so powerful that it can foresee and prevent nearly all future risks. This is a very fringe and controversial claim, supported by almost no reliable research and almost certainly denied by most experts in all relevant fields, to the extent that they are willing to speculate about such matters at all. In such a situation, it would seem appropriate to invest substantially more time and money in making sure that the EA view is correct, and to invest in the most rigorous research outputs to make sure that the EA view rests on solid research rather than loose speculation. By contrast, despite the traditional EA view on this matter, I’m not aware of so much as a decent full-length blogpost defending it. That’s not the situation we would like to be in.

      1. Jason Avatar

        I’d suggest that the argument for peer review is primarily stronger to the extent that credence in the relevant idea drives financial/intellectual/other resource allocation. In my view, whether something is a “fringe and controversial claim” isn’t very relevant by itself. It’s the intent to act on such a claim that triggers the need for scrutiny.

        I am not sure if the resource allocation picture or intellectual direction changes much based on the credence assigned to this particular idea. (I assign it very little.) I can see how it could potentially become relevant if one assigned a sufficiently low probability to AI doom, which I think is the major crux at present.

  2. Simon Friederich Avatar
    Simon Friederich

    I strongly support your call for EA to valuie peer review more.

    Another aspect of this, which I think you do not mention, has to do with status bias. Everyone of us has limited time and energy concerning what to read and engage with. If the most important thoughts and discussions are in blogs and forums, the best rule of thumb concerning what to read is to focus on what those with familiar “big” names — high status individuals — have been contributing. I am pretty sure that this dramatically increases the influence of a few individuals who, for better or worse, have big names and makes it very hard for others to attract attention, whatever the quality of their contributions.

    In a system centred around peer review, this problem would be mitigated, especially if procedures were double-blind. Then the status of a contribution would not be so dependent on the name of its author but also on the publication venue. And if access to the venue is more equitable, this helps contributing to more balanced attention. Individuals with high prestige will still attract the majority of reader attention, but this effect would be less pronounced.

    Put differently, the current EA-standards that value peer review relatively little, probable give rise to a particularly dramatic Matthew effect. Bright and Heesen actually somewhat concede that a runaway Matthew effect would be likely if, as they suggest, peer review were abolished: (See Section 5.2 for the Matthew effect considerations.) For EA, which to some extent may suffer from hero worship, combating the Matthew effect may be particularly desirable.

    To conclude, I find your plea for more peer review in EA really helpful, also on independent grounds!

    1. David Thorstad Avatar

      Thanks Simon!

      Yes, I can’t believe I forgot to mention this. Thanks you for bringing it up. As a junior scholar, if I were asked to name the single thing I like most about peer review it’s that (a) my work will be evaluated anonymously, more-or-less on its merits, and (b) if it’s accepted at a great journal through this process, people will read it and take it seriously no matter what my name is. This is literally how I got my job: by publishing papers through a process of anonymous peer review that were judged to be of consistently high-quality. Without this system, I might not have a job.

      I’ve honestly been thinking about writing a paper targeting Phil Sci / BJPS stressing some of the things that peer review does really well, and putting an argument like this at the front.

      1. Simon Friederich Avatar
        Simon Friederich

        Interesting! I had exactly the same strategy, and I also think that I own my career to double-blind peer review.

        Darrell Rowbottom makes some arguments against Bright and Heesen along these lines here, in Section 3, but he doesn’t seem to connect it well to their own discussion of the Matthew effect:

        Such a paper would be nice! But I guess one would need robust empirical results to make it really impactful, preferably novel ones.

  3. David Manheim Avatar

    Aside from substantive problems I have with your description of the way scientific publishing should work, a key problem with peer review for EA organizations is that it’s nearly impossible for certain types of important applied research to be reviewed in decent journals.

    Consider work like cost-benefit analysis of specific interventions, of the type performed by Givewell. The work isn’t methodologically novel or interesting, and it often looks at specific examples of programs that have been reviewed in other earlier work. They could submit to a lower tier journal, but in my experience, the reviews you get from many of those lower-tier journals are cursory at best, and the work is often still challenged on the basis of *scholarly* importance, regardless of substantive importance.

    Similarly, lots of research on substantive questions that isn’t within an established discipline is hard to get reviewed. This was, historically, a particularly critical problem in certain types of AI safety – for quite a long time, AI safety was hard to get published because it was weird. Much of it was, of course, low quality, but of the remainder, much of the work that was legitimate only got published via books, which don’t get peer-reviewed. This does not, of course, excuse continuing to avoid peer review as this type of work gets more attention, but it partly explains the history. (The recent norm in mathematics is very much to publish preprints and have more public discussion of results and validity rather than wait for peer review, but that’s a different issue.)

    1. David Thorstad Avatar

      Thanks David!

      I’m always happy to talk about the best ways to structure research communities and research practices, including practices of peer review. There is room for improvement in many areas. It sounds like there were some places where you disagreed with my views about publication practices. Were there any particular disagreements that stood out to you?

      It’s certainly true that some work, such as GiveWell cost-effectiveness analyses, will be hard to publish in good journals. I also share your skepticism of bad journals, and would not want to insist that such work should be submitted to bad journals. Bad journals are bad. I don’t read them, and I don’t respect them.

      While I am not highly knowledgeable about GiveWell’s internal processes, my impression is that GiveWell is relatively good at hiring qualified people, conducting rigorous assessments that draw on accepted methods and techniques, and seeking and responding to feedback from appropriate experts and stakeholders. If that’s right, I don’t want to complain much about GiveWell. I think GiveWell’s process and team are part of the reason that many people trust GiveWell.

      Moving to longtermist work, such as AI safety research, I’m not entirely unsympathetic to the idea that peer review processes make it challenging to introduce new questions and methods, particularly when those questions and methods cut across traditional academic disciplines. This is, to some extent, the challenge faced by many young people in academia, who want to convince the old guard to do things a bit differently than before. I’m facing this not only in my AI safety work, but also in my work in epistemology, which incorporates views traditionally thought of as part of ethics rather than epistemology.

      At the same time, I’m relatively unsympathetic to attempts to circumvent peer review entirely instead of rising to the challenge. It is not impossible to publish papers on these issues. My paper “Against the singularity hypothesis” got a R&R at a very good journal on the first try, and is likely to be published. My x-risk pessimism paper is forthcoming in a good journal. Several other philosophers (for example, Simon Friedrich) have published papers on AI risk, and Philosophical Studies (a very good journal) is publishing an entire special issue on the subject.

      I won’t pretend it’s been easy to publish papers on these issues, but I think it is very important to publish in traditional venues nonetheless. When communities decide, in large part (though not entirely! Some people have made good choices here) to circumvent traditional processes of expert review, not because they’re doing something too mundane (say, cost-effectiveness analysis) but rather because they’re doing something which is likely to be judged false, undersupported, or of low quality, it doesn’t leave a good impression. Many people come to suspect that the reason why papers aren’t being submitted for peer review is that the papers aren’t very good, and that they would be rejected if they were submitted.

      And … there’s not nothing to this suspicion. You did say that much of the early work on this subject was low-quality.

      Perhaps that’s unfair. I’d like to see more peer-reviewed papers on AI safety in the coming years. I hope and suspect we will see more such papers in the coming years. I think that this development will do more than almost everything else to open doors for AI safety work within academia, and to improve the actual and perceived rigor of the field.

      This might also be an opportunity to build bridges. For example, many AI ethicists work on topics that they would describe as “AI safety” issues: there are many ways in which AI can be unsafe without killing us all. Those scholars are, in some cases, interested in listening to and working with people from a variety of approaches. But they’re much more likely to engage with peer-reviewed work in good journals than with anything else.

      (As you mention, mathematics has different norms, and again for different reasons. My understanding of mathematics is that the field has very strong norms against publishing untrue claims, and that these norms serve as strong deterrents to publishing false things. My understanding is also that public discussion can be at least as effective at detecting issues in mathematics, due to the difficulty of asking a few reviewers to check every claim. My understanding is that this is particularly the case for significant or complex results, and that some very complex results have been reviewed by entire committees. Is that about right?)

      1. David Manheim Avatar

        On Givewell’s work and similar, I think this is a more important caveat that deserves highlighting, rather than burying in a comment here. It’s a very large part of EA’s non-philosophy / non-AI safety output, and if half the work shouldn’t undergo review, that seems to undercut a decent part of your original thesis.

        Aside from that, some responses:

        “This is, to some extent, the challenge faced by many young people in academia, who want to convince the old guard to do things a bit differently than before…At the same time, I’m relatively unsympathetic to attempts to circumvent peer review entirely instead of rising to the challenge.”

        This seems like a very different claim than your post, which is questioning the epistemics, not the social dynamics! It points to the fact that many of the people doing the work simply don’t want to convince the old guard, they want to substantively solve the problems, and per Kuhn, don’t think that scientific revolutions happen when old people change their minds – so they might as well ignore those people. (Unless they need academic prestige, which they don’t.) Unless peer review has substantive benefits which outweigh the very high costs, they shouldn’t care. (And I think that this is something that was the case for AI safety until somewhat recently, when the rest of the world starting actually taking the issues of misaligned AI seriously, and they should be welcoming the outside interest, but can’t do so easily for reasons I mention below.)

        “You did say that much of the early work on this subject was low-quality.”

        Yes – but I’d flag that there are several types of relevant low-quality. The first is just bad, poorly conceptualized, or poorly executed work. Obviously, there is going to be some of that, but it’s not what people on, say, the Alignment forum spend time on. The second is preliminary work, which isn’t fully fleshed out. Much of the work on the alignment forum is of this type, and it shouldn’t be published traditionally because it’s not done. The third, though, is the amalgam of a bunch of preliminary work that actually pans out and has important ideas and results – and it’s low quality because it’s not written in the right form, is meandering, etc. That requires lots of work to clean up, but within the small alignment community, it isn’t necessary. It’s only important if others are trying to understand – and this is where and why I strongly agree that we need to make the work more legible / better quality. But that’s a fact about the writing and presentation, not the substantive content, and it’s only begun to be true in the last year or two, when others did start to try to understand it.

        “Mathematics has different norms, and again for different reasons.”

        But much of the early alignment work, including almost everything by Garrabrant and Demski, a lot of work by Everitt, and much of the work by Critch, is more mathematical than computer science, and those are the relevant norms! The problem is that it was also weird, so it has the problems of needing to convince the old guard of its importance, which is thankless and potentially unnecessary. Instead, academic preprints made it to Arxiv, and the good work gets cited all over the place, got feedback, and is discussed without needing the pre-publication filter. (Which I see as a far better model than much of journal publication, per my comment on twitter about F1000 Research’s model for a journal.)

        1. David Thorstad Avatar

          Thanks David!

          Convincing people that submitted work is interesting, true and high-quality is not a merely social problem. The system of peer review is designed to allow leading experts to express their opinions about the merits of submitted work. Convincing experts that submitted work is, in fact, meritorious, is a large part of the work the authors of peer-reviewed papers aim to do, and the fact that they must do this is what allows the system to function as an effective filter for high-quality and important papers.

          I am aware that many effective altruists think they should not bother to convince experts that their work is important and that they can circumvent established practices of scholarly evaluation. This type of behavior contributes to the impression that a good deal of work conducted by effective altruists is low-quality, and that effective altruists are not always interested in meeting the same standards of scholarly rigor as anyone else. That is unfortunate for those effective altruists who do write high-quality papers, and who understand and appreciate the importance of submitting their work to the evaluation of experts in their field. It is also a significant lapse in community epistemics, since experts are in a position to determine (and improve) the quality of submitted papers.

          Arxiv preprints are no substitute for peer review. Arxiv conducts little meaningful evaluation of the quality of papers before accepting them for submission, depriving review of its primary function of evaluating the quality of reviewed work. Further, because Arxiv papers are not anonymous, every matter related to Arxiv papers including their likelihood of being read and the favorableness with which they are read has a great deal to do with the identity of the author, rather than the quality of the paper.

          If effective altruists continue to treat inclusion on an internet archive as a substitute for peer review, their papers will continue to be ignored or treated with deep suspicion, because readers will suspect that the papers were beneath the quality bar required for publication in a good journal. It is their choice, but effective altruists will have no grounds to complain if they are not willing to submit their work for evaluation by leading experts. For my part, I rarely read papers written by effective altruists that have not gone through rigorous processes of peer review, and I make no apologies for that.

          1. JWS Avatar

            David (T, for clarification!) is really below what I’ve come to know and expect from you. Some points here I hope you can consider from someone who respects you and your work, but has some profound disagreements with how you’re approaching this issue:

            1) You position seems to switch between “the peer review system, despite its flaws, is one of the best ways we have for generating knowledge” on the one hand and “anything not peer reviewed is low quality and not worth considering”. I think there’s a lot of nuance that can be had on the benefits/challenges of peer review, and how this might have different effect on various disciplines. But you really seem to come across as heavily identity-driven on this issue, not just on this blog but on your Twitter too. I think you’ve got a lot of respectful and nuanced pushback on this issue that hasn’t mollified your position at all.

            2) You attack ‘effective altruists’ as a whole on this issue, but I suspect that this is mostly driven by AI Safety research – while this has dominated the headlines recently (and not just in EA!) there’s so much other work being done by EAs. But you don’t provide any evidence for this beyond these assertions. I’ve noted on Twitter that you do focus on individual posts on the EA Forum sometimes, but don’t note (e.g. in the case of the Munk Debate recently) the direct pushback in those posts themselves!

            3) Similarly, you seem to position there being a significant academic consensus on EA research being poor and/or EAs being naive and arrogant and/or EA being harmful and wrong. You may have personal experience to this effect, but this far from being evidence from a consensus. Again, I feel this might be over-indexed on AI work for example? For example, as far as I know GPI is fairly well regarded at Oxford and not viewed as a “low-quality” institution by the rest of the university.

            4) Your last paragraph really rubs me the wrong way, especially given your previous tweet-storm around Open Phil’s AI work (which, to your credit, you did apologise for). I also think how you reacted to the recent Nature editorial is, rather than a bad look for EA, a really bad look *for you*. I think this goes back to point 1 – you switched into a “Nature has spoken, we must respect and listen” mode, as opposed to “this piece has litte to no evidence to support it and is thus a poor piece, regardless of it being a Nature editorial” which was my impression of the reaction of AI-Safety EAs.

            So, with respect, I think on this issue you fall well-below the high standard you’ve set for yourself with the rest of the ‘Reflective Altruism’ program. I’m happy to take this discussion to another higher-bandwidth medium (email/zoom/or in-person if you’d like), as I still think there is a lot of common ground to be had here.

Leave a Reply

%d bloggers like this: