Peer review and the trust crisis
Trust is a complex psychological and social process (
Lewis and Weigert 1985;
Evans and Krueger 2009). Though it is difficult to define (
Rousseau et al. 1998), one of the more common dictionary definitions of trust is a firm belief in the reliability, truth, ability, or strength of someone or something (see
McKnight and Chervany (2001) for a typology of definitions). Trust has tremendous social benefits. High-trust societies tend to be more tolerant, have higher levels of social cohesion, and have higher subjective levels of well-being (
Welch et al. 2005;
Hudson 2006). Trust in other people (interpersonal trust) and institutions (institutional trust—including trust in experts and authorities) are both important for public discourse and democracy (
Warren 2018). However, trust is fragile and asymmetrical, as it is built slowly but lost quickly (
Lewicki and Brinsfield 2017;
Cvitanovic et al. 2021). Moreover, “trust crises” are rarely rebuilt by time alone (
Millstone and Van Zwanenberg 2000). Rebuilding trust takes intentional and difficult work, usually involving listening to critics, admitting problems, and having a willingness to enact real changes to demonstrate that the problems have been reduced or eliminated (
Alexandre et al. 2013;
Lewicki and Brinsfield 2017;
Altenmüller et al. 2021).
Over the last few decades, trust and mistrust in science has received much attention (
Wilkie 1996;
Hendriks et al. 2016;
Funk et al. 2019). We use the word “science” in the broadest sense to capture scholarly activities across all domains and not just the natural, physical, health, or social sciences. For a wide range of issues such as climate change (
Hamilton et al. 2015;
Lacey et al. 2018), vaccines (
Black and Rappuoli 2010;
Hamilton et al. 2015), or public health measures during pandemics (
Kreps and Kriner 2020;
Rutjens et al. 2021), trust in science is necessary to persuade the public to support initiatives intended to benefit people and the planet. There is evidence that trust in science is waning, fueled by extreme anti-science views that are shared and amplified by some politicians and celebrities (
Tollefson 2020), and exacerbated by social (
Huber et al. 2019) and conventional (
Ophir and Jamieson 2021) media. The effects of media misinformation are so widespread that new terms have been invented to describe it, such as alternative facts, truthiness, and post-truth (
Sismondo 2017;
Vernon 2017;
Wight 2018).
One aspect of science that has been implicated in the erosion of public trust is the peer review process used to assess the quality of articles submitted for publication in refereed journals (
Ahmed 2021;
Kharasch et al. 2021). In a 1999 editorial, the editor-in-chief of BMJ famously described peer review as “slow [authors' note—that is an especially problematic issue for early career researchers when it comes to grant applications or meeting the requirements for a graduate degree], expensive, highly subjective, prone to bias, easily abused, and poor at detecting gross defects.” Recent high-profile cases that involved fraud and associated retractions (
Clark et al. 2016;
Byrne 2019;
Bucci et al. 2020;
Ledford and Van Noorden 2020,
Viglione 2020;
Pennisi 2021), the presence of paper mills (
Else and Van Noorden 2021;
Sabel et al. 2023;
United2Act 2024), spoof articles in predatory journals (
Bohannon 2013;
Grudniewicz et al. 2019), editorial or procedural incompetence and bias (
Weller 2001;
Wang et al. 2016;
Horbach and Halffman 2019), and the replication crisis (
Open Science Collaboration 2015;
Baker 2016a;
Guttinger and Love 2019) have negatively impacted trust in a process that is foundational to science (
Rowland 2002). It is important to note that the erosion of trust is mainly in the general public; researchers continue to show a strong level of trust in peer review as a process to improve the quality of science and scientific publications (
Kratz and Strasser 2015;
Elsevier and Sense about Science 2019). However, despite researcher confidence in the peer review process, its effectiveness is difficult to assess (
Jefferson et al. 2002;
Bruce et al. 2016;
Rennie 2016;
Helmer et al. 2017;
Lee and Moher 2017;
Squazzoni et al. 2020;
Squazzoni et al. 2021;
Garcia-Costa et al. 2022) and the sheer number of articles has placed the contemporary peer review system under immense strain (
Hanson et al. 2023). This makes it challenging to know what changes would improve the peer review system.
Peer review dates back hundreds of years (
Spier 2002) and has maintained its current form since the 19th century. During most of that time, it has changed little, including an absence of mandatory training in peer review and editing. Peer review, and the publishing landscape, have seen more changes in the last two decades than they have for centuries (
Ware 2005;
Tennant 2018), including the replacement of print with online publishing (eliminating physical constraints on space in journals), new peer review models (aimed at reducing bias), preprints (making non-peer-reviewed science available fast), post-publication peer review (allowing for peer and public discourse), and open access journals (removing financial constraints on readers and placing that burden on authors). While well-intentioned, these initiatives are missing an overall framework to select the most effective and fair peer review models and to implement them. We propose such in the next section.
Peer review reality checks
We present eight pre-publication peer review reality checks (see
Fig. 1) to openly acknowledge the shortcomings of peer review and suggest ways to minimize these effects on trust in science, both amongst practitioners and in the general public.
1. Peer reviewers are biased. Humans are biased; therefore, their actions are biased, and any artificial systems trained by human reasoning will be as well. This affects the two main functions of peer review, which are to select which papers should be published and to improve them (
Doctor et al. 2001;
Smith 2010). This involves an initial filtering of submissions by journal editors and, for those manuscripts deemed worthy of pursuing, an external third-party quality assessment by experts. There are three main sources of bias that can occur during this process (
Lee et al. 2013). First, reviews can be biased due to the characteristics and biases of the selected reviewers (
Fox et al. 2017b), including their gender, discipline, and geographic or cultural background (
Lee et al. 2013). Secondly, there can be a tendency of reviewers to accept papers that support their preferred hypotheses (confirmation bias) and that have positive outcomes (publication bias) (
Browman 1999;
Lee et al. 2013). To mitigate bias in reviewer selection, reviewers should come from diverse backgrounds; a greater diversity in perspectives should also reduce confirmation bias. Editors are tasked with being aware of publication bias in how they select papers and in assessing reviews, and should be trained to encourage evaluation of sound science over positive results (
Resnik and Elmore 2016).
The third form of bias is the trickiest to address. It occurs because the perceptions and decisions of editors and reviewers can be biased due to their own biases and also due to an author’s prestige/status (
Huber et al. 2022), institutional affiliation, race (
Strauss et al. 2023), nationality, language, or gender (
Lee et al. 2013). Various structural and procedural initiatives have emerged in an attempt to reduce these biases. Under double (reviewer and author) anonymity, journals hide the author’s identity to reduce harm from the unconscious biases of reviewers, which either reduces (
McNutt et al. 1990;
Blank 1991;
Budden et al. 2008;
Tomkins et al. 2017;
Sun et al. 2022;
Fox et al. 2023) or does not affect (
Van Rooyen et al. 1999;
Webb et al. 2008;
Carlsson et al. 2012,
Cassia-Silva et al. 2023) bias compared to having only single (reviewer) anonymity. None of these studies evaluate the level of experience of editors and reviewers, particularly their level of training in dealing with unconscious and conscious bias. Training is important to help overcome biases and reduce harm but should be implemented in conjunction with procedural changes, as training alone does not always change perspectives (
Schroter et al. 2004;
Bruce et al. 2016). In any recommendation that reduces harm, there is opportunity for people to cheat; technology (
Bauersfeld et al. 2023), preprints (
Rastogi et al. 2022;
Sun et al. 2024), and even the increased use of code repositories (e.g., GitHub) and study registration or pre-registered reports (which are both otherwise positive developments) now make it easier to circumvent anonymity and discover an author’s identity. Open identities (
Ross-Hellauer 2017), where author and reviewer identities are known but reviews are closed to the public, are meant to improve review quality and accountability and lead to reviews that are more constructive and thoughtfully worded, albeit evidence for those outcomes remains sparse. Open reports (
Ross-Hellauer 2017) take it one step further, where the reviews and (usually) names of reviewers are publicly released with the published article. This transparency is meant to promote greater accountability of reviewers and authors to the wider community, as conflicts of interest are more obvious; readers can judge whether reviewer criticisms and author responses were appropriate, reviewers should be more conscientious, and quality reviews are incentivized because they can be used for career advancement (
Walsh et al. 2000;
Polka et al. 2018;
Bravo et al. 2019). Open reports tend to have higher quality reviews than doubly anonymous ones (
Bruce et al. 2016;
Haffar et al. 2019), but this is not yet widely adopted (
Tennant 2018) and more research is needed. Compared to single (reviewer) anonymity, it is possible that harm reduction could be achieved by implementing author anonymity for reviewer and editorial decisions or implementing signed reviews (
Parmanne et al. 2023), and by making signed reviews public. However, open identities or reports can introduce other forms of harm. For example, if early career researchers decline to review manuscripts authored by established researchers because they are afraid of reprisal (
Rodriguez-Bravo et al. 2017), this may reduce reviews with alternative viewpoints—and those who do reviews are already a subset of the available viewpoints (
Kovanis et al. 2016). The evidence-base for how open review affects bias and quality is still slim (
Ross-Helleur et al. 2023) and most studies focus on attitudes instead of outcomes (
Ross-Helleuer and Horbach 2024), but at the moment the evidence shows either no change or positive effects for moving away from systems reliant on the known harms of single (reviewer) anonymity; further experimental studies are needed to fully assess this. Importantly, since it is the people doing the reviewing who introduce bias, no structural or procedural solution will completely eliminate it. While some forms of bias may be reduced by any given peer review model, others may be introduced. Thus, it may be premature to fully embrace any of the aforementioned approaches to harm reduction except in experimental contexts (i.e., to learn what works and what doesn’t). The most prudent, immediate opportunity for harm reduction is through training referees and editors in mitigating bias. In addition, journals could consider offering authors the option of choosing which peer review model they prefer to be applied in the assessment of their work.
2. Peer reviewers and editors limit scientific and ideological innovations. Academics face disincentives to pursue and publish research that goes against the dominant ideologies/theories in a field (
Kempner et al. 2011). Peer review contributes to this, as reviewers and editors have the ability (intentionally or not) to enforce conformity with engrained perspectives and ideas (
Hojat et al. 2003). The supposedly free field of academic inquiry is in fact highly patterned and uneven, with islands of intense attention to fashionable topics and perspectives within a sea of broader neglect (
Frickel et al. 2010;
Stephan 2012). Similarly, there is a growing recognition of the resistance to other sources of knowledge, for example Indigenous Knowledge (
Loseto et al. 2020). Harm reduction here can be implemented if reviewers/editors are trained to evaluate papers based on soundness (regardless of whether they conform), to not require authors to change their content, interpretation, or language without compelling and well-supported justification (i.e., not based only on the reviewer’s or editor’s opinion), and to ensure that authors can disagree with reviewers/editors without fear of rejection.
3. Peer reviewers can be ineffective at detecting flaws. No form of peer review will detect all instances of unintentional error (
Park et al. 2021) or overt misconduct (questionable research practices (
Fraser et al. 2018) or outright fraud (
Van Noorden 2014a)). This is because peer review relies on trusting that the author presented results that are reflective of how experiments were conducted and data were collected (
Tennant and Ross-Hellauer 2020). Practices such as preregistration (registering data collection and analysis methods prior to a study) and publicly sharing materials that support reproducibility (open data and code;
Culina et al. 2020) can reduce harm by facilitating detecting errors or misconduct earlier in the process (but see
Berberi and Roche 2022), though how many reviewers actually check the data or code is unknown. Preregistration is now mandated for clinical trails (e.g.,
Government of Canada 2018;
National Institutes of Health 2021), but adoption is not widespread (
Nosek et al. 2018;
Alayche et al. 2023). In ecology and evolution, 20% of journals now mandate open data as a condition of publication (
Berberi and Roche 2022), yet few of these outlets require authors to share data with reviewers at submission. Harm is likely to be reduced through open data practices and their integration into the peer review process so that reviewers can better assess the quality of data in a study; however, this will increase the time required per review. Identifying which peer review procedures are best at flagging erroneous and fraudulent research is ongoing (
Horbach and Halffman 2019) and training may need to occur in specific areas (
Zheng et al. 2023). Recent innovations in automation, including software (e.g., statcheck;
Baker 2016b, STM n.d.) and artificial intelligence (
Price and Flach 2017), may be better and faster at detecting flaws in data analysis than humans, although it may be best suited for routine tasks such as verifying references (as is already done by some publishers;
Schulz et al. 2022) and detection of image manipulation (
Hosseini and Resnik 2024).
4. Peer reviewers sometimes engage in unprofessional and unethical conduct. Although it is rare, particularly in well-run journals (
Smith 2006), unprofessional and unethical conduct unfortunately exists within the peer review process. Editors and reviewers can block (reject) or delay articles by competitors involved in similar research to control the narrative or to be the first to publish, and reviewers can submit unprofessional or dishonest reviews. In one study, nearly half of scientists had been asked by reviewers or editors to alter arguments in a way that contradicted their own scientific judgement (
Shibayama and Baba 2015). More distressingly, many authors reported making such changes to ensure publication (see also
Frey 2003;
Tsang and Frey 2007). Similarly, citation coercion can occur when reviewers insist that their own papers are cited (
McLeod 2021), or a journal requests that more articles from that journal are cited—sometimes under the threat of rejection (
Wilhite and Fong 2012). Authors may behave unethically by submitting positive reviews of their own manuscripts using fictitious reviewers or friends (
Ferguson et al. 2014;
Haug 2015;
Skórzewska-Amberg 2022). Unscrupulous editors and reviewers can steal ideas or plagiarize content (
Rennie 2003;
Smith 2006;
Rennie 2016) though open peer review with timestamped submissions could help reduce this (
Gipp et al. 2017). Harm is reduced by educating editors and reviewers about publication ethics, training editors to notice very fast turn-around times and generic email domains, and by empowering institutional research integrity offices to penalize bad actors (e.g.,
COPE 2011,
2017;
Basil et al. 2023).
5. Low quality peer reviewing results in everything being published. Peer review is increasingly about determining where work is published rather than whether it is published (
Peres-Neto 2016). A major contributing factor to this is the emergence of journals and publishers that are willing to publish anything for a fee (
Ward 2016;
Beall 2017;
Strielkowski 2018;
Grudniewicz et al. 2019;
Cortegiani et al. 2020;
Mills and Inouye 2021), including open access mega journals with limited scientific stringency or selectivity (
Wakeling et al. 2017). Preprints and draft manuscripts posted on servers (e.g., arxiv.org;
Van Schalkwyk et al. 2020) represent a high risk of harm approach to publishing because the results of research are made public whenever the authors decide, without any pre-publication review, and they can be difficult to distinguish from a peer-reviewed article, particularly for the public. Pre-prints can be an opportunity to refine a manuscript, IF they are peer reviewed. However, currently only a small percentage of preprints are peer reviewed mainly because there is no mechanism requiring such efforts nor (typically) any arbiters (i.e., editors) of such efforts (but see
https://peercommunityin.org/ and
https://prereview.org for examples of nascent developments in that sphere). While not everything published in “top tier” journals is of high quality and vice versa (
Moher et al. 2017b), the probability of high quality papers differs among these categories (
Happe 2020;
Stephen 2024). Readers need to be taught to assess paper quality independent of where they are published (i.e., to apply Mertonian academic scepticism), as there is a wide range of peer review rigour across journals and across articles within a given journal because, as already noted, all editors and reviewers have their own perspectives, biases, and uncertainties (
Barnett et al. 2021). The harm of poor reviewers/editors is reduced when there is more than one editor involved and they engage in stricter pre-screening, when at least three reviewers are used per paper, and when reviewer recommendations to accept must be unanimous (
Neff and Olden 2006).
6. Peer reviewers are not incentivized. The scholarly peer review system relies almost entirely on volunteer labour, with billions of dollars worth of time donated yearly (
Aczel et al. 2021). While reviewers are typically employed by research institutions, peer review is expected but is usually not explicitly part of their job description. The ask is not small: most reviewers evaluate several papers per year and typically spend 3–6 h per review, while “mega reviewers” assess over 100 papers per year (
Ware and Mabe 2015;
Rice et al. 2022). When this effort is not recognized, then participating (or not) in peer review does not affect career advancement (
Bianchi et al. 2018). Given that most individuals in the research community are already over-committed and struggle with work-life balance, declining (or ignoring) peer review requests due to lack of time is common (e.g.,
Willis 2016;
Sipior 2018;
Stafford 2018) and has been increasing (
Albert et al. 2016;
Fox et al. 2017a). Peer review needs to be incentivized to reverse this trend and to increase the diversity of reviewers (
Laxdal and Haugen 2024). Recognizing the contribution of reviewers will reduce harm and can be accomplished publicly (e.g., Publons or ORCID;
Van Noorden 2014b;
Hanson et al. 2016) by having reviewing count towards funding and promotion (
Clausen and Nielsen 2003), offering payment to referees for their services (
Lortie 2011;
Brainard 2021), or by providing other incentives (e.g., discount or waiver for publishing fees in an open access journal or another product from a publisher) though this may decrease review quality (
Squazzoni et al. 2013). In theory, allowing high quality peer reviews to count as contributing to scholarly performance will make the additional time involved in increasing the number of reviewers per article and having reviewers assess the quality of data easier to achieve (
Ferris and Brumback 2010).
7. Peer reviewers and editors are not formally trained. In most professions that involve passing judgement on individuals (e.g., judges, law enforcement officers, professional sports referees), there is legislation or policy that dictates how these judgements take place. Failure to judge in an appropriate manner means loss of responsibilities or termination. While peer review is a central component of the scientific process, formal training or certification for peer reviewers or editors is not mandated by employers or journals, and the consequences of performing poorly are limited, except in the most extreme cases. Instead, training depends on having a willing mentor, but mentorship quality also varies, resulting in considerable variation in how scientists learn to evaluate manuscripts. Training can increase agreement among reviewers on a given manuscript (
Strayhorn Jr et al. 1993), increase the probability of rejection (
Schroter et al. 2004), and does not affect the time that reviewers take to evaluate manuscripts (
Schroter et al. 2004). There are a growing number of resources and training programs available (
COPE 2017;
Galica et al. 2018;
Foster et al. 2021a,
2021b;
Willis et al. 2022;
Buser et al. 2023;
C4DISC n.d.), but these need to be tested for their effectiveness, and further research on the best types of training is necessary, as not all appear to be successful in leading to higher quality reviews (
Callaham and Schriger 2002;
Schroter et al. 2008;
Bruce et al. 2016, but see
Galipeau et al. (2015) and
Lyons-Warren et al. (2024)). Harm caused by lack of training will be reduced by adopting the core competencies for editors and reviewers that are constantly being developed and updated (e.g.,
Moher et al. 2017a;
COPE 2019;
Proctor et al. 2023).
8. Peer reviewers can be unkind. The necessity to be critical when evaluating scientific work can easily express itself in unconstructive and harsh rather than supportive and collegial comments. This becomes a problem when demeaning comments or ad hominem attacks creep in to reviews. For example, in a survey of >1100 scientists, more than half said that they had received at least one unprofessional review in their career (
Silbiger and Stubler 2019), and
Gerwing et al. (2020) reported demeaning comments in 10%–35% of peer reviews in ecology and evolution and 43% in behavioural medicine. Harm will be eliminated at the source when reviewers learn to refrain from making demeaning comments directed to authors and instead use neutral or positive language directed at the science (
Clements 2020;
Parsons and Baglini 2021); signed reviews may help as anonymity is one reason reviewers bully authors (
Comer and Schwartz 2014). Well trained and engaged editors can eliminate this harm from being passed on by redacting or editing reviewer text that is unkind (
Gerwing et al. 2021) and calling out such behaviours; clear journal guidelines for these instances will help (e.g.,
COPE 2021).
Post-publication harm reduction approaches to address the imperfections of the peer review system
In addition to the initiatives described above, we advance the following four suggestions for the development of a harm reduction plan for instances when peer review by itself in insufficient (
Fig. 2).
1. Post-publication commentary can support correction of the scientific record. When a paper is accepted by a journal, the formal peer review process has ended. Yet, with the advent of digital communication platforms, there are opportunities for “ad-hoc” peer review (by those who were not involved in the pre-publication peer review process) to continue after publication (
Townsend 2013). The most common platform is PubPeer (
https://pubpeer.com/ - which can be added as an extension on web browsers), where anyone can begin discussions (positive or negative) of any published paper, although it has been criticized for allowing anonymity (
Torny 2018). However, there are other harms that could be introduced via such platforms, including personal attacks, targeted harassment, and vendettas against competitors. Some journals have also instituted post-publication review, but these have yet to become widely embraced. Post-publication critiques could be better fostered by journals training editors as curators, following guidelines from the Committee on Publication Ethics (COPE). Unfortunately, the willingness of some journals/publishers to enable such discourse is limited or simply not helpful if the journals lack legitimacy (e.g., predatory publishers). If these forums become the norm, some of the harms of pre-publication peer review—especially missing fundamental flaws—may be corrected post-publication: two-thirds of comments on PubPeer (
Ortega 2021) and half on journal websites (
Wakeling et al. 2020) were about the paper’s soundness. The imperfections in published studies can reveal what is needed to increase their quality, impacting future studies so that they are more likely to be reliable (
Cooke et al. 2017). Some of the solutions already presented, such as open data sharing, will facilitate post-publication review, especially as data accessibility should outlive individual careers and lifetimes. The most obvious form of harm reduction is enhanced training of editors, referees, and authors on how to constructively engage in post-publication review such that those efforts are civilized and follow guidelines established by entities such as COPE.
2. Editorial notices should be used more rapidly and consistently. Editorial notices, in the form of notes, corrections, expressions of concern, or retractions, are mechanisms by which formal concerns about a paper can be made public (
Noonan and Parrish 2008;
Teixeira da Silva and Dobránszki 2017). In all instances where a post-publication critique identifies significant issues, such matters should be addressed by the journal editorial team in a timely manner. Editorial notices can be related to the validity of data, methods, or data interpretation, or issues related to manipulated or reused images (
Vaught et al. 2017). Depending on the type of concern, it can be resolved by discussion with the author (i.e., determined to not be an issue) or it can result in action such as a correction, expression of concern, or retraction (
Knoepfler 2015;
Didier and Guaspare-Cartron 2018). Such notices can involve complexities and legal action, which may result in it taking years before notices are issued or cases resolved (
Ortega 2021). Furthermore, unsound studies can continue to be used and cited incorrectly well after they are corrected (i.e., “zombie papers”;
Binning et al. 2018;
Brainard 2022). The potential harm of zombie papers and uncorrected science would be reduced by better procedures and standards for the use of retractions (
https://publicationethics.org/retraction-guidelines;
Teixeira da Silva and Dobránszki 2017) and by making the process more open and transparent (
Teixeira da Silva and Yamada 2021) including better and more prominent linking of rebuttals to the original articles (
Banobi et al. 2011;
https://www.csescienceeditor.org/article/introducing-the-niso-crec-guidelines/).
3. The importance of robust evidence synthesis should be repeatedly and publicly communicated. Research is conducted by people, which means that imperfections will always be present (
Olson 2008), while variation and uncertainty are also inherent properties of systems, making it difficult to achieve the level of certainty often desired by decision makers (
Malnes 2006). Because there is no single standard for how to do research across all fields, it is unlikely that there will be consensus on what constitutes an “adequate” study. Too much reliance on a single or small number of empirical studies can be risky as they all have some form of weakness or limitation that influences their reliability and broader relevance. Evidence synthesis is useful for collating knowledge on a given topic to identify broad patterns and assess treatment effect sizes and uncertainty. There are many forms of evidence synthesis that themselves vary in rigour and robustness (
Donnelly et al. 2018), with systematic reviews (including meta-analysis) following established practices (e.g., The Cochrane Collaboration, the Campbell Collaboration) being the gold standard (
Gough et al. 2017). These have a critical appraisal phase where individual studies are evaluated relative to predetermined criteria (
Burls 2009) and studies that are biased or flawed are either excluded or down weighted with justification given in a transparent manner. In other words, not all evidence is considered equal. This is the key harm reduction achieved by using evidence syntheses: readers are protected against the undue influence of a single paper which reduces the harm of the imperfect screening of the peer review process. These reviews inform policy and practice in several key sectors (e.g., health care, education, the environment) but could be more widely embraced in other knowledge domains. It is unclear how much the general public (and even some researchers) know about such syntheses, so we recommend that their purpose and utility be better communicated.
4. Foster development of critical thinking skills among all knowledge consumers. It takes a great deal of time, knowledge, and experience to read and understand a given study and to consider its strengths and weaknesses by applying Mertonian organized skepticism to it. Because of this, the peer-reviewed literature is put on a pedestal with cautions given against relying on other sources such as grey literature, preprints, blogs, or webpages. It is important that everyone who consumes scientific information (e.g., policy makers, industry professionals, the general public, and researchers) be trained to understand the complexities of knowledge generation so they can develop critical thinking skills to identify and weigh the reliability of any given piece of information irrespective of where it was sourced (
Bailin 2002;
Brewer and Gross 2003;
Levinson 2006;
Durbin 2009;
Subramanyam 2013), and understand the importance of systematic reviews and meta-analyses. Harm would be reduced by building a culture within and beyond academia where we teach that limitations and biases are pervasive and often unconscious in science and elsewhere (
Weaver 1961) and have them openly acknowledged by authors where possible (
Sumpter et al. 2023), while implementing strategies to mitigate bias (through training and structural changes). Related to this recommendation is the opportunity to share positive examples of where peer review has been particularly useful (an idea raised by one of the anonymous referees for this paper) and identified errors that were addressed prior to publication. Peer review is often framed as inherently “negative” given our human tendency to focus on criticism; yet, constructive criticism is the whole point of peer review and can improve science and protect the scientific record, which is an entirely positive outcome.
Synthesis and a path forward
Peer review is foundational to the scientific enterprise and carries great weight within the scientific community (
De Ridder 2022), policymaking (
Oliver et al. 2014), legal and judicial proceedings (
Chubin et al. 1995), journalism and media (
Young and Dugas 2012), and activism (
Fähnrich 2018). Acknowledging the fallibility of peer review systems and the people involved with them is an essential starting point to implementing a harm reduction approach and drive improvements.
A harm reduction approach recognizes peer review’s strengths while striving to minimize the problems that lead to scandal and mistrust by encouraging targeted interventions that seek to reduce bad outcomes in individual cases. The eight reality checks about peer review that we have presented are an attempt to do this. For each, we outlined steps to reduce the likelihood of harmful outcomes. In addition to the harm reduction that training will provide, we recommend that several institutional and procedural changes continue to be studied to provide more clarity on the extent to which they achieve the desired outcome and actually reduce harm. Those include continuing to evaluate the types of reviewer/author anonymity structures and perhaps moving away from single (reviewer) anonymity to making reviews open (transparent) and signed; having open data practices integrated into the peer review process; having more than one editor where each performs stricter pre-screening; having at least three reviewers per paper with a unanimous criterion for acceptance; selecting reviewers with diverse backgrounds; acknowledging that peer review is like other forms of mentorship in the scientific community and formally recognizing it for career credit; and making sure that bad actors face consequences. Public trust will be increased by encouraging post-publication commentary, making the process of editorial notices streamlined and transparent; making the process of whistleblowing safe and effective; and encouraging the use of evidence synthesis.
Training will reduce specific harms in peer review (
Fig. 2) when reviewers and editors learn not to require authors to change their language without substantial justification and when authors have the opportunity to disagree without fear of rejection; to recognize and avoid citation coercion; to evaluate papers based on soundness instead of ideological conformity or positive results; and to refrain from making demeaning comments directed to authors. This sort of training will require the development of core competencies for editors and reviewers (e.g.,
https://webofscienceacademy.clarivate.com/learn;
EASE 2024). There may be a need to incentivize training, or even make it mandatory (e.g., you are unable to submit or review articles for a collective of publishers without training) because voluntary attendance and motivation depends on socio-demographic factors. We note that specific training is also needed to reduce harm before peer review happens (
Fig. 2). For example, only 20% of authors read the papers they cited (
Simkin and Roychowdhury 2003) and 25% of citations were inappropriate (
Todd et al. 2010), including the continued usage of studies known to be severely flawed (
Binning et al. 2018;
Berenbaum 2021).
Some initiatives will undoubtedly air more dirty laundry about the peer review process; however, the trust crisis literature tells us that instead of hiding in shame, we should be open about these flaws to (re)build public confidence in this process. Collaborative efforts are needed to test novel interventions targeting improvements in peer review, including collaborative peer review (
Mehmani 2019). We should embrace observational analysis and experimentation (which requires the cooperation of publishers) to test procedural changes (
Ross-Hellauer et al. 2023) and methods of training (
Carter et al. 2020) and engage with stakeholders to work towards improved systems that address the flaws discussed throughout this paper (
Lee and Moher 2017). From there, the scientific community, in collaboration with professional associations and publishers, can demonstrate their good faith by widely and consistently implementing the best procedural and training reforms to enhance transparency and reduce bias. This will be seen as a challenging gambit in some quarters, but in our view ignoring the problem by separating publication from peer review (
Tennant et al. 2017) or seeking to “blow up” or go around peer review are riskier options that will result in more scandal. Efforts need to focus on how to improve it rather than abandoning it (
Haffar et al. 2019). The hard and incremental work inspired by harm reduction approaches offers a way forward.
Today, we continue to face many challenges ranging from climate change to public health crises. The fact that peer review is imperfect should not hobble our ability to make decisions and address urgent issues based on science; instead, the urgency should drive us to do better. We need to acknowledge the uncertainty inherent in every study and the imperfections of peer review—a compelling reason for ensuring decision-makers are well-versed in the realities of peer review and they understand the value of evidence syntheses—and deal with the overwhelming need for more training. Developing the skills in critiquing research and encouraging conversations about strengths and weaknesses of the system and particular studies is critical and will require a cultural shift—for the better (
Bastian 2014;
Fig. 2).