Introduction
Gender inequalities are well documented in academia and typically favour male researchers (
Huang et al. 2020;
Ross et al. 2022). For example, female researchers compared to male researchers frequently fare worse regarding work recognition, compensation, grant funding outcomes, teaching evaluations, hiring, or tenure promotion. Moreover, despite increasing awareness of gender biases in academia and initiatives to address them, necessary change appears to be slow and substantial inequality remains (
Llorens et al. 2021).
In this context, undercitation of female scholars is a particularly detrimental type of gender difference that has profound negative effects on academic visibility and career advancement (
Andersen et al. 2020;
Dworkin et al. 2020;
Llorens et al. 2021). Several recent studies have demonstrated that reference lists of scientific publications tend to include more papers with men as first and last authors (i.e., prominent authorship positions, frequently indicating the lead author and supervisor of the study) than one would expect based on the gender distribution of authors in specific fields. This pattern has consistently been shown across several fields, including neuroscience (
Dworkin et al. 2020;
Fulvio et al. 2021), physics (
Teich et al. 2022), astronomy (
Caplar et al. 2017), international relations (
Maliniak et al. 2013), and political science (
Sa et al. 2020). These studies suggest that male overcitation is a pervasive and universal pattern in science, with widespread negative implications for female scholars.
However, previous studies on citation inequality in science have mainly investigated fields with author distributions strongly skewed toward male researchers. In these fields, sociological theory predicts that a combination of implicit or explicit favouritism towards similar group members (e.g., individuals of the same gender, particular by individuals in influential positions;
McPherson et al. 2001) and systemic factors (
Dworkin et al. 2020;
Llorens et al. 2021;
Ross et al. 2022), contribute to overcitation of male scientists.
Therefore, the present study investigated citation inequality in a field, which has traditionally been more accessible to female scientists, i.e., speech–language pathology (SLP). For example, males make up only 2.5% of speech and language pathologists in the United Kingdom (
Litosseliti and Leadbeater 2013), and a recent survey in 13 different countries revealed that on average, only 5% of registered speech–language therapists are male (range: 2%–13%;
Speechguys 2017). Similarly, the percentage of male academics registered with different professional or scientific SLP organizations in 2021 ranged from 4.5%–22% (these numbers were acquired in personal communications with representatives of Speech Pathology Australia, American Speech–Language-Hearing Association, The Academy of Aphasia, and The German Association for Aphasia Research and Treatment). This field-specific pattern of gender distribution, and the resulting gender author distribution in SLP publications (see below), allows investigating whether (male) citation imbalance is a universal phenomenon, even in fields that are less “dominated” by males, or whether citation patterns are influenced by the author gender distribution in specific fields.
Materials and methods
Data collection
Data analyses were based on the following journals included in the Web of Science category “Audiology and Speech and Language Therapy”: American Journal of Speech and Language Pathology (available from 1998 to 2020), Aphasiology (1988–2020), Augmentative and Alternative Communication (2005–2020), Communication Sciences and Disorders (2015–2020), Folia Phoniatrica et Logopaedica (1994–2020), International Journal of Language & Communication Disorders (1998–2020), International Journal of Speech–Language Pathology (2008–2020), Journal of Communication Disorders (1967–2020), Journal of Fluency Disorders (1977–2020), Journal of Speech–Language and Hearing Research (1997–2020), Journal of Voice (1990–2020), Language and Speech (1958–2020), Language Speech and Hearing Services in Schools (1995–2020), and Seminars in Speech and Language (2012–2020). Journal selection was also guided by a short email survey asking 10 academics (5 males/females) working in speech and language therapy for journals representing their field. Journals from audiology and those publishing mainly experimental or neuroscientific research (e.g., brain and language) and general medical journals were not considered.
We downloaded all available published articles between 1958 and 2021 and included original articles, review articles, and proceeding papers that were labeled with a digital object identifier (DOI). The data downloaded for each paper included author names, reference lists, publication dates, and DOIs, and we obtained information referencing behaviour by matching DOIs contained within a reference list to DOIs of papers included in the dataset.
Data analyses
Determination and interpretation of authors’ gender
Even though the terms sex and gender are related, the former refers to biological characteristics of an individual (e.g., genetics influencing the development of internal or external reproductive organs or hormone expression), while gender comprises the social, environmental, cultural, and behavioral factors that influence a person’s self-identity (
Clayton and Tannenbaum 2016). In the present study, the term gender is operationally defined based on previous publications (
Dworkin et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022). Specifically, the term “gender” does not directly refer to the actual sex or gender of the author. Rather, “gender” in the present analysis is a function of the gender assumed to be associated with each author via the probability of gender assigned at birth for each name; the actual sex or gender of the authors is not and cannot be identified with this method. Moreover, we would like to emphasize that this approach leads to a gender binary, which does not reflect more nuanced definitions of gender as a social construct. Moreover, to allow for direct comparison with previous publications (
Dworkin et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022), we did not consider additional factors like race or ethnicity in our analyses that might be relevant for this specific field (e.g., only ∼8% of registered speech–language therapists in the United States are from a minority background
American Speech-Language-Hearing Association 2020) or their intersection with gender.
Authors’ first names were extracted from the published paper. Whenever this was not possible, we searched the names using CrossRef API (
www.crossref.org). If first names were not available on CrossRef, we searched for them on the journal webpage. After first names were determined, gender was assigned to first names using the “gender” package in R with the Social Security Administration baby name data set. For names that were not included in the R package, gender was assigned using
https://genderize.io/, a freely available service that contains roughly 250.000 names. Please note that in the original study (
Dworkin et al. 2020) a different, paid service was used (
https://gender-api.com). As suggested in the literature, we assigned “man” (“woman”) to each author if their name had a probability ≥ 0.70 of belonging to someone labeled as “man” (“woman”;
Dworkin et al. 2020;
Wang et al. 2021;
Teich et al. 2022). Overall, gender could be assigned to both the first and last authors in 89.51% of the papers in our dataset. Of the 10.49% of papers with missing data, the first or last author’s name either had uncertain gender (6.67%) or was not available (3.82%).
Computation of gender citation balance indices
For detailed information, code of the analysis, and explanatory formulas, please refer to the original study (
Dworkin et al. 2020). By assigning gender to authors’ first names of the manuscripts and citations, we created four main categories for our analyses: man as first author, man as last author (MM); man first, woman last (MW); woman first, man last (WM); woman first, woman last (WW). Only first and last author names were considered since in many fields of science, the first and last author positions are the most prominent ones in scientific publications and therefore most important regarding career advancement. Please note, SLP typically follows this pattern at present, but the exact timing of adopting this practice is unclear and likely varied by sub-fields or journals.
We then computed two gender citation balance indices for each of the four gender categories (i.e., WW, MM, WM, and MW): (1) gender citation balance indices relative to all literature and (2) gender citation balance indices relative to the conditional citation gap (i.e., accounting for other characteristics that may influence citations). Both indices are computed using the following formula:
Positive values correspond to more frequent citations than expected and negative values correspond to less frequent citations than expected.
The difference between the two indices is how the expected proportion is computed. Specifically, in the “gender citation index relative to all literature” expected proportions are handled as if they were random draws. This analysis treats the probability to cite a paper of a given gender category (WW, MM, WM, and MW) as a function of how many papers are published by authors of a given category, i.e., the more papers published by WW authors, the more these papers get cited.
The expected proportion in the “gender citation balance indices relative to the conditional citation gap” were calculated to account for various characteristics that may be associated with citation rates. For this analysis, we fit a generalized additive model on the multinomial outcome (MM, MW, WW, and WM) using R package “
mgcv”. Following the methodology reported in the original study (
Dworkin et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022), the model’s features were (1) year of publication by journal, (2) the combined number of publications by the first and last authors (seniority), (3) the number of authors on the paper, (4) the journal in which it was published and (5) the type of publication (e.g., original article or review). This model corresponds to the model used by the original study (
Dworkin et al. 2020) with the exception that publication date (i.e., the first year of publication of individual journals) was fitted separately for each journal. This was done to account for the possibility that less established journals might show a different gender distribution than well-known journals. When this model is then applied to each paper, it yields a set of probabilities that the paper belongs to the MM, WM, MW, and WW categories and citation rates can be predicted as if the citation was independent of the authors’ gender. The expected proportions computed from these predictions are corrected for factors other than gender influencing the citation probability. Replicating the code provided by the original study (
Dworkin et al. 2020) allows for direct comparison of our results to those of other studies in different fields of science that used the same approach (
Dworkin et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022).
The last analysis used the gender citation index based on the conditional cite gaps, but also based on the gender of the citing authors; i.e., we investigated whether the gender of the citing authors influences the citation pattern. This analysis included the aggregation of all publications that had a female first or last author, or both (i.e., creating an additional, combined W or W group), since the groups WW, WM, and MW showed a similar citation pattern. Confidence intervals were created by bootstrapping citing papers to maintain the dependence structures within citing articles.
Discussion and conclusions
Our results show that the author gender distribution in SLP publications is currently more balanced compared to previously investigated fields of science that are largely dominated by male authors (
Dworkin et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022). Moreover, for the last decade, our results demonstrate a citation pattern in SLP that (a) overall tends to favor female authors, (b) persists after controlling for potential confounding factors, but to a lesser extent, and (c) is particularly strong for publications involving female first and last author teams. While it is common that male scientists receive more citations than expected in many fields of science (
Llorens et al. 2021), our results emphasize the contribution of author gender distributions in specific sub-fields of science.
Previous studies interrogating citation inequity have highlighted a pervasive pattern of undercitation of female scholars (
Maliniak et al. 2013;
Caplar et al. 2017;
Dworkin et al. 2020;
Sa et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022). However, these studies have mainly investigated fields where author distributions are strongly skewed towards men (e.g., neuroscience, physics, or astronomy). Even in recent years, the majority of publications in these fields were contributed by male authors. The most dramatic example comes from contemporary physics, where only 3.6% of articles were contributed by WW author teams and only 33% involved at least one female first or senior author (
Teich et al. 2022). Indeed, similar author team distributions have been reported in all previously investigated fields to date. For example, in neuroscience, 78–90% of publications involved male authors, either as first or senior author, or both (
Dworkin et al. 2020;
Fulvio et al. 2021). In contrast, the present study investigated for the first time citation patterns in a field where the majority of recent publications have women in prominent authorship positions (i.e., ∼75% of publications with a woman as first and/or senior author over the last two decades, including ∼30% WW author teams, compared to ∼65% of publications with a woman as first and/or senior author over the last two decades; see
Fig. 2). For this specific field, we demonstrate that author teams involving at least one female author are citing publications authored by female first and last author teams (WW) 8.5% more than expected. This suggests that the degree of undercitation of women (and potential negative downstream effects on visibility and career advancement) is at least partly rooted in authorship distributions of specific fields of science, rather than being a universal pattern.
Nonetheless, the present research only considered one specific field of science that differs substantially from previously investigated fields (e.g., neuroscience, physics) and we did not consider other potentially relevant factors that may contribute to field specific citation practices like field size, subfield structure, or journal impact factors. However, by keeping the analytical approach consistent with previous publications (
Dworkin et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022), we aimed to facilitate direct “historical” comparisons between these specific subfields. To establish a causal link between author gender distributions and gendered citation practices, future research is needed that directly compares fields with varying author gender distributions and also considers potential field-specific idiosyncrasies in citation practices.
Three additional limitations need to be acknowledged: As in previous studies (
Dworkin et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022), first names of authors were identified using a probabilistic and binary gender algorithm, which may not reflect the actual self-identification of authors. However, because we used the same algorithm as in previous publications, the overall results of this study are still comparable to previous work. Moreover, in line with previous publications, we only considered first and last author gender in our study. While SLP currently follows standard authorship conventions (lead and last authors as most prominent positions), senior authorship in earlier publications may have followed a different convention (e.g., indicated by the second author). This potential confound may have affected the historical trends in authorship analysis, especially the earlier years that are reported.
Finally, actual gender self-identification may be unknown to the citing authors and obscured by citing authors’ assumptions of gender identity. It also needs to be noted that relative to the large number of publications/citations that were included in some previous studies (e.g., physics, neuroscience), we investigated a much smaller field of science (SLP), which produces relatively fewer publications/citations. This also prevented investigation of temporal trends in authorship across the past decade, which would have allowed more direct investigation of whether changes in authorship distributions over time result in changes in gendered citation inequality. However, no clear pattern emerged from the temporal analysis (not reported) and wide error bars suggested that no inference can be made from these analyses.
Nonetheless, the overall pattern of the results in a field that is largely dominated by female scientists, are consistent with those reported previously in fields dominated by men (with widely varying numbers of analyzed publications,
N = 2.069–1.067.276;
Dworkin et al. 2020;
Fulvio et al. 2021;
Wang et al. 2021;
Teich et al. 2022), even though the direction of the effects (favouring female vs. male scientists, respectively) was different. This suggests that citation differences depend at least to some degree on authorship distribution in specific fields, which also has implications for potential interventions aimed at reducing gender-based inequality in science.
Specifically, while the exact mechanisms underlying gender-based citation inequity are currently unknown and the present study was not designed to pinpoint the many potential individual and systemic factors that mediate the observed citations below expectations of female or male authors (for discussions see
Dworkin et al. 2020;
Llorens et al. 2021), the present study highlights the contribution of a common human factor to citation bias in science (i.e., “homophily”;
McPherson et al. 2001). This concept describes the tendency of individuals to associate, bond, or (implicitly or explicitly) favour other individuals that are similar with regard to age, race, and social status, but also gender. Within this framework, it would be expected that scientific fields dominated (either with regard to numbers or degree of influence on a given field) by men or women alike show a bias towards citing work (or “appreciation”;
Ross et al. 2022) of their own “gender in-group”. In the broader context of the current literature on disadvantages female scholars are faced with (
Llorens et al. 2021), this theory provides a framework to further investigate currently reported patterns of citation differences and to establish causal links between general or field-specific factors and gendered citation disparity. Such analyses will also provide context for studying cumulative bias against individuals perceived as belonging to multiple marginalized groups (e.g., black women). However, the present analyses were focused specifically on studying gender-based citation inequality and the comparison with previous publications using methods comparable to those previously used with author gender distributions skewed towards male scientists.
Indeed, while the current data suggest that male citation imbalance may be reduced in fields with more balanced authorship distributions, other types of disparities and biases may certainly have a negative impact on career advancement of women. For example, a recent review that focused on implicit gender biases in communication science and disorders provided an overview of why more men advance in academic careers than expected in this academic subfield (
Rogus-Pulia et al. 2018). It was suggested that this imbalance is explained by several implicit gender stereotypes (i.e., favouring men in hiring and evaluation processes, backlashes for agentic behaviour in women, and bias in funding), which are also likely applicable to SLP. However, implicit stereotypes are not at odds with our findings, because citations are a necessary but not sufficient condition for academic career advancement. Furthermore, the review emphasizes the importance of the intersection of gender and race, showing specific stereotypes against Black and White women (
Rogus-Pulia et al. 2018). The intersectionality of biases was also emphasized in a recent work discussing the effect of inequality in peer review in communication science and disorders reducing the likelihood of publication, and in turn, citation rates in marginalized groups (
Girolamo et al. 2022). This is augmented by the fact that marginalized groups are more likely to investigate topics dealing with diversity, which are also less likely to be published (
Girolamo et al. 2022). In addition, a recent citation analysis in neuroscience used a deep learning algorithm to generate categories based on race/ethnicity (e.g., White, Hispanic, Asian, and Black) using first name probabilities and showed that White individuals were cited more than expected. Moreover, they also demonstrated an intersection of gender and race/ethnicity regarding citation patterns, with gender as primary explanatory factor (i.e., White men were most overcited and men of color less overcited, but still cited more than expected;
Bertolero et al. 2020). Hence, citation analyses based on naming probabilities are suitable to detect multiple inequities and future studies are needed to further analyse the complex and multifactorial causes underlying disadvantages of marginalized groups in science.
In sum, our results highlight the contribution of field-specific author gender distribution to citation inequity. Implementing effective measures to increase the number and influence of under-represented individuals in specific fields of science (e.g., by connecting them to influential scholars;
Verhoeven et al. 2020) may contribute to reducing downstream disadvantages regarding visibility and career advancement.