We provide here the technical documentation for the forthcoming paper: Alexander, N. N., Stewart, Q, & Basil, G. (under review). Examining Notions of Racism in STEM: A Quantitative Historical Analysis.

1 RESEARCH QUESTIONS

What is the intellectual and conceptual structure of research on racism in science, technology, engineering, and mathematics (STEM)?
How are notions of racism in the research on STEM distributed across different racialized social systems?

2 METHOD

The primary goal of this study was (1) to frame and understand the intellectual and conceptual structure of research on racism in STEM and (2) to analyze the different notions of racism and their geographical distribution. This study is based on notes from three special issue collections that analyze different conceptual foundations of racism in STEM: (1) Martin, Valoyes-Chávez, and Valero (2024) describe a set of notions, and features of racialized social systems related to the geopolitical contexts of racism in mathematics education research, (2) Vakil and Ayers (2019) who discuss various approaches to balancing the various complexes (STEM industrial complex, militarism, etc.) that have been used to position STEM as a savior to structural problems, and (3) Nxumalo and Gitari (2021) who explore the possibility of STEM for liberatory purposes. These three issues enter the discussion on science, technology, engineering, and mathematics education from different theoretical endpoints.

Based on our identification of various national anti-discrimination laws, we put into conversations these various theoretical entry points and notions of racism to explore the differential relationship between a set of notions utilized across geopolitical contexts. Namely, we explore the note in Martin, Valoyes-Chávez, and Valero (2024) to confirm different sociopolitical discourses across a set of racialized social systems.

We use the bibliometrix() and quanteda() R packages to analyze the citation records and test the hypothesis for the study. The bibliometrix package was used to conduct the summary and performance analysis, and to conduct the science mapping. The scientific and network maps were used to inform the final synthesis and analysis of data in the quanteda package, which focuses on the analysis of textual data. In this second phase of data analysis, priority was placed on the use and conceptualization of the various notions by geographical region or, as noted by Martin, Valoyes-Chávez, and Valero (2024), as the “racialized social systems” that we analyzed in STEM education.

2.1 DATA

We follow the framework steps to acquire the data for the study, and use the PRISM guidelines to scope and review the data.

2.1.1 Scoping

The data for the study comes from the Web of Science (WoS) Core Collection. Our initial scoping process included a set of iterative steps to make sense of the global research literature on the various notions of racism in STEM. We prioritized three citation indexes in our searches between the period from 2015 to 2024. Our analysis focused on journal articles written in English in the Education, Special Education, and related Education Scientific Disciplines.

Science Citation Index Expanded, SCI-EXPANDED (2002-present)
Social Sciences Citation Index SSCI (2002-present)
Arts and Humanities Citation Index ACHI (2002-present)
Emerging Sources Citation Index ESSI (2012-present)

Timespan: 2014-01-01 to 2024-12-31

Document Types: Article

2.1.1.1 Notions of racism and STEM

Our exploratory search process was conducted over the course of four months and was primarily used to develop the code and software needed to analyze the final data set that would be used for the study. These initial search parameters were related to the conceptual components of the study analysis. Variation was sometimes present during this stage to make sense of the initial structure and estimate of the counts for the data.

2.1.1.1.1 ALL=(racism AND STEM)

https://www.webofscience.com/wos/woscc/summary/d56f4376-da3f-4860-bef4-fc1f3b43dd98-01412dcc90/relevance/1

Returned \(207\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.2 ALL=(“white supremacy” AND STEM)

https://www.webofscience.com/wos/woscc/summary/ab11e0dc-99e5-48a4-b6ab-a32a5d8dcfaf-01415ff5dd/relevance/1

Returned \(15\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.3 ALL=(nationalism AND STEM)

https://www.webofscience.com/wos/woscc/summary/605ee34a-413f-4752-9f65-0872f23867b6-0141601cc5/relevance/1

Returned \(32\) results results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.4 ALL=(xenophobia AND STEM)

https://www.webofscience.com/wos/woscc/summary/3da3d792-2897-42e8-acb2-be528a92b9a0-0141603068/relevance/1

Returned \(6\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.5 ALL=(colonialism AND STEM)

https://www.webofscience.com/wos/woscc/summary/81abe434-a7c3-431d-a876-d7b12235fdbb-0141604271/relevance/1

Returned \(42\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.6 ALL=(antiasian AND STEM)

https://www.webofscience.com/wos/woscc/summary/4f4bf54e-fdd6-4e51-b423-7ab4a544e1e0-014160d61d/relevance/1

Returned \(1\) result from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.7 ALL=(anti-Asian AND STEM)

https://www.webofscience.com/wos/woscc/summary/f0d26c1f-993e-47da-8cc7-3ec8f537e2f3-014160dea4/relevance/1

Returned \(4\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.8 ALL=(antiblack* AND STEM)

https://www.webofscience.com/wos/woscc/summary/fd9fdc18-3a2d-4800-942e-f4c0138117d4-014160e8b1/relevance/1

Returned \(3\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.9 ALL=(anti-Black* AND STEM)

https://www.webofscience.com/wos/woscc/summary/1d44f59a-bd5c-4924-94ed-b2f8922ca6e6-0141611000/relevance/1

Returned \(17\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.2 Document search and inclusion

A final search for bibliometric data was conducted by the researchers based on findings from the scoping process (the series of initial searches). This search was also modified during the review and revision process for the manuscript. These revisions focused on important considerations provided by the reviewers. A set of final inclusion criteria were set for the study and followed. These criteria are outlined in the table below.

Inclusion and exclusion criteria for the study
Code	Criteria
IC1	Article contains STEM and one of the notions in the title (TI) or abstract (AB): racism, “white supremacy,” colonialism, xenophobia, nationalism, antiasian, anti-Asian[], antiblack, Anti-Black[]
IC2	Article published between 2014 and 2024
IC3	Article originally written in English
IC4	Article is a journal article
IC5	Article purpose or core questions center on the topical subjects of analysis

2.1.2 Review

Key Columns of Interest:

AU: Authors of the publication
AB: Abstract text
TI: Title of the publication
AU_CO: Countries of the authors
SC: Subject categories (e.g., “Education & Educational Research”)
PY: Publication year
TC: Total citations

2.1.3 Reduction

We then review the data frame for exclusion of both columns that we do not need and articles that do not meet the inclusion criteria. The data come from a total of 338 sources. Any records were removed based on the inclusion and exclusion criteria. For example, one paper based on publication year (PY) was removed from the data and nine papers that did not meet the inclusion criteria based on document type (DT) were removed. The final data set contained 420 records.

3 ANALYTIC FRAMEWORK

4 FINDINGS

4.1 Descriptive (performance) analysis

Summary of the data set and documents.

4.1.1 Publication-related metrics

4.1.1.1 Main information

Main information about the collection.

S[2] # Main information

## $MainInformationDF
##                           Description   Results
## 1         MAIN INFORMATION ABOUT DATA          
## 2                            Timespan 2014:2024
## 3      Sources (Journals, Books, etc)       337
## 4                           Documents       419
## 5                Annual Growth Rate %     21.16
## 6                Document Average Age      4.06
## 7           Average citations per doc      11.2
## 8  Average citations per year per doc      1.97
## 9                          References     22454
## 10                     DOCUMENT TYPES          
## 11                            article       400
## 12              article; early access        19
## 13                  DOCUMENT CONTENTS          
## 14                 Keywords Plus (ID)       749
## 15             Author's Keywords (DE)      1396
## 16                            AUTHORS          
## 17                            Authors      1098
## 18                 Author Appearances      1136
## 19    Authors of single-authored docs       184
## 20              AUTHORS COLLABORATION          
## 21               Single-authored docs       189
## 22               Documents per Author     0.382
## 23                 Co-Authors per Doc      2.71
## 24     International co-authorships %     8.831
## 25

4.1.1.2 Publications by year

S[3] # Article count by year

## $AnnualProduction
##    Year    Articles
## 1     2014       11
## 2     2015       14
## 3     2016       17
## 4     2017       17
## 5     2018       21
## 6     2019       25
## 7     2020       34
## 8     2021       67
## 9     2022       67
## 10    2023       71
## 11    2024       75

Year wise publication of papers. Note: This figure represents the publication trend of articles between 2014 and 2024. The data was retrieved from the Web of Science (WoS) database in the subject areas related to education using the keyword patterns based on the set of notions identified in Martin, Valoyes-Chávez, and Valero (2024).

year_counts <- M4 %>%
  group_by(PY) %>%
  summarise(count = n())

# Your existing plot code
pubs_by_year <- ggplot(year_counts, aes(x = PY, y = count)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = count), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, 
            size = 3) +
  geom_smooth(method = "loess", se = FALSE, color = "blue", size = 0.5, linetype = "dotted") +
  theme_minimal() +
  labs(x = "Year", y = "Number of Publications", 
       title = "") +
  scale_x_continuous(breaks = seq(min(year_counts$PY), max(year_counts$PY), by = 1)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

pubs_by_year

## `geom_smooth()` using formula = 'y ~ x'

# Save the plot as a high-resolution PNG
ggsave("plots/publications_by_year.png", plot = pubs_by_year, width = 10, height = 8, units = "in", dpi = 300, bg = "white")

## `geom_smooth()` using formula = 'y ~ x'

4.1.1.3 Most productive authors

S[5] # Most productive authors

## $MostProdAuthors
##    Authors        Articles Authors        Articles Fractionalized
## 1     MCGEE EO           9   MCGEE EO                        5.28
## 2     LEYVA LA           4   MYRTAJ ML                       2.00
## 3     BROCKMAN AJ        3   SPENCER BM                      2.00
## 4     DANCY M            3   SPARKS DM                       1.50
## 5     MCNEILL RT         3   BROCKMAN AJ                     1.42
## 6     WHITE DT           3   RUSSO-TAIT T                    1.33
## 7     ANTONIO MCK        2   BROOKS E                        1.14
## 8     BENTON A           2   LEYVA LA                        1.03
## 9     BROOKS E           2   ABRICA EJ                       1.00
## 10    KEAULANA S         2   AKINS H                         1.00

4.1.1.4 Most cited papers

S[6] # Most cited papers

## $MostCitedPapers
##                            Paper                                      DOI  TC TCperYear   NTC
## 1  MCGEE EO, 2016, AM EDUC RES J          10.3102/0002831216676572        231     23.10  5.37
## 2  MCGEE EO, 2020, EDUC RESEARCHER        10.3102/0013189X20972718        226     37.67 10.04
## 3  CHAVEZ-DUEÑAS NY, 2019, AM PSYCHOL     10.1037/amp0000289              192     27.43  9.06
## 4  WEN J, 2020, ANATOLIA                  10.1080/13032917.2020.1730621   148     24.67  6.58
## 5  TATE SA, 2018, ETHICS EDUC             10.1080/17449642.2018.1428718    95     11.88  4.33
## 6  MCCOY DL, 2015, J DIVERS HIGH EDUC     10.1037/a0038676                 89      8.09  6.32
## 7  KIIK L, 2016, EURASIAN GEOGR ECON      10.1080/15387216.2016.1198265    79      7.90  1.84
## 8  SLAUGHTER-ACEY JC, 2016, ANN EPIDEMIOL 10.1016/j.annepidem.2015.10.005  76      7.60  1.77
## 9  MCGEE EO, 2019, TEACH COLL REC         NA                               71     10.14  3.35
## 10 AYMER SR, 2016, J HUM BEHAV SOC ENVI   10.1080/10911359.2015.1132828    69      6.90  1.60

4.1.1.5 Main sources

S[9] # Main sources (journals)

## $MostRelSources
##                                                       Sources        Articles
## 1  FRONTIERS IN EDUCATION                                                   7
## 2  JOURNAL OF CHEMICAL EDUCATION                                            6
## 3  CULTURAL STUDIES OF SCIENCE EDUCATION                                    5
## 4  JOURNAL OF WOMEN AND MINORITIES IN SCIENCE AND ENGINEERING               5
## 5  RACE ETHNICITY AND EDUCATION                                             5
## 6  ETHNIC AND RACIAL STUDIES                                                4
## 7  INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH        4
## 8  INTERNATIONAL JOURNAL OF STEM EDUCATION                                  4
## 9  JOURNAL OF RACIAL AND ETHNIC HEALTH DISPARITIES                          4
## 10 JOURNAL OF RESEARCH IN SCIENCE TEACHING                                  4

4.1.2 Citation-related metrics

4.1.2.1 Most frequently cited documents

The top 21 most cited papers. A total of 21 papers was chosen based on ties with the top 15 (8 citations).

# M4$CR[1] # identify separators
# Most frequently cited documents in the collection
CR <- citations(M4, field = "article", sep = ";")
cbind(CR$Cited[1:21])

##                                                                                                                        [,1]
## CRENSHAW K, 1993, STANFORD LAW REVIEW VOL 43, NO 6, JULY 1991, P1241                                                     21
## BRAUN V, 2021, QUAL RES PSYCHOL, V18, P328, DOI 10.1080/14780887.2020.1769238                                            20
## MCGEE EO, 2016, AM EDUC RES J, V53, P1626, DOI 10.3102/0002831216676572                                                  20
## BONILLA-SILVA E., 2010, RACISM WITHOUT RACISTS: COLOR-BLIND RACISM AND THE PERSISTENCE OF RACIAL INEQUALITY IN AMERICA   19
## CARLONE HB, 2007, J RES SCI TEACH, V44, P1187, DOI 10.1002/TEA.20237                                                     18
## ONG M, 2011, HARVARD EDUC REV, V81, P172, DOI 10.17763/HAER.81.2.T022245N7X4752V2                                        17
## MCGEE EO, 2011, AM EDUC RES J, V48, P1347, DOI 10.3102/0002831211423972                                                  16
## SUE DW, 2007, AM PSYCHOL, V62, P271, DOI 10.1037/0003-066X.62.4.271                                                      16
## SOLÓRZANO D, 2000, J NEGRO EDUC, V69, P60                                                                                14
## DELGADO R., 2012, CRITICAL RACE THEORY: AN INTRODUCTION, DOI DOI 10.2307/J.CTT1GGJJN3                                    13
## LADSONBILLINGS G, 1995, TEACH COLL REC, V97, P47                                                                         13
## MCGEE EO, 2017, COGNITION INSTRUCT, V35, P265, DOI 10.1080/07370008.2017.1355211                                         13
## YOSSO TJ, 2017, CRITICAL RACE THEORY IN EDUCATION: ALL GOD'S CHILDREN GOT A SONG, 2ND EDITION, P113                      13
## MARTIN DB, 2009, TEACH COLL REC, V111, P295                                                                              12
## MCGEE EO, 2020, EDUC RESEARCHER, V49, P633, DOI 10.3102/0013189X20972718                                                 12
## MCGEE E.O., 2020, BLACK, BROWN, BRUISED: HOW RACIALIZED STEM EDUCATION STIFLES INNOVATION                                11
## MILNER H. R., 2007, EDUCATIONAL RESEARCHER, V36, P388, DOI 10.3102/0013189X07309471, DOI 10.3102/0013189X07309471        10
## ONG M, 2018, J RES SCI TEACH, V55, P206, DOI 10.1002/TEA.21417                                                           10
## WOLFE P, 2006, J GENOCIDE RES, V8, P387, DOI 10.1080/14623520601056240                                                   10
## IRELAND DT, 2018, REV RES EDUC, V42, P226, DOI 10.3102/0091732X18759072                                                   9
## MCGEE E, 2017, AM J EDUC, V124, P1, DOI 10.1086/693954                                                                    9

4.2 Global Structure

4.2.1 Most productive countries

S[7]# Most productive countries

## $MostProdCountries
##           Country Articles    Freq SCP MCP MCP_Ratio
## 1  USA                 239 0.57869 225  14    0.0586
## 2  UNITED KINGDOM       34 0.08232  30   4    0.1176
## 3  CANADA               30 0.07264  25   5    0.1667
## 4  AUSTRALIA            17 0.04116  13   4    0.2353
## 5  CHINA                 8 0.01937   8   0    0.0000
## 6  SOUTH AFRICA          8 0.01937   7   1    0.1250
## 7  BRAZIL                6 0.01453   4   2    0.3333
## 8  INDIA                 5 0.01211   5   0    0.0000
## 9  IRELAND               5 0.01211   4   1    0.2000
## 10 CROATIA               4 0.00969   4   0    0.0000

4.2.2 Total citations by country

S[8] # Country citations

## $TCperCountries
##      Country      Total Citations Average Article Citations
## 1  USA                       3143                     13.15
## 2  UNITED KINGDOM             400                     11.76
## 3  CANADA                     294                      9.80
## 4  AUSTRALIA                  222                     13.06
## 5  BRAZIL                      82                     13.67
## 6  ESTONIA                     79                     79.00
## 7  ISRAEL                      62                     20.67
## 8  ECUADOR                     58                     58.00
## 9  INDIA                       58                     11.60
## 10 CHINA                       46                      5.75

# country scientific collaboration
# Create a country collaboration network
M4_tags <- metaTagExtraction(M4, Field = "AU_CO", sep = ";")
NetMatrix2 <- biblioNetwork(M4_tags, analysis = "collaboration", network = "countries", sep = ";")

4.2.3 Top countries by publication count

# Country Distribution
country_distribution <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  count(AU_CO) %>%
  slice_max(n, n = 10) %>%
  ggplot(aes(x = reorder(AU_CO, n), y = n)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top Countries by Publication Count",
    x = "Country",
    y = "Number of Publications"
  )

country_distribution

4.2.4 Country collaboration network

# Plot the network
net2a = networkPlot(NetMatrix2, 
                  n = 10, 
                  Title = "Country Collaboration", 
                  type = "fruchterman", 
                  size = TRUE, 
                  remove.multiple = FALSE,
                  labelsize = 0.8,
                  cluster = "none")

4.2.5 Keywords by country

# Bar plot of top keywords by country
ggplot(country_keywords %>% top_n(10, unique_keywords), 
       aes(x = reorder(AU_CO, unique_keywords), y = unique_keywords)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  coord_flip() +
  labs(
    title = "Top Keywords by Country",
    x = "Country",
    y = "Number of Unique Keywords"
  ) +
  geom_text(
    aes(label = unique_keywords), 
    hjust = -0.3, 
    size = 3
  ) +
  theme_minimal()

# Create a function to extract unique keywords by country
extract_country_keywords <- function(data) {
  # Split countries and keywords
  country_keywords <- data %>%
    separate_rows(AU_CO, sep = ";") %>%
    separate_rows(DE, sep = ";") %>%
    group_by(AU_CO, DE) %>%
    summarise(
      keyword_count = n(),
      .groups = 'drop'
    ) %>%
    group_by(AU_CO) %>%
    mutate(
      country_total_keywords = sum(keyword_count),
      keyword_percentage = (keyword_count / country_total_keywords) * 100
    ) %>%
    arrange(AU_CO, desc(keyword_percentage)) %>%
    filter(keyword_percentage > 5)  # Focus on significant keywords
}

# Apply the function to M4_tags
M4_tags_country <- extract_country_keywords(M4_tags)

4.2.6 Top keywords by country

Top keywords for country when keyword appears 4 or more times.

# View top keywords for each country
M4_tags_country %>% 
  filter(keyword_count > 3) %>% 
  print()

## # A tibble: 23 × 5
## # Groups:   AU_CO [5]
##    AU_CO     DE                           keyword_count country_total_keywords keyword_percentage
##    <chr>     <chr>                                <int>                  <int>              <dbl>
##  1 AUSTRALIA " WELLBEING"                            14                    176               7.95
##  2 AUSTRALIA " COVID-19"                             10                    176               5.68
##  3 AUSTRALIA " INDIGENOUS HEALTH"                    10                    176               5.68
##  4 AUSTRALIA " PUBLIC HEALTH"                        10                    176               5.68
##  5 AUSTRALIA "ABORIGINAL HEALTH"                     10                    176               5.68
##  6 BRAZIL    " RACE"                                  7                     83               8.43
##  7 BRAZIL    " ILLEGAL FOSSIL TRADE"                  6                     83               7.23
##  8 BRAZIL    " LATIN AMERICA"                         6                     83               7.23
##  9 BRAZIL    " PALAEONTOLOGICAL HERITAGE"             6                     83               7.23
## 10 BRAZIL    " PARACHUTE SCIENCE"                     6                     83               7.23
## # ℹ 13 more rows

# If you want to work with keywords
library(dplyr)
# Create a summary of publications by country
country_summary <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  group_by(AU_CO) %>%
  summarise(
    article_count = n(),
    unique_authors = n_distinct(AU),
    total_citations = sum(TC, na.rm = TRUE)
  ) %>%
  arrange(desc(article_count))

# Create a kable table for visualization
library(knitr)
country_summary %>%
  kable(
    col.names = c("Country", "Article Count", "Unique Authors", "Total Citations"),
    caption = "Publication Summary by Country"
  )

## Warning: 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")

## Warning: 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")

# Prepare data for top countries
top_countries_keywords <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  separate_rows(DE, sep = ";") %>%
  group_by(AU_CO, DE) %>%
  summarise(
    keyword_count = n(),
    .groups = 'drop'
  ) %>%
  # Filter for top 10 countries by article count
  filter(AU_CO %in% c('USA', 'UNITED KINGDOM', 'CANADA', 'AUSTRALIA', 
                      'CHINA', 'BRAZIL', 'GERMANY', 'KOREA', 
                      'SOUTH AFRICA', 'SWEDEN')) %>%
  group_by(AU_CO) %>%
  slice_max(order_by = keyword_count, n = 3) %>%
  arrange(AU_CO, desc(keyword_count))

# Create a table
kable(top_countries_keywords, 
      col.names = c("Country", "Keyword", "Keyword Count"),
      caption = "Top Keywords by Country")

# Create a visualization
ggplot(top_countries_keywords, 
       aes(x = AU_CO, y = keyword_count, fill = DE)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Top Keywords by Country",
    x = "Country",
    y = "Keyword Count",
    fill = "Keywords"
  ) +
  theme_minimal() +
  coord_flip()

4.2.7 Status of publications by countries

country.counts <- as.data.frame(S[7]) # Article counts by country
country.citations <- as.data.frame(S[8]) # Citations by country
country <- country.counts %>% 
  merge(country.citations, by.x = 1, by.y = 1,
        all=T)
names(country) <- c("Country", "Articles", "Freq", "SCP", "MCP", "MCP_Ratio", "Total_Citations", "Average_Citations")

country %>% 
  arrange(desc(Articles)) %>% 
  na.omit() -> country_summary

country_summary

##          Country Articles    Freq SCP MCP MCP_Ratio Total_Citations Average_Citations
## 1 USA                 239 0.57869 225  14    0.0586            3143             13.15
## 2 UNITED KINGDOM       34 0.08232  30   4    0.1176             400             11.76
## 3 CANADA               30 0.07264  25   5    0.1667             294              9.80
## 4 AUSTRALIA            17 0.04116  13   4    0.2353             222             13.06
## 5 CHINA                 8 0.01937   8   0    0.0000              46              5.75
## 7 BRAZIL                6 0.01453   4   2    0.3333              82             13.67
## 8 INDIA                 5 0.01211   5   0    0.0000              58             11.60

4.2.8 Keywords and Keywords Plus

4.2.8.1 Author Keywords and Keywords-Plus

S[10] # Author Keywords and Keywords-Plus

## $MostRelKeywords
##    Author Keywords (DE)      Articles Keywords-Plus (ID)     Articles
## 1           RACISM                 45            RACE              54
## 2           RACE                   27            EXPERIENCES       37
## 3           STEM                   24            SCIENCE           34
## 4           NATIONALISM            23            WOMEN             34
## 5           HIGHER EDUCATION       17            EDUCATION         33
## 6           COVID-19               15            IDENTITY          28
## 7           COLONIALISM            14            STUDENTS          28
## 8           DIVERSITY              14            HEALTH            26
## 9           GENDER                 14            COLOR             20
## 10          EQUITY                 13            GENDER            20

4.2.8.2 Keyword Occurence Network

# Classical keyword co-occurrences network
NetMatrix1 <- biblioNetwork(M4, analysis = "co-occurrences", network = "keywords", sep = ";")

# statistics for the network
netstat1 <- networkStat(NetMatrix1)
summary(netstat1, k=10)

## 
## 
## Main statistics about the network
## 
##  Size                                  749 
##  Density                               0.016 
##  Transitivity                          0.262 
##  Diameter                              6 
##  Degree Centralization                 0.243 
##  Average path length                   2.849 
##

# Plot the network
set.seed(3)
net1a = networkPlot(NetMatrix1, 
                   n = 25,  # Limit to top 25 keywords
                   normalize = "association",
                   Title = "Top Keyword Co-Occurrences", 
                   type = "circle", 
                   size = TRUE, 
                   remove.multiple = FALSE,
                   labelsize = 0.7,
                   cluster = "none")

net1b = networkPlot(NetMatrix1, 
                   n = 30,  # Even fewer nodes
                   normalize = "association",
                   Title = "Keyword Network", 
                   type = "kamada", 
                   size = TRUE, 
                   remove.multiple = TRUE,
                   labelsize = 0.5,
                   cluster = "louvain")

net1c = networkPlot(NetMatrix1, 
                   n = 30,  # Even fewer nodes
                   #normalize = "association",
                   #weighted = T,
                   Title = "Keyword Co-Occurence Network", 
                   type = "fruchterman", 
                   size = TRUE, 
                   remove.multiple = TRUE,
                   labelsize = 0.5,
                   cluster = "louvain")

# Save the plot as a high-resolution image with white background
ggsave("plots/keyword_co_occurrence_network.png", plot = net1c$plot, 
       width = 10, height = 8, dpi = 1200, bg = "white")

4.2.9 Conceptual Structure Map

4.3 Conceptual Structure

suppressWarnings(CS1 <- conceptualStructure(M4_tags,
                                            method="MCA", 
                                            field="ID", 
                                            minDegree=15, 
                                            clust=5, 
                                            stemming=FALSE, 
                                            labelsize=15,
                                            documents=20)
                 )

# Conceptual Structure using keywords (method="CA")
CS <- conceptualStructure(M4,field="ID", method="CA", minDegree=4, clust=5, stemming=FALSE, labelsize=10, documents=10)

CS <- conceptualStructure(M4, 
                           field="ID", 
                           method="CA", 
                           minDegree=4, 
                           clust=5, 
                           stemming=FALSE, 
                           labelsize=10,  # Set to 0 to remove labels
                           documents=10)

# Extract coordinates and clusters
coords <- CS[[1]]  # Coordinates
clusters <- CS[[2]]  # Cluster assignments

CS[4]

# Create a historical citation network
options(width=130)
histResults <- histNetwork(M4, min.citations = 5, sep = ";")

# Plot a historical co-citation network
net <- histPlot(histResults, n=15, size = 8, labelsize=4)

# LEXICAL PATTERNS
# keywords in context
M4_abstract <- corpus(M4$AB)
# M4_abstract
toks_M4_abstract <- corpus_subset(M4_abstract) %>% 
  tokens()

toks <- toks_M4_abstract
toks_clean <- tokens(toks, 
               remove_punct = TRUE, 
               remove_numbers = TRUE) %>%
        tokens_remove(stopwords("english"))

4.3.1 Top token frequencies

# Top token frequencies
top_tokens <- toks_clean %>%
  tokens_group() %>%
  dfm() %>%
  textstat_frequency(n = 20)
top_tokens %>%  # top tokens from abstracts
  filter(feature != "research") %>% 
  filter(feature != "study") %>% 
  filter(feature != "article") %>% 
  filter(feature != "also") %>% 
  filter(feature != "can")

##        feature frequency rank docfreq group
## 1         stem       470    1     187   all
## 2       racism       417    2     238   all
## 3        black       394    3     115   all
## 4       health       297    4      76   all
## 5     students       294    5      89   all
## 8       social       240    8     128   all
## 10      racial       216   10     110   all
## 11 experiences       209   11     107   all
## 12       women       200   12      61   all
## 13       white       190   13      90   all
## 14   education       173   14      87   all
## 15     science       172   15      76   all
## 17 nationalism       152   17      82   all
## 18  indigenous       151   18      45   all
## 19        race       144   19      88   all

4.3.2 Keywords-in-Context

kw_antiblack <- kwic(toks, pattern =  "anti-Black*")
kw_antiblack <- kw_antiblack %>% 
  rbind(kwic(toks, pattern = "antiblack*"))
head(kw_antiblack, 10)

kw_sup <- kwic(toks, pattern =  phrase("White Supremacy"))
head(kw_sup, 10)

kw_nationalism <- kwic(toks, pattern =  phrase("nationalism"))
head(kw_nationalism, 10)

kw_colonial <- kwic(toks, pattern =  phrase("colonial*"))
head(kw_colonial, 10)

kw_xeno <- kwic(toks, pattern =  phrase("xenophobi*"))
head(kw_xeno, 10)

kw_multiword_antiblack <- kwic(toks, pattern = phrase(c("anti-black*", "antiblack*")))
head(kw_multiword_antiblack, 10)

Lexical dispersion plots

4.3.3 Document feature matrix

Remove punctuation and stop words.

dfmat_notions <- dfm(toks_clean)
# print(dfmat_notions)

Tokens of words in abstract.

dfmat_notions_nostop <- dfm_select(dfmat_notions, pattern = stopwords("en"), selection = "remove")
# print(dfmat_notions_nostop)
notions.df <- convert(dfmat_notions_nostop, to = "data.frame")

4.3.4 Feature co-occurence matrix (FCM)

toks_abstract <- tokens(M4$AB, remove_punct = TRUE)
dfmat_AB <- dfm(toks_abstract)
dfmat_AB <- dfm_remove(dfmat_AB, pattern = c(stopwords("en"), "*-time", "updated-*", "gmt", "bst"))
dfmat_AB <- dfm_trim(dfmat_AB, min_termfreq = 100)
topfeatures(dfmat_AB)
nfeat(dfmat_AB)

# most frequent co-occuring words
fcmat_AB <- fcm(dfmat_AB)
dim(fcmat_AB)
# topfeatures(fcmat_AB)


feat <- names(topfeatures(dfmat_AB, 50))
fcmat_AB_select <- fcm_select(fcmat_AB, pattern = feat, selection = "keep")
dim(fcmat_AB_select)
size <- log(colSums(dfm_select(dfmat_AB, feat, selection = "keep")))

4.3.4.1 Network of most co-occuring words

set.seed(144)
textplot_network(fcmat_AB_select, min_freq = 0.8, vertex_size = size / max(size) * 3)

4.4 Thematic Map and Frequency Analysis

Map1=thematicMap(M4_tags, field = "ID", n = 250, minfreq = 10,
  stemming = FALSE, size = 0.4, n.labels=5, repel = TRUE)
plot(Map1$map)

4.4.1 Clusters

Clusters1=Map1$words[order(Map1$words$Cluster,-Map1$words$Occurrences),]
CL1 <- Clusters1 %>% group_by(.data$Cluster_Label) %>% 
  arrange(-Cluster_Frequency) %>% 
  top_n(5, .data$Occurrences)
# CL1

4.4.2 Frequency Analysis

4.4.2.1 Rac*

rac* keyword expressions

toks_rac <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "rac*")
dfmat_rac<- dfm(toks_rac)
dfmat_rac
tstat_freq_rac <- textstat_frequency(dfmat_rac, n = 13)
head(tstat_freq_rac, 15)

# frequency plot
r <- dfmat_rac %>% 
  textstat_frequency(n = 13) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "", # Top Rac* Terms by Frequency
    subtitle = "" # Visualization of Most Common Terms
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )

# Save the plot as a high-resolution image
ggsave("plots/frequency_plot_rac.png", plot = r, width = 10, height = 8, dpi = 1200, bg = "white")

4.4.2.2 Nationali*

Nationali* keyword expressions

# Tokenize and prepare data
toks_nat <- tokens(toks, remove_punct = TRUE) %>% 
            tokens_keep(pattern = "nationali*")
dfmat_nat <- dfm(toks_nat)
tstat_freq_nat <- textstat_frequency(dfmat_nat, n = 50)

# Create frequency plot
p <- dfmat_nat %>% 
  textstat_frequency(n = 9) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top Nationalism-related Terms by Frequency",
    subtitle = "Frequency of Nationalism Tokens"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )

# Save the plot as a high-resolution image with white background
ggsave("plots/frequency_plot_nation.png", plot = p, width = 10, height = 8, dpi = 1200, bg = "white")

4.4.2.3 Xeno

Xeno* keyword search

toks_xen <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "xen*")
dfmat_xen <- dfm(toks_xen)
dfmat_xen
tstat_freq_xen <- textstat_frequency(dfmat_xen, n = 50)
head(tstat_freq_xen, 50)

# frequency plot for dfmat_xen
x <- dfmat_xen %>% 
  textstat_frequency(n = 2) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top Xeno* Features by Frequency",
    subtitle = "Visualization of Most Common Terms"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )

4.4.2.4 Colonial*

Colonial* keyword expressions

toks_col <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "colon*")
dfmat_col <- dfm(toks_col)
dfmat_col
tstat_freq_col <- textstat_frequency(dfmat_col, n = 50)
head(tstat_freq_col, 50)

# frequency plot for dfmat_sup
s <- dfmat_col %>% 
  textstat_frequency(n = 11) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top Colonial* Features by Frequency",
    subtitle = "Visualization of Most Common Terms"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )

# Save the plot as a high-resolution image with white background
ggsave("plots/frequency_plot_sup.png", plot = s, width = 10, height = 8, dpi = 1200, bg = "white")

4.4.2.5 Anti*

Anti* keyword expressions

toks_anti <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "anti*")
dfmat_anti <- dfm(toks_anti)
dfmat_anti <- dfmat_anti[, colnames(dfmat_anti) != "anticipated"]
tstat_freq_anti <- textstat_frequency(dfmat_anti, n = 50)
head(tstat_freq_anti, 20) # frequency of at least two occurs at 21


# frequency plot for dfmat_anti
q <- dfmat_anti %>% 
  textstat_frequency(n = 20) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top Anti* Features by Frequency",
    subtitle = "Visualization of Most Common Terms"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.4, 
    size = 3
  )

# Save the plot as a high-resolution image with white background
ggsave("plots/frequency_plot_anti.png", plot = q, width = 10, height = 8, dpi = 1200, bg = "white")

4.4.2.6 Discrim*

Discrimination keyword occurents, based on the ID (Keyword plus) results, this was added.

toks_dis <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "discrim*")
dfmat_dis <- dfm(toks_dis)
dfmat_dis
tstat_freq_dis <- textstat_frequency(dfmat_dis, n = 50)
head(tstat_freq_dis, 50)

# frequency plot for dfmat_xen
d <- dfmat_dis %>% 
  textstat_frequency(n = 2) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top 12 Features by Frequency",
    subtitle = "Visualization of Most Common Terms"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )

4.4.2.7 Merging two frequency analysis

Comparative normalized frequencies: Rac* and Anti*

# Combine data
combined_data <- bind_rows(
  mutate(r$data, source = "Rac* Analysis"),
  mutate(q$data, source = "Anti* Analysis")
)

# Global normalization
combined_normalized <- combined_data %>%
  mutate(
    normalized_freq = frequency / max(combined_data$frequency)
  )

# Create plot with globally normalized values
ggplot(combined_normalized, 
       aes(x = reorder(feature, normalized_freq), 
           y = normalized_freq, 
           fill = source)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(
    x = "Features", 
    y = "Normalized Frequency",
    title = "Comparative Normalized Feature Frequencies"
  ) +
  scale_y_continuous(limits = c(0, 1)) +
  theme_minimal() +
  theme(legend.position = "bottom")

ggplot(combined_normalized, 
       aes(x = reorder(feature, normalized_freq), 
           y = normalized_freq, 
           fill = source)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(
    aes(label = sprintf("%.2f", normalized_freq), 
        y = normalized_freq),
    position = position_dodge(width = 0.9),
    hjust = -0.1,
    size = 3
  ) +
  coord_flip() +
  labs(
    x = "Features", 
    y = "Normalized Frequency",
    title = "Comparative Normalized Feature Frequencies"
  ) +
  scale_y_continuous(limits = c(0, 1.1)) +  # Extend y-axis to accommodate labels
  theme_minimal() +
  theme(legend.position = "bottom")

4.4.2.8 Merging multiple frequency analysis

Comparative normalized frequencies: Rac* and Anti*

ggplot(combined_all_normalized, 
       aes(x = reorder(feature, normalized_freq), 
           y = normalized_freq, 
           fill = source)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(
    aes(label = sprintf("%.2f", normalized_freq), 
        y = normalized_freq),
    position = position_dodge(width = 0.9),
    hjust = -0.1,
    size = 3
  ) +
  coord_flip() +
  labs(
    x = "Features", 
    y = "Normalized Frequency",
    title = "" # Comparative Normalized Feature Frequencies
  ) +
  scale_y_continuous(limits = c(0, 1.1)) +  # Extend y-axis to accommodate labels
  theme_minimal() +
  theme(legend.position = "bottom")

4.5 Analysis of Notions at Country-level

Comparing word scores by geographical region to connect to racialized social systems and geopolitics.

# Split countries and keywords
keyword_by_country_all <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  separate_rows(DE, sep = ";") %>%
  group_by(AU_CO, DE) %>%
  summarize(keyword_count = n()) %>%
  group_by(AU_CO) %>%
  slice_max(keyword_count, n = 5) %>%
  arrange(AU_CO, desc(keyword_count))

## `summarise()` has grouped output by 'AU_CO'. You can override using the `.groups` argument.

# Print results
head(keyword_by_country_all, n=10)

## # A tibble: 10 × 3
## # Groups:   AU_CO [3]
##    AU_CO      DE                           keyword_count
##    <chr>      <chr>                                <int>
##  1 AUSTRALIA  " WELLBEING"                            14
##  2 AUSTRALIA  " COVID-19"                             10
##  3 AUSTRALIA  " INDIGENOUS HEALTH"                    10
##  4 AUSTRALIA  " PUBLIC HEALTH"                        10
##  5 AUSTRALIA  "ABORIGINAL HEALTH"                     10
##  6 AUSTRIA    " INCLUSION"                             2
##  7 AUSTRIA    " PHYSICAL EDUCATION"                    2
##  8 AUSTRIA    " SPECIAL EDUCATIONAL NEEDS"             2
##  9 AUSTRIA    "DIVERSITY"                              2
## 10 BANGLADESH " BANGLADESH"                            1

tail(keyword_by_country_all, n=10)

## # A tibble: 10 × 3
## # Groups:   AU_CO [1]
##    AU_CO DE                               keyword_count
##    <chr> <chr>                                    <int>
##  1 <NA>  " PEDAGOGY"                                  1
##  2 <NA>  " POLONAISE"                                 1
##  3 <NA>  " QUALITATIVE RESEARCH"                      1
##  4 <NA>  " RECOGNITION"                               1
##  5 <NA>  " SOCIAL DETERMINANTS OF HEALTH"             1
##  6 <NA>  " VISHWAGURU"                                1
##  7 <NA>  "ARABIC LANGUAGE"                            1
##  8 <NA>  "HEALTH CARE"                                1
##  9 <NA>  "INDIA"                                      1
## 10 <NA>  "POLISH DANCES"                              1

4.5.1 Top keywords by country

# Identify top countries by publication count
top_countries <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  count(AU_CO, sort = TRUE) %>%
  slice_head(n = 5) %>%
  pull(AU_CO)

# Find top keywords for these countries
keyword_by_country_top <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  separate_rows(DE, sep = ";") %>%
  filter(AU_CO %in% top_countries) %>%
  group_by(AU_CO, DE) %>%
  summarize(keyword_count = n()) %>%
  group_by(AU_CO) %>%
  slice_max(keyword_count, n = 5) %>%
  arrange(AU_CO, desc(keyword_count))

## `summarise()` has grouped output by 'AU_CO'. You can override using the `.groups` argument.

head(keyword_by_country_top, n=17)

## # A tibble: 17 × 3
## # Groups:   AU_CO [3]
##    AU_CO     DE                           keyword_count
##    <chr>     <chr>                                <int>
##  1 AUSTRALIA " WELLBEING"                            14
##  2 AUSTRALIA " COVID-19"                             10
##  3 AUSTRALIA " INDIGENOUS HEALTH"                    10
##  4 AUSTRALIA " PUBLIC HEALTH"                        10
##  5 AUSTRALIA "ABORIGINAL HEALTH"                     10
##  6 BRAZIL    " RACE"                                  7
##  7 BRAZIL    " ILLEGAL FOSSIL TRADE"                  6
##  8 BRAZIL    " LATIN AMERICA"                         6
##  9 BRAZIL    " PALAEONTOLOGICAL HERITAGE"             6
## 10 BRAZIL    " PARACHUTE SCIENCE"                     6
## 11 BRAZIL    " RESEARCH ETHICS"                       6
## 12 BRAZIL    "SCIENTIFIC COLONIALISM"                 6
## 13 CANADA     <NA>                                   15
## 14 CANADA    " INDIGENOUS METHODOLOGIES"              9
## 15 CANADA    " INDIGENOUS PEOPLES"                    9
## 16 CANADA    " DECOLONIZATION"                        8
## 17 CANADA    "INDIGENOUS PEOPLES"                     8

tail(keyword_by_country_top, n=17)

## # A tibble: 17 × 3
## # Groups:   AU_CO [3]
##    AU_CO          DE                              keyword_count
##    <chr>          <chr>                                   <int>
##  1 CANADA         " INDIGENOUS METHODOLOGIES"                 9
##  2 CANADA         " INDIGENOUS PEOPLES"                       9
##  3 CANADA         " DECOLONIZATION"                           8
##  4 CANADA         "INDIGENOUS PEOPLES"                        8
##  5 UNITED KINGDOM  <NA>                                      10
##  6 UNITED KINGDOM " NATIONALISM"                              9
##  7 UNITED KINGDOM " AUTHORITARIANISM"                         7
##  8 UNITED KINGDOM " IMMIGRATION"                              7
##  9 UNITED KINGDOM " PANDEMIC"                                 7
## 10 UNITED KINGDOM " SOCIAL DOMINANCE ORIENTATION"             7
## 11 UNITED KINGDOM " THREAT"                                   7
## 12 UNITED KINGDOM "COVID-19"                                  7
## 13 USA             <NA>                                     127
## 14 USA            " RACISM"                                  54
## 15 USA            " STEM"                                    54
## 16 USA            " RACE"                                    42
## 17 USA            " HIGHER EDUCATION"                        41

4.5.2 Plots of top keywords by country

# Extract keywords for these countries
keywords_by_country <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  separate_rows(DE, sep = ";") %>%
  filter(AU_CO %in% top_countries) %>%
  group_by(AU_CO, DE) %>%
  summarize(keyword_count = n(), .groups = 'drop') %>%
  group_by(AU_CO) %>%
  slice_max(keyword_count, n = 10) %>%  # Top 10 keywords per country
  ungroup()

# Remove NA keywords and filter out countries with no valid keywords
keywords_by_country <- keywords_by_country %>%
  filter(!is.na(DE) & DE != "NA" & str_trim(DE) != "")

# Create a separate plot for each country with counts on bars
plot_keywords_by_country <- function(country_data) {
  ggplot(country_data, aes(x = reorder(DE, keyword_count), y = keyword_count)) +
    geom_bar(stat = "identity", fill = "steelblue") +
    geom_text(aes(label = keyword_count), 
              position = position_stack(vjust = 0.5), 
              color = "white", size = 3) +  # Add counts to the bars
    coord_flip() +
    labs(
      title = paste("Top Keywords for", unique(country_data$AU_CO)),
      x = "Keywords",
      y = "Keyword Count"
    ) +
    theme_minimal() +
    theme(
      axis.text.y = element_text(size = 8),
      plot.title = element_text(size = 10, face = "bold")
    )
}

# Split the data by country and create individual plots
country_plots <- keywords_by_country %>%
  group_split(AU_CO) %>%
  lapply(plot_keywords_by_country)

# Print or save the plots as needed
for (plot in country_plots) {
  print(plot)
}

4.5.3 Notions by Country

We first define the patterns, or notions, to be analyzed.

# Define patterns
patterns <- c("rac*", "nationali*", "xeno*", "colon*", "anti*", "white supremacy")

We then clean and duplicated the data for plotting.

## Warning: tolower argument is not used.

## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(everything(), sum, na.rm = TRUE)`.
## ℹ In group 1: `AU_CO = "AUSTRALIA"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

## Warning: tolower argument is not used.

## # A tibble: 17 × 13
##    AU_CO   colonialism racism  race nationalism racial `anti-white` `anti-immigrant` colonial `anti-asian` racialized nationalisms
##    <chr>         <dbl>  <dbl> <dbl>       <dbl>  <dbl>        <dbl>            <dbl>    <dbl>        <dbl>      <dbl>        <dbl>
##  1 AUSTRA…           0      1     0           0      0            1                0        0            0          0            0
##  2 BANGLA…           0      1     0           0      0            0                0        0            0          0            0
##  3 BRAZIL            0      0     0           0      0            0                0        1            0          0            0
##  4 CANADA            0      0     1           0      0            0                0        1            0          0            0
##  5 CHINA             0      0     0           0      0            0                0        0            0          0            0
##  6 COLOMB…           1      1     0           0      0            0                0        0            0          0            0
##  7 CROATIA           0      0     0           0      0            0                0        0            0          0            0
##  8 ICELAND           0      0     0           0      0            0                0        0            0          0            0
##  9 INDIA             0      0     0           0      0            0                0        0            0          0            1
## 10 ISRAEL            0      0     0           0      0            0                0        0            0          0            0
## 11 MEXICO            0      0     0           0      0            0                0        0            0          0            0
## 12 NORWAY            0      0     0           0      0            0                0        0            0          0            0
## 13 POLAND            0      0     0           0      1            0                0        0            0          0            0
## 14 SOUTH …           0      0     0           0      0            0                0        0            0          0            0
## 15 UNITED…           0      0     0           1      0            0                1        0            0          1            0
## 16 USA               0      5     8           1      6            0                0        0            2          2            0
## 17 VIETNAM           0      0     0           0      0            0                0        0            0          0            0
## # ℹ 1 more variable: `anti-blackness` <dbl>

## # A tibble: 17 × 38
##    AU_CO        `anti-black` racism racialized antiracist colonialism colonization colonizers  race nationalism racial anticipated
##    <chr>               <dbl>  <dbl>      <dbl>      <dbl>       <dbl>        <dbl>      <dbl> <dbl>       <dbl>  <dbl>       <dbl>
##  1 AUSTRALIA               0      8          0          0           0            0          0     0           0      0           0
##  2 BANGLADESH              0     10          0          0           0            0          0     0           0      0           0
##  3 BRAZIL                  0      0          0          0           3            0          0     0           0      0           0
##  4 CANADA                  1     11          1          0          12            1          0     5           0      6           0
##  5 CHINA                   0      1          0          0           0            0          0     0           0      0           0
##  6 COLOMBIA                0      1          1          0           3            1          1     0           0      0           0
##  7 CROATIA                 0      0          0          0           0            0          0     0           1      0           0
##  8 ICELAND                 0      0          0          0           0            0          0     0           1      0           0
##  9 INDIA                   0      0          0          0           0            0          0     0           3      0           0
## 10 ISRAEL                  0      1          0          0           0            0          0     0           0      1           0
## 11 MEXICO                  0      1          0          0           0            0          0     0           0      0           0
## 12 NORWAY                  0      2          0          0           0            0          0     0           0      0           0
## 13 POLAND                  0      0          0          0           1            2          0     0           0      1           0
## 14 SOUTH AFRICA            0      1          0          0           3            0          0     0           0      0           0
## 15 UNITED KING…            0      4          0          0           2            0          0     0           1      3           0
## 16 USA                     9     61         15          1           5            2          0    31           4     39           2
## 17 VIETNAM                 0      0          0          0           1            0          0     0           0      0           0
## # ℹ 26 more variables: racially <dbl>, racialization <dbl>, `anti-asian` <dbl>, racist <dbl>, `anti-racism` <dbl>,
## #   colonial <dbl>, racismin <dbl>, `anti-white` <dbl>, colonisation <dbl>, `anti-immigrant` <dbl>, `anti-access` <dbl>,
## #   racism.intervention <dbl>, colonizing <dbl>, races <dbl>, nationalities <dbl>, colonies <dbl>, `anti-racist` <dbl>,
## #   `race-neutral` <dbl>, `anti-blackness` <dbl>, antiasian <dbl>, `race-ethnicity` <dbl>, nationalist <dbl>, `anti-parsi` <dbl>,
## #   nationalisms <dbl>, `anti-dalhousie` <dbl>, `racism-related` <dbl>

## tibble [17 × 13] (S3: tbl_df/tbl/data.frame)
##  $ AU_CO         : chr [1:17] "AUSTRALIA" "BANGLADESH" "BRAZIL" "CANADA" ...
##  $ colonialism   : num [1:17] 0 0 0 0 0 1 0 0 0 0 ...
##  $ racism        : num [1:17] 1 1 0 0 0 1 0 0 0 0 ...
##  $ race          : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ nationalism   : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racial        : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-white    : num [1:17] 1 0 0 0 0 0 0 0 0 0 ...
##  $ anti-immigrant: num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ colonial      : num [1:17] 0 0 1 1 0 0 0 0 0 0 ...
##  $ anti-asian    : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racialized    : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ nationalisms  : num [1:17] 0 0 0 0 0 0 0 0 1 0 ...
##  $ anti-blackness: num [1:17] 0 0 0 0 0 0 0 0 0 0 ...

## tibble [17 × 38] (S3: tbl_df/tbl/data.frame)
##  $ AU_CO              : chr [1:17] "AUSTRALIA" "BANGLADESH" "BRAZIL" "CANADA" ...
##  $ anti-black         : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ racism             : num [1:17] 8 10 0 11 1 1 0 0 0 1 ...
##  $ racialized         : num [1:17] 0 0 0 1 0 1 0 0 0 0 ...
##  $ antiracist         : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ colonialism        : num [1:17] 0 0 3 12 0 3 0 0 0 0 ...
##  $ colonization       : num [1:17] 0 0 0 1 0 1 0 0 0 0 ...
##  $ colonizers         : num [1:17] 0 0 0 0 0 1 0 0 0 0 ...
##  $ race               : num [1:17] 0 0 0 5 0 0 0 0 0 0 ...
##  $ nationalism        : num [1:17] 0 0 0 0 0 0 1 1 3 0 ...
##  $ racial             : num [1:17] 0 0 0 6 0 0 0 0 0 1 ...
##  $ anticipated        : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racially           : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racialization      : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-asian         : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racist             : num [1:17] 1 0 0 1 0 0 0 0 0 0 ...
##  $ anti-racism        : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ colonial           : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ racismin           : num [1:17] 1 0 0 0 0 0 0 0 0 0 ...
##  $ anti-white         : num [1:17] 2 0 0 0 0 0 0 0 0 0 ...
##  $ colonisation       : num [1:17] 1 0 0 0 0 0 0 0 0 0 ...
##  $ anti-immigrant     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-access        : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racism.intervention: num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ colonizing         : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ races              : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ nationalities      : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ colonies           : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ anti-racist        : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ race-neutral       : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-blackness     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ antiasian          : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ race-ethnicity     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ nationalist        : num [1:17] 0 0 0 0 0 0 0 0 2 0 ...
##  $ anti-parsi         : num [1:17] 0 0 0 0 0 0 0 0 1 0 ...
##  $ nationalisms       : num [1:17] 0 0 0 0 0 0 0 0 1 0 ...
##  $ anti-dalhousie     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racism-related     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...

4.5.3.1 Abstract patterns (top 10 notions)

Top 10 patterns in abstract.

# Create bar plot of top patterns
ggplot(top_patterns, aes(x = reorder(pattern, total_count), y = total_count)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 10 Patterns in Abstracts",
    x = "Pattern",
    y = "Total Count"
  ) +
  theme_minimal()

4.5.3.2 Pattern occurence by country

The pattern occurrence graphic illustrates the frequency of specific keywords related to race and social dynamics in the abstracts of academic publications, categorized by country. Each bar represents a keyword (such as “racism,” “colonialism,” or “nationalism”) and its corresponding count in the abstracts, allowing for a visual comparison of how often these themes are discussed across different countries. The bars are color-coded to indicate the country of the authors, making it easy to identify which countries are engaging with particular issues.

# Visualization of top patterns by country
ggplot(abstract_summary, aes(x = pattern, y = count, fill = AU_CO)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(
    title = "Pattern Occurrences by Country (AB)",
    x = "Pattern",
    y = "Count"
  ) +
  theme_minimal()

# Visualization of top patterns by country in abstracts with counts on bars (only if count > 4)
ggplot(abstract_summary, aes(x = pattern, y = count, fill = AU_CO)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = ifelse(count > 3, count, "")), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5,  # Adjust vertical position of text
            size = 3) +    # Adjust text size as needed
  coord_flip() +
  labs(
    title = "", # Pattern Occurrences by Country in Abstracts
    x = "Pattern",
    y = "Count"
  ) +
  theme_minimal()

Final modified graphic of relative proportions for whose count value is one or more.

# Visualization of top patterns by country in abstracts with proportions on bars
y <- ggplot(abstract_summary, aes(x = pattern, y = proportion, fill = AU_CO)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = ifelse(proportion > 0.05, round(proportion, 2), "")), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5,  # Adjust vertical position of text
            size = 3) +    # Adjust text size
  coord_flip() +
  labs(
    title = "", # Relative Proportions of Notions by Country (Abstract)
    x = "Pattern",
    y = "Proportion"
  ) +
  theme_minimal()

y

# Save the plot as a high-resolution image
ggsave("plots/pattern_occurence.png", plot = y, width = 10, height = 8, dpi = 1200, bg = "white")

References

Martin, Danny Bernard, Luz Valoyes-Chávez, and Paola Valero. 2024. “Race, Racism, and Racialization in Mathematics Education: Global Perspectives.” Educational Studies in Mathematics 116 (3): 313–31.

Nxumalo, Fikile, and Wanja Gitari. 2021. “Introduction to the Special Theme on Responding to Anti-Blackness in Science, Mathematics, Technology and STEM Education.” Canadian Journal of Science, Mathematics and Technology Education 21: 226–31.

Vakil, Sepehr, and Rick Ayers. 2019. “The Racial Politics of STEM Education in the USA: Interrogations and Explorations.” Race Ethnicity and Education 22 (4): 449–58.

Department of Curriculum and Instruction. Corresponding Author. E-mail: nathan.alexander@howard.edu ↩︎
Department of Higher Education Leadership and Policy Studies↩︎
Department of Psychology↩︎

Examining Notions of Racism in STEM: A Quantitative Historical Analysis

Nathan Alexander, Howard University1

Qyana Stewart, Howard University2

Basil Ghali, Howard University3

1 RESEARCH QUESTIONS

2 METHOD

2.1 DATA

2.1.1 Scoping

2.1.1.1 Notions of racism and STEM

2.1.1.1.1 ALL=(racism AND STEM)

2.1.1.1.2 ALL=(“white supremacy” AND STEM)

2.1.1.1.3 ALL=(nationalism AND STEM)

2.1.1.1.4 ALL=(xenophobia AND STEM)

2.1.1.1.5 ALL=(colonialism AND STEM)

2.1.1.1.6 ALL=(antiasian AND STEM)

2.1.1.1.7 ALL=(anti-Asian AND STEM)

2.1.1.1.8 ALL=(antiblack* AND STEM)

2.1.1.1.9 ALL=(anti-Black* AND STEM)

2.1.1.2 Document search and inclusion

2.1.2 Review

2.1.3 Reduction

3 ANALYTIC FRAMEWORK

4 FINDINGS

4.1 Descriptive (performance) analysis

4.1.1 Publication-related metrics

4.1.1.1 Main information

4.1.1.2 Publications by year

4.1.1.3 Most productive authors

4.1.1.4 Most cited papers

4.1.1.5 Main sources

4.1.2 Citation-related metrics

4.1.2.1 Most frequently cited documents

4.2 Global Structure

4.2.1 Most productive countries

4.2.2 Total citations by country

4.2.3 Top countries by publication count

4.2.4 Country collaboration network

4.2.5 Keywords by country

4.2.6 Top keywords by country

4.2.7 Status of publications by countries

4.2.8 Keywords and Keywords Plus

4.2.8.1 Author Keywords and Keywords-Plus

4.2.8.2 Keyword Occurence Network

4.2.9 Conceptual Structure Map

4.3 Conceptual Structure

4.3.1 Top token frequencies

4.3.2 Keywords-in-Context

4.3.3 Document feature matrix

4.3.4 Feature co-occurence matrix (FCM)

4.3.4.1 Network of most co-occuring words

4.4 Thematic Map and Frequency Analysis

4.4.1 Clusters

4.4.2 Frequency Analysis

4.4.2.1 Rac*

4.4.2.2 Nationali*

4.4.2.3 Xeno

4.4.2.4 Colonial*

4.4.2.5 Anti*

4.4.2.6 Discrim*

4.4.2.7 Merging two frequency analysis

4.4.2.8 Merging multiple frequency analysis

4.5 Analysis of Notions at Country-level

4.5.1 Top keywords by country

4.5.2 Plots of top keywords by country

4.5.3 Notions by Country

4.5.3.1 Abstract patterns (top 10 notions)

4.5.3.2 Pattern occurence by country

References

Nathan Alexander, Howard University¹

Qyana Stewart, Howard University²

Basil Ghali, Howard University³