We provide here the technical documentation for the forthcoming paper: Alexander, N. N., Stewart, Q, & Basil, G. (under review). Examining Notions of Racism in STEM: A Quantitative Historical Analysis.

1 RESEARCH QUESTIONS

  1. What is the intellectual and conceptual structure of research on racism in science, technology, engineering, and mathematics (STEM)?

  2. How are notions of racism in the research on STEM distributed across different racialized social systems?

2 METHOD

The primary goal of this study was (1) to frame and understand the intellectual and conceptual structure of research on racism in STEM and (2) to analyze the different notions of racism and their geographical distribution. This study is based on notes from three special issue collections that analyze different conceptual foundations of racism in STEM: (1) Martin, Valoyes-Chávez, and Valero (2024) describe a set of notions, and features of racialized social systems related to the geopolitical contexts of racism in mathematics education research, (2) Vakil and Ayers (2019) who discuss various approaches to balancing the various complexes (STEM industrial complex, militarism, etc.) that have been used to position STEM as a savior to structural problems, and (3) Nxumalo and Gitari (2021) who explore the possibility of STEM for liberatory purposes. These three issues enter the discussion on science, technology, engineering, and mathematics education from different theoretical endpoints.

Based on our identification of various national anti-discrimination laws, we put into conversations these various theoretical entry points and notions of racism to explore the differential relationship between a set of notions utilized across geopolitical contexts. Namely, we explore the note in Martin, Valoyes-Chávez, and Valero (2024) to confirm different sociopolitical discourses across a set of racialized social systems.

We use the bibliometrix() and quanteda() R packages to analyze the citation records and test the hypothesis for the study. The bibliometrix package was used to conduct the summary and performance analysis, and to conduct the science mapping. The scientific and network maps were used to inform the final synthesis and analysis of data in the quanteda package, which focuses on the analysis of textual data. In this second phase of data analysis, priority was placed on the use and conceptualization of the various notions by geographical region or, as noted by Martin, Valoyes-Chávez, and Valero (2024), as the “racialized social systems” that we analyzed in STEM education.

2.1 DATA

We follow the framework steps to acquire the data for the study, and use the PRISM guidelines to scope and review the data.

2.1.1 Scoping

The data for the study comes from the Web of Science (WoS) Core Collection. Our initial scoping process included a set of iterative steps to make sense of the global research literature on the various notions of racism in STEM. We prioritized three citation indexes in our searches between the period from 2015 to 2024. Our analysis focused on journal articles written in English in the Education, Special Education, and related Education Scientific Disciplines.

  • Science Citation Index Expanded, SCI-EXPANDED (2002-present)
  • Social Sciences Citation Index SSCI (2002-present)
  • Arts and Humanities Citation Index ACHI (2002-present)
  • Emerging Sources Citation Index ESSI (2012-present)

Timespan: 2014-01-01 to 2024-12-31

Document Types: Article

2.1.1.1 Notions of racism and STEM

Our exploratory search process was conducted over the course of four months and was primarily used to develop the code and software needed to analyze the final data set that would be used for the study. These initial search parameters were related to the conceptual components of the study analysis. Variation was sometimes present during this stage to make sense of the initial structure and estimate of the counts for the data.

2.1.1.1.1 ALL=(racism AND STEM)

https://www.webofscience.com/wos/woscc/summary/d56f4376-da3f-4860-bef4-fc1f3b43dd98-01412dcc90/relevance/1

Returned \(207\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.2 ALL=(“white supremacy” AND STEM)

https://www.webofscience.com/wos/woscc/summary/ab11e0dc-99e5-48a4-b6ab-a32a5d8dcfaf-01415ff5dd/relevance/1

Returned \(15\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.3 ALL=(nationalism AND STEM)

https://www.webofscience.com/wos/woscc/summary/605ee34a-413f-4752-9f65-0872f23867b6-0141601cc5/relevance/1

Returned \(32\) results results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.4 ALL=(xenophobia AND STEM)

https://www.webofscience.com/wos/woscc/summary/3da3d792-2897-42e8-acb2-be528a92b9a0-0141603068/relevance/1

Returned \(6\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.5 ALL=(colonialism AND STEM)

https://www.webofscience.com/wos/woscc/summary/81abe434-a7c3-431d-a876-d7b12235fdbb-0141604271/relevance/1

Returned \(42\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.6 ALL=(antiasian AND STEM)

https://www.webofscience.com/wos/woscc/summary/4f4bf54e-fdd6-4e51-b423-7ab4a544e1e0-014160d61d/relevance/1

Returned \(1\) result from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.7 ALL=(anti-Asian AND STEM)

https://www.webofscience.com/wos/woscc/summary/f0d26c1f-993e-47da-8cc7-3ec8f537e2f3-014160dea4/relevance/1

Returned \(4\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.8 ALL=(antiblack* AND STEM)

https://www.webofscience.com/wos/woscc/summary/fd9fdc18-3a2d-4800-942e-f4c0138117d4-014160e8b1/relevance/1

Returned \(3\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)

2.1.1.1.9 ALL=(anti-Black* AND STEM)

https://www.webofscience.com/wos/woscc/summary/1d44f59a-bd5c-4924-94ed-b2f8922ca6e6-0141611000/relevance/1

Returned \(17\) results from Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI)


2.1.1.2 Document search and inclusion

A final search for bibliometric data was conducted by the researchers based on findings from the scoping process (the series of initial searches). This search was also modified during the review and revision process for the manuscript. These revisions focused on important considerations provided by the reviewers. A set of final inclusion criteria were set for the study and followed. These criteria are outlined in the table below.

Inclusion and exclusion criteria for the study
Code Criteria
IC1 Article contains STEM and one of the notions in the title (TI) or abstract (AB): racism, “white supremacy,” colonialism, xenophobia, nationalism, antiasian, anti-Asian[*], antiblack, Anti-Black[*]
IC2 Article published between 2014 and 2024
IC3 Article originally written in English
IC4 Article is a journal article
IC5 Article purpose or core questions center on the topical subjects of analysis

2.1.2 Review

Key Columns of Interest:

  • AU: Authors of the publication

  • AB: Abstract text

  • TI: Title of the publication

  • AU_CO: Countries of the authors

  • SC: Subject categories (e.g., “Education & Educational Research”)

  • PY: Publication year

  • TC: Total citations

2.1.3 Reduction

We then review the data frame for exclusion of both columns that we do not need and articles that do not meet the inclusion criteria. The data come from a total of 338 sources. Any records were removed based on the inclusion and exclusion criteria. For example, one paper based on publication year (PY) was removed from the data and nine papers that did not meet the inclusion criteria based on document type (DT) were removed. The final data set contained 420 records.

3 ANALYTIC FRAMEWORK

4 FINDINGS

4.1 Descriptive (performance) analysis

Summary of the data set and documents.

4.2 Global Structure

4.2.1 Most productive countries

S[7]# Most productive countries
## $MostProdCountries
##           Country Articles    Freq SCP MCP MCP_Ratio
## 1  USA                 239 0.57869 225  14    0.0586
## 2  UNITED KINGDOM       34 0.08232  30   4    0.1176
## 3  CANADA               30 0.07264  25   5    0.1667
## 4  AUSTRALIA            17 0.04116  13   4    0.2353
## 5  CHINA                 8 0.01937   8   0    0.0000
## 6  SOUTH AFRICA          8 0.01937   7   1    0.1250
## 7  BRAZIL                6 0.01453   4   2    0.3333
## 8  INDIA                 5 0.01211   5   0    0.0000
## 9  IRELAND               5 0.01211   4   1    0.2000
## 10 CROATIA               4 0.00969   4   0    0.0000

4.2.2 Total citations by country

S[8] # Country citations
## $TCperCountries
##      Country      Total Citations Average Article Citations
## 1  USA                       3143                     13.15
## 2  UNITED KINGDOM             400                     11.76
## 3  CANADA                     294                      9.80
## 4  AUSTRALIA                  222                     13.06
## 5  BRAZIL                      82                     13.67
## 6  ESTONIA                     79                     79.00
## 7  ISRAEL                      62                     20.67
## 8  ECUADOR                     58                     58.00
## 9  INDIA                       58                     11.60
## 10 CHINA                       46                      5.75
# country scientific collaboration
# Create a country collaboration network
M4_tags <- metaTagExtraction(M4, Field = "AU_CO", sep = ";")
NetMatrix2 <- biblioNetwork(M4_tags, analysis = "collaboration", network = "countries", sep = ";")

4.2.3 Top countries by publication count

# Country Distribution
country_distribution <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  count(AU_CO) %>%
  slice_max(n, n = 10) %>%
  ggplot(aes(x = reorder(AU_CO, n), y = n)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top Countries by Publication Count",
    x = "Country",
    y = "Number of Publications"
  )

country_distribution

4.2.4 Country collaboration network

# Plot the network
net2a = networkPlot(NetMatrix2, 
                  n = 10, 
                  Title = "Country Collaboration", 
                  type = "fruchterman", 
                  size = TRUE, 
                  remove.multiple = FALSE,
                  labelsize = 0.8,
                  cluster = "none")

4.2.5 Keywords by country

# Bar plot of top keywords by country
ggplot(country_keywords %>% top_n(10, unique_keywords), 
       aes(x = reorder(AU_CO, unique_keywords), y = unique_keywords)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  coord_flip() +
  labs(
    title = "Top Keywords by Country",
    x = "Country",
    y = "Number of Unique Keywords"
  ) +
  geom_text(
    aes(label = unique_keywords), 
    hjust = -0.3, 
    size = 3
  ) +
  theme_minimal()

# Create a function to extract unique keywords by country
extract_country_keywords <- function(data) {
  # Split countries and keywords
  country_keywords <- data %>%
    separate_rows(AU_CO, sep = ";") %>%
    separate_rows(DE, sep = ";") %>%
    group_by(AU_CO, DE) %>%
    summarise(
      keyword_count = n(),
      .groups = 'drop'
    ) %>%
    group_by(AU_CO) %>%
    mutate(
      country_total_keywords = sum(keyword_count),
      keyword_percentage = (keyword_count / country_total_keywords) * 100
    ) %>%
    arrange(AU_CO, desc(keyword_percentage)) %>%
    filter(keyword_percentage > 5)  # Focus on significant keywords
}

# Apply the function to M4_tags
M4_tags_country <- extract_country_keywords(M4_tags)

4.2.6 Top keywords by country

Top keywords for country when keyword appears 4 or more times.

# View top keywords for each country
M4_tags_country %>% 
  filter(keyword_count > 3) %>% 
  print()
## # A tibble: 23 × 5
## # Groups:   AU_CO [5]
##    AU_CO     DE                           keyword_count country_total_keywords keyword_percentage
##    <chr>     <chr>                                <int>                  <int>              <dbl>
##  1 AUSTRALIA " WELLBEING"                            14                    176               7.95
##  2 AUSTRALIA " COVID-19"                             10                    176               5.68
##  3 AUSTRALIA " INDIGENOUS HEALTH"                    10                    176               5.68
##  4 AUSTRALIA " PUBLIC HEALTH"                        10                    176               5.68
##  5 AUSTRALIA "ABORIGINAL HEALTH"                     10                    176               5.68
##  6 BRAZIL    " RACE"                                  7                     83               8.43
##  7 BRAZIL    " ILLEGAL FOSSIL TRADE"                  6                     83               7.23
##  8 BRAZIL    " LATIN AMERICA"                         6                     83               7.23
##  9 BRAZIL    " PALAEONTOLOGICAL HERITAGE"             6                     83               7.23
## 10 BRAZIL    " PARACHUTE SCIENCE"                     6                     83               7.23
## # ℹ 13 more rows
# If you want to work with keywords
library(dplyr)
# Create a summary of publications by country
country_summary <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  group_by(AU_CO) %>%
  summarise(
    article_count = n(),
    unique_authors = n_distinct(AU),
    total_citations = sum(TC, na.rm = TRUE)
  ) %>%
  arrange(desc(article_count))

# Create a kable table for visualization
library(knitr)
country_summary %>%
  kable(
    col.names = c("Country", "Article Count", "Unique Authors", "Total Citations"),
    caption = "Publication Summary by Country"
  )
## Warning: 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")

## Warning: 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
# Prepare data for top countries
top_countries_keywords <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  separate_rows(DE, sep = ";") %>%
  group_by(AU_CO, DE) %>%
  summarise(
    keyword_count = n(),
    .groups = 'drop'
  ) %>%
  # Filter for top 10 countries by article count
  filter(AU_CO %in% c('USA', 'UNITED KINGDOM', 'CANADA', 'AUSTRALIA', 
                      'CHINA', 'BRAZIL', 'GERMANY', 'KOREA', 
                      'SOUTH AFRICA', 'SWEDEN')) %>%
  group_by(AU_CO) %>%
  slice_max(order_by = keyword_count, n = 3) %>%
  arrange(AU_CO, desc(keyword_count))

# Create a table
kable(top_countries_keywords, 
      col.names = c("Country", "Keyword", "Keyword Count"),
      caption = "Top Keywords by Country")
# Create a visualization
ggplot(top_countries_keywords, 
       aes(x = AU_CO, y = keyword_count, fill = DE)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Top Keywords by Country",
    x = "Country",
    y = "Keyword Count",
    fill = "Keywords"
  ) +
  theme_minimal() +
  coord_flip()

4.2.7 Status of publications by countries

country.counts <- as.data.frame(S[7]) # Article counts by country
country.citations <- as.data.frame(S[8]) # Citations by country
country <- country.counts %>% 
  merge(country.citations, by.x = 1, by.y = 1,
        all=T)
names(country) <- c("Country", "Articles", "Freq", "SCP", "MCP", "MCP_Ratio", "Total_Citations", "Average_Citations")

country %>% 
  arrange(desc(Articles)) %>% 
  na.omit() -> country_summary

country_summary
##          Country Articles    Freq SCP MCP MCP_Ratio Total_Citations Average_Citations
## 1 USA                 239 0.57869 225  14    0.0586            3143             13.15
## 2 UNITED KINGDOM       34 0.08232  30   4    0.1176             400             11.76
## 3 CANADA               30 0.07264  25   5    0.1667             294              9.80
## 4 AUSTRALIA            17 0.04116  13   4    0.2353             222             13.06
## 5 CHINA                 8 0.01937   8   0    0.0000              46              5.75
## 7 BRAZIL                6 0.01453   4   2    0.3333              82             13.67
## 8 INDIA                 5 0.01211   5   0    0.0000              58             11.60

4.2.8 Keywords and Keywords Plus

4.2.8.1 Author Keywords and Keywords-Plus

S[10] # Author Keywords and Keywords-Plus
## $MostRelKeywords
##    Author Keywords (DE)      Articles Keywords-Plus (ID)     Articles
## 1           RACISM                 45            RACE              54
## 2           RACE                   27            EXPERIENCES       37
## 3           STEM                   24            SCIENCE           34
## 4           NATIONALISM            23            WOMEN             34
## 5           HIGHER EDUCATION       17            EDUCATION         33
## 6           COVID-19               15            IDENTITY          28
## 7           COLONIALISM            14            STUDENTS          28
## 8           DIVERSITY              14            HEALTH            26
## 9           GENDER                 14            COLOR             20
## 10          EQUITY                 13            GENDER            20

4.2.8.2 Keyword Occurence Network

# Classical keyword co-occurrences network
NetMatrix1 <- biblioNetwork(M4, analysis = "co-occurrences", network = "keywords", sep = ";")

# statistics for the network
netstat1 <- networkStat(NetMatrix1)
summary(netstat1, k=10)
## 
## 
## Main statistics about the network
## 
##  Size                                  749 
##  Density                               0.016 
##  Transitivity                          0.262 
##  Diameter                              6 
##  Degree Centralization                 0.243 
##  Average path length                   2.849 
## 
# Plot the network
set.seed(3)
net1a = networkPlot(NetMatrix1, 
                   n = 25,  # Limit to top 25 keywords
                   normalize = "association",
                   Title = "Top Keyword Co-Occurrences", 
                   type = "circle", 
                   size = TRUE, 
                   remove.multiple = FALSE,
                   labelsize = 0.7,
                   cluster = "none")

net1b = networkPlot(NetMatrix1, 
                   n = 30,  # Even fewer nodes
                   normalize = "association",
                   Title = "Keyword Network", 
                   type = "kamada", 
                   size = TRUE, 
                   remove.multiple = TRUE,
                   labelsize = 0.5,
                   cluster = "louvain")

net1c = networkPlot(NetMatrix1, 
                   n = 30,  # Even fewer nodes
                   #normalize = "association",
                   #weighted = T,
                   Title = "Keyword Co-Occurence Network", 
                   type = "fruchterman", 
                   size = TRUE, 
                   remove.multiple = TRUE,
                   labelsize = 0.5,
                   cluster = "louvain")

# Save the plot as a high-resolution image with white background
ggsave("plots/keyword_co_occurrence_network.png", plot = net1c$plot, 
       width = 10, height = 8, dpi = 1200, bg = "white")

4.2.9 Conceptual Structure Map

4.3 Conceptual Structure

suppressWarnings(CS1 <- conceptualStructure(M4_tags,
                                            method="MCA", 
                                            field="ID", 
                                            minDegree=15, 
                                            clust=5, 
                                            stemming=FALSE, 
                                            labelsize=15,
                                            documents=20)
                 )

# Conceptual Structure using keywords (method="CA")
CS <- conceptualStructure(M4,field="ID", method="CA", minDegree=4, clust=5, stemming=FALSE, labelsize=10, documents=10)

CS <- conceptualStructure(M4, 
                           field="ID", 
                           method="CA", 
                           minDegree=4, 
                           clust=5, 
                           stemming=FALSE, 
                           labelsize=10,  # Set to 0 to remove labels
                           documents=10)

# Extract coordinates and clusters
coords <- CS[[1]]  # Coordinates
clusters <- CS[[2]]  # Cluster assignments

CS[4]

# Create a historical citation network
options(width=130)
histResults <- histNetwork(M4, min.citations = 5, sep = ";")
# Plot a historical co-citation network
net <- histPlot(histResults, n=15, size = 8, labelsize=4)

# LEXICAL PATTERNS
# keywords in context
M4_abstract <- corpus(M4$AB)
# M4_abstract
toks_M4_abstract <- corpus_subset(M4_abstract) %>% 
  tokens()

toks <- toks_M4_abstract
toks_clean <- tokens(toks, 
               remove_punct = TRUE, 
               remove_numbers = TRUE) %>%
        tokens_remove(stopwords("english"))

4.3.1 Top token frequencies

# Top token frequencies
top_tokens <- toks_clean %>%
  tokens_group() %>%
  dfm() %>%
  textstat_frequency(n = 20)
top_tokens %>%  # top tokens from abstracts
  filter(feature != "research") %>% 
  filter(feature != "study") %>% 
  filter(feature != "article") %>% 
  filter(feature != "also") %>% 
  filter(feature != "can")
##        feature frequency rank docfreq group
## 1         stem       470    1     187   all
## 2       racism       417    2     238   all
## 3        black       394    3     115   all
## 4       health       297    4      76   all
## 5     students       294    5      89   all
## 8       social       240    8     128   all
## 10      racial       216   10     110   all
## 11 experiences       209   11     107   all
## 12       women       200   12      61   all
## 13       white       190   13      90   all
## 14   education       173   14      87   all
## 15     science       172   15      76   all
## 17 nationalism       152   17      82   all
## 18  indigenous       151   18      45   all
## 19        race       144   19      88   all

4.3.2 Keywords-in-Context

kw_antiblack <- kwic(toks, pattern =  "anti-Black*")
kw_antiblack <- kw_antiblack %>% 
  rbind(kwic(toks, pattern = "antiblack*"))
head(kw_antiblack, 10)

kw_sup <- kwic(toks, pattern =  phrase("White Supremacy"))
head(kw_sup, 10)

kw_nationalism <- kwic(toks, pattern =  phrase("nationalism"))
head(kw_nationalism, 10)

kw_colonial <- kwic(toks, pattern =  phrase("colonial*"))
head(kw_colonial, 10)

kw_xeno <- kwic(toks, pattern =  phrase("xenophobi*"))
head(kw_xeno, 10)
kw_multiword_antiblack <- kwic(toks, pattern = phrase(c("anti-black*", "antiblack*")))
head(kw_multiword_antiblack, 10)

Lexical dispersion plots

4.3.3 Document feature matrix

Remove punctuation and stop words.

dfmat_notions <- dfm(toks_clean)
# print(dfmat_notions)

Tokens of words in abstract.

dfmat_notions_nostop <- dfm_select(dfmat_notions, pattern = stopwords("en"), selection = "remove")
# print(dfmat_notions_nostop)
notions.df <- convert(dfmat_notions_nostop, to = "data.frame")

4.3.4 Feature co-occurence matrix (FCM)

toks_abstract <- tokens(M4$AB, remove_punct = TRUE)
dfmat_AB <- dfm(toks_abstract)
dfmat_AB <- dfm_remove(dfmat_AB, pattern = c(stopwords("en"), "*-time", "updated-*", "gmt", "bst"))
dfmat_AB <- dfm_trim(dfmat_AB, min_termfreq = 100)
topfeatures(dfmat_AB)
nfeat(dfmat_AB)

# most frequent co-occuring words
fcmat_AB <- fcm(dfmat_AB)
dim(fcmat_AB)
# topfeatures(fcmat_AB)


feat <- names(topfeatures(dfmat_AB, 50))
fcmat_AB_select <- fcm_select(fcmat_AB, pattern = feat, selection = "keep")
dim(fcmat_AB_select)
size <- log(colSums(dfm_select(dfmat_AB, feat, selection = "keep")))

4.3.4.1 Network of most co-occuring words

set.seed(144)
textplot_network(fcmat_AB_select, min_freq = 0.8, vertex_size = size / max(size) * 3)

4.4 Thematic Map and Frequency Analysis

Map1=thematicMap(M4_tags, field = "ID", n = 250, minfreq = 10,
  stemming = FALSE, size = 0.4, n.labels=5, repel = TRUE)
plot(Map1$map)

4.4.1 Clusters

Clusters1=Map1$words[order(Map1$words$Cluster,-Map1$words$Occurrences),]
CL1 <- Clusters1 %>% group_by(.data$Cluster_Label) %>% 
  arrange(-Cluster_Frequency) %>% 
  top_n(5, .data$Occurrences)
# CL1

4.4.2 Frequency Analysis

4.4.2.1 Rac*

rac* keyword expressions

toks_rac <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "rac*")
dfmat_rac<- dfm(toks_rac)
dfmat_rac
tstat_freq_rac <- textstat_frequency(dfmat_rac, n = 13)
head(tstat_freq_rac, 15)

# frequency plot
r <- dfmat_rac %>% 
  textstat_frequency(n = 13) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "", # Top Rac* Terms by Frequency
    subtitle = "" # Visualization of Most Common Terms
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )
r

# Save the plot as a high-resolution image
ggsave("plots/frequency_plot_rac.png", plot = r, width = 10, height = 8, dpi = 1200, bg = "white")

4.4.2.2 Nationali*

Nationali* keyword expressions

# Tokenize and prepare data
toks_nat <- tokens(toks, remove_punct = TRUE) %>% 
            tokens_keep(pattern = "nationali*")
dfmat_nat <- dfm(toks_nat)
tstat_freq_nat <- textstat_frequency(dfmat_nat, n = 50)

# Create frequency plot
p <- dfmat_nat %>% 
  textstat_frequency(n = 9) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top Nationalism-related Terms by Frequency",
    subtitle = "Frequency of Nationalism Tokens"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )
p

# Save the plot as a high-resolution image with white background
ggsave("plots/frequency_plot_nation.png", plot = p, width = 10, height = 8, dpi = 1200, bg = "white")

4.4.2.3 Xeno

Xeno* keyword search

toks_xen <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "xen*")
dfmat_xen <- dfm(toks_xen)
dfmat_xen
tstat_freq_xen <- textstat_frequency(dfmat_xen, n = 50)
head(tstat_freq_xen, 50)

# frequency plot for dfmat_xen
x <- dfmat_xen %>% 
  textstat_frequency(n = 2) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top Xeno* Features by Frequency",
    subtitle = "Visualization of Most Common Terms"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )
x

4.4.2.4 Colonial*

Colonial* keyword expressions

toks_col <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "colon*")
dfmat_col <- dfm(toks_col)
dfmat_col
tstat_freq_col <- textstat_frequency(dfmat_col, n = 50)
head(tstat_freq_col, 50)

# frequency plot for dfmat_sup
s <- dfmat_col %>% 
  textstat_frequency(n = 11) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top Colonial* Features by Frequency",
    subtitle = "Visualization of Most Common Terms"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )
s

# Save the plot as a high-resolution image with white background
ggsave("plots/frequency_plot_sup.png", plot = s, width = 10, height = 8, dpi = 1200, bg = "white")

4.4.2.5 Anti*

Anti* keyword expressions

toks_anti <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "anti*")
dfmat_anti <- dfm(toks_anti)
dfmat_anti <- dfmat_anti[, colnames(dfmat_anti) != "anticipated"]
tstat_freq_anti <- textstat_frequency(dfmat_anti, n = 50)
head(tstat_freq_anti, 20) # frequency of at least two occurs at 21


# frequency plot for dfmat_anti
q <- dfmat_anti %>% 
  textstat_frequency(n = 20) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top Anti* Features by Frequency",
    subtitle = "Visualization of Most Common Terms"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.4, 
    size = 3
  )
q

# Save the plot as a high-resolution image with white background
ggsave("plots/frequency_plot_anti.png", plot = q, width = 10, height = 8, dpi = 1200, bg = "white")

4.4.2.6 Discrim*

Discrimination keyword occurents, based on the ID (Keyword plus) results, this was added.

toks_dis <- tokens(toks, remove_punct = TRUE) %>% 
               tokens_keep(pattern = "discrim*")
dfmat_dis <- dfm(toks_dis)
dfmat_dis
tstat_freq_dis <- textstat_frequency(dfmat_dis, n = 50)
head(tstat_freq_dis, 50)

# frequency plot for dfmat_xen
d <- dfmat_dis %>% 
  textstat_frequency(n = 2) %>% 
  ggplot(aes(x = reorder(feature, frequency), y = frequency)) +
  geom_point() +
  coord_flip() +
  labs(
    x = NULL, 
    y = "Frequency", 
    title = "Top 12 Features by Frequency",
    subtitle = "Visualization of Most Common Terms"
  ) +
  theme_classic() +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  ) +
  geom_text(
    aes(label = frequency), 
    hjust = -0.3, 
    size = 3
  )
d

4.4.2.7 Merging two frequency analysis

Comparative normalized frequencies: Rac* and Anti*

# Combine data
combined_data <- bind_rows(
  mutate(r$data, source = "Rac* Analysis"),
  mutate(q$data, source = "Anti* Analysis")
)

# Global normalization
combined_normalized <- combined_data %>%
  mutate(
    normalized_freq = frequency / max(combined_data$frequency)
  )

# Create plot with globally normalized values
ggplot(combined_normalized, 
       aes(x = reorder(feature, normalized_freq), 
           y = normalized_freq, 
           fill = source)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(
    x = "Features", 
    y = "Normalized Frequency",
    title = "Comparative Normalized Feature Frequencies"
  ) +
  scale_y_continuous(limits = c(0, 1)) +
  theme_minimal() +
  theme(legend.position = "bottom")

ggplot(combined_normalized, 
       aes(x = reorder(feature, normalized_freq), 
           y = normalized_freq, 
           fill = source)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(
    aes(label = sprintf("%.2f", normalized_freq), 
        y = normalized_freq),
    position = position_dodge(width = 0.9),
    hjust = -0.1,
    size = 3
  ) +
  coord_flip() +
  labs(
    x = "Features", 
    y = "Normalized Frequency",
    title = "Comparative Normalized Feature Frequencies"
  ) +
  scale_y_continuous(limits = c(0, 1.1)) +  # Extend y-axis to accommodate labels
  theme_minimal() +
  theme(legend.position = "bottom")

4.4.2.8 Merging multiple frequency analysis

Comparative normalized frequencies: Rac* and Anti*

ggplot(combined_all_normalized, 
       aes(x = reorder(feature, normalized_freq), 
           y = normalized_freq, 
           fill = source)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(
    aes(label = sprintf("%.2f", normalized_freq), 
        y = normalized_freq),
    position = position_dodge(width = 0.9),
    hjust = -0.1,
    size = 3
  ) +
  coord_flip() +
  labs(
    x = "Features", 
    y = "Normalized Frequency",
    title = "" # Comparative Normalized Feature Frequencies
  ) +
  scale_y_continuous(limits = c(0, 1.1)) +  # Extend y-axis to accommodate labels
  theme_minimal() +
  theme(legend.position = "bottom")

4.5 Analysis of Notions at Country-level

Comparing word scores by geographical region to connect to racialized social systems and geopolitics.

# Split countries and keywords
keyword_by_country_all <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  separate_rows(DE, sep = ";") %>%
  group_by(AU_CO, DE) %>%
  summarize(keyword_count = n()) %>%
  group_by(AU_CO) %>%
  slice_max(keyword_count, n = 5) %>%
  arrange(AU_CO, desc(keyword_count))
## `summarise()` has grouped output by 'AU_CO'. You can override using the `.groups` argument.
# Print results
head(keyword_by_country_all, n=10)
## # A tibble: 10 × 3
## # Groups:   AU_CO [3]
##    AU_CO      DE                           keyword_count
##    <chr>      <chr>                                <int>
##  1 AUSTRALIA  " WELLBEING"                            14
##  2 AUSTRALIA  " COVID-19"                             10
##  3 AUSTRALIA  " INDIGENOUS HEALTH"                    10
##  4 AUSTRALIA  " PUBLIC HEALTH"                        10
##  5 AUSTRALIA  "ABORIGINAL HEALTH"                     10
##  6 AUSTRIA    " INCLUSION"                             2
##  7 AUSTRIA    " PHYSICAL EDUCATION"                    2
##  8 AUSTRIA    " SPECIAL EDUCATIONAL NEEDS"             2
##  9 AUSTRIA    "DIVERSITY"                              2
## 10 BANGLADESH " BANGLADESH"                            1
tail(keyword_by_country_all, n=10)
## # A tibble: 10 × 3
## # Groups:   AU_CO [1]
##    AU_CO DE                               keyword_count
##    <chr> <chr>                                    <int>
##  1 <NA>  " PEDAGOGY"                                  1
##  2 <NA>  " POLONAISE"                                 1
##  3 <NA>  " QUALITATIVE RESEARCH"                      1
##  4 <NA>  " RECOGNITION"                               1
##  5 <NA>  " SOCIAL DETERMINANTS OF HEALTH"             1
##  6 <NA>  " VISHWAGURU"                                1
##  7 <NA>  "ARABIC LANGUAGE"                            1
##  8 <NA>  "HEALTH CARE"                                1
##  9 <NA>  "INDIA"                                      1
## 10 <NA>  "POLISH DANCES"                              1

4.5.1 Top keywords by country

# Identify top countries by publication count
top_countries <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  count(AU_CO, sort = TRUE) %>%
  slice_head(n = 5) %>%
  pull(AU_CO)

# Find top keywords for these countries
keyword_by_country_top <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  separate_rows(DE, sep = ";") %>%
  filter(AU_CO %in% top_countries) %>%
  group_by(AU_CO, DE) %>%
  summarize(keyword_count = n()) %>%
  group_by(AU_CO) %>%
  slice_max(keyword_count, n = 5) %>%
  arrange(AU_CO, desc(keyword_count))
## `summarise()` has grouped output by 'AU_CO'. You can override using the `.groups` argument.
head(keyword_by_country_top, n=17)
## # A tibble: 17 × 3
## # Groups:   AU_CO [3]
##    AU_CO     DE                           keyword_count
##    <chr>     <chr>                                <int>
##  1 AUSTRALIA " WELLBEING"                            14
##  2 AUSTRALIA " COVID-19"                             10
##  3 AUSTRALIA " INDIGENOUS HEALTH"                    10
##  4 AUSTRALIA " PUBLIC HEALTH"                        10
##  5 AUSTRALIA "ABORIGINAL HEALTH"                     10
##  6 BRAZIL    " RACE"                                  7
##  7 BRAZIL    " ILLEGAL FOSSIL TRADE"                  6
##  8 BRAZIL    " LATIN AMERICA"                         6
##  9 BRAZIL    " PALAEONTOLOGICAL HERITAGE"             6
## 10 BRAZIL    " PARACHUTE SCIENCE"                     6
## 11 BRAZIL    " RESEARCH ETHICS"                       6
## 12 BRAZIL    "SCIENTIFIC COLONIALISM"                 6
## 13 CANADA     <NA>                                   15
## 14 CANADA    " INDIGENOUS METHODOLOGIES"              9
## 15 CANADA    " INDIGENOUS PEOPLES"                    9
## 16 CANADA    " DECOLONIZATION"                        8
## 17 CANADA    "INDIGENOUS PEOPLES"                     8
tail(keyword_by_country_top, n=17)
## # A tibble: 17 × 3
## # Groups:   AU_CO [3]
##    AU_CO          DE                              keyword_count
##    <chr>          <chr>                                   <int>
##  1 CANADA         " INDIGENOUS METHODOLOGIES"                 9
##  2 CANADA         " INDIGENOUS PEOPLES"                       9
##  3 CANADA         " DECOLONIZATION"                           8
##  4 CANADA         "INDIGENOUS PEOPLES"                        8
##  5 UNITED KINGDOM  <NA>                                      10
##  6 UNITED KINGDOM " NATIONALISM"                              9
##  7 UNITED KINGDOM " AUTHORITARIANISM"                         7
##  8 UNITED KINGDOM " IMMIGRATION"                              7
##  9 UNITED KINGDOM " PANDEMIC"                                 7
## 10 UNITED KINGDOM " SOCIAL DOMINANCE ORIENTATION"             7
## 11 UNITED KINGDOM " THREAT"                                   7
## 12 UNITED KINGDOM "COVID-19"                                  7
## 13 USA             <NA>                                     127
## 14 USA            " RACISM"                                  54
## 15 USA            " STEM"                                    54
## 16 USA            " RACE"                                    42
## 17 USA            " HIGHER EDUCATION"                        41

4.5.2 Plots of top keywords by country

# Extract keywords for these countries
keywords_by_country <- M4_tags %>%
  separate_rows(AU_CO, sep = ";") %>%
  separate_rows(DE, sep = ";") %>%
  filter(AU_CO %in% top_countries) %>%
  group_by(AU_CO, DE) %>%
  summarize(keyword_count = n(), .groups = 'drop') %>%
  group_by(AU_CO) %>%
  slice_max(keyword_count, n = 10) %>%  # Top 10 keywords per country
  ungroup()

# Remove NA keywords and filter out countries with no valid keywords
keywords_by_country <- keywords_by_country %>%
  filter(!is.na(DE) & DE != "NA" & str_trim(DE) != "")

# Create a separate plot for each country with counts on bars
plot_keywords_by_country <- function(country_data) {
  ggplot(country_data, aes(x = reorder(DE, keyword_count), y = keyword_count)) +
    geom_bar(stat = "identity", fill = "steelblue") +
    geom_text(aes(label = keyword_count), 
              position = position_stack(vjust = 0.5), 
              color = "white", size = 3) +  # Add counts to the bars
    coord_flip() +
    labs(
      title = paste("Top Keywords for", unique(country_data$AU_CO)),
      x = "Keywords",
      y = "Keyword Count"
    ) +
    theme_minimal() +
    theme(
      axis.text.y = element_text(size = 8),
      plot.title = element_text(size = 10, face = "bold")
    )
}

# Split the data by country and create individual plots
country_plots <- keywords_by_country %>%
  group_split(AU_CO) %>%
  lapply(plot_keywords_by_country)

# Print or save the plots as needed
for (plot in country_plots) {
  print(plot)
}

4.5.3 Notions by Country

We first define the patterns, or notions, to be analyzed.

# Define patterns
patterns <- c("rac*", "nationali*", "xeno*", "colon*", "anti*", "white supremacy")

We then clean and duplicated the data for plotting.

## Warning: tolower argument is not used.
## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(everything(), sum, na.rm = TRUE)`.
## ℹ In group 1: `AU_CO = "AUSTRALIA"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))
## Warning: tolower argument is not used.
## # A tibble: 17 × 13
##    AU_CO   colonialism racism  race nationalism racial `anti-white` `anti-immigrant` colonial `anti-asian` racialized nationalisms
##    <chr>         <dbl>  <dbl> <dbl>       <dbl>  <dbl>        <dbl>            <dbl>    <dbl>        <dbl>      <dbl>        <dbl>
##  1 AUSTRA…           0      1     0           0      0            1                0        0            0          0            0
##  2 BANGLA…           0      1     0           0      0            0                0        0            0          0            0
##  3 BRAZIL            0      0     0           0      0            0                0        1            0          0            0
##  4 CANADA            0      0     1           0      0            0                0        1            0          0            0
##  5 CHINA             0      0     0           0      0            0                0        0            0          0            0
##  6 COLOMB…           1      1     0           0      0            0                0        0            0          0            0
##  7 CROATIA           0      0     0           0      0            0                0        0            0          0            0
##  8 ICELAND           0      0     0           0      0            0                0        0            0          0            0
##  9 INDIA             0      0     0           0      0            0                0        0            0          0            1
## 10 ISRAEL            0      0     0           0      0            0                0        0            0          0            0
## 11 MEXICO            0      0     0           0      0            0                0        0            0          0            0
## 12 NORWAY            0      0     0           0      0            0                0        0            0          0            0
## 13 POLAND            0      0     0           0      1            0                0        0            0          0            0
## 14 SOUTH …           0      0     0           0      0            0                0        0            0          0            0
## 15 UNITED…           0      0     0           1      0            0                1        0            0          1            0
## 16 USA               0      5     8           1      6            0                0        0            2          2            0
## 17 VIETNAM           0      0     0           0      0            0                0        0            0          0            0
## # ℹ 1 more variable: `anti-blackness` <dbl>
## # A tibble: 17 × 38
##    AU_CO        `anti-black` racism racialized antiracist colonialism colonization colonizers  race nationalism racial anticipated
##    <chr>               <dbl>  <dbl>      <dbl>      <dbl>       <dbl>        <dbl>      <dbl> <dbl>       <dbl>  <dbl>       <dbl>
##  1 AUSTRALIA               0      8          0          0           0            0          0     0           0      0           0
##  2 BANGLADESH              0     10          0          0           0            0          0     0           0      0           0
##  3 BRAZIL                  0      0          0          0           3            0          0     0           0      0           0
##  4 CANADA                  1     11          1          0          12            1          0     5           0      6           0
##  5 CHINA                   0      1          0          0           0            0          0     0           0      0           0
##  6 COLOMBIA                0      1          1          0           3            1          1     0           0      0           0
##  7 CROATIA                 0      0          0          0           0            0          0     0           1      0           0
##  8 ICELAND                 0      0          0          0           0            0          0     0           1      0           0
##  9 INDIA                   0      0          0          0           0            0          0     0           3      0           0
## 10 ISRAEL                  0      1          0          0           0            0          0     0           0      1           0
## 11 MEXICO                  0      1          0          0           0            0          0     0           0      0           0
## 12 NORWAY                  0      2          0          0           0            0          0     0           0      0           0
## 13 POLAND                  0      0          0          0           1            2          0     0           0      1           0
## 14 SOUTH AFRICA            0      1          0          0           3            0          0     0           0      0           0
## 15 UNITED KING…            0      4          0          0           2            0          0     0           1      3           0
## 16 USA                     9     61         15          1           5            2          0    31           4     39           2
## 17 VIETNAM                 0      0          0          0           1            0          0     0           0      0           0
## # ℹ 26 more variables: racially <dbl>, racialization <dbl>, `anti-asian` <dbl>, racist <dbl>, `anti-racism` <dbl>,
## #   colonial <dbl>, racismin <dbl>, `anti-white` <dbl>, colonisation <dbl>, `anti-immigrant` <dbl>, `anti-access` <dbl>,
## #   racism.intervention <dbl>, colonizing <dbl>, races <dbl>, nationalities <dbl>, colonies <dbl>, `anti-racist` <dbl>,
## #   `race-neutral` <dbl>, `anti-blackness` <dbl>, antiasian <dbl>, `race-ethnicity` <dbl>, nationalist <dbl>, `anti-parsi` <dbl>,
## #   nationalisms <dbl>, `anti-dalhousie` <dbl>, `racism-related` <dbl>
## tibble [17 × 13] (S3: tbl_df/tbl/data.frame)
##  $ AU_CO         : chr [1:17] "AUSTRALIA" "BANGLADESH" "BRAZIL" "CANADA" ...
##  $ colonialism   : num [1:17] 0 0 0 0 0 1 0 0 0 0 ...
##  $ racism        : num [1:17] 1 1 0 0 0 1 0 0 0 0 ...
##  $ race          : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ nationalism   : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racial        : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-white    : num [1:17] 1 0 0 0 0 0 0 0 0 0 ...
##  $ anti-immigrant: num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ colonial      : num [1:17] 0 0 1 1 0 0 0 0 0 0 ...
##  $ anti-asian    : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racialized    : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ nationalisms  : num [1:17] 0 0 0 0 0 0 0 0 1 0 ...
##  $ anti-blackness: num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
## tibble [17 × 38] (S3: tbl_df/tbl/data.frame)
##  $ AU_CO              : chr [1:17] "AUSTRALIA" "BANGLADESH" "BRAZIL" "CANADA" ...
##  $ anti-black         : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ racism             : num [1:17] 8 10 0 11 1 1 0 0 0 1 ...
##  $ racialized         : num [1:17] 0 0 0 1 0 1 0 0 0 0 ...
##  $ antiracist         : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ colonialism        : num [1:17] 0 0 3 12 0 3 0 0 0 0 ...
##  $ colonization       : num [1:17] 0 0 0 1 0 1 0 0 0 0 ...
##  $ colonizers         : num [1:17] 0 0 0 0 0 1 0 0 0 0 ...
##  $ race               : num [1:17] 0 0 0 5 0 0 0 0 0 0 ...
##  $ nationalism        : num [1:17] 0 0 0 0 0 0 1 1 3 0 ...
##  $ racial             : num [1:17] 0 0 0 6 0 0 0 0 0 1 ...
##  $ anticipated        : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racially           : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racialization      : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-asian         : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racist             : num [1:17] 1 0 0 1 0 0 0 0 0 0 ...
##  $ anti-racism        : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ colonial           : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ racismin           : num [1:17] 1 0 0 0 0 0 0 0 0 0 ...
##  $ anti-white         : num [1:17] 2 0 0 0 0 0 0 0 0 0 ...
##  $ colonisation       : num [1:17] 1 0 0 0 0 0 0 0 0 0 ...
##  $ anti-immigrant     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-access        : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racism.intervention: num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ colonizing         : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ races              : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ nationalities      : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ colonies           : num [1:17] 0 0 0 1 0 0 0 0 0 0 ...
##  $ anti-racist        : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ race-neutral       : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ anti-blackness     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ antiasian          : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ race-ethnicity     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ nationalist        : num [1:17] 0 0 0 0 0 0 0 0 2 0 ...
##  $ anti-parsi         : num [1:17] 0 0 0 0 0 0 0 0 1 0 ...
##  $ nationalisms       : num [1:17] 0 0 0 0 0 0 0 0 1 0 ...
##  $ anti-dalhousie     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
##  $ racism-related     : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...

4.5.3.1 Abstract patterns (top 10 notions)

Top 10 patterns in abstract.

# Create bar plot of top patterns
ggplot(top_patterns, aes(x = reorder(pattern, total_count), y = total_count)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 10 Patterns in Abstracts",
    x = "Pattern",
    y = "Total Count"
  ) +
  theme_minimal()

4.5.3.2 Pattern occurence by country

The pattern occurrence graphic illustrates the frequency of specific keywords related to race and social dynamics in the abstracts of academic publications, categorized by country. Each bar represents a keyword (such as “racism,” “colonialism,” or “nationalism”) and its corresponding count in the abstracts, allowing for a visual comparison of how often these themes are discussed across different countries. The bars are color-coded to indicate the country of the authors, making it easy to identify which countries are engaging with particular issues.

# Visualization of top patterns by country
ggplot(abstract_summary, aes(x = pattern, y = count, fill = AU_CO)) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  labs(
    title = "Pattern Occurrences by Country (AB)",
    x = "Pattern",
    y = "Count"
  ) +
  theme_minimal()

# Visualization of top patterns by country in abstracts with counts on bars (only if count > 4)
ggplot(abstract_summary, aes(x = pattern, y = count, fill = AU_CO)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = ifelse(count > 3, count, "")), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5,  # Adjust vertical position of text
            size = 3) +    # Adjust text size as needed
  coord_flip() +
  labs(
    title = "", # Pattern Occurrences by Country in Abstracts
    x = "Pattern",
    y = "Count"
  ) +
  theme_minimal()

Final modified graphic of relative proportions for whose count value is one or more.

# Visualization of top patterns by country in abstracts with proportions on bars
y <- ggplot(abstract_summary, aes(x = pattern, y = proportion, fill = AU_CO)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = ifelse(proportion > 0.05, round(proportion, 2), "")), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5,  # Adjust vertical position of text
            size = 3) +    # Adjust text size
  coord_flip() +
  labs(
    title = "", # Relative Proportions of Notions by Country (Abstract)
    x = "Pattern",
    y = "Proportion"
  ) +
  theme_minimal()

y

# Save the plot as a high-resolution image
ggsave("plots/pattern_occurence.png", plot = y, width = 10, height = 8, dpi = 1200, bg = "white")

References

Martin, Danny Bernard, Luz Valoyes-Chávez, and Paola Valero. 2024. “Race, Racism, and Racialization in Mathematics Education: Global Perspectives.” Educational Studies in Mathematics 116 (3): 313–31.
Nxumalo, Fikile, and Wanja Gitari. 2021. “Introduction to the Special Theme on Responding to Anti-Blackness in Science, Mathematics, Technology and STEM Education.” Canadian Journal of Science, Mathematics and Technology Education 21: 226–31.
Vakil, Sepehr, and Rick Ayers. 2019. “The Racial Politics of STEM Education in the USA: Interrogations and Explorations.” Race Ethnicity and Education 22 (4): 449–58.

  1. Department of Curriculum and Instruction. Corresponding Author. E-mail: ↩︎

  2. Department of Higher Education Leadership and Policy Studies↩︎

  3. Department of Psychology↩︎