Iran Citation - Progress Report

source("./progress_report_functions.R")
source("./figures.R")

Introduction

Current Research questions

The purpose of this report is to give a continues update on what we are currently working on when it comes to the Iranian citation

What different sactions regime affect collaborations with different parties?

Current Progress

API Access Expanded

The OpenAlex API has a default limit of 100,000 calls per day (max 10 per second). I contacted their team and received an increased quota of 1 million calls per day and 100 per second.

If you plan to replicate or extend this work, I strongly recommend requesting the same: OpenAlex API Rate Limits & Authentication

Data Collection – Iranian Institutions

We have successfully collected works from all 403 Iranian institutions. These datasets are stored at:
nsdpi-storage > people > czj9zj > temp_progress_final/

Files are named:
works_batch_1.rds (fewest works) to works_batch_403.rds (most works)
Key columns included: top_field, top_subfield, etc.
Data collection required careful error handling due to API limitations.

Current Limitations

Not yet filtered by strategic research fields.
Some works may have incorrect institution name mappings, however, this was fixed in pullIran_works_final.R.

Current Scripts

get_authors.R

This script is used to download all the authors from Iran who have published works

Updated Script to pull all the works – pullIran_works_final.R

This script re-downloads works with the correct institution name. It generates a folder temp_progresss_final/ with the most updated results. This is still in progress and needs to be completed.

Analysis Script – Iran_analysis.R

This script is now the main tool for analyzing Iranian works.

Next Steps

Deeper Exploratory Data Analysis

Current EDA needs refinement to better understand key patterns.
I reviewed this resource (but couldn’t yet replicate plots):
R Journal 2023 – Mapping Scientific Production

Pipeline Optimization & Reproducibility

The full workflow will be organized into a clean GitHub repo.
Harmonizing all pieces into one reproducible research package is a priority.
To do this, we need all the data.

Change Point Analysis

We aim to test whether key events (e.g., assassinations, sanctions) led to significant disruptions.
The gam package will be used to detect changes in trend (structural breaks).

Insights from D&PI BI weekly meeting (7/22/2025)

Mapping Science – (Uttan Rao)

Key takeaways for OpenAlex limitations:

No Advisor–Advisee Detection

Incomplete Affiliations

English Language Bias

Author Name Disambiguation Problems

Sparse or Missing Funding Info

These are critical to interpreting patterns accurately.

Research Questions

These are aligned with the NSDPI summer research objectives:

Key Question: How are the military affiliated PRC research institutions working with Iran and Russia on the emerging tech NSDPI is examining?

What research fields are largest/strongest?
Are there correlations with certain countries and who researchers are citing?
What are the values of the papers being cited? (Need to develop and refine measures)
Are new PRC policies driving certain research fields? What tools and methods can be used to better inform policy makers?
For a broader context/later work, compare this with what US/West is researching.

⚠️ Note: These questions require extensive comparative data beyond Iran. This is just a broather approach of the citation analysis.

data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

majid_works_long <- readRDS(file.path(data_path, "majid_works_long.rds"))

# Read Abbasi's long works data
abbasi_works_long <- readRDS(file.path(data_path, "abbasi_works_long.rds"))

# Read Masoud's long works data
masoud_works_long <- readRDS(file.path(data_path, "masoud_works_long.rds"))

# Save each object as an .RDS file
#saveRDS(majid_works, file = file.path(data_path, "majid_works.rds"))
#saveRDS(abbasi_works, file = file.path(data_path, "abbasi_works.rds"))
#saveRDS(masoud_works, file = file.path(data_path, "masoud_works.rds"))

majid_works <- readRDS(file.path(data_path, "majid_works.rds"))
abbasi_works <- readRDS(file.path(data_path, "abbasi_works.rds"))
masoud_works <- readRDS(file.path(data_path, "masoud_works.rds"))

Take 1: Identify the right subset of Iranian publication data

Steps from Iran meeting 7/23/25 - Margaret’s Notes:

Basic Strategy: snowball sample of topics and, if needed, coauthors

Pull authors who were assassinated;

The complete list of authors who were assassinated from Iranian institutions is saved as nuclear_scientist under extra_data/nuclear_scientist.rds. This dataset ranges from 2007 to 2025 and has all the incidents associated with Iranian scientist assassinations, including one in which the victim survived (Fereydoon Abbasi), and one where assassination has not been confirmed. The main source was this data set is this timeline. As of this date, a total of 16 targeted assassinations have been confirmed.

List of Identified Iranian Nuclear Scientists
Date	Victim	Expertise	Method	Location
2007-01-15	Ardeshir Hosseinpour	Authority on electromagnetism	By gas or possibly radiation poisoning	Shiraz
2010-01-12	Masoud Ali-Mohammadi	Quantum field theory and elementary particle physics	By a remote-control bomb attached to a motorcycle	Tehran
2010-11-29	Majid Shahriari	Specialized in neutron transport	By a bomb attached to his car from a motorcycle	Tehran
2010-11-29	Fereydoon Abbasi	Nuclear physicist and administrator	By a bomb attached to his car from a motorcycle	Tehran
2011-07-23	Darioush Rezaeinejad	Expert in neutron transport	Shot by motorcycle gunmen	Tehran
2012-01-11	Mostafa Ahmadi Roshan	Researching polymeric membranes for gaseous diffusion	By a bomb attached to his car from a motorcycle	Tehran
2020-11-27	Mohsen Fakhrizadeh	Nuclear physicist and head of Iran’s nuclear program	Shot by a remote-control machine gun	Damavand
2025-06-13	Fereydoon Abbasi	Expert in nuclear engineering	Killed in simultaneous strikes	Tehran
2025-06-13	Seyyed Amir Hossein Feghhi	Deputy of the Atomic Energy Organization; Expert in physics	Killed in simultaneous strikes	Tehran
2025-06-13	Akbar Motalebizadeh	Nuclear chemical engineering	Killed in simultaneous strikes	Tehran
2025-06-13	Mohammad Mehdi Tehranchi	Physics	Killed in simultaneous strikes	Tehran
2025-06-13	Saeed Borji	Materials engineering	Killed in simultaneous strikes	Tehran
2025-06-13	Mansour Asgari	Physics	Killed in simultaneous strikes	Tehran
2025-06-13	Ahmadreza Zolfaghari Daryani	Nuclear engineering and nuclear physics	Killed in simultaneous strikes	Tehran
2025-06-13	Ali Bakhouei Katirimi	Mechanics	Killed in simultaneous strikes	Tehran

“He said killing scientists may have been intended “to scare people so they don’t go work on these programs.”

“Then the questions are, ‘Where do you stop?’ I mean you start killing, like, students who study physics?” he asked. “This is a very slippery slope.”

“Strikes cannot destroy the knowledge Iran has acquired over several decades, nor any regime ambition to deploy that knowledge to build a nuclear weapon,” U.K. Foreign Secretary David Lammy told lawmakers in the House of Commons.”

Attach Author ID

Before we begin the steps, we attach the author id to make it easier later on to identify the scholars of interest.

In the article How do I find my OpenAlex author ID?, there are 3 recommenced methods from OpenAlex to finding author ids. For these purposes, we go to openalex.org/authors to search for authors in more detail. We then use the name in the search bar and filter by Instiutiton country = Iran.

But first we collect the names of all the scientists:

nuclear_scientists$Victim

##  [1] "Ardeshir Hosseinpour"         "Masoud Ali-Mohammadi"        
##  [3] "Majid Shahriari"              "Fereydoon Abbasi"            
##  [5] "Darioush Rezaeinejad"         "Mostafa Ahmadi Roshan"       
##  [7] "Mohsen Fakhrizadeh"           "Fereydoon Abbasi"            
##  [9] "Seyyed Amir Hossein Feghhi"   "Akbar Motalebizadeh"         
## [11] "Mohammad Mehdi Tehranchi"     "Saeed Borji"                 
## [13] "Mansour Asgari"               "Ahmadreza Zolfaghari Daryani"
## [15] "Ali Bakhouei Katirimi"        "Abdolhamid Minouchehr"       
## [17] "Isar Tabatabai-Qamsheh"       "Mohammad Reza Sedighi Saber"

We then use this list to manually collect the data. However, there are a couple of issues with this process:

Disambiguating scholars is difficult: It’s often hard to identify the correct scholar based solely on their name, as OpenAlex may return multiple candidates—even when we apply filters such as institution country = “Iran”.
Incomplete publication data: The total number of works listed for a given scholar sometimes appears to be incomplete, which could affect the accuracy of our data collection(example: Masoud Ali-Mohammadi). Also, finding these missing works can be hard as well. For instance, Majid Shahriari was a top Iranian nuclear scientist and physicist, yet finding his works can be challenging.

Adding OpenAlex Author Information to Nuclear Scientists Dataset

Load and Prepare Author Information

Because of this, the collection of Author ID will take a bit more time and requires careful inspection for accuracy.

Ardeshir Hosseinpour

Author ID: a5048006655 Alternate names: A. Hosseinpour, Ardeshir Hosseinpour Institution: Shiraz University Past institutions: Shiraz University, Malek Ashtar University of Technology H-index: 3 I10-index: 1 Works count: 4 Citations count: 37

Majid Shahriari

Author ID: A5112039075 Alternate names: Majid Shahriari, M H Shahriari Institution: Shahid Beheshti University Past institutions: Shahid Beheshti University, Amirkabir University of Technology H-index: 6 I10-index: 1 Works count: 13 Citations count: 80

Fereydoon Abbasi

8/6/25: Maura spotted a new Author ID for abbasi.

New Author ID: a5037724110 Alternate names: Freydon AbbasiDavani, Fereydoun Abbasi Davani, F. Abbasi‐Davani, Freydoun Abbasi Davani, Fereydoun Abbasi‐Davani +2 more Institution: Shahid Beheshti University Past institutions: Shahid Beheshti University, University of Tehran, Institute for Research in Fundamental Sciences, Islamic Azad University Bandar Abbas, Atomic Energy Organization of Iran +1 more

Old Author ID: a5103401909* Alternate name: Fereydoon Abbasi Davani Institution: Shahid Beheshti University Past institution: Shahid Beheshti University H-index: 2 I10-index: Works count: 13 Citations count: 6

8/8/25: Maura suspects Majid’s works to be under other author IDs

#your_email <- "czj9zj@virginia.edu"  # Replace with your email for better API performance

# Fetch all works
#works_5111646381 <- get_author_works("A5111646381", email = your_email)  # 43 works
#works_5112039075 <- get_author_works("A5112039075", email = your_email)  # 13 works
#works_5028976637 <- get_author_works("A5028976637", email = your_email)  # 2 works
#works_5102177495 <- get_author_works("A5102177495", email = your_email)  # 1 work

#data_path <- "/standard/nsdpi_storage/people/czj9zj/extra_data"
# Save each author's works
#saveRDS(works_5111646381, file = file.path(data_path, "works_5111646381.rds"))
#saveRDS(works_5112039075, file = file.path(data_path, "works_5112039075.rds"))
#saveRDS(works_5028976637, file = file.path(data_path, "works_5028976637.rds"))
#saveRDS(works_5102177495, file = file.path(data_path, "works_5102177495.rds"))

Merge with Nuclear Scientists Dataset

# Assuming your nuclear_scientists dataset is already loaded as 'nuclear_scientists'

# Merge the datasets
#enhanced_nuclear_scientists <- nuclear_scientists %>%left_join(author_info, by = "Victim")

# Display the enhanced dataset structure
#cat("Original columns:", ncol(nuclear_scientists), "\n")
#cat("Enhanced columns:", ncol(enhanced_nuclear_scientists), "\n")
#cat("New columns added:", ncol(enhanced_nuclear_scientists) - ncol(nuclear_scientists), "\n")

# Show column names
#cat("\nNew columns added:\n")
#new_cols <- setdiff(names(enhanced_nuclear_scientists), names(nuclear_scientists))
#cat(paste("-", new_cols, collapse = "\n"))

#saveRDS(enhanced_nuclear_scientists, "/standard/nsdpi_storage/people/czj9zj/extra_data/nuclear_scientists.rds")

View Enhanced Dataset

# Display only the rows with author information

#data_path <- "extra_data"
#nuclear_scientists <- readRDS(file.path(data_path, "nuclear_scientists.rds"))


#kable(enhanced_nuclear_scientists, caption = "Nuclear Scientists with OpenAlex Information")

Missing:

Darioush Rezaeinejad
Mostafa Ahmadi Roshan
Mohsen Fakhrizadeh
Seyyed Amir Hossein Feghhi
Akbar Motalebizadeh
Mohammad Mehdi Tehranchi
Saeed Borji
Mansour Asgari
Ahmadreza Zolfaghari Daryani
Ali Bakhouei Katirimi
Abdolhamid Minouchehr
Isar Tabatabai-Qamsheh
Mohammad Reza Sedighi Saber

Filtering for Pre-2025 Assassinations

We will continue on the list of To focus our analysis on a meaningful timeline, we filter the dataset to include only assassinations that occurred prior to 2025. This provides a clearer contrast between earlier targeted killings and those that occurred during the significant escalation in 2025.

Identified Iranian Nuclear Scientists Assassinated Prior to 2025
Date	Victim	Expertise	Method	Location	Event	Role	institution_name	author_id	alternate_names	current_institution	past_institutions	h_index	i10_index	works_count	citations_count
2007-01-15	Ardeshir Hosseinpour	Authority on electromagnetism	By gas or possibly radiation poisoning	Shiraz	Died	Professor	Shiraz University	a5048006655	A. Hosseinpour, Ardeshir Hosseinpour	Shiraz University	Shiraz University, Malek Ashtar University of Technology	3	1	4	37
2010-01-12	Masoud Ali-Mohammadi	Quantum field theory and elementary particle physics	By a remote-control bomb attached to a motorcycle	Tehran	Assassination	Professor	University of Tehran	A5111436477	Masoud Alimohammadi	Ilam University	Ilam University, University of Tehran, Institute for Research in Fundamental Sciences, Institute for Cognitive Science Studies	4	4	9	90
2010-11-29	Majid Shahriari	Specialized in neutron transport	By a bomb attached to his car from a motorcycle	Tehran	Assassination	Nuclear Engineer	Shahid Beheshti University	A5112039075	Majid Shahriari, M H Shahriari	Shahid Beheshti University	Shahid Beheshti University, Amirkabir University of Technology	6	1	13	80
2010-11-29	Fereydoon Abbasi	Nuclear physicist and administrator	By a bomb attached to his car from a motorcycle	Tehran	Survived	Professor	Shahid Beheshti University	A5103401909	Fereydoon Abbasi Davani	Shahid Beheshti University	Shahid Beheshti University	2	NA	13	6
2011-07-23	Darioush Rezaeinejad	Expert in neutron transport	Shot by motorcycle gunmen	Tehran	Assassination	Physicist	Shahid Beheshti University	NA	NA	NA	NA	NA	NA	NA	NA
2012-01-11	Mostafa Ahmadi Roshan	Researching polymeric membranes for gaseous diffusion	By a bomb attached to his car from a motorcycle	Tehran	Assassination	Professor	Sharif University of Technology	NA	NA	NA	NA	NA	NA	NA	NA
2020-11-27	Mohsen Fakhrizadeh	Nuclear physicist and head of Iran’s nuclear program	Shot by a remote-control machine gun	Damavand	Assassination	Professor	Imam Hossein University	NA	NA	NA	NA	NA	NA	NA	NA

Now we only have 7 researchers of interest. From these list, we can consider to remove Ardeshir Hosseinpour (15 January 2007), as there are “conflicting reports on the cause of Hosseinpour’s death” but we decide to keep it for these purposes as he seemed to be a very prominent figure (read below). We do exclude Fereydoon Abbasi from this subset. Although he was targeted in the 29 November 2010 attack that killed Majid Shahriari, Abbasi survived that assassination attempt. However, he was later killed on 13 June 2025 during the American-Israeli strikes that targeted Iranian scientists, military officials, and civilians (AP, 2025). So, we will keep an eye on him.

These 4 researchers were prominent in Iran’s nuclear program:

1.Ardeshir Hosseinpour, Age 45 “Hosseinpour was a nuclear physics scientist and a lecturer at Shiraz University and the Malek Ashtar University of Technology in Isfahan. An expert in the field of electromagnetism, he was one of the founders of the “Nuclear Technology Center of Isfahan,” the genesis of Natanz nuclear facility where he continued his research until his mysterious death on January 15, 2007.”
2. Masoud Ali Mohammadi, 50 “Mohammadi was a nuclear scientist and a PhD graduate student of physics from the Sharif University in Tehran. He had over 50 published papers and articles in academic journals and was reportedly named one of the key scientists in the advancements related to particle accelerator machines and atom smashers.”
3. Majid Shahriari, 45 was regarded as a key figure in the advancement of uranium enrichment technologies at Iran’s Atomic Energy Organization. He was assassinated on 29 November 2010 by a magnetic bomb attached to his car by assailants on a motorcycle, while driving on the Artesh highway in Tehran

Source for the 3 above: (VOA News).

4. Fereydoon Abbasi-Davani, 66, a professor of nuclear physics at Shahid Beheshti University, was reportedly a member of the Islamic Revolutionary Guard Corps (IRGC) since the 1979 Islamic Revolution (NYTimes, 2011).

Therefore, for the initial analysis, we will focus on these 3 scientists:

Identified Iranian Nuclear Scientists
Date	Victim	Expertise	Method	Location	Event	Role	institution_name	author_id	alternate_names	current_institution	past_institutions	h_index	i10_index	works_count	citations_count
2007-01-15	Ardeshir Hosseinpour	Authority on electromagnetism	By gas or possibly radiation poisoning	Shiraz	Died	Professor	Shiraz University	a5048006655	A. Hosseinpour, Ardeshir Hosseinpour	Shiraz University	Shiraz University, Malek Ashtar University of Technology	3	1	4	37
2010-01-12	Masoud Ali-Mohammadi	Quantum field theory and elementary particle physics	By a remote-control bomb attached to a motorcycle	Tehran	Assassination	Professor	University of Tehran	A5111436477	Masoud Alimohammadi	Ilam University	Ilam University, University of Tehran, Institute for Research in Fundamental Sciences, Institute for Cognitive Science Studies	4	4	9	90
2010-11-29	Majid Shahriari	Specialized in neutron transport	By a bomb attached to his car from a motorcycle	Tehran	Assassination	Nuclear Engineer	Shahid Beheshti University	A5112039075	Majid Shahriari, M H Shahriari	Shahid Beheshti University	Shahid Beheshti University, Amirkabir University of Technology	6	1	13	80

Additional Sources:

Pull papers associated with those author IDS

As we noticed above, the papers associated with the authors might be incomplete. We start by focusing on the scholars Majid and Abbasi .OpenAlex says there are 13 works associated with each scholar.

# Step 1: Find common columns
#common_cols <- intersect(names(new_abbasi_works), names(abbasi_works))

# Step 2: Subset both data frames to only those columns
#abbasi_subset <- abbasi_works[, common_cols]
#new_abbasi_subset <- new_abbasi_works[, common_cols]

# Step 3: Combine
#new_abbasi_works <- rbind(new_abbasi_subset, abbasi_subset)

#majid_id <- "A5112039075"

#abbasi_id <- "A5103401909"
#masoud_id <- "A5111436477"
#new_abbasi_id <- "a5037724110"

#your_email <- "czj9zj@virginia.edu"  # Replace with your email for better API performance

# Fetch all works
#cat("Fetching works for author ID:", author_id, "\n")
#majid_works <- get_author_works(majid_id, email = your_email)
#abbasi_works <- get_author_works(abbasi_id, email = your_email)


#new_abbasi_works <- get_author_works(new_abbasi_id, email = your_email)

#masoud_works <- get_author_works(masoud_id, email = your_email)

#data_path <- "/standard/nsdpi_storage/people/czj9zj/extra_data"
# Save Majid Shahriari's works
#saveRDS(majid_works, file = file.path(data_path, "majid_works.rds"))

# Save Fereydoon Abbasi's works
#saveRDS(abbasi_works, file = file.path(data_path, "abbasi_works.rds"))

#saveRDS(new_abbasi_works, file = file.path(data_path, "new_abbasi_works.rds"))

# Save Masoud Ali-Mohammadi's works
#masoud_works <- masoud_works %>%mutate(id = sub(".*/", "", id))
#saveRDS(masoud_works, file = file.path(data_path, "masoud_works.rds"))

#new_abbasi_works <- new_abbasi_works[, 1:50]

# Remove 'https://openalex.org/' from the 'id' column
#new_abbasi_works$id <- gsub("https://openalex.org/", "", new_abbasi_works$id)

#abbasi_works <- abbasi_works[, 1:50]

# Remove 'https://openalex.org/' from the 'id' column

#abbasi_works$id <- gsub("https://openalex.org/", "", abbasi_works$id)

# View the cleaned dataset
#head(abbasi_works)

# Step 1: Initialize an empty vector to store cleaned topic IDs
#all_topic_ids <- c()

# Step 2: Loop through all items in new_abbasi_works$topics
#for (i in seq_along(new_abbasi_works$topics)) {
  # Get the current topic dataframe
#  topic_df <- new_abbasi_works$topics[[i]]
  
  # Check if it's a data frame and contains the 'id' column
#  if (is.data.frame(topic_df) && "id" %in% names(topic_df)) {
    # Remove 'https://openalex.org/' from IDs
#    cleaned_ids <- gsub("https://openalex.org/", "", topic_df$display_name)
    
    # Append to the list of all topic IDs
#    all_topic_ids <- c(all_topic_ids, cleaned_ids)
#  }
#}

# Step 3: Optional - remove duplicates
#topics_name <- unique(all_topic_ids)
#new_abbasi_keywords <- topic_summary[topic_summary$Topic %in% topics_name, ]
#saveRDS(new_abbasi_keywords, file = file.path(data_path, "new_abbasi_keywords.rds"))

When we run colnames(majid_works), there are 468 columns in total and for Abbasi there are 336. We are interested in topics, subtopics, etc. So we filter to only those rows of interest. That is, we select:

Useful columns (id, title, display_name, etc.)
columns that start with primary_topic, topics, keywords, or concepts

data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic"
majid_works <- readRDS(file.path(data_path, "MShahriari_works.rds"))
abbasi_works <- readRDS(file.path(data_path, "FAbbasiDavani_works.rds"))
shahid_beheshti_university_cleanedLONG <-readRDS("/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data/shahid_beheshti_university_cleanedLONG.rds")

                                                  masoud_works <- readRDS(file.path("/standard/nsdpi_storage/people/czj9zj/extra_data", "masoud_works.rds"))

Visualisations

majid_works <- majid_works %>%
  mutate(year = year(ymd(publication_date)))

abbasi_works <- abbasi_works %>%
  mutate(year = year(ymd(publication_date)))

majid_counts <- majid_works %>%
  filter(!is.na(year)) %>%
  count(year)

abbasi_counts <- abbasi_works %>%
  filter(!is.na(year)) %>%
  count(year)

# Create shared range of years
combined_years <- tibble(year = min(c(majid_counts$year, abbasi_counts$year)):
                                           max(c(majid_counts$year, abbasi_counts$year)))

# Fill and label: Majid Shahriari
majid_full <- combined_years %>%
  left_join(majid_counts, by = "year") %>%
  replace_na(list(n = 0)) %>%
  mutate(author = "Majid Shahriari")

# Fill and label: Fereydoon Abbasi
abbasi_full <- combined_years %>%
  left_join(abbasi_counts, by = "year") %>%
  replace_na(list(n = 0)) %>%
  mutate(author = "Fereydoon Abbasi")

# Combine
combined_data <- bind_rows(majid_full, abbasi_full)

# Softer color palette
custom_colors <- c(
  "Majid Shahriari" = "#91B3D7",  # muted blue
  "Fereydoon Abbasi" = "#F4A582"  # muted coral
)

# Create plot

publication_plot <- ggplot(combined_data, aes(x = year, y = n, fill = author)) +
  geom_col(position = "dodge", width = 0.7, alpha = 0.85) +
  scale_fill_manual(values = custom_colors) +
  scale_x_continuous(
    breaks = seq(min(combined_years$year), max(combined_years$year), by = 2)
  ) +
  labs(
    title = "Publications Count by Year",
    x = "Publication Year",
    y = "Number of Publications",
    fill = "Author"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank()
  )

publication_plot

# Save to 'plots' folder
#ggsave(
#  filename = "/standard/nsdpi_storage/people/czj9zj/plots/publications_by_year.png",
#  plot = publication_plot,
#  width = 12,
#  height = 8,
#  dpi = 300
#)

make a big of all subfields and topics. Call that “subjects” (or something else, doesn’t really matter)

# Define your path if needed
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

# Read the keyword files (adjust file names if needed)
#majid_keywords <- readRDS(file.path(data_path, "majid_keywords.rds"))
#abbasi_keywords <- readRDS(file.path(data_path, "abbasi_keywords.rds"))

majid_keywords <- readRDS(file.path(data_path, "majid_keywords_v1.rds"))
new_abbasi_keywords <- readRDS(file.path(data_path, "abbasi_keywords_v1.rds"))

# Step 1: Extract unique topics from the 'Topic' column
#abbasi_topics <- unique(new_abbasi_keywords$Topic)
#majid_topics <- unique(new_majid_keywords$Topic)

# Step 2: Identify topic categories
#shared_topics <- intersect(abbasi_topics, majid_topics)
#abbasi_only_topics <- setdiff(abbasi_topics, shared_topics)
#majid_only_topics <- setdiff(majid_topics, shared_topics)

# Step 3: Padding helper
#pad_column <- function(x, n) {
#  length(x) <- n
#  x[is.na(x)] <- "---"
#  return(x)
#}

# Step 4: Create the comparison table
#max_len <- max(length(majid_only_topics), length(shared_topics), length(abbasi_only_topics))

#topic_comparison <- tibble(
#  "Majid Shahriari Only" = pad_column(majid_only_topics, max_len),
#  "Shared Topics" = pad_column(shared_topics, max_len),
#  "Fereydoon Abbasi Only" = pad_column(abbasi_only_topics, max_len)
#)

# Step 5: Generate the styled table
#table_html <- kbl(
#  topic_comparison,
#  caption = "Topic Comparison: Majid Shahriari vs. Fereydoon Abbasi"
#) %>%
#  kable_styling(full_width = FALSE)

# Step 6: Save as HTML
#dir.create("plots", showWarnings = FALSE)
#save_kable(table_html, file = "plots/topic_comparison.html")

# Step 7: Save as PNG
#webshot("plots/topic_comparison.html", "plots/topic_comparison.png", zoom = 2)

Keywords Associated with Majid Shahriari
	Domain	Field	Subfield	Topic	Total Works	Total Citations	% of Works	% of Citations
53	Physical Sciences	Physics and Astronomy	Radiation	Nuclear Physics and Applications	207020	1274223	0.09	0.05
83	Health Sciences	Medicine	Radiology, Nuclear Medicine and Imaging	Medical Imaging Techniques and Applications	177439	1662238	0.08	0.06
157	Health Sciences	Medicine	Ophthalmology	Glaucoma and retinal disorders	140274	2078725	0.06	0.08
211	Physical Sciences	Engineering	Safety, Risk, Reliability and Quality	Nuclear and radioactivity studies	130674	285858	0.06	0.01
228	Physical Sciences	Environmental Science	Health, Toxicology and Mutagenesis	Air Quality and Health Impacts	128119	3243177	0.06	0.12
229	Health Sciences	Medicine	Radiology, Nuclear Medicine and Imaging	Advanced MRI Techniques and Applications	127691	2302623	0.06	0.09
244	Physical Sciences	Engineering	Mechanics of Materials	Metal and Thin Film Mechanics	125631	1703914	0.06	0.06
292	Health Sciences	Medicine	Pulmonary and Respiratory Medicine	Cerebrovascular and Carotid Artery Diseases	116893	1283161	0.05	0.05
306	Physical Sciences	Earth and Planetary Sciences	Atmospheric Science	Atmospheric chemistry and aerosols	115277	3807482	0.05	0.14
311	Physical Sciences	Engineering	Civil and Structural Engineering	Structural Health Monitoring Techniques	114437	1283516	0.05	0.05
342	Physical Sciences	Physics and Astronomy	Radiation	Advanced Radiotherapy Techniques	111820	1196375	0.05	0.05
428	Physical Sciences	Engineering	Aerospace Engineering	Nuclear reactor physics and engineering	102519	483794	0.05	0.02
475	Physical Sciences	Engineering	Automotive Engineering	Vehicle emissions and performance	99203	641034	0.04	0.02
489	Physical Sciences	Materials Science	Materials Chemistry	Graphite, nuclear technology, radiation studies	98176	402275	0.04	0.02
568	Health Sciences	Medicine	Radiology, Nuclear Medicine and Imaging	Radiation Dose and Imaging	93510	782436	0.04	0.03

Keywords Associated with Fereydoon Abbasi
	Domain	Field	Subfield	Topic	Total Works	Total Citations	% of Works	% of Citations
17	Physical Sciences	Physics and Astronomy	Astronomy and Astrophysics	Astro and Planetary Science	272845	2696720	0.12	0.10
53	Physical Sciences	Physics and Astronomy	Radiation	Nuclear Physics and Applications	207020	1274223	0.09	0.05
56	Physical Sciences	Engineering	Civil and Structural Engineering	Engineering Applied Research	206697	170096	0.09	0.01
68	Physical Sciences	Engineering	Electrical and Electronic Engineering	Photonic and Optical Devices	192687	1971794	0.09	0.07
82	Health Sciences	Medicine	Radiology, Nuclear Medicine and Imaging	Monoclonal and Polyclonal Antibodies Research	177915	4304991	0.08	0.16
83	Health Sciences	Medicine	Radiology, Nuclear Medicine and Imaging	Medical Imaging Techniques and Applications	177439	1662238	0.08	0.06
103	Physical Sciences	Engineering	Electrical and Electronic Engineering	Semiconductor materials and devices	163693	2223407	0.07	0.08
119	Physical Sciences	Physics and Astronomy	Nuclear and High Energy Physics	Nuclear physics research studies	157237	2553546	0.07	0.10
167	Physical Sciences	Materials Science	Surfaces, Coatings and Films	Electron and X-Ray Spectroscopy Techniques	137173	1773250	0.06	0.07
186	Physical Sciences	Engineering	Control and Systems Engineering	Fault Detection and Control Systems	134886	1422574	0.06	0.05
211	Physical Sciences	Engineering	Safety, Risk, Reliability and Quality	Nuclear and radioactivity studies	130674	285858	0.06	0.01
219	Physical Sciences	Chemistry	Inorganic Chemistry	Radioactive element chemistry and processing	129479	1660269	0.06	0.06
238	Life Sciences	Biochemistry, Genetics and Molecular Biology	Molecular Biology	Advanced biosensing and bioanalysis techniques	126304	3813628	0.06	0.14
244	Physical Sciences	Engineering	Mechanics of Materials	Metal and Thin Film Mechanics	125631	1703914	0.06	0.06
325	Physical Sciences	Physics and Astronomy	Nuclear and High Energy Physics	Magnetic confinement fusion research	112874	1213545	0.05	0.05

(a) Pull all papers associated with Shahid Beheshti University (uni 393 in Luz’s numbering schmes) and which have those subfields and topics

Now, we read all the works associated with Shahid Beheshti University, which in our case is works_393. This has 35,305 works in total.

length(unique(works_393$id)) == nrow(works_393)

## [1] TRUE

length(unique(works_403$id)) == nrow(works_403)

## [1] FALSE

duplicated_ids <- works_403$id[duplicated(works_403$id)]

# Show which rows are duplicated
works_403 %>% 
  filter(id %in% duplicated_ids)

## # A tibble: 2 × 51
##   id              title display_name authorships abstract doi   publication_date
##   <chr>           <chr> <chr>        <list>      <chr>    <chr> <date>          
## 1 https://openal… Defe… Defect dete… <tibble>    This st… http… 2024-12-23      
## 2 https://openal… Defe… Defect dete… <tibble>    This st… http… 2024-12-23      
## # ℹ 44 more variables: publication_year <int>, fwci <dbl>,
## #   cited_by_count <int>, counts_by_year <list>, cited_by_api_url <chr>,
## #   ids <list>, type <chr>, is_oa <lgl>, is_oa_anywhere <lgl>, oa_status <chr>,
## #   oa_url <chr>, any_repository_has_fulltext <lgl>, source_display_name <chr>,
## #   source_id <chr>, issn_l <chr>, host_organization <chr>,
## #   host_organization_name <chr>, landing_page_url <chr>, pdf_url <chr>,
## #   license <chr>, version <chr>, referenced_works <list>, …

works_403 <- works_403 %>%
  filter(!duplicated(id) & !duplicated(id, fromLast = TRUE))

We then check the id column to make sure there are no duplicate works. We then see that the works are indeed unique.

Does Shahid Beheshti have more distinct topics than distinct subfields among its publications?

We do unique(works_393$top_subfield) and noticed there we 234 unique subfields in Shahid Beheshti University and 3129 unique topics.

  domain_colors <- c(
    "Physical Sciences" = "#1e3a8a",    # Deep blue
    "Social Sciences"   = "#7c2d12",    # Deep brown/orange
    "Life Sciences"     = "#14532d",    # Deep green  
    "Health Sciences"   = "#7f1d1d"     # Deep red
  )
  

# Create the plot and assign it to domain_pct_plot
domain_pct_plot <- works_393 %>%
  filter(!is.na(top_domain)) %>%
  count(top_domain, sort = TRUE) %>%
  mutate(pct = n / sum(n)) %>%
  ggplot(aes(x = reorder(top_domain, pct), y = pct, fill = top_domain)) +
  geom_col() +
  geom_text(aes(label = percent(pct, accuracy = 0.1)),
            hjust = 1.3, color = "white", size = 4.2) +
  coord_flip() +
  scale_y_continuous(labels = percent_format(accuracy = 0.1)) +
  scale_fill_manual(values = domain_colors) +
  labs(
    title = "Research Domains at Shahid Beheshti University",
    x = "Domain",
    y = "% of Total Works",
    fill = "Domain"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 13, face = "bold"),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
  )

# Now save it
#ggsave("plots/domain_percent_bar.png", plot = domain_pct_plot, width = 10, height = 6, dpi = 320)

topics_data <- works_393 %>%
  filter(!is.na(top_domain), !is.na(top_field)) %>%
  group_by(domain.display_name = top_domain, field.display_name = top_field) %>%
  summarise(works_count = n(), .groups = "drop")

treemap_plot <- create_complete_names_treemap(topics_data)

## Warning in geom_treemap_text(colour = "black", place = "centre", grow = TRUE, :
## Ignoring unknown parameters: `max.size` and `check_overlap`

## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#ggsave("plots/domain_treemap.png", treemap_plot, width = 12, height = 8, dpi = 320)

# Count unique fields
n_fields <- works_393 %>%
  filter(!is.na(top_field)) %>%
  summarise(n = n_distinct(top_field)) %>%
  pull(n)

# Get field-level counts and domains
field_data <- works_393 %>%
  filter(!is.na(top_field), !is.na(top_domain)) %>%
  count(top_field, top_domain, sort = TRUE)

# Plot all fields
ggplot(field_data, aes(x = reorder(top_field, n), y = n, fill = top_domain)) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(values = domain_colors) +
  labs(
    title = "Distribution of Research Fields at Shahid Beheshti University",
    subtitle = paste("All", n_fields, "fields colored by domain"),
    x = "Field",
    y = "Number of Works",
    fill = "Domain"
  ) +
  theme_minimal() +
  theme(
    axis.text = element_text(size = 11),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5)
  )

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ readr::col_factor()      masks scales::col_factor()
## ✖ scales::discard()        masks purrr::discard()
## ✖ dplyr::filter()          masks stats::filter()
## ✖ jsonlite::flatten()      masks purrr::flatten()
## ✖ kableExtra::group_rows() masks dplyr::group_rows()
## ✖ dplyr::lag()             masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(forcats)

# Identify top 5 fields by publication count
top_5_fields <- works_393 %>%
  filter(!is.na(top_field), !is.na(publication_year)) %>%
  count(publication_year, top_field) %>%
  group_by(top_field) %>%
  mutate(total = sum(n)) %>%
  ungroup() %>%
  group_by(top_field) %>%
  slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  slice_max(order_by = total, n = 5) %>%
  pull(top_field)

# Filter and plot
plot <- works_393 %>%
  filter(top_field %in% top_5_fields, !is.na(publication_year)) %>%
  count(publication_year, top_field) %>%
  ggplot(aes(x = publication_year, y = n, fill = top_field)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ top_field, scales = "free_y", ncol = 2) +
  scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
  theme_minimal() +
  labs(
    title = "Evolution of Top 5 Research Fields Over Time",
    subtitle = "Shahid Beheshti University – by publication count",
    x = "Publication Year",
    y = "Number of Works"
  )

# Save the plot
#ggsave("plots/top_fields_evolution.png", plot = plot, width = 10, height = 6, dpi = 300)

library(tidyverse)
library(forcats)

# Identify top 5 subfields by publication count
top_5_subfields <- works_393 %>%
  filter(!is.na(top_subfield), !is.na(publication_year)) %>%
  count(publication_year, top_subfield) %>%
  group_by(top_subfield) %>%
  mutate(total = sum(n)) %>%
  ungroup() %>%
  group_by(top_subfield) %>%
  slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  slice_max(order_by = total, n = 5) %>%
  pull(top_subfield)

# Filter and plot
plot <- works_393 %>%
  filter(top_subfield %in% top_5_subfields, !is.na(publication_year)) %>%
  count(publication_year, top_subfield) %>%
  ggplot(aes(x = publication_year, y = n, fill = top_subfield)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ top_subfield, scales = "free_y", ncol = 2) +
  scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
  theme_minimal() +
  labs(
    title = "Evolution of Top 5 Research Subfields Over Time",
    subtitle = "Shahid Beheshti University – by publication count",
    x = "Publication Year",
    y = "Number of Works"
  )

plot

## Warning: Removed 35 rows containing missing values or values outside the scale range
## (`geom_col()`).

# Save the plot
#ggsave("plots/top_subfields_evolution.png", plot = plot, width = 10, height = 6, dpi = 300)

library(tidyverse)
library(broom)  # for tidy regression results

# Step 1: Filter and prepare data
field_growth_slopes <- works_393 %>%
  filter(!is.na(top_field), !is.na(publication_year)) %>%
  count(publication_year, top_field) %>%
  group_by(top_field) %>%
  filter(n() >= 5) %>%  # Ensure enough data points for a slope
  nest() %>%
  mutate(
    model = map(data, ~lm(n ~ publication_year, data = .x)),
    slope = map_dbl(model, ~coef(.x)[["publication_year"]])
  ) %>%
  ungroup() %>%
  arrange(desc(slope)) %>%
  slice_head(n = 5)

# Extract top 5 fastest-growing fields
top_5_growing_fields <- field_growth_slopes$top_field

# Filter and plot the fastest-growing fields
plot <- works_393 %>%
  filter(top_field %in% top_5_growing_fields, !is.na(publication_year)) %>%
  count(publication_year, top_field) %>%
  ggplot(aes(x = publication_year, y = n, fill = top_field)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ top_field, scales = "free_y", ncol = 2) +
  scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
  theme_minimal() +
  labs(
    title = "Fastest-Growing Research Fields Over Time",
    subtitle = "Top 5 fields at Shahid Beheshti University based on publication growth rate",
    x = "Publication Year",
    y = "Number of Works"
  )

plot

## Warning: Removed 54 rows containing missing values or values outside the scale range
## (`geom_col()`).

#ggsave("plots/sbu_fastest_growing_fields.png", plot = plot, width = 10, height = 6, dpi = 300)

library(tidyverse)
library(scales)
library(lubridate)

# Ensure publication dates are parsed
works_393 <- works_393 %>%
  mutate(pub_month = floor_date(publication_date, "month"))

# Filter after year 2000 only
monthly_counts <- works_393 %>%
  filter(top_subfield %in% top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
  count(pub_month, top_subfield)

# Normalize to each subfield's first available month (after 2000)
monthly_growth <- monthly_counts %>%
  group_by(top_subfield) %>%
  arrange(pub_month) %>%
  mutate(
    base_count = first(n),
    growth_index = (n / base_count) * 100
  ) %>%
  ungroup()



dark_colors <- c(
  "Biomedical Engineering" = "#102542",   # Dark navy
  "Electrical and Electronic Engineering" = "#A84400",  # Deep indigo
  "Materials Chemistry" = "darkgreen",      # Dark red (true crimson)
  "Molecular Biology" = "darkred",        # Deep cyan-teal
  "Organic Chemistry" = "#4B0082"         # Dark violet/indigo
)


ggplot(monthly_growth, aes(x = pub_month, y = growth_index, color = top_subfield)) +
  geom_line(linewidth = 1, alpha = 0.8) +
  scale_color_manual(values = dark_colors) +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y", limits = c(as.Date("2000-01-01"), NA)) +
  labs(
    title = "Monthly Publication Growth in Top Subfields at Shahid Beheshti University",
    subtitle = "Indexed to each subfield's first recorded month after 2000 (Index = 100%)",
    x = "Publication Date",
    y = "Relative Publication Index (%)",
    color = "Subfield"
  ) +
  theme_minimal(base_size = 12)

# Save the plot
ggsave("plots/monthly_subfield_growth.png", width = 10, height = 6, dpi = 300)

#ggsave("plots/subfield_growth_slopechart.png", width = 10, height = 6, dpi = 300)

monthly_growth_sbu <- monthly_growth %>%
  mutate(university = "Shahid Beheshti University")

works_403 <- works_403 %>%
  mutate(pub_month = floor_date(publication_date, "month"))

# Filter after year 2000 only
monthly_counts_ut <- works_403 %>%
  filter(top_subfield %in% top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
  count(pub_month, top_subfield)

# Normalize to each subfield's first available month (after 2000)
monthly_growth_ut <- monthly_counts %>%
  group_by(top_subfield) %>%
  arrange(pub_month) %>%
  mutate(
    base_count = first(n),
    growth_index = (n / base_count) * 100
  ) %>%
  ungroup()


monthly_growth_ut <- monthly_growth_ut %>%
  mutate(university = "University of Tehran")
shared_colors <- c(
  "Biomedical Engineering" = "#102542",   # Dark navy
  "Electrical and Electronic Engineering" = "#A84400",  # Burnt orange
  "Molecular Biology" = "#8B0000",        # Dark red
  "Materials Chemistry" = "#4B0082"       # Dark violet
)

sbu_only_colors <- c(
  "Organic Chemistry" = "#3E3E6B"         # Deep steel blue
)

ut_only_colors <- c(
  "Mechanical Engineering" = "darkgreen"    # Dark teal
)

dark_colors_combined <- c(shared_colors, sbu_only_colors, ut_only_colors)

Next, we filter works_393 to only include works that fall within the subfield areas associated with Majid Shahriari’s research, as identified in majid_keywords. Specifically, we exclude the domain level from this filtering step. This is because OpenAlex only includes four broad domains, making domain-level classification too coarse to meaningfully reflect a researcher’s specific area of expertise and we instead aim to capture a more focused and relevant set of works aligned with Majid Shahriari’s research contributions.

Note on OpenAlex Topic Hierarchy: Works in OpenAlex are tagged with Topics using an automated model that evaluates features such as the title, abstract, journal, and citations of the work.
- There are approximately 4,500 Topics in OpenAlex.
- Each Topic is nested within a Subfield, which is nested within a Field, which in turn is nested within a top-level Domain.
- A work is assigned a primary topic (the one with the highest score), and inherits the corresponding subfield, field, and domain.

Source: https://help.openalex.org/hc/en-us/articles/24736129405719-Topics

Subset of Works from Shahid Beheshti University Matching Majid Shahriari’s Topic Areas
id	title	publication_date	topic	score	type	institution
W3193094654	<i>Planck</i> 2018 results	2021-08-01	Radiation Therapy and Dosimetry	0.9249	article	Shahid Beheshti University
W3200934665	FCC-hh: The Hadron Collider	2019-07-01	Particle Accelerators and Free-Electron Lasers	0.9984	article	Shahid Beheshti University
W2903991298	Recent advances in modeling and simulation of nanofluid flows—Part II: Applications	2018-12-05	Fluid Dynamics and Vibration Analysis	0.9947	article	Shahid Beheshti University
W4377695098	Diffusion models in medical imaging: A comprehensive survey	2023-05-23	AI in cancer detection	0.9944	review	Shahid Beheshti University
W4377695098	Diffusion models in medical imaging: A comprehensive survey	2023-05-23	MRI in cancer diagnosis	0.9940	review	Shahid Beheshti University
W2962966855	HE-LHC: The High-Energy Large Hadron Collider	2019-07-01	Particle Accelerators and Free-Electron Lasers	0.9982	article	Shahid Beheshti University
W4387778010	Advances in medical image analysis with vision Transformers: A comprehensive review	2023-10-19	AI in cancer detection	0.9979	review	Shahid Beheshti University
W4221140247	A next-generation liquid xenon observatory for dark matter and neutrino physics	2022-12-21	Atomic and Subatomic Physics Research	0.9955	article	Shahid Beheshti University
W4387430177	DAE-Former: Dual Attention-Guided Efficient Transformer for Medical Image Segmentation	2023-01-01	AI in cancer detection	0.9956	book-chapter	Shahid Beheshti University
W3134197091	Monte Carlo-based estimation of patient absorbed dose in 99mTc-DMSA, -MAG3, and -DTPA SPECT imaging using the University of Florida (UF) phantoms	2025-03-06	Radiopharmaceutical Chemistry and Applications	1.0000	article	Shahid Beheshti University
W3134197091	Monte Carlo-based estimation of patient absorbed dose in 99mTc-DMSA, -MAG3, and -DTPA SPECT imaging using the University of Florida (UF) phantoms	2025-03-06	Medical Imaging Techniques and Applications	1.0000	article	Shahid Beheshti University
W4406828888	Occurrence and transport of per- and polyfluoroalkyl substances (PFAS) in the leachate of a municipal solid waste landfill in Tehran, Iran (a Middle-eastern megacity)	2025-01-01	Atmospheric chemistry and aerosols	0.9603	article	Shahid Beheshti University
W2048628691	Deep Anterior Lamellar Keratoplasty in Patients with Keratoconus: Big-Bubble Technique	2010-01-14	Glaucoma and retinal disorders	0.9988	article	Shahid Beheshti University
W2039681904	Flow regime identification and void fraction prediction in two-phase flows based on gamma ray attenuation	2014-11-20	Nuclear reactor physics and engineering	0.9910	article	Shahid Beheshti University
W2039681904	Flow regime identification and void fraction prediction in two-phase flows based on gamma ray attenuation	2014-11-20	Nuclear Physics and Applications	0.9910	article	Shahid Beheshti University

Total works published in Majid subfields = 2155 (notice that all matching topics fall under already-included subfields. So if we filtered by topics only we would get 427 works.) Total works published in Majid topics/subfields/fields = 15520

# Using new_abbasi_keywords instead of just abbasi_keywords
# Subset works in Fereydoon Abbasi’s areas
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

abbasi_keywords_v1 <- readRDS(file.path(data_path, "abbasi_keywords_v1.rds"))


sbu_abbasi_works <- shahid_beheshti_university_cleanedLONG %>%
  filter(
    topic %in% abbasi_keywords_v1$Topic
  )

# Works NOT in Abbasi’s areas
sbu_non_abbasi_works <- shahid_beheshti_university_cleanedLONG %>%
  filter(
    !(topic %in% abbasi_keywords_v1$Topic)
  )


sbu_abbasi_works %>%
  head(15) %>%
  kbl(caption = "Subset of Works from Shahid Beheshti University Matching Fereydoon Abbasi’s Research Areas") %>%
  kable_styling(latex_options = c("hold_position", "scale_down"), font_size = 10)

Subset of Works from Shahid Beheshti University Matching Fereydoon Abbasi’s Research Areas
id	title	publication_date	topic	score	type	institution
W2284013896	Smart micro/nanoparticles in stimulus-responsive drug/gene delivery systems	2016-01-01	Nanoparticle-Based Drug Delivery	0.9999	review	Shahid Beheshti University
W3193094654	<i>Planck</i> 2018 results	2021-08-01	Radiation Therapy and Dosimetry	0.9249	article	Shahid Beheshti University
W2908825166	FCC-ee: The Lepton Collider	2019-06-01	Particle Detector Development and Performance	0.9895	article	Shahid Beheshti University
W2953154703	FCC Physics Opportunities	2019-06-01	Particle Detector Development and Performance	0.9982	article	Shahid Beheshti University
W3200934665	FCC-hh: The Hadron Collider	2019-07-01	Particle Accelerators and Free-Electron Lasers	0.9984	article	Shahid Beheshti University
W3200934665	FCC-hh: The Hadron Collider	2019-07-01	Superconducting Materials and Applications	0.9979	article	Shahid Beheshti University
W3119276753	Guanine-Based DNA Biosensor Amplified with Pt/SWCNTs Nanocomposite as Analytical Tool for Nanomolar Determination of Daunorubicin as an Anticancer Drug: A Docking/Experimental Investigation	2021-01-08	Advanced biosensing and bioanalysis techniques	1.0000	article	Shahid Beheshti University
W2108404396	Evidence for a Kaon-Bound State<mml:math xmlns:mml=“http://www.w3.org/1998/Math/MathML” display=“inline”><mml:msup><mml:mi>K</mml:mi><mml:mo>−</mml:mo></mml:msup><mml:mi>p</mml:mi><mml:mi>p</mml:mi></mml:math>Produced in<mml:math xmlns:mml=“http://www.w3.org/1998/Math/MathML” display=“inline”><mml:msup><mml:mi>K</mml:mi><mml:mo>−</mml:mo></mml:msup></mml:math>Absorption Reactions at Rest	2005-06-03	Nuclear physics research studies	0.9982	article	Shahid Beheshti University
W3131284135	A novel detection method for organophosphorus insecticide fenamiphos: Molecularly imprinted electrochemical sensor based on core-shell Co3O4@MOF-74 nanocomposite	2021-02-25	Advanced biosensing and bioanalysis techniques	0.9938	article	Shahid Beheshti University
W2096158223	Principal components analysis by the galaxy-based search algorithm: a novel metaheuristic for continuous optimisation	2011-01-01	Spectroscopy and Chemometric Analyses	0.9886	article	Shahid Beheshti University
W2027703730	Coplanar Full Adder in Quantum-Dot Cellular Automata via Clock-Zone-Based Crossover	2015-03-05	Quantum and electron transport phenomena	0.9931	article	Shahid Beheshti University
W3165315707	Liposomal Nanomedicine: Applications for Drug Delivery in Cancer Therapy	2021-05-25	Nanoparticle-Based Drug Delivery	1.0000	review	Shahid Beheshti University
W2899821123	Ultrasonic nano-emulsification – A review	2018-11-09	Ultrasound and Cavitation Phenomena	0.9984	review	Shahid Beheshti University
W2899821123	Ultrasonic nano-emulsification – A review	2018-11-09	Electrohydrodynamics and Fluid Dynamics	0.9955	review	Shahid Beheshti University
W2962966855	HE-LHC: The High-Energy Large Hadron Collider	2019-07-01	Particle Accelerators and Free-Electron Lasers	0.9982	article	Shahid Beheshti University

After creating the new_abbasi_keywords dataset, here is what we have in comparison to the abbasi_keywords:

In general, there are 4516 topics in OpenAlex and 252 unique subfields.

We have 35305 unique works in works_393. This has 3129 unique topics and 234 unique subfields.

new_abbasi_keywords has 99 unique topics and 36 unique subfields.
- In subfield only: 15783
- In topic only: 2193
abbasi_keywods has 7 unique topics 4 unique subfields.
- In subfield only: 4347
- In topic only: 147

library(dplyr)

data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

masoud_keywords_v1 <- readRDS(file.path(data_path, "masoud_keywords_v1.rds"))

base_path <- "/standard/nsdpi_storage/people/czj9zj/temp_progress_new"

# university_of_tehran_long <- readRDS(file.path(base_path, "works_batch_403.rds")) %>%
#   filter(type != "dataset", type != "erratum", type != "retraction") %>%
#   select(id, title, publication_date, topics, type) %>%
#   # keep rows where 'topics' exists and has at least one row
#   filter(purrr::map_lgl(topics, ~ !is.null(.x) && nrow(.x) > 0)) %>%
#   mutate(
#     topics = map(topics, ~ filter(.x, type == "topic")),
#     topics = map(topics, ~ select(.x, display_name, score))
#   ) %>%
#   unnest(cols = topics) %>%
#   rename(topic = display_name) %>%
#   mutate(
#     across(where(is.character), ~ gsub("https://openalex.org/", "", .x, fixed = TRUE)),
#     institution = "University of Tehran"
#   )


#saveRDS(university_of_tehran_long, file.path(base_path, "university_of_tehran_long.rds"))

university_of_tehran_long <- readRDS(file.path(base_path, "university_of_tehran_long.rds"))

# Works in Masoud's subfields
# Works in Masoud’s areas
sbu_masoud_works <- university_of_tehran_long %>%
  filter(
    topic %in% masoud_keywords_v1$Topic
  )

# Works NOT in Masoud’s areas
sbu_non_masoud_works <- university_of_tehran_long %>%
  filter(
    !(topic %in% masoud_keywords_v1$Topic)
  )

# Save both datasets as RDS
saveRDS(sbu_masoud_works, file = file.path("/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data", "sbu_masoud_works.rds"))
saveRDS(sbu_non_masoud_works, file = file.path("/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data", "sbu_non_masoud_works.rds"))

0.1 Major Foreign Events: Assassinations, Sabotage and Sanctions

Key Events Impacting Iran’s Scientific Output: Assassinations, Sabotage, and Pandemic
event	category	date	year
Stuxnet Cyberattack	Sabotage	2010-06-17	2010
Natanz Explosion	Sabotage	2020-07-02	2020
Natanz Blackout	Sabotage	2021-04-11	2021
Assassination of Massoud Ali-Mohammadi	Assassination	2010-01-12	2010
Assassination of Mohsen Fakhrizadeh	Assassination	2020-11-27	2020
First COVID-19 Case in Iran	Pandemic	2020-02-19	2020

data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

majid_timeline <- prepare_publication_timelines_with_complement(
  sbu_majid_works,
  sbu_non_majid_works,
  author_label = "Majid"
)

plot_author_subfields_with_sabotage(
  timeline_data = majid_timeline,
  author_label = "Majid",
  assassination_date = as.Date("2010-11-29"),
  within_color = "#1f78b4",
  data_path
)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 190 rows containing non-finite outside the scale range
## (`stat_smooth()`).

#ggsave("plots/majid_topics97threshold_plot.png", width = 10, height = 6, dpi = 300)

# Prepare Abbasi's timeline data
abbasi_timeline <- prepare_publication_timelines_with_complement(
  sbu_abbasi_works,
  sbu_non_abbasi_works,
  author_label = "Abbasi"
)

# Generate the plot for Abbasi
plot_author_subfields_with_sabotage(
  timeline_data = abbasi_timeline,
  author_label = "Abbasi",
  assassination_date = as.Date("2010-11-29"),  # Abbasi was not killed, but include a relevant date if analyzing a key event
  within_color = "coral",  # Use a distinct color for Abbasi
  data_path
)

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 206 rows containing non-finite outside the scale range
## (`stat_smooth()`).

# Save the plot
ggsave("plots/abbasi_topics97threshold_plot.png", width = 10, height = 6, dpi = 300)

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 206 rows containing non-finite outside the scale range
## (`stat_smooth()`).

masoud_timeline <- prepare_publication_timelines_with_complement(
  sbu_masoud_works,
  sbu_non_masoud_works,
  author_label = "Masoud"
)

works_monthly_long <- masoud_timeline$monthly_long

works_monthly_long <- works_monthly_long %>%
  mutate(count_type = recode(count_type,
                             "Masoud Monthly Count" = "Within Masoud's Topics",
                             "Complement Monthly Count" = "Outside Masoud's Topics"))

assassination_date <- as.Date("2010-01-12")

# Load events
major_events_filtered <- readRDS(file.path(data_path, "major_events_filtered.rds"))

# Remove Massoud Ali-Mohammadi's assassination
major_events_filtered <- major_events_filtered %>%
  filter(event != "Assassination of Massoud Ali-Mohammadi")

# Add Majid Shahriari's assassination (if needed again)
majid_event <- tibble::tibble(
  event = "Assassination of Majid Shahriari",
  category = "Assassination",
  date = as.Date("2010-11-29"),
  year = 2010
)

major_events_filtered <- bind_rows(major_events_filtered, majid_event)



# Define sabotage event styles
sabotage_events <- major_events_filtered %>%
  filter(category == "Sabotage")

# Packages
library(tidyverse)
library(ggnewscale)

# --- Styles for sabotage events ---------------------------------------------
sabotage_styles <- tibble::tibble(
  event    = c("Stuxnet Cyberattack", "Natanz Explosion", "Natanz Blackout"),
  color    = c("darkorange", "#008", "darkmagenta"),
  linetype = c("dotted", "dotdash", "dashed")
)

# Merge styles onto your sabotage_events data
# (assumes you already have sabotage_events with a 'date' column)
sabotage_events <- left_join(sabotage_events, sabotage_styles, by = "event")

# Factor for consistent legend order
sabotage_events$event <- factor(sabotage_events$event, levels = sabotage_styles$event)

# --- Plot --------------------------------------------------------------------
# assumes:
#   works_monthly_long with columns: year_month (Date), count (numeric), count_type (factor/chr)
#   assassination_date is a Date

p <- ggplot(works_monthly_long, aes(x = year_month, y = count)) +
  # Publication trends (two series) -------------------------------------------
  geom_smooth(aes(color = count_type),
              method = "loess", span = 0.25, se = FALSE, size = 0.9) +
  scale_color_manual(
    values = c(
      "Within Masoud's Topics"  = "darkolivegreen",
      "Outside Masoud's Topics" = "#333333"
    ),
    name = "Publication Trend"
  ) +

  # Assassination (solid red) -------------------------------------------------
  geom_vline(xintercept = assassination_date, color = "#e31a1c",
             linetype = "solid", linewidth = 1) +

  # Reset color scale for sabotage events ------------------------------------
  new_scale_color() +

  # Sabotage vertical lines (color + linetype mapped to event)
  geom_vline(
    data = sabotage_events,
    aes(xintercept = date, color = event, linetype = event),
    linewidth = 1
  ) +
  scale_color_manual(
    values = setNames(sabotage_styles$color, sabotage_styles$event),
    name = "Sabotage Event"
  ) +
  scale_linetype_manual(
    values = setNames(sabotage_styles$linetype, sabotage_styles$event),
    name  = "Sabotage Event"
  ) +

  # Facet & labels ------------------------------------------------------------
  facet_wrap(~ count_type, scales = "free_y", ncol = 1) +
  labs(
    title = "Publication Output Within and Outside Massoud’s Topics at University of Tehran",
    subtitle = "Solid red = Assassination of Massoud Ali-Mohammadi - January 12, 2010\nColored dashed lines = Sabotage Events",
    x = "Publication Date",
    y = "Monthly Publication Count"
  ) +
  scale_x_date(
    date_breaks = "2 years",
    date_labels = "%Y",
    limits = c(as.Date("2000-01-01"), NA)
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x   = element_text(angle = 60, hjust = 1),
    legend.box    = "vertical",
    legend.position = "bottom"
  )

# Print to screen
print(p)

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 423 rows containing non-finite outside the scale range
## (`new_stat_smooth()`).

# Save the plot
ggsave("plots/masoud_subfield_topics_plot.png", p, width = 10, height = 6, dpi = 300)

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 423 rows containing non-finite outside the scale range
## (`new_stat_smooth()`).

path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data"
works_tagged <- readRDS(file.path(path, "works_tagged.rds"))


library(tidyverse)
library(lubridate)
library(scales)

# 1. Function to prepare timeline data
prepare_combined_author_timelines <- function(works_tagged) {
  works_tagged %>%
    mutate(subfield_category = factor(
      subfield_category,
      levels = c("Majid Only", "Abbasi Only", "Both", "Neither")
    )) %>%
    group_by(monthly, subfield_category) %>%
    summarise(count = n(), .groups = "drop") %>%
    rename(
      year_month = monthly,
      count_type = subfield_category
    ) -> monthly_long

  list(monthly_long = monthly_long)
}

# 2. Prepare data
timeline_data <- prepare_combined_author_timelines(works_tagged)

# 3. Load events
major_events_filtered <- readRDS(file.path(data_path, "major_events_filtered.rds")) %>%
  filter(event != "Assassination of Massoud Ali-Mohammadi")

# Add Majid Shahriari’s assassination attempt (2010-11-29)
assassination_date <- as.Date("2010-11-29")

# 4. Define sabotage styling
sabotage_styles <- tibble::tibble(
  event = c("Stuxnet Cyberattack", "Natanz Explosion", "Natanz Blackout"),
  color = c("darkorange", "#008", "darkmagenta"),
  linetype = c("dotted", "dotdash", "dashed")
)

# Join to sabotage events
sabotage_events <- major_events_filtered %>%
  filter(category == "Sabotage") %>%
  left_join(sabotage_styles, by = "event") %>%
  mutate(event = factor(event, levels = sabotage_styles$event))

# 5. Colors for publication categories
pub_colors <- c(
  "Majid Only" = "#1f78b4",
  "Abbasi Only" = "coral",
  "Both" = "#8f7c82",
  "Neither" = "gray4",
  setNames(sabotage_styles$color, sabotage_styles$event)
)

# 6. Plot
p <- ggplot(timeline_data$monthly_long, aes(x = year_month, y = count, color = count_type)) +
  geom_smooth(method = "loess", span = 0.25, se = TRUE, linewidth = 1) +

  # Assassination attempt
  geom_vline(xintercept = assassination_date, color = "#e31a1c", linetype = "solid", linewidth = 1) +

  # Sabotage
  geom_vline(data = sabotage_events,
             aes(xintercept = date, color = event, linetype = event),
             linewidth = 1, show.legend = TRUE) +

  facet_wrap(~count_type, scales = "free_y", ncol = 1) +

  labs(
    title = "Publication Output at Shahid Beheshti University",
    subtitle = "Solid red line = Assassination attempt on Majid Shahriari (killed) and Fereydoon Abbasi (survived) – Nov 29, 2010\nColored dashed lines = Sabotage Events",
    x = "Publication Date",
    y = "Monthly Publication Count",
    color = "Publication Trend",
    linetype = "Sabotage Event"
  ) +

  scale_color_manual(
    values = pub_colors,
    breaks = c("Majid Only", "Abbasi Only", "Both", "Neither"),
    guide = guide_legend(order = 1)
  ) +

  scale_linetype_manual(
    values = setNames(sabotage_styles$linetype, sabotage_styles$event),
    guide = guide_legend(order = 2, override.aes = list(
      color = sabotage_styles$color
    ))
  ) +
  scale_x_date(
  date_breaks = "6 months", 
  date_labels = "%b %Y", 
  limits = c(as.Date("2009-12-01"), as.Date("2011-12-01"))
)+
  theme_minimal(base_size = 13) +
  theme(
    axis.text.x = element_text(angle = 60, hjust = 1),
    legend.position = "bottom",
    legend.box = "vertical"
  )

p

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 930 rows containing non-finite outside the scale range
## (`stat_smooth()`).

# 7. Save
ggsave("plots/combined_subfield_timelines.png", plot = p, width = 10, height = 6, dpi = 300)

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 930 rows containing non-finite outside the scale range
## (`stat_smooth()`).

# Also return it to view in console

library(tidyverse)
library(lubridate)

# Aggregate publication count per month
all_university_monthly <- works_tagged %>%
  group_by(monthly = floor_date(publication_date, "month")) %>%
  summarise(publication_count = n(), .groups = "drop")

# Plot
ggplot(all_university_monthly, aes(x = monthly, y = publication_count)) +
  #geom_line(color = "#333366", size = 1) +
  geom_smooth(method = "loess", span = 0.25, se = TRUE, color = "black", linewidth = 0.8) +
  labs(
    title = "Publication Output at Shahid Beheshti University",
    x = "Publication Date",
    y = "Monthly Publication Count",
    caption = "Note: Based on all available works in OpenAlex"
  ) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme_minimal(base_size = 13) +
  theme(
    axis.text.x = element_text(angle = 60, hjust = 1),
    plot.title = element_text(face = "bold")
  )

## `geom_smooth()` using formula = 'y ~ x'

Another important limitation relates to the accuracy of publication dates provided by OpenAlex. As many publication records might have missing or incomplete publication data, OpenAlex automatically assigns either the earliest known electronic release date or the first day of the month. This leads to artificial spikes in publication counts on the 1st of each month, not reflective of actual publication behavior. Unfortunately, OpenAlex does not distinguish between genuinely first-of-the-month publications and those retroactively assigned that date, making it impossible to disambiguate between real and artificial publication dates in such cases. To mitigate this distortion, we aggregated data at the monthly level and used LOESS smoothing during visualization. This smoothing approach helps reduce the influence of day-level anomalies and provides a clearer depiction of broader publication trends. While these steps lessen the impact of misdated entries, we recommend that future research incorporate additional metadata sources (such as Crossref XML or legacy MAG data) when finer temporal resolution is required.

library(tidyverse)
library(scales)
library(lubridate)

# Identify top 5 subfields by publication count
ut_top_5_subfields <- works_403 %>%
  filter(!is.na(top_subfield), !is.na(publication_year)) %>%
  count(publication_year, top_subfield) %>%
  group_by(top_subfield) %>%
  mutate(total = sum(n)) %>%
  ungroup() %>%
  group_by(top_subfield) %>%
  slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  slice_max(order_by = total, n = 5) %>%
  pull(top_subfield)

# Ensure publication dates are parsed
ut_works <- works_403 %>%
  mutate(pub_month = floor_date(publication_date, "month"))

# Filter after year 2000 only
monthly_counts_ut <- ut_works %>%
  filter(top_subfield %in% ut_top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
  count(pub_month, top_subfield)

# Normalize to each subfield's first available month (after 2000)
monthly_growth_ut <- monthly_counts_ut %>%
  group_by(top_subfield) %>%
  arrange(pub_month) %>%
  mutate(
    base_count = first(n),
    growth_index = (n / base_count) * 100
  ) %>%
  ungroup()

library(dplyr)
library(ggplot2)
library(scales)

# Step 1: Compute top 5 subfields by average growth for SBU
top_5_sbu <- monthly_growth %>%
  group_by(top_subfield) %>%
  summarise(avg_index = mean(growth_index, na.rm = TRUE)) %>%
  arrange(desc(avg_index)) %>%
  slice_head(n = 5) %>%
  pull(top_subfield)

# Step 2: Compute top 5 subfields by publication count for UT
top_5_ut <- works_403 %>%
  filter(pub_month >= as.Date("2000-01-01")) %>%
  group_by(top_subfield) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) %>%
  slice_head(n = 5) %>%
  pull(top_subfield)

# Step 3: Combine both top 5 lists
top_5_combined <- union(top_5_sbu, top_5_ut)

# Step 4: Prepare SBU data
monthly_growth_sbu <- monthly_growth %>%
  filter(top_subfield %in% top_5_combined) %>%
  mutate(university = "Shahid Beheshti University")

# Step 5: Prepare UT data
monthly_counts_ut <- works_403 %>%
  filter(top_subfield %in% top_5_combined, pub_month >= as.Date("2000-01-01")) %>%
  count(pub_month, top_subfield)

monthly_growth_ut <- monthly_counts_ut %>%
  group_by(top_subfield) %>%
  arrange(pub_month) %>%
  mutate(
    base_count = first(n),
    growth_index = (n / base_count) * 100
  ) %>%
  ungroup() %>%
  mutate(university = "University of Tehran")

# Step 6: Combine both datasets
combined_growth <- bind_rows(monthly_growth_sbu, monthly_growth_ut)

# Step 7: Define consistent colors
shared_colors <- c(
  "Biomedical Engineering" = "#102542",   # Dark navy
  "Electrical and Electronic Engineering" = "#A84400",  # Burnt orange
  "Molecular Biology" = "#8B0000",        # Dark red
  "Materials Chemistry" = "#4B0082"       # Dark violet
)

sbu_only_colors <- c(
  "Organic Chemistry" = "darkcyan"         # Deep steel blue
)

ut_only_colors <- c(
  "Mechanical Engineering" = "darkgreen"    # Dark teal
)

dark_colors_combined <- c(shared_colors, sbu_only_colors, ut_only_colors)

# Step 8: Plot
ggplot(combined_growth, aes(x = pub_month, y = growth_index, color = top_subfield)) +
  geom_line(linewidth = 1, alpha = 0.8) +
  facet_wrap(~university) +
  scale_color_manual(values = dark_colors_combined) +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y", limits = c(as.Date("2000-01-01"), NA)) +
  labs(
    title = "Comparison of Publication Growth in Top Subfields",
    subtitle = "Indexed to each subfield's first recorded month after 2000 (Index = 100%)",
    x = "Publication Date",
    y = "Relative Publication Index (%)",
    color = "Subfield"
  ) +
  theme_minimal(base_size = 12)

# Save the plot
ggsave("plots/ut_sbu_monthly_subfield_growth.png", width = 10, height = 6, dpi = 300)

assassination_date <- as.Date("2010-11-29")
#assassination_date <- as.Date("2010-01-12")


avg_growth_before <- combined_growth %>%
  filter(pub_month < assassination_date) %>%
  group_by(university, top_subfield) %>%
  summarise(avg_growth_index = mean(growth_index, na.rm = TRUE)) %>%
  mutate(period = "Before 2010-11-29")

## `summarise()` has grouped output by 'university'. You can override using the
## `.groups` argument.

avg_growth_after <- combined_growth %>%
  filter(pub_month >= assassination_date) %>%
  group_by(university, top_subfield) %>%
  summarise(avg_growth_index = mean(growth_index, na.rm = TRUE)) %>%
  mutate(period = "After 2010-11-29")

## `summarise()` has grouped output by 'university'. You can override using the
## `.groups` argument.

growth_comparison <- bind_rows(avg_growth_before, avg_growth_after) %>%
  arrange(university, top_subfield, period) %>%
  mutate(avg_growth_index = round(avg_growth_index, 1))

To identify the top five subfields at each institution, we began by filtering the publication dataset by university—Shahid Beheshti University and the University of Tehran. For each institution, we grouped publications by subfield and by month, and then computed the number of publications per subfield per month. To assess growth over time, we calculated a relative growth index for each subfield by normalizing its monthly publication count to the number of publications in its first recorded month after the year 2000. Specifically, the growth index was defined as:

\[ \text{Growth Index} = \frac{\text{Monthly Publications}}{\text{Publications in First Month}} \times 100 \]

This measure reflects how much each subfield’s output increased relative to its own baseline. We then averaged the growth index over time for each subfield and selected the top five subfields with the highest mean growth index for each university. Building on this measure, we observe that both Shahid Beheshti University and the University of Tehran share four of the same top five subfields with the highest relative growth: Biomedical Engineering, Electrical and Electronic Engineering, Materials Chemistry, and Molecular Biology. The fifth subfield differs across institutions: Organic Chemistry is prominent at Shahid Beheshti, whereas Mechanical Engineering stands out at the University of Tehran. Among these, Electrical and Electronic Engineering saw the most pronounced growth at Shahid Beheshti University, while Mechanical Engineering experienced the fastest expansion at the University of Tehran. These patterns indicate both shared and institution-specific research priorities since the beginning of the 2000s.

Get all the author names from Iranian Institution

## Requesting url: https://api.openalex.org/institutions?filter=country_code%3AIR

## Getting 3 pages of results with a total of 403 records...

Format for DiD

```

(b) If the dataset is too small– then all Iranian papers associated with those subfields and topics

The script where I coded this is called filter_merge.R.

Take 2

take the subject vector, pull all papers associated with the identified keywords and subtopics
Pull all author IDs associated with those papers
Look at # of people working over time.

If we need more data: - Run a clustering analysis of topics/keywords/subtopics to expand aperture - Run the same snowball sample outlined above for all co-authors of the assassinated scholars

The above addresses the narrowest interpretation of our question: did assassination [and probably other co-occurring sabotage] campaigns reduce scientific output in the university and the research areas associated with the targeted scholars?

and Luz– let us know as you get through the “first pass” checklist. I think that there’s some clever way to use dplyr to search and collected those nested keyword/sub topic lists without having to permanently expand out the dataset. I don’t know it off of the top of my head, but I can look for advice if you get stumped.

Iranian research notes from today’s check-in meeting(7/24/2025)/things I’ll look into:

Create some plots with the Pre/Post Comparison of Work Output: That is, create timelines or counts by year to visualize trends, mainly of pre/post volume of works being published in areas/keywords of interest for various universities, but also look at trends of topics and subfields pre vs. post. So, is there a decrease in publication in specific areas of interest? Share in the chat.
Co-Authorship over time: Track how scholars’s direct and indirect collaborators evolve over time. Take Majid Shahriar as an example. He became an active in neutron transport research in the early 2000s and began contributing to Iran’s nuclear science community. Between 2005–2010, he held a key position at Shahid Beheshti University and worked with the Atomic Energy Organization of Iran. By 2009–2010, he was identified as a top nuclear scientist and reportedly worked on strategic applications of nuclear chain reactions. Gained international surveillance attention. SO, since Majid Shahriari gained more power and strategic responsibility within Iran’s nuclear establishment (particularly in sensitive or classified domains) his publication count would likely decrease but this doesn’t precisely means he is not publishing as much. Create a list of authors Majid Shahriari collaborated with prior to his assassination to identify potential trends or patterns in his research network
Structure the missing work: For missing or unindexed works, cross-check works in OpenAlex vs. known bibliography and flag known missing papers for deeper investigation.

Talk with Damian 8/4/2025:

Compared to other subfields/topics that they had nothing to do with, his death shouldn’t affect those.
But ALSO, if we look at at one of his subfields/topics (let’s say nuclear engineering) then it would have at another university (We could hold the exact same subfield/topic constant since it’s from another university)
Subfield level: Group by up to the subfield/month then take each subfield and give it 1 for treatmenmt and 0 for control Time indicator (post = 1 for every month that majid was assassinated and 0 for everything before then ). Month of november should be zero.
Create the same for Abbasi!