Iran Citation - Progress Report

source("./progress_report_functions.R")
source("./figures.R")

Introduction

Current Research questions

The purpose of this report is to give a continues update on what we are currently working on when it comes to the Iranian citation

What different sactions regime affect collaborations with different parties?

Current Progress

API Access Expanded

The OpenAlex API has a default limit of 100,000 calls per day (max 10 per second). I contacted their team and received an increased quota of 1 million calls per day and 100 per second.

If you plan to replicate or extend this work, I strongly recommend requesting the same: OpenAlex API Rate Limits & Authentication

Data Collection – Iranian Institutions

We have successfully collected works from all 403 Iranian institutions. These datasets are stored at:
nsdpi-storage > people > czj9zj > temp_progress_final/

  • Files are named:
    works_batch_1.rds (fewest works) to works_batch_403.rds (most works)
  • Key columns included: top_field, top_subfield, etc.
  • Data collection required careful error handling due to API limitations.

Current Limitations

  • Not yet filtered by strategic research fields.
  • Some works may have incorrect institution name mappings, however, this was fixed in pullIran_works_final.R.

Current Scripts

  • get_authors.R

This script is used to download all the authors from Iran who have published works

  • Updated Script to pull all the works – pullIran_works_final.R

This script re-downloads works with the correct institution name. It generates a folder temp_progresss_final/ with the most updated results. This is still in progress and needs to be completed.

  • Analysis Script – Iran_analysis.R

This script is now the main tool for analyzing Iranian works.

Next Steps

Deeper Exploratory Data Analysis

Pipeline Optimization & Reproducibility

  • The full workflow will be organized into a clean GitHub repo.
  • Harmonizing all pieces into one reproducible research package is a priority.
  • To do this, we need all the data.

Change Point Analysis

  • We aim to test whether key events (e.g., assassinations, sanctions) led to significant disruptions.
  • The gam package will be used to detect changes in trend (structural breaks).

Insights from D&PI BI weekly meeting (7/22/2025)

  1. Mapping Science – (Uttan Rao)

Key takeaways for OpenAlex limitations:

  • No Advisor–Advisee Detection
  • Incomplete Affiliations
  • English Language Bias
  • Author Name Disambiguation Problems
  • Sparse or Missing Funding Info

These are critical to interpreting patterns accurately.

Research Questions

These are aligned with the NSDPI summer research objectives:

Key Question: How are the military affiliated PRC research institutions working with Iran and Russia on the emerging tech NSDPI is examining?

  1. What research fields are largest/strongest?
  2. Are there correlations with certain countries and who researchers are citing?
  3. What are the values of the papers being cited? (Need to develop and refine measures)
  4. Are new PRC policies driving certain research fields? What tools and methods can be used to better inform policy makers?
  5. For a broader context/later work, compare this with what US/West is researching.

⚠️ Note: These questions require extensive comparative data beyond Iran. This is just a broather approach of the citation analysis.

data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

majid_works_long <- readRDS(file.path(data_path, "majid_works_long.rds"))

# Read Abbasi's long works data
abbasi_works_long <- readRDS(file.path(data_path, "abbasi_works_long.rds"))

# Read Masoud's long works data
masoud_works_long <- readRDS(file.path(data_path, "masoud_works_long.rds"))

# Save each object as an .RDS file
#saveRDS(majid_works, file = file.path(data_path, "majid_works.rds"))
#saveRDS(abbasi_works, file = file.path(data_path, "abbasi_works.rds"))
#saveRDS(masoud_works, file = file.path(data_path, "masoud_works.rds"))

majid_works <- readRDS(file.path(data_path, "majid_works.rds"))
abbasi_works <- readRDS(file.path(data_path, "abbasi_works.rds"))
masoud_works <- readRDS(file.path(data_path, "masoud_works.rds"))

Take 1: Identify the right subset of Iranian publication data

Steps from Iran meeting 7/23/25 - Margaret’s Notes:

  • Basic Strategy: snowball sample of topics and, if needed, coauthors

Pull authors who were assassinated;

The complete list of authors who were assassinated from Iranian institutions is saved as nuclear_scientist under extra_data/nuclear_scientist.rds. This dataset ranges from 2007 to 2025 and has all the incidents associated with Iranian scientist assassinations, including one in which the victim survived (Fereydoon Abbasi), and one where assassination has not been confirmed. The main source was this data set is this timeline. As of this date, a total of 16 targeted assassinations have been confirmed.

List of Identified Iranian Nuclear Scientists
Date Victim Expertise Method Location
2007-01-15 Ardeshir Hosseinpour Authority on electromagnetism By gas or possibly radiation poisoning Shiraz
2010-01-12 Masoud Ali-Mohammadi Quantum field theory and elementary particle physics By a remote-control bomb attached to a motorcycle Tehran
2010-11-29 Majid Shahriari Specialized in neutron transport By a bomb attached to his car from a motorcycle Tehran
2010-11-29 Fereydoon Abbasi Nuclear physicist and administrator By a bomb attached to his car from a motorcycle Tehran
2011-07-23 Darioush Rezaeinejad Expert in neutron transport Shot by motorcycle gunmen Tehran
2012-01-11 Mostafa Ahmadi Roshan Researching polymeric membranes for gaseous diffusion By a bomb attached to his car from a motorcycle Tehran
2020-11-27 Mohsen Fakhrizadeh Nuclear physicist and head of Iran’s nuclear program Shot by a remote-control machine gun Damavand
2025-06-13 Fereydoon Abbasi Expert in nuclear engineering Killed in simultaneous strikes Tehran
2025-06-13 Seyyed Amir Hossein Feghhi Deputy of the Atomic Energy Organization; Expert in physics Killed in simultaneous strikes Tehran
2025-06-13 Akbar Motalebizadeh Nuclear chemical engineering Killed in simultaneous strikes Tehran
2025-06-13 Mohammad Mehdi Tehranchi Physics Killed in simultaneous strikes Tehran
2025-06-13 Saeed Borji Materials engineering Killed in simultaneous strikes Tehran
2025-06-13 Mansour Asgari Physics Killed in simultaneous strikes Tehran
2025-06-13 Ahmadreza Zolfaghari Daryani Nuclear engineering and nuclear physics Killed in simultaneous strikes Tehran
2025-06-13 Ali Bakhouei Katirimi Mechanics Killed in simultaneous strikes Tehran

“He said killing scientists may have been intended “to scare people so they don’t go work on these programs.”

“Then the questions are, ‘Where do you stop?’ I mean you start killing, like, students who study physics?” he asked. “This is a very slippery slope.”

“Strikes cannot destroy the knowledge Iran has acquired over several decades, nor any regime ambition to deploy that knowledge to build a nuclear weapon,” U.K. Foreign Secretary David Lammy told lawmakers in the House of Commons.”

Attach Author ID

Before we begin the steps, we attach the author id to make it easier later on to identify the scholars of interest.

In the article How do I find my OpenAlex author ID?, there are 3 recommenced methods from OpenAlex to finding author ids. For these purposes, we go to openalex.org/authors to search for authors in more detail. We then use the name in the search bar and filter by Instiutiton country = Iran.

But first we collect the names of all the scientists:

nuclear_scientists$Victim
##  [1] "Ardeshir Hosseinpour"         "Masoud Ali-Mohammadi"        
##  [3] "Majid Shahriari"              "Fereydoon Abbasi"            
##  [5] "Darioush Rezaeinejad"         "Mostafa Ahmadi Roshan"       
##  [7] "Mohsen Fakhrizadeh"           "Fereydoon Abbasi"            
##  [9] "Seyyed Amir Hossein Feghhi"   "Akbar Motalebizadeh"         
## [11] "Mohammad Mehdi Tehranchi"     "Saeed Borji"                 
## [13] "Mansour Asgari"               "Ahmadreza Zolfaghari Daryani"
## [15] "Ali Bakhouei Katirimi"        "Abdolhamid Minouchehr"       
## [17] "Isar Tabatabai-Qamsheh"       "Mohammad Reza Sedighi Saber"

We then use this list to manually collect the data. However, there are a couple of issues with this process:

  • Disambiguating scholars is difficult: It’s often hard to identify the correct scholar based solely on their name, as OpenAlex may return multiple candidates—even when we apply filters such as institution country = “Iran”.

  • Incomplete publication data: The total number of works listed for a given scholar sometimes appears to be incomplete, which could affect the accuracy of our data collection(example: Masoud Ali-Mohammadi). Also, finding these missing works can be hard as well. For instance, Majid Shahriari was a top Iranian nuclear scientist and physicist, yet finding his works can be challenging.

Adding OpenAlex Author Information to Nuclear Scientists Dataset

Load and Prepare Author Information

Because of this, the collection of Author ID will take a bit more time and requires careful inspection for accuracy.

  1. Ardeshir Hosseinpour

Author ID: a5048006655 Alternate names: A. Hosseinpour, Ardeshir Hosseinpour Institution: Shiraz University Past institutions: Shiraz University, Malek Ashtar University of Technology H-index: 3 I10-index: 1 Works count: 4 Citations count: 37

  1. Majid Shahriari

Author ID: A5112039075 Alternate names: Majid Shahriari, M H Shahriari Institution: Shahid Beheshti University Past institutions: Shahid Beheshti University, Amirkabir University of Technology H-index: 6 I10-index: 1 Works count: 13 Citations count: 80

  1. Fereydoon Abbasi

8/6/25: Maura spotted a new Author ID for abbasi.

New Author ID: a5037724110 Alternate names: Freydon AbbasiDavani, Fereydoun Abbasi Davani, F. Abbasi‐Davani, Freydoun Abbasi Davani, Fereydoun Abbasi‐Davani +2 more Institution: Shahid Beheshti University Past institutions: Shahid Beheshti University, University of Tehran, Institute for Research in Fundamental Sciences, Islamic Azad University Bandar Abbas, Atomic Energy Organization of Iran +1 more

Old Author ID: a5103401909* Alternate name: Fereydoon Abbasi Davani Institution: Shahid Beheshti University Past institution: Shahid Beheshti University H-index: 2 I10-index: Works count: 13 Citations count: 6

8/8/25: Maura suspects Majid’s works to be under other author IDs

#your_email <- "czj9zj@virginia.edu"  # Replace with your email for better API performance

# Fetch all works
#works_5111646381 <- get_author_works("A5111646381", email = your_email)  # 43 works
#works_5112039075 <- get_author_works("A5112039075", email = your_email)  # 13 works
#works_5028976637 <- get_author_works("A5028976637", email = your_email)  # 2 works
#works_5102177495 <- get_author_works("A5102177495", email = your_email)  # 1 work

#data_path <- "/standard/nsdpi_storage/people/czj9zj/extra_data"
# Save each author's works
#saveRDS(works_5111646381, file = file.path(data_path, "works_5111646381.rds"))
#saveRDS(works_5112039075, file = file.path(data_path, "works_5112039075.rds"))
#saveRDS(works_5028976637, file = file.path(data_path, "works_5028976637.rds"))
#saveRDS(works_5102177495, file = file.path(data_path, "works_5102177495.rds"))

Merge with Nuclear Scientists Dataset

# Assuming your nuclear_scientists dataset is already loaded as 'nuclear_scientists'

# Merge the datasets
#enhanced_nuclear_scientists <- nuclear_scientists %>%left_join(author_info, by = "Victim")

# Display the enhanced dataset structure
#cat("Original columns:", ncol(nuclear_scientists), "\n")
#cat("Enhanced columns:", ncol(enhanced_nuclear_scientists), "\n")
#cat("New columns added:", ncol(enhanced_nuclear_scientists) - ncol(nuclear_scientists), "\n")

# Show column names
#cat("\nNew columns added:\n")
#new_cols <- setdiff(names(enhanced_nuclear_scientists), names(nuclear_scientists))
#cat(paste("-", new_cols, collapse = "\n"))

#saveRDS(enhanced_nuclear_scientists, "/standard/nsdpi_storage/people/czj9zj/extra_data/nuclear_scientists.rds")

View Enhanced Dataset

# Display only the rows with author information

#data_path <- "extra_data"
#nuclear_scientists <- readRDS(file.path(data_path, "nuclear_scientists.rds"))


#kable(enhanced_nuclear_scientists, caption = "Nuclear Scientists with OpenAlex Information")

Missing:

  1. Darioush Rezaeinejad
  2. Mostafa Ahmadi Roshan
  3. Mohsen Fakhrizadeh
  4. Seyyed Amir Hossein Feghhi
  5. Akbar Motalebizadeh
  6. Mohammad Mehdi Tehranchi
  7. Saeed Borji
  8. Mansour Asgari
  9. Ahmadreza Zolfaghari Daryani
  10. Ali Bakhouei Katirimi
  11. Abdolhamid Minouchehr
  12. Isar Tabatabai-Qamsheh
  13. Mohammad Reza Sedighi Saber

Filtering for Pre-2025 Assassinations

We will continue on the list of To focus our analysis on a meaningful timeline, we filter the dataset to include only assassinations that occurred prior to 2025. This provides a clearer contrast between earlier targeted killings and those that occurred during the significant escalation in 2025.

Identified Iranian Nuclear Scientists Assassinated Prior to 2025
Date Victim Expertise Method Location Event Role institution_name author_id alternate_names current_institution past_institutions h_index i10_index works_count citations_count
2007-01-15 Ardeshir Hosseinpour Authority on electromagnetism By gas or possibly radiation poisoning Shiraz Died Professor Shiraz University a5048006655 A. Hosseinpour, Ardeshir Hosseinpour Shiraz University Shiraz University, Malek Ashtar University of Technology 3 1 4 37
2010-01-12 Masoud Ali-Mohammadi Quantum field theory and elementary particle physics By a remote-control bomb attached to a motorcycle Tehran Assassination Professor University of Tehran A5111436477 Masoud Alimohammadi Ilam University Ilam University, University of Tehran, Institute for Research in Fundamental Sciences, Institute for Cognitive Science Studies 4 4 9 90
2010-11-29 Majid Shahriari Specialized in neutron transport By a bomb attached to his car from a motorcycle Tehran Assassination Nuclear Engineer Shahid Beheshti University A5112039075 Majid Shahriari, M H Shahriari Shahid Beheshti University Shahid Beheshti University, Amirkabir University of Technology 6 1 13 80
2010-11-29 Fereydoon Abbasi Nuclear physicist and administrator By a bomb attached to his car from a motorcycle Tehran Survived Professor Shahid Beheshti University A5103401909 Fereydoon Abbasi Davani Shahid Beheshti University Shahid Beheshti University 2 NA 13 6
2011-07-23 Darioush Rezaeinejad Expert in neutron transport Shot by motorcycle gunmen Tehran Assassination Physicist Shahid Beheshti University NA NA NA NA NA NA NA NA
2012-01-11 Mostafa Ahmadi Roshan Researching polymeric membranes for gaseous diffusion By a bomb attached to his car from a motorcycle Tehran Assassination Professor Sharif University of Technology NA NA NA NA NA NA NA NA
2020-11-27 Mohsen Fakhrizadeh Nuclear physicist and head of Iran’s nuclear program Shot by a remote-control machine gun Damavand Assassination Professor Imam Hossein University NA NA NA NA NA NA NA NA

Now we only have 7 researchers of interest. From these list, we can consider to remove Ardeshir Hosseinpour (15 January 2007), as there are “conflicting reports on the cause of Hosseinpour’s death” but we decide to keep it for these purposes as he seemed to be a very prominent figure (read below). We do exclude Fereydoon Abbasi from this subset. Although he was targeted in the 29 November 2010 attack that killed Majid Shahriari, Abbasi survived that assassination attempt. However, he was later killed on 13 June 2025 during the American-Israeli strikes that targeted Iranian scientists, military officials, and civilians (AP, 2025). So, we will keep an eye on him.

These 4 researchers were prominent in Iran’s nuclear program:

  • 1.Ardeshir Hosseinpour, Age 45 “Hosseinpour was a nuclear physics scientist and a lecturer at Shiraz University and the Malek Ashtar University of Technology in Isfahan. An expert in the field of electromagnetism, he was one of the founders of the “Nuclear Technology Center of Isfahan,” the genesis of Natanz nuclear facility where he continued his research until his mysterious death on January 15, 2007.”

  • 2. Masoud Ali Mohammadi, 50 “Mohammadi was a nuclear scientist and a PhD graduate student of physics from the Sharif University in Tehran. He had over 50 published papers and articles in academic journals and was reportedly named one of the key scientists in the advancements related to particle accelerator machines and atom smashers.”

  • 3. Majid Shahriari, 45 was regarded as a key figure in the advancement of uranium enrichment technologies at Iran’s Atomic Energy Organization. He was assassinated on 29 November 2010 by a magnetic bomb attached to his car by assailants on a motorcycle, while driving on the Artesh highway in Tehran

Source for the 3 above: (VOA News).

  • 4. Fereydoon Abbasi-Davani, 66, a professor of nuclear physics at Shahid Beheshti University, was reportedly a member of the Islamic Revolutionary Guard Corps (IRGC) since the 1979 Islamic Revolution (NYTimes, 2011).

Therefore, for the initial analysis, we will focus on these 3 scientists:

Identified Iranian Nuclear Scientists
Date Victim Expertise Method Location Event Role institution_name author_id alternate_names current_institution past_institutions h_index i10_index works_count citations_count
2007-01-15 Ardeshir Hosseinpour Authority on electromagnetism By gas or possibly radiation poisoning Shiraz Died Professor Shiraz University a5048006655 A. Hosseinpour, Ardeshir Hosseinpour Shiraz University Shiraz University, Malek Ashtar University of Technology 3 1 4 37
2010-01-12 Masoud Ali-Mohammadi Quantum field theory and elementary particle physics By a remote-control bomb attached to a motorcycle Tehran Assassination Professor University of Tehran A5111436477 Masoud Alimohammadi Ilam University Ilam University, University of Tehran, Institute for Research in Fundamental Sciences, Institute for Cognitive Science Studies 4 4 9 90
2010-11-29 Majid Shahriari Specialized in neutron transport By a bomb attached to his car from a motorcycle Tehran Assassination Nuclear Engineer Shahid Beheshti University A5112039075 Majid Shahriari, M H Shahriari Shahid Beheshti University Shahid Beheshti University, Amirkabir University of Technology 6 1 13 80

Additional Sources:

Pull papers associated with those author IDS

As we noticed above, the papers associated with the authors might be incomplete. We start by focusing on the scholars Majid and Abbasi .OpenAlex says there are 13 works associated with each scholar.

# Step 1: Find common columns
#common_cols <- intersect(names(new_abbasi_works), names(abbasi_works))

# Step 2: Subset both data frames to only those columns
#abbasi_subset <- abbasi_works[, common_cols]
#new_abbasi_subset <- new_abbasi_works[, common_cols]

# Step 3: Combine
#new_abbasi_works <- rbind(new_abbasi_subset, abbasi_subset)
#majid_id <- "A5112039075"

#abbasi_id <- "A5103401909"
#masoud_id <- "A5111436477"
#new_abbasi_id <- "a5037724110"

#your_email <- "czj9zj@virginia.edu"  # Replace with your email for better API performance

# Fetch all works
#cat("Fetching works for author ID:", author_id, "\n")
#majid_works <- get_author_works(majid_id, email = your_email)
#abbasi_works <- get_author_works(abbasi_id, email = your_email)


#new_abbasi_works <- get_author_works(new_abbasi_id, email = your_email)

#masoud_works <- get_author_works(masoud_id, email = your_email)

#data_path <- "/standard/nsdpi_storage/people/czj9zj/extra_data"
# Save Majid Shahriari's works
#saveRDS(majid_works, file = file.path(data_path, "majid_works.rds"))

# Save Fereydoon Abbasi's works
#saveRDS(abbasi_works, file = file.path(data_path, "abbasi_works.rds"))

#saveRDS(new_abbasi_works, file = file.path(data_path, "new_abbasi_works.rds"))

# Save Masoud Ali-Mohammadi's works
#masoud_works <- masoud_works %>%mutate(id = sub(".*/", "", id))
#saveRDS(masoud_works, file = file.path(data_path, "masoud_works.rds"))

#new_abbasi_works <- new_abbasi_works[, 1:50]

# Remove 'https://openalex.org/' from the 'id' column
#new_abbasi_works$id <- gsub("https://openalex.org/", "", new_abbasi_works$id)

#abbasi_works <- abbasi_works[, 1:50]

# Remove 'https://openalex.org/' from the 'id' column

#abbasi_works$id <- gsub("https://openalex.org/", "", abbasi_works$id)

# View the cleaned dataset
#head(abbasi_works)
# Step 1: Initialize an empty vector to store cleaned topic IDs
#all_topic_ids <- c()

# Step 2: Loop through all items in new_abbasi_works$topics
#for (i in seq_along(new_abbasi_works$topics)) {
  # Get the current topic dataframe
#  topic_df <- new_abbasi_works$topics[[i]]
  
  # Check if it's a data frame and contains the 'id' column
#  if (is.data.frame(topic_df) && "id" %in% names(topic_df)) {
    # Remove 'https://openalex.org/' from IDs
#    cleaned_ids <- gsub("https://openalex.org/", "", topic_df$display_name)
    
    # Append to the list of all topic IDs
#    all_topic_ids <- c(all_topic_ids, cleaned_ids)
#  }
#}

# Step 3: Optional - remove duplicates
#topics_name <- unique(all_topic_ids)
#new_abbasi_keywords <- topic_summary[topic_summary$Topic %in% topics_name, ]
#saveRDS(new_abbasi_keywords, file = file.path(data_path, "new_abbasi_keywords.rds"))

When we run colnames(majid_works), there are 468 columns in total and for Abbasi there are 336. We are interested in topics, subtopics, etc. So we filter to only those rows of interest. That is, we select:

  1. Useful columns (id, title, display_name, etc.)

  2. columns that start with primary_topic, topics, keywords, or concepts

data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic"
majid_works <- readRDS(file.path(data_path, "MShahriari_works.rds"))
abbasi_works <- readRDS(file.path(data_path, "FAbbasiDavani_works.rds"))
shahid_beheshti_university_cleanedLONG <-readRDS("/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data/shahid_beheshti_university_cleanedLONG.rds")

                                                  masoud_works <- readRDS(file.path("/standard/nsdpi_storage/people/czj9zj/extra_data", "masoud_works.rds"))

Visualisations

majid_works <- majid_works %>%
  mutate(year = year(ymd(publication_date)))

abbasi_works <- abbasi_works %>%
  mutate(year = year(ymd(publication_date)))

majid_counts <- majid_works %>%
  filter(!is.na(year)) %>%
  count(year)

abbasi_counts <- abbasi_works %>%
  filter(!is.na(year)) %>%
  count(year)

# Create shared range of years
combined_years <- tibble(year = min(c(majid_counts$year, abbasi_counts$year)):
                                           max(c(majid_counts$year, abbasi_counts$year)))

# Fill and label: Majid Shahriari
majid_full <- combined_years %>%
  left_join(majid_counts, by = "year") %>%
  replace_na(list(n = 0)) %>%
  mutate(author = "Majid Shahriari")

# Fill and label: Fereydoon Abbasi
abbasi_full <- combined_years %>%
  left_join(abbasi_counts, by = "year") %>%
  replace_na(list(n = 0)) %>%
  mutate(author = "Fereydoon Abbasi")

# Combine
combined_data <- bind_rows(majid_full, abbasi_full)

# Softer color palette
custom_colors <- c(
  "Majid Shahriari" = "#91B3D7",  # muted blue
  "Fereydoon Abbasi" = "#F4A582"  # muted coral
)

# Create plot

publication_plot <- ggplot(combined_data, aes(x = year, y = n, fill = author)) +
  geom_col(position = "dodge", width = 0.7, alpha = 0.85) +
  scale_fill_manual(values = custom_colors) +
  scale_x_continuous(
    breaks = seq(min(combined_years$year), max(combined_years$year), by = 2)
  ) +
  labs(
    title = "Publications Count by Year",
    x = "Publication Year",
    y = "Number of Publications",
    fill = "Author"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank()
  )

publication_plot

# Save to 'plots' folder
#ggsave(
#  filename = "/standard/nsdpi_storage/people/czj9zj/plots/publications_by_year.png",
#  plot = publication_plot,
#  width = 12,
#  height = 8,
#  dpi = 300
#)

make a big of all subfields and topics. Call that “subjects” (or something else, doesn’t really matter)

# Define your path if needed
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

# Read the keyword files (adjust file names if needed)
#majid_keywords <- readRDS(file.path(data_path, "majid_keywords.rds"))
#abbasi_keywords <- readRDS(file.path(data_path, "abbasi_keywords.rds"))

majid_keywords <- readRDS(file.path(data_path, "majid_keywords_v1.rds"))
new_abbasi_keywords <- readRDS(file.path(data_path, "abbasi_keywords_v1.rds"))
# Step 1: Extract unique topics from the 'Topic' column
#abbasi_topics <- unique(new_abbasi_keywords$Topic)
#majid_topics <- unique(new_majid_keywords$Topic)

# Step 2: Identify topic categories
#shared_topics <- intersect(abbasi_topics, majid_topics)
#abbasi_only_topics <- setdiff(abbasi_topics, shared_topics)
#majid_only_topics <- setdiff(majid_topics, shared_topics)

# Step 3: Padding helper
#pad_column <- function(x, n) {
#  length(x) <- n
#  x[is.na(x)] <- "---"
#  return(x)
#}

# Step 4: Create the comparison table
#max_len <- max(length(majid_only_topics), length(shared_topics), length(abbasi_only_topics))

#topic_comparison <- tibble(
#  "Majid Shahriari Only" = pad_column(majid_only_topics, max_len),
#  "Shared Topics" = pad_column(shared_topics, max_len),
#  "Fereydoon Abbasi Only" = pad_column(abbasi_only_topics, max_len)
#)

# Step 5: Generate the styled table
#table_html <- kbl(
#  topic_comparison,
#  caption = "Topic Comparison: Majid Shahriari vs. Fereydoon Abbasi"
#) %>%
#  kable_styling(full_width = FALSE)

# Step 6: Save as HTML
#dir.create("plots", showWarnings = FALSE)
#save_kable(table_html, file = "plots/topic_comparison.html")

# Step 7: Save as PNG
#webshot("plots/topic_comparison.html", "plots/topic_comparison.png", zoom = 2)
Keywords Associated with Majid Shahriari
Domain Field Subfield Topic Total Works Total Citations % of Works % of Citations
53 Physical Sciences Physics and Astronomy Radiation Nuclear Physics and Applications 207020 1274223 0.09 0.05
83 Health Sciences Medicine Radiology, Nuclear Medicine and Imaging Medical Imaging Techniques and Applications 177439 1662238 0.08 0.06
157 Health Sciences Medicine Ophthalmology Glaucoma and retinal disorders 140274 2078725 0.06 0.08
211 Physical Sciences Engineering Safety, Risk, Reliability and Quality Nuclear and radioactivity studies 130674 285858 0.06 0.01
228 Physical Sciences Environmental Science Health, Toxicology and Mutagenesis Air Quality and Health Impacts 128119 3243177 0.06 0.12
229 Health Sciences Medicine Radiology, Nuclear Medicine and Imaging Advanced MRI Techniques and Applications 127691 2302623 0.06 0.09
244 Physical Sciences Engineering Mechanics of Materials Metal and Thin Film Mechanics 125631 1703914 0.06 0.06
292 Health Sciences Medicine Pulmonary and Respiratory Medicine Cerebrovascular and Carotid Artery Diseases 116893 1283161 0.05 0.05
306 Physical Sciences Earth and Planetary Sciences Atmospheric Science Atmospheric chemistry and aerosols 115277 3807482 0.05 0.14
311 Physical Sciences Engineering Civil and Structural Engineering Structural Health Monitoring Techniques 114437 1283516 0.05 0.05
342 Physical Sciences Physics and Astronomy Radiation Advanced Radiotherapy Techniques 111820 1196375 0.05 0.05
428 Physical Sciences Engineering Aerospace Engineering Nuclear reactor physics and engineering 102519 483794 0.05 0.02
475 Physical Sciences Engineering Automotive Engineering Vehicle emissions and performance 99203 641034 0.04 0.02
489 Physical Sciences Materials Science Materials Chemistry Graphite, nuclear technology, radiation studies 98176 402275 0.04 0.02
568 Health Sciences Medicine Radiology, Nuclear Medicine and Imaging Radiation Dose and Imaging 93510 782436 0.04 0.03
Keywords Associated with Fereydoon Abbasi
Domain Field Subfield Topic Total Works Total Citations % of Works % of Citations
17 Physical Sciences Physics and Astronomy Astronomy and Astrophysics Astro and Planetary Science 272845 2696720 0.12 0.10
53 Physical Sciences Physics and Astronomy Radiation Nuclear Physics and Applications 207020 1274223 0.09 0.05
56 Physical Sciences Engineering Civil and Structural Engineering Engineering Applied Research 206697 170096 0.09 0.01
68 Physical Sciences Engineering Electrical and Electronic Engineering Photonic and Optical Devices 192687 1971794 0.09 0.07
82 Health Sciences Medicine Radiology, Nuclear Medicine and Imaging Monoclonal and Polyclonal Antibodies Research 177915 4304991 0.08 0.16
83 Health Sciences Medicine Radiology, Nuclear Medicine and Imaging Medical Imaging Techniques and Applications 177439 1662238 0.08 0.06
103 Physical Sciences Engineering Electrical and Electronic Engineering Semiconductor materials and devices 163693 2223407 0.07 0.08
119 Physical Sciences Physics and Astronomy Nuclear and High Energy Physics Nuclear physics research studies 157237 2553546 0.07 0.10
167 Physical Sciences Materials Science Surfaces, Coatings and Films Electron and X-Ray Spectroscopy Techniques 137173 1773250 0.06 0.07
186 Physical Sciences Engineering Control and Systems Engineering Fault Detection and Control Systems 134886 1422574 0.06 0.05
211 Physical Sciences Engineering Safety, Risk, Reliability and Quality Nuclear and radioactivity studies 130674 285858 0.06 0.01
219 Physical Sciences Chemistry Inorganic Chemistry Radioactive element chemistry and processing 129479 1660269 0.06 0.06
238 Life Sciences Biochemistry, Genetics and Molecular Biology Molecular Biology Advanced biosensing and bioanalysis techniques 126304 3813628 0.06 0.14
244 Physical Sciences Engineering Mechanics of Materials Metal and Thin Film Mechanics 125631 1703914 0.06 0.06
325 Physical Sciences Physics and Astronomy Nuclear and High Energy Physics Magnetic confinement fusion research 112874 1213545 0.05 0.05

(a) Pull all papers associated with Shahid Beheshti University (uni 393 in Luz’s numbering schmes) and which have those subfields and topics

Now, we read all the works associated with Shahid Beheshti University, which in our case is works_393. This has 35,305 works in total.

length(unique(works_393$id)) == nrow(works_393)
## [1] TRUE
length(unique(works_403$id)) == nrow(works_403)
## [1] FALSE
duplicated_ids <- works_403$id[duplicated(works_403$id)]

# Show which rows are duplicated
works_403 %>% 
  filter(id %in% duplicated_ids)
## # A tibble: 2 × 51
##   id              title display_name authorships abstract doi   publication_date
##   <chr>           <chr> <chr>        <list>      <chr>    <chr> <date>          
## 1 https://openal… Defe… Defect dete… <tibble>    This st… http… 2024-12-23      
## 2 https://openal… Defe… Defect dete… <tibble>    This st… http… 2024-12-23      
## # ℹ 44 more variables: publication_year <int>, fwci <dbl>,
## #   cited_by_count <int>, counts_by_year <list>, cited_by_api_url <chr>,
## #   ids <list>, type <chr>, is_oa <lgl>, is_oa_anywhere <lgl>, oa_status <chr>,
## #   oa_url <chr>, any_repository_has_fulltext <lgl>, source_display_name <chr>,
## #   source_id <chr>, issn_l <chr>, host_organization <chr>,
## #   host_organization_name <chr>, landing_page_url <chr>, pdf_url <chr>,
## #   license <chr>, version <chr>, referenced_works <list>, …
works_403 <- works_403 %>%
  filter(!duplicated(id) & !duplicated(id, fromLast = TRUE))
  • We then check the id column to make sure there are no duplicate works. We then see that the works are indeed unique.

Does Shahid Beheshti have more distinct topics than distinct subfields among its publications?

  • We do unique(works_393$top_subfield) and noticed there we 234 unique subfields in Shahid Beheshti University and 3129 unique topics.
  domain_colors <- c(
    "Physical Sciences" = "#1e3a8a",    # Deep blue
    "Social Sciences"   = "#7c2d12",    # Deep brown/orange
    "Life Sciences"     = "#14532d",    # Deep green  
    "Health Sciences"   = "#7f1d1d"     # Deep red
  )
  

# Create the plot and assign it to domain_pct_plot
domain_pct_plot <- works_393 %>%
  filter(!is.na(top_domain)) %>%
  count(top_domain, sort = TRUE) %>%
  mutate(pct = n / sum(n)) %>%
  ggplot(aes(x = reorder(top_domain, pct), y = pct, fill = top_domain)) +
  geom_col() +
  geom_text(aes(label = percent(pct, accuracy = 0.1)),
            hjust = 1.3, color = "white", size = 4.2) +
  coord_flip() +
  scale_y_continuous(labels = percent_format(accuracy = 0.1)) +
  scale_fill_manual(values = domain_colors) +
  labs(
    title = "Research Domains at Shahid Beheshti University",
    x = "Domain",
    y = "% of Total Works",
    fill = "Domain"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 13, face = "bold"),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
  )

# Now save it
#ggsave("plots/domain_percent_bar.png", plot = domain_pct_plot, width = 10, height = 6, dpi = 320)
topics_data <- works_393 %>%
  filter(!is.na(top_domain), !is.na(top_field)) %>%
  group_by(domain.display_name = top_domain, field.display_name = top_field) %>%
  summarise(works_count = n(), .groups = "drop")

treemap_plot <- create_complete_names_treemap(topics_data)
## Warning in geom_treemap_text(colour = "black", place = "centre", grow = TRUE, :
## Ignoring unknown parameters: `max.size` and `check_overlap`
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#ggsave("plots/domain_treemap.png", treemap_plot, width = 12, height = 8, dpi = 320)
# Count unique fields
n_fields <- works_393 %>%
  filter(!is.na(top_field)) %>%
  summarise(n = n_distinct(top_field)) %>%
  pull(n)

# Get field-level counts and domains
field_data <- works_393 %>%
  filter(!is.na(top_field), !is.na(top_domain)) %>%
  count(top_field, top_domain, sort = TRUE)

# Plot all fields
ggplot(field_data, aes(x = reorder(top_field, n), y = n, fill = top_domain)) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(values = domain_colors) +
  labs(
    title = "Distribution of Research Fields at Shahid Beheshti University",
    subtitle = paste("All", n_fields, "fields colored by domain"),
    x = "Field",
    y = "Number of Works",
    fill = "Domain"
  ) +
  theme_minimal() +
  theme(
    axis.text = element_text(size = 11),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5)
  )

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ readr::col_factor()      masks scales::col_factor()
## ✖ scales::discard()        masks purrr::discard()
## ✖ dplyr::filter()          masks stats::filter()
## ✖ jsonlite::flatten()      masks purrr::flatten()
## ✖ kableExtra::group_rows() masks dplyr::group_rows()
## ✖ dplyr::lag()             masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(forcats)

# Identify top 5 fields by publication count
top_5_fields <- works_393 %>%
  filter(!is.na(top_field), !is.na(publication_year)) %>%
  count(publication_year, top_field) %>%
  group_by(top_field) %>%
  mutate(total = sum(n)) %>%
  ungroup() %>%
  group_by(top_field) %>%
  slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  slice_max(order_by = total, n = 5) %>%
  pull(top_field)

# Filter and plot
plot <- works_393 %>%
  filter(top_field %in% top_5_fields, !is.na(publication_year)) %>%
  count(publication_year, top_field) %>%
  ggplot(aes(x = publication_year, y = n, fill = top_field)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ top_field, scales = "free_y", ncol = 2) +
  scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
  theme_minimal() +
  labs(
    title = "Evolution of Top 5 Research Fields Over Time",
    subtitle = "Shahid Beheshti University – by publication count",
    x = "Publication Year",
    y = "Number of Works"
  )

# Save the plot
#ggsave("plots/top_fields_evolution.png", plot = plot, width = 10, height = 6, dpi = 300)
library(tidyverse)
library(forcats)

# Identify top 5 subfields by publication count
top_5_subfields <- works_393 %>%
  filter(!is.na(top_subfield), !is.na(publication_year)) %>%
  count(publication_year, top_subfield) %>%
  group_by(top_subfield) %>%
  mutate(total = sum(n)) %>%
  ungroup() %>%
  group_by(top_subfield) %>%
  slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  slice_max(order_by = total, n = 5) %>%
  pull(top_subfield)

# Filter and plot
plot <- works_393 %>%
  filter(top_subfield %in% top_5_subfields, !is.na(publication_year)) %>%
  count(publication_year, top_subfield) %>%
  ggplot(aes(x = publication_year, y = n, fill = top_subfield)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ top_subfield, scales = "free_y", ncol = 2) +
  scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
  theme_minimal() +
  labs(
    title = "Evolution of Top 5 Research Subfields Over Time",
    subtitle = "Shahid Beheshti University – by publication count",
    x = "Publication Year",
    y = "Number of Works"
  )

plot
## Warning: Removed 35 rows containing missing values or values outside the scale range
## (`geom_col()`).

# Save the plot
#ggsave("plots/top_subfields_evolution.png", plot = plot, width = 10, height = 6, dpi = 300)
library(tidyverse)
library(broom)  # for tidy regression results

# Step 1: Filter and prepare data
field_growth_slopes <- works_393 %>%
  filter(!is.na(top_field), !is.na(publication_year)) %>%
  count(publication_year, top_field) %>%
  group_by(top_field) %>%
  filter(n() >= 5) %>%  # Ensure enough data points for a slope
  nest() %>%
  mutate(
    model = map(data, ~lm(n ~ publication_year, data = .x)),
    slope = map_dbl(model, ~coef(.x)[["publication_year"]])
  ) %>%
  ungroup() %>%
  arrange(desc(slope)) %>%
  slice_head(n = 5)

# Extract top 5 fastest-growing fields
top_5_growing_fields <- field_growth_slopes$top_field

# Filter and plot the fastest-growing fields
plot <- works_393 %>%
  filter(top_field %in% top_5_growing_fields, !is.na(publication_year)) %>%
  count(publication_year, top_field) %>%
  ggplot(aes(x = publication_year, y = n, fill = top_field)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ top_field, scales = "free_y", ncol = 2) +
  scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
  theme_minimal() +
  labs(
    title = "Fastest-Growing Research Fields Over Time",
    subtitle = "Top 5 fields at Shahid Beheshti University based on publication growth rate",
    x = "Publication Year",
    y = "Number of Works"
  )

plot
## Warning: Removed 54 rows containing missing values or values outside the scale range
## (`geom_col()`).

#ggsave("plots/sbu_fastest_growing_fields.png", plot = plot, width = 10, height = 6, dpi = 300)
library(tidyverse)
library(scales)
library(lubridate)

# Ensure publication dates are parsed
works_393 <- works_393 %>%
  mutate(pub_month = floor_date(publication_date, "month"))

# Filter after year 2000 only
monthly_counts <- works_393 %>%
  filter(top_subfield %in% top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
  count(pub_month, top_subfield)

# Normalize to each subfield's first available month (after 2000)
monthly_growth <- monthly_counts %>%
  group_by(top_subfield) %>%
  arrange(pub_month) %>%
  mutate(
    base_count = first(n),
    growth_index = (n / base_count) * 100
  ) %>%
  ungroup()



dark_colors <- c(
  "Biomedical Engineering" = "#102542",   # Dark navy
  "Electrical and Electronic Engineering" = "#A84400",  # Deep indigo
  "Materials Chemistry" = "darkgreen",      # Dark red (true crimson)
  "Molecular Biology" = "darkred",        # Deep cyan-teal
  "Organic Chemistry" = "#4B0082"         # Dark violet/indigo
)


ggplot(monthly_growth, aes(x = pub_month, y = growth_index, color = top_subfield)) +
  geom_line(linewidth = 1, alpha = 0.8) +
  scale_color_manual(values = dark_colors) +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y", limits = c(as.Date("2000-01-01"), NA)) +
  labs(
    title = "Monthly Publication Growth in Top Subfields at Shahid Beheshti University",
    subtitle = "Indexed to each subfield's first recorded month after 2000 (Index = 100%)",
    x = "Publication Date",
    y = "Relative Publication Index (%)",
    color = "Subfield"
  ) +
  theme_minimal(base_size = 12)

# Save the plot
ggsave("plots/monthly_subfield_growth.png", width = 10, height = 6, dpi = 300)

#ggsave("plots/subfield_growth_slopechart.png", width = 10, height = 6, dpi = 300)
monthly_growth_sbu <- monthly_growth %>%
  mutate(university = "Shahid Beheshti University")

works_403 <- works_403 %>%
  mutate(pub_month = floor_date(publication_date, "month"))

# Filter after year 2000 only
monthly_counts_ut <- works_403 %>%
  filter(top_subfield %in% top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
  count(pub_month, top_subfield)

# Normalize to each subfield's first available month (after 2000)
monthly_growth_ut <- monthly_counts %>%
  group_by(top_subfield) %>%
  arrange(pub_month) %>%
  mutate(
    base_count = first(n),
    growth_index = (n / base_count) * 100
  ) %>%
  ungroup()


monthly_growth_ut <- monthly_growth_ut %>%
  mutate(university = "University of Tehran")
shared_colors <- c(
  "Biomedical Engineering" = "#102542",   # Dark navy
  "Electrical and Electronic Engineering" = "#A84400",  # Burnt orange
  "Molecular Biology" = "#8B0000",        # Dark red
  "Materials Chemistry" = "#4B0082"       # Dark violet
)

sbu_only_colors <- c(
  "Organic Chemistry" = "#3E3E6B"         # Deep steel blue
)

ut_only_colors <- c(
  "Mechanical Engineering" = "darkgreen"    # Dark teal
)

dark_colors_combined <- c(shared_colors, sbu_only_colors, ut_only_colors)

Next, we filter works_393 to only include works that fall within the subfield areas associated with Majid Shahriari’s research, as identified in majid_keywords. Specifically, we exclude the domain level from this filtering step. This is because OpenAlex only includes four broad domains, making domain-level classification too coarse to meaningfully reflect a researcher’s specific area of expertise and we instead aim to capture a more focused and relevant set of works aligned with Majid Shahriari’s research contributions.

Note on OpenAlex Topic Hierarchy: Works in OpenAlex are tagged with Topics using an automated model that evaluates features such as the title, abstract, journal, and citations of the work.
- There are approximately 4,500 Topics in OpenAlex.
- Each Topic is nested within a Subfield, which is nested within a Field, which in turn is nested within a top-level Domain.
- A work is assigned a primary topic (the one with the highest score), and inherits the corresponding subfield, field, and domain.

Source: https://help.openalex.org/hc/en-us/articles/24736129405719-Topics

Subset of Works from Shahid Beheshti University Matching Majid Shahriari’s Topic Areas
id title publication_date topic score type institution
W3193094654 <i>Planck</i> 2018 results 2021-08-01 Radiation Therapy and Dosimetry 0.9249 article Shahid Beheshti University
W3200934665 FCC-hh: The Hadron Collider 2019-07-01 Particle Accelerators and Free-Electron Lasers 0.9984 article Shahid Beheshti University
W2903991298 Recent advances in modeling and simulation of nanofluid flows—Part II: Applications 2018-12-05 Fluid Dynamics and Vibration Analysis 0.9947 article Shahid Beheshti University
W4377695098 Diffusion models in medical imaging: A comprehensive survey 2023-05-23 AI in cancer detection 0.9944 review Shahid Beheshti University
W4377695098 Diffusion models in medical imaging: A comprehensive survey 2023-05-23 MRI in cancer diagnosis 0.9940 review Shahid Beheshti University
W2962966855 HE-LHC: The High-Energy Large Hadron Collider 2019-07-01 Particle Accelerators and Free-Electron Lasers 0.9982 article Shahid Beheshti University
W4387778010 Advances in medical image analysis with vision Transformers: A comprehensive review 2023-10-19 AI in cancer detection 0.9979 review Shahid Beheshti University
W4221140247 A next-generation liquid xenon observatory for dark matter and neutrino physics 2022-12-21 Atomic and Subatomic Physics Research 0.9955 article Shahid Beheshti University
W4387430177 DAE-Former: Dual Attention-Guided Efficient Transformer for Medical Image Segmentation 2023-01-01 AI in cancer detection 0.9956 book-chapter Shahid Beheshti University
W3134197091 Monte Carlo-based estimation of patient absorbed dose in 99mTc-DMSA, -MAG3, and -DTPA SPECT imaging using the University of Florida (UF) phantoms 2025-03-06 Radiopharmaceutical Chemistry and Applications 1.0000 article Shahid Beheshti University
W3134197091 Monte Carlo-based estimation of patient absorbed dose in 99mTc-DMSA, -MAG3, and -DTPA SPECT imaging using the University of Florida (UF) phantoms 2025-03-06 Medical Imaging Techniques and Applications 1.0000 article Shahid Beheshti University
W4406828888 Occurrence and transport of per- and polyfluoroalkyl substances (PFAS) in the leachate of a municipal solid waste landfill in Tehran, Iran (a Middle-eastern megacity) 2025-01-01 Atmospheric chemistry and aerosols 0.9603 article Shahid Beheshti University
W2048628691 Deep Anterior Lamellar Keratoplasty in Patients with Keratoconus: Big-Bubble Technique 2010-01-14 Glaucoma and retinal disorders 0.9988 article Shahid Beheshti University
W2039681904 Flow regime identification and void fraction prediction in two-phase flows based on gamma ray attenuation 2014-11-20 Nuclear reactor physics and engineering 0.9910 article Shahid Beheshti University
W2039681904 Flow regime identification and void fraction prediction in two-phase flows based on gamma ray attenuation 2014-11-20 Nuclear Physics and Applications 0.9910 article Shahid Beheshti University

Total works published in Majid subfields = 2155 (notice that all matching topics fall under already-included subfields. So if we filtered by topics only we would get 427 works.) Total works published in Majid topics/subfields/fields = 15520

# Using new_abbasi_keywords instead of just abbasi_keywords
# Subset works in Fereydoon Abbasi’s areas
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

abbasi_keywords_v1 <- readRDS(file.path(data_path, "abbasi_keywords_v1.rds"))


sbu_abbasi_works <- shahid_beheshti_university_cleanedLONG %>%
  filter(
    topic %in% abbasi_keywords_v1$Topic
  )

# Works NOT in Abbasi’s areas
sbu_non_abbasi_works <- shahid_beheshti_university_cleanedLONG %>%
  filter(
    !(topic %in% abbasi_keywords_v1$Topic)
  )


sbu_abbasi_works %>%
  head(15) %>%
  kbl(caption = "Subset of Works from Shahid Beheshti University Matching Fereydoon Abbasi’s Research Areas") %>%
  kable_styling(latex_options = c("hold_position", "scale_down"), font_size = 10)
Subset of Works from Shahid Beheshti University Matching Fereydoon Abbasi’s Research Areas
id title publication_date topic score type institution
W2284013896 Smart micro/nanoparticles in stimulus-responsive drug/gene delivery systems 2016-01-01 Nanoparticle-Based Drug Delivery 0.9999 review Shahid Beheshti University
W3193094654 <i>Planck</i> 2018 results 2021-08-01 Radiation Therapy and Dosimetry 0.9249 article Shahid Beheshti University
W2908825166 FCC-ee: The Lepton Collider 2019-06-01 Particle Detector Development and Performance 0.9895 article Shahid Beheshti University
W2953154703 FCC Physics Opportunities 2019-06-01 Particle Detector Development and Performance 0.9982 article Shahid Beheshti University
W3200934665 FCC-hh: The Hadron Collider 2019-07-01 Particle Accelerators and Free-Electron Lasers 0.9984 article Shahid Beheshti University
W3200934665 FCC-hh: The Hadron Collider 2019-07-01 Superconducting Materials and Applications 0.9979 article Shahid Beheshti University
W3119276753 Guanine-Based DNA Biosensor Amplified with Pt/SWCNTs Nanocomposite as Analytical Tool for Nanomolar Determination of Daunorubicin as an Anticancer Drug: A Docking/Experimental Investigation 2021-01-08 Advanced biosensing and bioanalysis techniques 1.0000 article Shahid Beheshti University
W2108404396 Evidence for a Kaon-Bound State<mml:math xmlns:mml=“http://www.w3.org/1998/Math/MathML” display=“inline”><mml:msup><mml:mi>K</mml:mi><mml:mo>−</mml:mo></mml:msup><mml:mi>p</mml:mi><mml:mi>p</mml:mi></mml:math>Produced in<mml:math xmlns:mml=“http://www.w3.org/1998/Math/MathML” display=“inline”><mml:msup><mml:mi>K</mml:mi><mml:mo>−</mml:mo></mml:msup></mml:math>Absorption Reactions at Rest 2005-06-03 Nuclear physics research studies 0.9982 article Shahid Beheshti University
W3131284135 A novel detection method for organophosphorus insecticide fenamiphos: Molecularly imprinted electrochemical sensor based on core-shell nanocomposite 2021-02-25 Advanced biosensing and bioanalysis techniques 0.9938 article Shahid Beheshti University
W2096158223 Principal components analysis by the galaxy-based search algorithm: a novel metaheuristic for continuous optimisation 2011-01-01 Spectroscopy and Chemometric Analyses 0.9886 article Shahid Beheshti University
W2027703730 Coplanar Full Adder in Quantum-Dot Cellular Automata via Clock-Zone-Based Crossover 2015-03-05 Quantum and electron transport phenomena 0.9931 article Shahid Beheshti University
W3165315707 Liposomal Nanomedicine: Applications for Drug Delivery in Cancer Therapy 2021-05-25 Nanoparticle-Based Drug Delivery 1.0000 review Shahid Beheshti University
W2899821123 Ultrasonic nano-emulsification – A review 2018-11-09 Ultrasound and Cavitation Phenomena 0.9984 review Shahid Beheshti University
W2899821123 Ultrasonic nano-emulsification – A review 2018-11-09 Electrohydrodynamics and Fluid Dynamics 0.9955 review Shahid Beheshti University
W2962966855 HE-LHC: The High-Energy Large Hadron Collider 2019-07-01 Particle Accelerators and Free-Electron Lasers 0.9982 article Shahid Beheshti University

After creating the new_abbasi_keywords dataset, here is what we have in comparison to the abbasi_keywords:

In general, there are 4516 topics in OpenAlex and 252 unique subfields.

We have 35305 unique works in works_393. This has 3129 unique topics and 234 unique subfields.

  • new_abbasi_keywords has 99 unique topics and 36 unique subfields.
    • In subfield only: 15783
    • In topic only: 2193
  • abbasi_keywods has 7 unique topics 4 unique subfields.
    • In subfield only: 4347
    • In topic only: 147
library(dplyr)

data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

masoud_keywords_v1 <- readRDS(file.path(data_path, "masoud_keywords_v1.rds"))

base_path <- "/standard/nsdpi_storage/people/czj9zj/temp_progress_new"

# university_of_tehran_long <- readRDS(file.path(base_path, "works_batch_403.rds")) %>%
#   filter(type != "dataset", type != "erratum", type != "retraction") %>%
#   select(id, title, publication_date, topics, type) %>%
#   # keep rows where 'topics' exists and has at least one row
#   filter(purrr::map_lgl(topics, ~ !is.null(.x) && nrow(.x) > 0)) %>%
#   mutate(
#     topics = map(topics, ~ filter(.x, type == "topic")),
#     topics = map(topics, ~ select(.x, display_name, score))
#   ) %>%
#   unnest(cols = topics) %>%
#   rename(topic = display_name) %>%
#   mutate(
#     across(where(is.character), ~ gsub("https://openalex.org/", "", .x, fixed = TRUE)),
#     institution = "University of Tehran"
#   )


#saveRDS(university_of_tehran_long, file.path(base_path, "university_of_tehran_long.rds"))

university_of_tehran_long <- readRDS(file.path(base_path, "university_of_tehran_long.rds"))

# Works in Masoud's subfields
# Works in Masoud’s areas
sbu_masoud_works <- university_of_tehran_long %>%
  filter(
    topic %in% masoud_keywords_v1$Topic
  )

# Works NOT in Masoud’s areas
sbu_non_masoud_works <- university_of_tehran_long %>%
  filter(
    !(topic %in% masoud_keywords_v1$Topic)
  )

# Save both datasets as RDS
saveRDS(sbu_masoud_works, file = file.path("/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data", "sbu_masoud_works.rds"))
saveRDS(sbu_non_masoud_works, file = file.path("/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data", "sbu_non_masoud_works.rds"))

0.1 Major Foreign Events: Assassinations, Sabotage and Sanctions

Key Events Impacting Iran’s Scientific Output: Assassinations, Sabotage, and Pandemic
event category date year
Stuxnet Cyberattack Sabotage 2010-06-17 2010
Natanz Explosion Sabotage 2020-07-02 2020
Natanz Blackout Sabotage 2021-04-11 2021
Assassination of Massoud Ali-Mohammadi Assassination 2010-01-12 2010
Assassination of Mohsen Fakhrizadeh Assassination 2020-11-27 2020
First COVID-19 Case in Iran Pandemic 2020-02-19 2020
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"

majid_timeline <- prepare_publication_timelines_with_complement(
  sbu_majid_works,
  sbu_non_majid_works,
  author_label = "Majid"
)

plot_author_subfields_with_sabotage(
  timeline_data = majid_timeline,
  author_label = "Majid",
  assassination_date = as.Date("2010-11-29"),
  within_color = "#1f78b4",
  data_path
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 190 rows containing non-finite outside the scale range
## (`stat_smooth()`).

#ggsave("plots/majid_topics97threshold_plot.png", width = 10, height = 6, dpi = 300)
# Prepare Abbasi's timeline data
abbasi_timeline <- prepare_publication_timelines_with_complement(
  sbu_abbasi_works,
  sbu_non_abbasi_works,
  author_label = "Abbasi"
)

# Generate the plot for Abbasi
plot_author_subfields_with_sabotage(
  timeline_data = abbasi_timeline,
  author_label = "Abbasi",
  assassination_date = as.Date("2010-11-29"),  # Abbasi was not killed, but include a relevant date if analyzing a key event
  within_color = "coral",  # Use a distinct color for Abbasi
  data_path
)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 206 rows containing non-finite outside the scale range
## (`stat_smooth()`).

# Save the plot
ggsave("plots/abbasi_topics97threshold_plot.png", width = 10, height = 6, dpi = 300)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 206 rows containing non-finite outside the scale range
## (`stat_smooth()`).
masoud_timeline <- prepare_publication_timelines_with_complement(
  sbu_masoud_works,
  sbu_non_masoud_works,
  author_label = "Masoud"
)

works_monthly_long <- masoud_timeline$monthly_long

works_monthly_long <- works_monthly_long %>%
  mutate(count_type = recode(count_type,
                             "Masoud Monthly Count" = "Within Masoud's Topics",
                             "Complement Monthly Count" = "Outside Masoud's Topics"))

assassination_date <- as.Date("2010-01-12")

# Load events
major_events_filtered <- readRDS(file.path(data_path, "major_events_filtered.rds"))

# Remove Massoud Ali-Mohammadi's assassination
major_events_filtered <- major_events_filtered %>%
  filter(event != "Assassination of Massoud Ali-Mohammadi")

# Add Majid Shahriari's assassination (if needed again)
majid_event <- tibble::tibble(
  event = "Assassination of Majid Shahriari",
  category = "Assassination",
  date = as.Date("2010-11-29"),
  year = 2010
)

major_events_filtered <- bind_rows(major_events_filtered, majid_event)



# Define sabotage event styles
sabotage_events <- major_events_filtered %>%
  filter(category == "Sabotage")
# Packages
library(tidyverse)
library(ggnewscale)

# --- Styles for sabotage events ---------------------------------------------
sabotage_styles <- tibble::tibble(
  event    = c("Stuxnet Cyberattack", "Natanz Explosion", "Natanz Blackout"),
  color    = c("darkorange", "#008", "darkmagenta"),
  linetype = c("dotted", "dotdash", "dashed")
)

# Merge styles onto your sabotage_events data
# (assumes you already have sabotage_events with a 'date' column)
sabotage_events <- left_join(sabotage_events, sabotage_styles, by = "event")

# Factor for consistent legend order
sabotage_events$event <- factor(sabotage_events$event, levels = sabotage_styles$event)

# --- Plot --------------------------------------------------------------------
# assumes:
#   works_monthly_long with columns: year_month (Date), count (numeric), count_type (factor/chr)
#   assassination_date is a Date

p <- ggplot(works_monthly_long, aes(x = year_month, y = count)) +
  # Publication trends (two series) -------------------------------------------
  geom_smooth(aes(color = count_type),
              method = "loess", span = 0.25, se = FALSE, size = 0.9) +
  scale_color_manual(
    values = c(
      "Within Masoud's Topics"  = "darkolivegreen",
      "Outside Masoud's Topics" = "#333333"
    ),
    name = "Publication Trend"
  ) +

  # Assassination (solid red) -------------------------------------------------
  geom_vline(xintercept = assassination_date, color = "#e31a1c",
             linetype = "solid", linewidth = 1) +

  # Reset color scale for sabotage events ------------------------------------
  new_scale_color() +

  # Sabotage vertical lines (color + linetype mapped to event)
  geom_vline(
    data = sabotage_events,
    aes(xintercept = date, color = event, linetype = event),
    linewidth = 1
  ) +
  scale_color_manual(
    values = setNames(sabotage_styles$color, sabotage_styles$event),
    name = "Sabotage Event"
  ) +
  scale_linetype_manual(
    values = setNames(sabotage_styles$linetype, sabotage_styles$event),
    name  = "Sabotage Event"
  ) +

  # Facet & labels ------------------------------------------------------------
  facet_wrap(~ count_type, scales = "free_y", ncol = 1) +
  labs(
    title = "Publication Output Within and Outside Massoud’s Topics at University of Tehran",
    subtitle = "Solid red = Assassination of Massoud Ali-Mohammadi - January 12, 2010\nColored dashed lines = Sabotage Events",
    x = "Publication Date",
    y = "Monthly Publication Count"
  ) +
  scale_x_date(
    date_breaks = "2 years",
    date_labels = "%Y",
    limits = c(as.Date("2000-01-01"), NA)
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x   = element_text(angle = 60, hjust = 1),
    legend.box    = "vertical",
    legend.position = "bottom"
  )

# Print to screen
print(p)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 423 rows containing non-finite outside the scale range
## (`new_stat_smooth()`).

# Save the plot
ggsave("plots/masoud_subfield_topics_plot.png", p, width = 10, height = 6, dpi = 300)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 423 rows containing non-finite outside the scale range
## (`new_stat_smooth()`).
path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data"
works_tagged <- readRDS(file.path(path, "works_tagged.rds"))


library(tidyverse)
library(lubridate)
library(scales)

# 1. Function to prepare timeline data
prepare_combined_author_timelines <- function(works_tagged) {
  works_tagged %>%
    mutate(subfield_category = factor(
      subfield_category,
      levels = c("Majid Only", "Abbasi Only", "Both", "Neither")
    )) %>%
    group_by(monthly, subfield_category) %>%
    summarise(count = n(), .groups = "drop") %>%
    rename(
      year_month = monthly,
      count_type = subfield_category
    ) -> monthly_long

  list(monthly_long = monthly_long)
}

# 2. Prepare data
timeline_data <- prepare_combined_author_timelines(works_tagged)

# 3. Load events
major_events_filtered <- readRDS(file.path(data_path, "major_events_filtered.rds")) %>%
  filter(event != "Assassination of Massoud Ali-Mohammadi")

# Add Majid Shahriari’s assassination attempt (2010-11-29)
assassination_date <- as.Date("2010-11-29")

# 4. Define sabotage styling
sabotage_styles <- tibble::tibble(
  event = c("Stuxnet Cyberattack", "Natanz Explosion", "Natanz Blackout"),
  color = c("darkorange", "#008", "darkmagenta"),
  linetype = c("dotted", "dotdash", "dashed")
)

# Join to sabotage events
sabotage_events <- major_events_filtered %>%
  filter(category == "Sabotage") %>%
  left_join(sabotage_styles, by = "event") %>%
  mutate(event = factor(event, levels = sabotage_styles$event))

# 5. Colors for publication categories
pub_colors <- c(
  "Majid Only" = "#1f78b4",
  "Abbasi Only" = "coral",
  "Both" = "#8f7c82",
  "Neither" = "gray4",
  setNames(sabotage_styles$color, sabotage_styles$event)
)

# 6. Plot
p <- ggplot(timeline_data$monthly_long, aes(x = year_month, y = count, color = count_type)) +
  geom_smooth(method = "loess", span = 0.25, se = TRUE, linewidth = 1) +

  # Assassination attempt
  geom_vline(xintercept = assassination_date, color = "#e31a1c", linetype = "solid", linewidth = 1) +

  # Sabotage
  geom_vline(data = sabotage_events,
             aes(xintercept = date, color = event, linetype = event),
             linewidth = 1, show.legend = TRUE) +

  facet_wrap(~count_type, scales = "free_y", ncol = 1) +

  labs(
    title = "Publication Output at Shahid Beheshti University",
    subtitle = "Solid red line = Assassination attempt on Majid Shahriari (killed) and Fereydoon Abbasi (survived) – Nov 29, 2010\nColored dashed lines = Sabotage Events",
    x = "Publication Date",
    y = "Monthly Publication Count",
    color = "Publication Trend",
    linetype = "Sabotage Event"
  ) +

  scale_color_manual(
    values = pub_colors,
    breaks = c("Majid Only", "Abbasi Only", "Both", "Neither"),
    guide = guide_legend(order = 1)
  ) +

  scale_linetype_manual(
    values = setNames(sabotage_styles$linetype, sabotage_styles$event),
    guide = guide_legend(order = 2, override.aes = list(
      color = sabotage_styles$color
    ))
  ) +
  scale_x_date(
  date_breaks = "6 months", 
  date_labels = "%b %Y", 
  limits = c(as.Date("2009-12-01"), as.Date("2011-12-01"))
)+
  theme_minimal(base_size = 13) +
  theme(
    axis.text.x = element_text(angle = 60, hjust = 1),
    legend.position = "bottom",
    legend.box = "vertical"
  )

p
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 930 rows containing non-finite outside the scale range
## (`stat_smooth()`).

# 7. Save
ggsave("plots/combined_subfield_timelines.png", plot = p, width = 10, height = 6, dpi = 300)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 930 rows containing non-finite outside the scale range
## (`stat_smooth()`).
# Also return it to view in console
library(tidyverse)
library(lubridate)

# Aggregate publication count per month
all_university_monthly <- works_tagged %>%
  group_by(monthly = floor_date(publication_date, "month")) %>%
  summarise(publication_count = n(), .groups = "drop")

# Plot
ggplot(all_university_monthly, aes(x = monthly, y = publication_count)) +
  #geom_line(color = "#333366", size = 1) +
  geom_smooth(method = "loess", span = 0.25, se = TRUE, color = "black", linewidth = 0.8) +
  labs(
    title = "Publication Output at Shahid Beheshti University",
    x = "Publication Date",
    y = "Monthly Publication Count",
    caption = "Note: Based on all available works in OpenAlex"
  ) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme_minimal(base_size = 13) +
  theme(
    axis.text.x = element_text(angle = 60, hjust = 1),
    plot.title = element_text(face = "bold")
  )
## `geom_smooth()` using formula = 'y ~ x'

Another important limitation relates to the accuracy of publication dates provided by OpenAlex. As many publication records might have missing or incomplete publication data, OpenAlex automatically assigns either the earliest known electronic release date or the first day of the month. This leads to artificial spikes in publication counts on the 1st of each month, not reflective of actual publication behavior. Unfortunately, OpenAlex does not distinguish between genuinely first-of-the-month publications and those retroactively assigned that date, making it impossible to disambiguate between real and artificial publication dates in such cases. To mitigate this distortion, we aggregated data at the monthly level and used LOESS smoothing during visualization. This smoothing approach helps reduce the influence of day-level anomalies and provides a clearer depiction of broader publication trends. While these steps lessen the impact of misdated entries, we recommend that future research incorporate additional metadata sources (such as Crossref XML or legacy MAG data) when finer temporal resolution is required.

library(tidyverse)
library(scales)
library(lubridate)

# Identify top 5 subfields by publication count
ut_top_5_subfields <- works_403 %>%
  filter(!is.na(top_subfield), !is.na(publication_year)) %>%
  count(publication_year, top_subfield) %>%
  group_by(top_subfield) %>%
  mutate(total = sum(n)) %>%
  ungroup() %>%
  group_by(top_subfield) %>%
  slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  slice_max(order_by = total, n = 5) %>%
  pull(top_subfield)

# Ensure publication dates are parsed
ut_works <- works_403 %>%
  mutate(pub_month = floor_date(publication_date, "month"))

# Filter after year 2000 only
monthly_counts_ut <- ut_works %>%
  filter(top_subfield %in% ut_top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
  count(pub_month, top_subfield)

# Normalize to each subfield's first available month (after 2000)
monthly_growth_ut <- monthly_counts_ut %>%
  group_by(top_subfield) %>%
  arrange(pub_month) %>%
  mutate(
    base_count = first(n),
    growth_index = (n / base_count) * 100
  ) %>%
  ungroup()
library(dplyr)
library(ggplot2)
library(scales)

# Step 1: Compute top 5 subfields by average growth for SBU
top_5_sbu <- monthly_growth %>%
  group_by(top_subfield) %>%
  summarise(avg_index = mean(growth_index, na.rm = TRUE)) %>%
  arrange(desc(avg_index)) %>%
  slice_head(n = 5) %>%
  pull(top_subfield)

# Step 2: Compute top 5 subfields by publication count for UT
top_5_ut <- works_403 %>%
  filter(pub_month >= as.Date("2000-01-01")) %>%
  group_by(top_subfield) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) %>%
  slice_head(n = 5) %>%
  pull(top_subfield)

# Step 3: Combine both top 5 lists
top_5_combined <- union(top_5_sbu, top_5_ut)

# Step 4: Prepare SBU data
monthly_growth_sbu <- monthly_growth %>%
  filter(top_subfield %in% top_5_combined) %>%
  mutate(university = "Shahid Beheshti University")

# Step 5: Prepare UT data
monthly_counts_ut <- works_403 %>%
  filter(top_subfield %in% top_5_combined, pub_month >= as.Date("2000-01-01")) %>%
  count(pub_month, top_subfield)

monthly_growth_ut <- monthly_counts_ut %>%
  group_by(top_subfield) %>%
  arrange(pub_month) %>%
  mutate(
    base_count = first(n),
    growth_index = (n / base_count) * 100
  ) %>%
  ungroup() %>%
  mutate(university = "University of Tehran")

# Step 6: Combine both datasets
combined_growth <- bind_rows(monthly_growth_sbu, monthly_growth_ut)

# Step 7: Define consistent colors
shared_colors <- c(
  "Biomedical Engineering" = "#102542",   # Dark navy
  "Electrical and Electronic Engineering" = "#A84400",  # Burnt orange
  "Molecular Biology" = "#8B0000",        # Dark red
  "Materials Chemistry" = "#4B0082"       # Dark violet
)

sbu_only_colors <- c(
  "Organic Chemistry" = "darkcyan"         # Deep steel blue
)

ut_only_colors <- c(
  "Mechanical Engineering" = "darkgreen"    # Dark teal
)

dark_colors_combined <- c(shared_colors, sbu_only_colors, ut_only_colors)

# Step 8: Plot
ggplot(combined_growth, aes(x = pub_month, y = growth_index, color = top_subfield)) +
  geom_line(linewidth = 1, alpha = 0.8) +
  facet_wrap(~university) +
  scale_color_manual(values = dark_colors_combined) +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y", limits = c(as.Date("2000-01-01"), NA)) +
  labs(
    title = "Comparison of Publication Growth in Top Subfields",
    subtitle = "Indexed to each subfield's first recorded month after 2000 (Index = 100%)",
    x = "Publication Date",
    y = "Relative Publication Index (%)",
    color = "Subfield"
  ) +
  theme_minimal(base_size = 12)

# Save the plot
ggsave("plots/ut_sbu_monthly_subfield_growth.png", width = 10, height = 6, dpi = 300)
assassination_date <- as.Date("2010-11-29")
#assassination_date <- as.Date("2010-01-12")


avg_growth_before <- combined_growth %>%
  filter(pub_month < assassination_date) %>%
  group_by(university, top_subfield) %>%
  summarise(avg_growth_index = mean(growth_index, na.rm = TRUE)) %>%
  mutate(period = "Before 2010-11-29")
## `summarise()` has grouped output by 'university'. You can override using the
## `.groups` argument.
avg_growth_after <- combined_growth %>%
  filter(pub_month >= assassination_date) %>%
  group_by(university, top_subfield) %>%
  summarise(avg_growth_index = mean(growth_index, na.rm = TRUE)) %>%
  mutate(period = "After 2010-11-29")
## `summarise()` has grouped output by 'university'. You can override using the
## `.groups` argument.
growth_comparison <- bind_rows(avg_growth_before, avg_growth_after) %>%
  arrange(university, top_subfield, period) %>%
  mutate(avg_growth_index = round(avg_growth_index, 1))

To identify the top five subfields at each institution, we began by filtering the publication dataset by university—Shahid Beheshti University and the University of Tehran. For each institution, we grouped publications by subfield and by month, and then computed the number of publications per subfield per month. To assess growth over time, we calculated a relative growth index for each subfield by normalizing its monthly publication count to the number of publications in its first recorded month after the year 2000. Specifically, the growth index was defined as:

\[ \text{Growth Index} = \frac{\text{Monthly Publications}}{\text{Publications in First Month}} \times 100 \]

This measure reflects how much each subfield’s output increased relative to its own baseline. We then averaged the growth index over time for each subfield and selected the top five subfields with the highest mean growth index for each university. Building on this measure, we observe that both Shahid Beheshti University and the University of Tehran share four of the same top five subfields with the highest relative growth: Biomedical Engineering, Electrical and Electronic Engineering, Materials Chemistry, and Molecular Biology. The fifth subfield differs across institutions: Organic Chemistry is prominent at Shahid Beheshti, whereas Mechanical Engineering stands out at the University of Tehran. Among these, Electrical and Electronic Engineering saw the most pronounced growth at Shahid Beheshti University, while Mechanical Engineering experienced the fastest expansion at the University of Tehran. These patterns indicate both shared and institution-specific research priorities since the beginning of the 2000s.

Get all the author names from Iranian Institution

## Requesting url: https://api.openalex.org/institutions?filter=country_code%3AIR
## Getting 3 pages of results with a total of 403 records...

Format for DiD

```

(b) If the dataset is too small– then all Iranian papers associated with those subfields and topics

The script where I coded this is called filter_merge.R.

Take 2

  • take the subject vector, pull all papers associated with the identified keywords and subtopics

  • Pull all author IDs associated with those papers

  • Look at # of people working over time.

If we need more data: - Run a clustering analysis of topics/keywords/subtopics to expand aperture - Run the same snowball sample outlined above for all co-authors of the assassinated scholars

The above addresses the narrowest interpretation of our question: did assassination [and probably other co-occurring sabotage] campaigns reduce scientific output in the university and the research areas associated with the targeted scholars?

and Luz– let us know as you get through the “first pass” checklist. I think that there’s some clever way to use dplyr to search and collected those nested keyword/sub topic lists without having to permanently expand out the dataset. I don’t know it off of the top of my head, but I can look for advice if you get stumped.

Iranian research notes from today’s check-in meeting(7/24/2025)/things I’ll look into:

  1. Create some plots with the Pre/Post Comparison of Work Output: That is, create timelines or counts by year to visualize trends, mainly of pre/post volume of works being published in areas/keywords of interest for various universities, but also look at trends of topics and subfields pre vs. post. So, is there a decrease in publication in specific areas of interest? Share in the chat.

  2. Co-Authorship over time: Track how scholars’s direct and indirect collaborators evolve over time. Take Majid Shahriar as an example. He became an active in neutron transport research in the early 2000s and began contributing to Iran’s nuclear science community. Between 2005–2010, he held a key position at Shahid Beheshti University and worked with the Atomic Energy Organization of Iran. By 2009–2010, he was identified as a top nuclear scientist and reportedly worked on strategic applications of nuclear chain reactions. Gained international surveillance attention. SO, since Majid Shahriari gained more power and strategic responsibility within Iran’s nuclear establishment (particularly in sensitive or classified domains) his publication count would likely decrease but this doesn’t precisely means he is not publishing as much. Create a list of authors Majid Shahriari collaborated with prior to his assassination to identify potential trends or patterns in his research network

  3. Structure the missing work: For missing or unindexed works, cross-check works in OpenAlex vs. known bibliography and flag known missing papers for deeper investigation.

Talk with Damian 8/4/2025:

  • Compared to other subfields/topics that they had nothing to do with, his death shouldn’t affect those.

  • But ALSO, if we look at at one of his subfields/topics (let’s say nuclear engineering) then it would have at another university (We could hold the exact same subfield/topic constant since it’s from another university)

  • Subfield level: Group by up to the subfield/month then take each subfield and give it 1 for treatmenmt and 0 for control Time indicator (post = 1 for every month that majid was assassinated and 0 for everything before then ). Month of november should be zero.

  • Create the same for Abbasi!