Iran Citation - Progress Report
Introduction
Current Research questions
The purpose of this report is to give a continues update on what we are currently working on when it comes to the Iranian citation
What different sactions regime affect collaborations with different parties?
Current Progress
API Access Expanded
The OpenAlex API has a default limit of 100,000 calls per day (max 10 per second). I contacted their team and received an increased quota of 1 million calls per day and 100 per second.
If you plan to replicate or extend this work, I strongly recommend requesting the same: OpenAlex API Rate Limits & Authentication
Data Collection – Iranian Institutions
We have successfully collected works from all 403 Iranian
institutions. These datasets are stored at:
nsdpi-storage > people > czj9zj > temp_progress_final/
- Files are named:
works_batch_1.rds(fewest works) toworks_batch_403.rds(most works) - Key columns included:
top_field,top_subfield, etc. - Data collection required careful error handling due to API limitations.
Current Limitations
- Not yet filtered by strategic research fields.
- Some works may have incorrect institution name mappings, however,
this was fixed in
pullIran_works_final.R.
Current Scripts
- get_authors.R
This script is used to download all the authors from Iran who have published works
- Updated Script to pull all the works –
pullIran_works_final.R
This script re-downloads works with the correct institution name. It
generates a folder temp_progresss_final/ with the most
updated results. This is still in progress and needs to be
completed.
- Analysis Script –
Iran_analysis.R
This script is now the main tool for analyzing Iranian works.
Next Steps
Deeper Exploratory Data Analysis
- Current EDA needs refinement to better understand key patterns.
- I reviewed this resource (but couldn’t yet replicate plots):
R Journal 2023 – Mapping Scientific Production
Pipeline Optimization & Reproducibility
- The full workflow will be organized into a clean GitHub repo.
- Harmonizing all pieces into one reproducible research package is a priority.
- To do this, we need all the data.
Change Point Analysis
- We aim to test whether key events (e.g., assassinations, sanctions) led to significant disruptions.
- The
gampackage will be used to detect changes in trend (structural breaks).
Insights from D&PI BI weekly meeting (7/22/2025)
- Mapping Science – (Uttan Rao)
Key takeaways for OpenAlex limitations:
- No Advisor–Advisee Detection
- Incomplete Affiliations
- English Language Bias
- Author Name Disambiguation Problems
- Sparse or Missing Funding Info
These are critical to interpreting patterns accurately.
Research Questions
These are aligned with the NSDPI summer research objectives:
Key Question: How are the military affiliated PRC research institutions working with Iran and Russia on the emerging tech NSDPI is examining?
- What research fields are largest/strongest?
- Are there correlations with certain countries and who researchers are citing?
- What are the values of the papers being cited? (Need to develop and refine measures)
- Are new PRC policies driving certain research fields? What tools and methods can be used to better inform policy makers?
- For a broader context/later work, compare this with what US/West is researching.
⚠️ Note: These questions require extensive comparative data beyond Iran. This is just a broather approach of the citation analysis.
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"
majid_works_long <- readRDS(file.path(data_path, "majid_works_long.rds"))
# Read Abbasi's long works data
abbasi_works_long <- readRDS(file.path(data_path, "abbasi_works_long.rds"))
# Read Masoud's long works data
masoud_works_long <- readRDS(file.path(data_path, "masoud_works_long.rds"))
# Save each object as an .RDS file
#saveRDS(majid_works, file = file.path(data_path, "majid_works.rds"))
#saveRDS(abbasi_works, file = file.path(data_path, "abbasi_works.rds"))
#saveRDS(masoud_works, file = file.path(data_path, "masoud_works.rds"))
majid_works <- readRDS(file.path(data_path, "majid_works.rds"))
abbasi_works <- readRDS(file.path(data_path, "abbasi_works.rds"))
masoud_works <- readRDS(file.path(data_path, "masoud_works.rds"))Take 1: Identify the right subset of Iranian publication data
Steps from Iran meeting 7/23/25 - Margaret’s Notes:
- Basic Strategy: snowball sample of topics and, if needed, coauthors
Missing:
- Darioush Rezaeinejad
- Mostafa Ahmadi Roshan
- Mohsen Fakhrizadeh
- Seyyed Amir Hossein Feghhi
- Akbar Motalebizadeh
- Mohammad Mehdi Tehranchi
- Saeed Borji
- Mansour Asgari
- Ahmadreza Zolfaghari Daryani
- Ali Bakhouei Katirimi
- Abdolhamid Minouchehr
- Isar Tabatabai-Qamsheh
- Mohammad Reza Sedighi Saber
Filtering for Pre-2025 Assassinations
We will continue on the list of To focus our analysis on a meaningful timeline, we filter the dataset to include only assassinations that occurred prior to 2025. This provides a clearer contrast between earlier targeted killings and those that occurred during the significant escalation in 2025.
| Date | Victim | Expertise | Method | Location | Event | Role | institution_name | author_id | alternate_names | current_institution | past_institutions | h_index | i10_index | works_count | citations_count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007-01-15 | Ardeshir Hosseinpour | Authority on electromagnetism | By gas or possibly radiation poisoning | Shiraz | Died | Professor | Shiraz University | a5048006655 | A. Hosseinpour, Ardeshir Hosseinpour | Shiraz University | Shiraz University, Malek Ashtar University of Technology | 3 | 1 | 4 | 37 |
| 2010-01-12 | Masoud Ali-Mohammadi | Quantum field theory and elementary particle physics | By a remote-control bomb attached to a motorcycle | Tehran | Assassination | Professor | University of Tehran | A5111436477 | Masoud Alimohammadi | Ilam University | Ilam University, University of Tehran, Institute for Research in Fundamental Sciences, Institute for Cognitive Science Studies | 4 | 4 | 9 | 90 |
| 2010-11-29 | Majid Shahriari | Specialized in neutron transport | By a bomb attached to his car from a motorcycle | Tehran | Assassination | Nuclear Engineer | Shahid Beheshti University | A5112039075 | Majid Shahriari, M H Shahriari | Shahid Beheshti University | Shahid Beheshti University, Amirkabir University of Technology | 6 | 1 | 13 | 80 |
| 2010-11-29 | Fereydoon Abbasi | Nuclear physicist and administrator | By a bomb attached to his car from a motorcycle | Tehran | Survived | Professor | Shahid Beheshti University | A5103401909 | Fereydoon Abbasi Davani | Shahid Beheshti University | Shahid Beheshti University | 2 | NA | 13 | 6 |
| 2011-07-23 | Darioush Rezaeinejad | Expert in neutron transport | Shot by motorcycle gunmen | Tehran | Assassination | Physicist | Shahid Beheshti University | NA | NA | NA | NA | NA | NA | NA | NA |
| 2012-01-11 | Mostafa Ahmadi Roshan | Researching polymeric membranes for gaseous diffusion | By a bomb attached to his car from a motorcycle | Tehran | Assassination | Professor | Sharif University of Technology | NA | NA | NA | NA | NA | NA | NA | NA |
| 2020-11-27 | Mohsen Fakhrizadeh | Nuclear physicist and head of Iran’s nuclear program | Shot by a remote-control machine gun | Damavand | Assassination | Professor | Imam Hossein University | NA | NA | NA | NA | NA | NA | NA | NA |
Now we only have 7 researchers of interest. From these list, we can consider to remove Ardeshir Hosseinpour (15 January 2007), as there are “conflicting reports on the cause of Hosseinpour’s death” but we decide to keep it for these purposes as he seemed to be a very prominent figure (read below). We do exclude Fereydoon Abbasi from this subset. Although he was targeted in the 29 November 2010 attack that killed Majid Shahriari, Abbasi survived that assassination attempt. However, he was later killed on 13 June 2025 during the American-Israeli strikes that targeted Iranian scientists, military officials, and civilians (AP, 2025). So, we will keep an eye on him.
These 4 researchers were prominent in Iran’s nuclear program:
1.Ardeshir Hosseinpour, Age 45 “Hosseinpour was a nuclear physics scientist and a lecturer at Shiraz University and the Malek Ashtar University of Technology in Isfahan. An expert in the field of electromagnetism, he was one of the founders of the “Nuclear Technology Center of Isfahan,” the genesis of Natanz nuclear facility where he continued his research until his mysterious death on January 15, 2007.”
2. Masoud Ali Mohammadi, 50 “Mohammadi was a nuclear scientist and a PhD graduate student of physics from the Sharif University in Tehran. He had over 50 published papers and articles in academic journals and was reportedly named one of the key scientists in the advancements related to particle accelerator machines and atom smashers.”
3. Majid Shahriari, 45 was regarded as a key figure in the advancement of uranium enrichment technologies at Iran’s Atomic Energy Organization. He was assassinated on 29 November 2010 by a magnetic bomb attached to his car by assailants on a motorcycle, while driving on the Artesh highway in Tehran
Source for the 3 above: (VOA News).
- 4. Fereydoon Abbasi-Davani, 66, a professor of nuclear physics at Shahid Beheshti University, was reportedly a member of the Islamic Revolutionary Guard Corps (IRGC) since the 1979 Islamic Revolution (NYTimes, 2011).
Therefore, for the initial analysis, we will focus on these 3 scientists:
| Date | Victim | Expertise | Method | Location | Event | Role | institution_name | author_id | alternate_names | current_institution | past_institutions | h_index | i10_index | works_count | citations_count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007-01-15 | Ardeshir Hosseinpour | Authority on electromagnetism | By gas or possibly radiation poisoning | Shiraz | Died | Professor | Shiraz University | a5048006655 | A. Hosseinpour, Ardeshir Hosseinpour | Shiraz University | Shiraz University, Malek Ashtar University of Technology | 3 | 1 | 4 | 37 |
| 2010-01-12 | Masoud Ali-Mohammadi | Quantum field theory and elementary particle physics | By a remote-control bomb attached to a motorcycle | Tehran | Assassination | Professor | University of Tehran | A5111436477 | Masoud Alimohammadi | Ilam University | Ilam University, University of Tehran, Institute for Research in Fundamental Sciences, Institute for Cognitive Science Studies | 4 | 4 | 9 | 90 |
| 2010-11-29 | Majid Shahriari | Specialized in neutron transport | By a bomb attached to his car from a motorcycle | Tehran | Assassination | Nuclear Engineer | Shahid Beheshti University | A5112039075 | Majid Shahriari, M H Shahriari | Shahid Beheshti University | Shahid Beheshti University, Amirkabir University of Technology | 6 | 1 | 13 | 80 |
Additional Sources:
make a big of all subfields and topics. Call that “subjects” (or something else, doesn’t really matter)
# Define your path if needed
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"
# Read the keyword files (adjust file names if needed)
#majid_keywords <- readRDS(file.path(data_path, "majid_keywords.rds"))
#abbasi_keywords <- readRDS(file.path(data_path, "abbasi_keywords.rds"))
majid_keywords <- readRDS(file.path(data_path, "majid_keywords_v1.rds"))
new_abbasi_keywords <- readRDS(file.path(data_path, "abbasi_keywords_v1.rds"))# Step 1: Extract unique topics from the 'Topic' column
#abbasi_topics <- unique(new_abbasi_keywords$Topic)
#majid_topics <- unique(new_majid_keywords$Topic)
# Step 2: Identify topic categories
#shared_topics <- intersect(abbasi_topics, majid_topics)
#abbasi_only_topics <- setdiff(abbasi_topics, shared_topics)
#majid_only_topics <- setdiff(majid_topics, shared_topics)
# Step 3: Padding helper
#pad_column <- function(x, n) {
# length(x) <- n
# x[is.na(x)] <- "---"
# return(x)
#}
# Step 4: Create the comparison table
#max_len <- max(length(majid_only_topics), length(shared_topics), length(abbasi_only_topics))
#topic_comparison <- tibble(
# "Majid Shahriari Only" = pad_column(majid_only_topics, max_len),
# "Shared Topics" = pad_column(shared_topics, max_len),
# "Fereydoon Abbasi Only" = pad_column(abbasi_only_topics, max_len)
#)
# Step 5: Generate the styled table
#table_html <- kbl(
# topic_comparison,
# caption = "Topic Comparison: Majid Shahriari vs. Fereydoon Abbasi"
#) %>%
# kable_styling(full_width = FALSE)
# Step 6: Save as HTML
#dir.create("plots", showWarnings = FALSE)
#save_kable(table_html, file = "plots/topic_comparison.html")
# Step 7: Save as PNG
#webshot("plots/topic_comparison.html", "plots/topic_comparison.png", zoom = 2)| Domain | Field | Subfield | Topic | Total Works | Total Citations | % of Works | % of Citations | |
|---|---|---|---|---|---|---|---|---|
| 53 | Physical Sciences | Physics and Astronomy | Radiation | Nuclear Physics and Applications | 207020 | 1274223 | 0.09 | 0.05 |
| 83 | Health Sciences | Medicine | Radiology, Nuclear Medicine and Imaging | Medical Imaging Techniques and Applications | 177439 | 1662238 | 0.08 | 0.06 |
| 157 | Health Sciences | Medicine | Ophthalmology | Glaucoma and retinal disorders | 140274 | 2078725 | 0.06 | 0.08 |
| 211 | Physical Sciences | Engineering | Safety, Risk, Reliability and Quality | Nuclear and radioactivity studies | 130674 | 285858 | 0.06 | 0.01 |
| 228 | Physical Sciences | Environmental Science | Health, Toxicology and Mutagenesis | Air Quality and Health Impacts | 128119 | 3243177 | 0.06 | 0.12 |
| 229 | Health Sciences | Medicine | Radiology, Nuclear Medicine and Imaging | Advanced MRI Techniques and Applications | 127691 | 2302623 | 0.06 | 0.09 |
| 244 | Physical Sciences | Engineering | Mechanics of Materials | Metal and Thin Film Mechanics | 125631 | 1703914 | 0.06 | 0.06 |
| 292 | Health Sciences | Medicine | Pulmonary and Respiratory Medicine | Cerebrovascular and Carotid Artery Diseases | 116893 | 1283161 | 0.05 | 0.05 |
| 306 | Physical Sciences | Earth and Planetary Sciences | Atmospheric Science | Atmospheric chemistry and aerosols | 115277 | 3807482 | 0.05 | 0.14 |
| 311 | Physical Sciences | Engineering | Civil and Structural Engineering | Structural Health Monitoring Techniques | 114437 | 1283516 | 0.05 | 0.05 |
| 342 | Physical Sciences | Physics and Astronomy | Radiation | Advanced Radiotherapy Techniques | 111820 | 1196375 | 0.05 | 0.05 |
| 428 | Physical Sciences | Engineering | Aerospace Engineering | Nuclear reactor physics and engineering | 102519 | 483794 | 0.05 | 0.02 |
| 475 | Physical Sciences | Engineering | Automotive Engineering | Vehicle emissions and performance | 99203 | 641034 | 0.04 | 0.02 |
| 489 | Physical Sciences | Materials Science | Materials Chemistry | Graphite, nuclear technology, radiation studies | 98176 | 402275 | 0.04 | 0.02 |
| 568 | Health Sciences | Medicine | Radiology, Nuclear Medicine and Imaging | Radiation Dose and Imaging | 93510 | 782436 | 0.04 | 0.03 |
| Domain | Field | Subfield | Topic | Total Works | Total Citations | % of Works | % of Citations | |
|---|---|---|---|---|---|---|---|---|
| 17 | Physical Sciences | Physics and Astronomy | Astronomy and Astrophysics | Astro and Planetary Science | 272845 | 2696720 | 0.12 | 0.10 |
| 53 | Physical Sciences | Physics and Astronomy | Radiation | Nuclear Physics and Applications | 207020 | 1274223 | 0.09 | 0.05 |
| 56 | Physical Sciences | Engineering | Civil and Structural Engineering | Engineering Applied Research | 206697 | 170096 | 0.09 | 0.01 |
| 68 | Physical Sciences | Engineering | Electrical and Electronic Engineering | Photonic and Optical Devices | 192687 | 1971794 | 0.09 | 0.07 |
| 82 | Health Sciences | Medicine | Radiology, Nuclear Medicine and Imaging | Monoclonal and Polyclonal Antibodies Research | 177915 | 4304991 | 0.08 | 0.16 |
| 83 | Health Sciences | Medicine | Radiology, Nuclear Medicine and Imaging | Medical Imaging Techniques and Applications | 177439 | 1662238 | 0.08 | 0.06 |
| 103 | Physical Sciences | Engineering | Electrical and Electronic Engineering | Semiconductor materials and devices | 163693 | 2223407 | 0.07 | 0.08 |
| 119 | Physical Sciences | Physics and Astronomy | Nuclear and High Energy Physics | Nuclear physics research studies | 157237 | 2553546 | 0.07 | 0.10 |
| 167 | Physical Sciences | Materials Science | Surfaces, Coatings and Films | Electron and X-Ray Spectroscopy Techniques | 137173 | 1773250 | 0.06 | 0.07 |
| 186 | Physical Sciences | Engineering | Control and Systems Engineering | Fault Detection and Control Systems | 134886 | 1422574 | 0.06 | 0.05 |
| 211 | Physical Sciences | Engineering | Safety, Risk, Reliability and Quality | Nuclear and radioactivity studies | 130674 | 285858 | 0.06 | 0.01 |
| 219 | Physical Sciences | Chemistry | Inorganic Chemistry | Radioactive element chemistry and processing | 129479 | 1660269 | 0.06 | 0.06 |
| 238 | Life Sciences | Biochemistry, Genetics and Molecular Biology | Molecular Biology | Advanced biosensing and bioanalysis techniques | 126304 | 3813628 | 0.06 | 0.14 |
| 244 | Physical Sciences | Engineering | Mechanics of Materials | Metal and Thin Film Mechanics | 125631 | 1703914 | 0.06 | 0.06 |
| 325 | Physical Sciences | Physics and Astronomy | Nuclear and High Energy Physics | Magnetic confinement fusion research | 112874 | 1213545 | 0.05 | 0.05 |
(a) Pull all papers associated with Shahid Beheshti University (uni 393 in Luz’s numbering schmes) and which have those subfields and topics
Now, we read all the works associated with Shahid Beheshti
University, which in our case is works_393. This has 35,305
works in total.
## [1] TRUE
## [1] FALSE
duplicated_ids <- works_403$id[duplicated(works_403$id)]
# Show which rows are duplicated
works_403 %>%
filter(id %in% duplicated_ids)## # A tibble: 2 × 51
## id title display_name authorships abstract doi publication_date
## <chr> <chr> <chr> <list> <chr> <chr> <date>
## 1 https://openal… Defe… Defect dete… <tibble> This st… http… 2024-12-23
## 2 https://openal… Defe… Defect dete… <tibble> This st… http… 2024-12-23
## # ℹ 44 more variables: publication_year <int>, fwci <dbl>,
## # cited_by_count <int>, counts_by_year <list>, cited_by_api_url <chr>,
## # ids <list>, type <chr>, is_oa <lgl>, is_oa_anywhere <lgl>, oa_status <chr>,
## # oa_url <chr>, any_repository_has_fulltext <lgl>, source_display_name <chr>,
## # source_id <chr>, issn_l <chr>, host_organization <chr>,
## # host_organization_name <chr>, landing_page_url <chr>, pdf_url <chr>,
## # license <chr>, version <chr>, referenced_works <list>, …
- We then check the
idcolumn to make sure there are no duplicate works. We then see that the works are indeed unique.
Does Shahid Beheshti have more distinct topics than distinct subfields among its publications?
- We do unique(works_393$top_subfield) and noticed there we 234 unique subfields in Shahid Beheshti University and 3129 unique topics.
domain_colors <- c(
"Physical Sciences" = "#1e3a8a", # Deep blue
"Social Sciences" = "#7c2d12", # Deep brown/orange
"Life Sciences" = "#14532d", # Deep green
"Health Sciences" = "#7f1d1d" # Deep red
)
# Create the plot and assign it to domain_pct_plot
domain_pct_plot <- works_393 %>%
filter(!is.na(top_domain)) %>%
count(top_domain, sort = TRUE) %>%
mutate(pct = n / sum(n)) %>%
ggplot(aes(x = reorder(top_domain, pct), y = pct, fill = top_domain)) +
geom_col() +
geom_text(aes(label = percent(pct, accuracy = 0.1)),
hjust = 1.3, color = "white", size = 4.2) +
coord_flip() +
scale_y_continuous(labels = percent_format(accuracy = 0.1)) +
scale_fill_manual(values = domain_colors) +
labs(
title = "Research Domains at Shahid Beheshti University",
x = "Domain",
y = "% of Total Works",
fill = "Domain"
) +
theme_minimal() +
theme(
legend.position = "none",
axis.text = element_text(size = 12),
axis.title = element_text(size = 13, face = "bold"),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)
# Now save it
#ggsave("plots/domain_percent_bar.png", plot = domain_pct_plot, width = 10, height = 6, dpi = 320)topics_data <- works_393 %>%
filter(!is.na(top_domain), !is.na(top_field)) %>%
group_by(domain.display_name = top_domain, field.display_name = top_field) %>%
summarise(works_count = n(), .groups = "drop")
treemap_plot <- create_complete_names_treemap(topics_data)## Warning in geom_treemap_text(colour = "black", place = "centre", grow = TRUE, :
## Ignoring unknown parameters: `max.size` and `check_overlap`
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Count unique fields
n_fields <- works_393 %>%
filter(!is.na(top_field)) %>%
summarise(n = n_distinct(top_field)) %>%
pull(n)
# Get field-level counts and domains
field_data <- works_393 %>%
filter(!is.na(top_field), !is.na(top_domain)) %>%
count(top_field, top_domain, sort = TRUE)
# Plot all fields
ggplot(field_data, aes(x = reorder(top_field, n), y = n, fill = top_domain)) +
geom_col() +
coord_flip() +
scale_fill_manual(values = domain_colors) +
labs(
title = "Distribution of Research Fields at Shahid Beheshti University",
subtitle = paste("All", n_fields, "fields colored by domain"),
x = "Field",
y = "Number of Works",
fill = "Domain"
) +
theme_minimal() +
theme(
axis.text = element_text(size = 11),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5)
)## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ readr::col_factor() masks scales::col_factor()
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ jsonlite::flatten() masks purrr::flatten()
## ✖ kableExtra::group_rows() masks dplyr::group_rows()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(forcats)
# Identify top 5 fields by publication count
top_5_fields <- works_393 %>%
filter(!is.na(top_field), !is.na(publication_year)) %>%
count(publication_year, top_field) %>%
group_by(top_field) %>%
mutate(total = sum(n)) %>%
ungroup() %>%
group_by(top_field) %>%
slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
ungroup() %>%
slice_max(order_by = total, n = 5) %>%
pull(top_field)
# Filter and plot
plot <- works_393 %>%
filter(top_field %in% top_5_fields, !is.na(publication_year)) %>%
count(publication_year, top_field) %>%
ggplot(aes(x = publication_year, y = n, fill = top_field)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ top_field, scales = "free_y", ncol = 2) +
scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
theme_minimal() +
labs(
title = "Evolution of Top 5 Research Fields Over Time",
subtitle = "Shahid Beheshti University – by publication count",
x = "Publication Year",
y = "Number of Works"
)
# Save the plot
#ggsave("plots/top_fields_evolution.png", plot = plot, width = 10, height = 6, dpi = 300)library(tidyverse)
library(forcats)
# Identify top 5 subfields by publication count
top_5_subfields <- works_393 %>%
filter(!is.na(top_subfield), !is.na(publication_year)) %>%
count(publication_year, top_subfield) %>%
group_by(top_subfield) %>%
mutate(total = sum(n)) %>%
ungroup() %>%
group_by(top_subfield) %>%
slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
ungroup() %>%
slice_max(order_by = total, n = 5) %>%
pull(top_subfield)
# Filter and plot
plot <- works_393 %>%
filter(top_subfield %in% top_5_subfields, !is.na(publication_year)) %>%
count(publication_year, top_subfield) %>%
ggplot(aes(x = publication_year, y = n, fill = top_subfield)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ top_subfield, scales = "free_y", ncol = 2) +
scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
theme_minimal() +
labs(
title = "Evolution of Top 5 Research Subfields Over Time",
subtitle = "Shahid Beheshti University – by publication count",
x = "Publication Year",
y = "Number of Works"
)
plot## Warning: Removed 35 rows containing missing values or values outside the scale range
## (`geom_col()`).
# Save the plot
#ggsave("plots/top_subfields_evolution.png", plot = plot, width = 10, height = 6, dpi = 300)library(tidyverse)
library(broom) # for tidy regression results
# Step 1: Filter and prepare data
field_growth_slopes <- works_393 %>%
filter(!is.na(top_field), !is.na(publication_year)) %>%
count(publication_year, top_field) %>%
group_by(top_field) %>%
filter(n() >= 5) %>% # Ensure enough data points for a slope
nest() %>%
mutate(
model = map(data, ~lm(n ~ publication_year, data = .x)),
slope = map_dbl(model, ~coef(.x)[["publication_year"]])
) %>%
ungroup() %>%
arrange(desc(slope)) %>%
slice_head(n = 5)
# Extract top 5 fastest-growing fields
top_5_growing_fields <- field_growth_slopes$top_field
# Filter and plot the fastest-growing fields
plot <- works_393 %>%
filter(top_field %in% top_5_growing_fields, !is.na(publication_year)) %>%
count(publication_year, top_field) %>%
ggplot(aes(x = publication_year, y = n, fill = top_field)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ top_field, scales = "free_y", ncol = 2) +
scale_x_continuous(limits = c(2000, 2025), breaks = seq(2000, 2025, 2)) +
theme_minimal() +
labs(
title = "Fastest-Growing Research Fields Over Time",
subtitle = "Top 5 fields at Shahid Beheshti University based on publication growth rate",
x = "Publication Year",
y = "Number of Works"
)
plot## Warning: Removed 54 rows containing missing values or values outside the scale range
## (`geom_col()`).
library(tidyverse)
library(scales)
library(lubridate)
# Ensure publication dates are parsed
works_393 <- works_393 %>%
mutate(pub_month = floor_date(publication_date, "month"))
# Filter after year 2000 only
monthly_counts <- works_393 %>%
filter(top_subfield %in% top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
count(pub_month, top_subfield)
# Normalize to each subfield's first available month (after 2000)
monthly_growth <- monthly_counts %>%
group_by(top_subfield) %>%
arrange(pub_month) %>%
mutate(
base_count = first(n),
growth_index = (n / base_count) * 100
) %>%
ungroup()
dark_colors <- c(
"Biomedical Engineering" = "#102542", # Dark navy
"Electrical and Electronic Engineering" = "#A84400", # Deep indigo
"Materials Chemistry" = "darkgreen", # Dark red (true crimson)
"Molecular Biology" = "darkred", # Deep cyan-teal
"Organic Chemistry" = "#4B0082" # Dark violet/indigo
)
ggplot(monthly_growth, aes(x = pub_month, y = growth_index, color = top_subfield)) +
geom_line(linewidth = 1, alpha = 0.8) +
scale_color_manual(values = dark_colors) +
scale_y_continuous(labels = percent_format(scale = 1)) +
scale_x_date(date_breaks = "2 years", date_labels = "%Y", limits = c(as.Date("2000-01-01"), NA)) +
labs(
title = "Monthly Publication Growth in Top Subfields at Shahid Beheshti University",
subtitle = "Indexed to each subfield's first recorded month after 2000 (Index = 100%)",
x = "Publication Date",
y = "Relative Publication Index (%)",
color = "Subfield"
) +
theme_minimal(base_size = 12)# Save the plot
ggsave("plots/monthly_subfield_growth.png", width = 10, height = 6, dpi = 300)
#ggsave("plots/subfield_growth_slopechart.png", width = 10, height = 6, dpi = 300)monthly_growth_sbu <- monthly_growth %>%
mutate(university = "Shahid Beheshti University")
works_403 <- works_403 %>%
mutate(pub_month = floor_date(publication_date, "month"))
# Filter after year 2000 only
monthly_counts_ut <- works_403 %>%
filter(top_subfield %in% top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
count(pub_month, top_subfield)
# Normalize to each subfield's first available month (after 2000)
monthly_growth_ut <- monthly_counts %>%
group_by(top_subfield) %>%
arrange(pub_month) %>%
mutate(
base_count = first(n),
growth_index = (n / base_count) * 100
) %>%
ungroup()
monthly_growth_ut <- monthly_growth_ut %>%
mutate(university = "University of Tehran")
shared_colors <- c(
"Biomedical Engineering" = "#102542", # Dark navy
"Electrical and Electronic Engineering" = "#A84400", # Burnt orange
"Molecular Biology" = "#8B0000", # Dark red
"Materials Chemistry" = "#4B0082" # Dark violet
)
sbu_only_colors <- c(
"Organic Chemistry" = "#3E3E6B" # Deep steel blue
)
ut_only_colors <- c(
"Mechanical Engineering" = "darkgreen" # Dark teal
)
dark_colors_combined <- c(shared_colors, sbu_only_colors, ut_only_colors)Next, we filter works_393 to only include works that
fall within the subfield areas associated with Majid
Shahriari’s research, as identified in majid_keywords.
Specifically, we exclude the domain level from this
filtering step. This is because OpenAlex only includes four
broad domains, making domain-level classification too coarse to
meaningfully reflect a researcher’s specific area of expertise and we
instead aim to capture a more focused and relevant set of works aligned
with Majid Shahriari’s research contributions.
Note on OpenAlex Topic Hierarchy: Works in OpenAlex are tagged with Topics using an automated model that evaluates features such as the title, abstract, journal, and citations of the work.
- There are approximately 4,500 Topics in OpenAlex.
- Each Topic is nested within a Subfield, which is nested within a Field, which in turn is nested within a top-level Domain.
- A work is assigned a primary topic (the one with the highest score), and inherits the corresponding subfield, field, and domain.
Source: https://help.openalex.org/hc/en-us/articles/24736129405719-Topics
| id | title | publication_date | topic | score | type | institution |
|---|---|---|---|---|---|---|
| W3193094654 | <i>Planck</i> 2018 results | 2021-08-01 | Radiation Therapy and Dosimetry | 0.9249 | article | Shahid Beheshti University |
| W3200934665 | FCC-hh: The Hadron Collider | 2019-07-01 | Particle Accelerators and Free-Electron Lasers | 0.9984 | article | Shahid Beheshti University |
| W2903991298 | Recent advances in modeling and simulation of nanofluid flows—Part II: Applications | 2018-12-05 | Fluid Dynamics and Vibration Analysis | 0.9947 | article | Shahid Beheshti University |
| W4377695098 | Diffusion models in medical imaging: A comprehensive survey | 2023-05-23 | AI in cancer detection | 0.9944 | review | Shahid Beheshti University |
| W4377695098 | Diffusion models in medical imaging: A comprehensive survey | 2023-05-23 | MRI in cancer diagnosis | 0.9940 | review | Shahid Beheshti University |
| W2962966855 | HE-LHC: The High-Energy Large Hadron Collider | 2019-07-01 | Particle Accelerators and Free-Electron Lasers | 0.9982 | article | Shahid Beheshti University |
| W4387778010 | Advances in medical image analysis with vision Transformers: A comprehensive review | 2023-10-19 | AI in cancer detection | 0.9979 | review | Shahid Beheshti University |
| W4221140247 | A next-generation liquid xenon observatory for dark matter and neutrino physics | 2022-12-21 | Atomic and Subatomic Physics Research | 0.9955 | article | Shahid Beheshti University |
| W4387430177 | DAE-Former: Dual Attention-Guided Efficient Transformer for Medical Image Segmentation | 2023-01-01 | AI in cancer detection | 0.9956 | book-chapter | Shahid Beheshti University |
| W3134197091 | Monte Carlo-based estimation of patient absorbed dose in 99mTc-DMSA, -MAG3, and -DTPA SPECT imaging using the University of Florida (UF) phantoms | 2025-03-06 | Radiopharmaceutical Chemistry and Applications | 1.0000 | article | Shahid Beheshti University |
| W3134197091 | Monte Carlo-based estimation of patient absorbed dose in 99mTc-DMSA, -MAG3, and -DTPA SPECT imaging using the University of Florida (UF) phantoms | 2025-03-06 | Medical Imaging Techniques and Applications | 1.0000 | article | Shahid Beheshti University |
| W4406828888 | Occurrence and transport of per- and polyfluoroalkyl substances (PFAS) in the leachate of a municipal solid waste landfill in Tehran, Iran (a Middle-eastern megacity) | 2025-01-01 | Atmospheric chemistry and aerosols | 0.9603 | article | Shahid Beheshti University |
| W2048628691 | Deep Anterior Lamellar Keratoplasty in Patients with Keratoconus: Big-Bubble Technique | 2010-01-14 | Glaucoma and retinal disorders | 0.9988 | article | Shahid Beheshti University |
| W2039681904 | Flow regime identification and void fraction prediction in two-phase flows based on gamma ray attenuation | 2014-11-20 | Nuclear reactor physics and engineering | 0.9910 | article | Shahid Beheshti University |
| W2039681904 | Flow regime identification and void fraction prediction in two-phase flows based on gamma ray attenuation | 2014-11-20 | Nuclear Physics and Applications | 0.9910 | article | Shahid Beheshti University |
Total works published in Majid subfields = 2155 (notice that all matching topics fall under already-included subfields. So if we filtered by topics only we would get 427 works.) Total works published in Majid topics/subfields/fields = 15520
# Using new_abbasi_keywords instead of just abbasi_keywords
# Subset works in Fereydoon Abbasi’s areas
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"
abbasi_keywords_v1 <- readRDS(file.path(data_path, "abbasi_keywords_v1.rds"))
sbu_abbasi_works <- shahid_beheshti_university_cleanedLONG %>%
filter(
topic %in% abbasi_keywords_v1$Topic
)
# Works NOT in Abbasi’s areas
sbu_non_abbasi_works <- shahid_beheshti_university_cleanedLONG %>%
filter(
!(topic %in% abbasi_keywords_v1$Topic)
)
sbu_abbasi_works %>%
head(15) %>%
kbl(caption = "Subset of Works from Shahid Beheshti University Matching Fereydoon Abbasi’s Research Areas") %>%
kable_styling(latex_options = c("hold_position", "scale_down"), font_size = 10)| id | title | publication_date | topic | score | type | institution |
|---|---|---|---|---|---|---|
| W2284013896 | Smart micro/nanoparticles in stimulus-responsive drug/gene delivery systems | 2016-01-01 | Nanoparticle-Based Drug Delivery | 0.9999 | review | Shahid Beheshti University |
| W3193094654 | <i>Planck</i> 2018 results | 2021-08-01 | Radiation Therapy and Dosimetry | 0.9249 | article | Shahid Beheshti University |
| W2908825166 | FCC-ee: The Lepton Collider | 2019-06-01 | Particle Detector Development and Performance | 0.9895 | article | Shahid Beheshti University |
| W2953154703 | FCC Physics Opportunities | 2019-06-01 | Particle Detector Development and Performance | 0.9982 | article | Shahid Beheshti University |
| W3200934665 | FCC-hh: The Hadron Collider | 2019-07-01 | Particle Accelerators and Free-Electron Lasers | 0.9984 | article | Shahid Beheshti University |
| W3200934665 | FCC-hh: The Hadron Collider | 2019-07-01 | Superconducting Materials and Applications | 0.9979 | article | Shahid Beheshti University |
| W3119276753 | Guanine-Based DNA Biosensor Amplified with Pt/SWCNTs Nanocomposite as Analytical Tool for Nanomolar Determination of Daunorubicin as an Anticancer Drug: A Docking/Experimental Investigation | 2021-01-08 | Advanced biosensing and bioanalysis techniques | 1.0000 | article | Shahid Beheshti University |
| W2108404396 | Evidence for a Kaon-Bound State<mml:math xmlns:mml=“http://www.w3.org/1998/Math/MathML” display=“inline”><mml:msup><mml:mi>K</mml:mi><mml:mo>−</mml:mo></mml:msup><mml:mi>p</mml:mi><mml:mi>p</mml:mi></mml:math>Produced in<mml:math xmlns:mml=“http://www.w3.org/1998/Math/MathML” display=“inline”><mml:msup><mml:mi>K</mml:mi><mml:mo>−</mml:mo></mml:msup></mml:math>Absorption Reactions at Rest | 2005-06-03 | Nuclear physics research studies | 0.9982 | article | Shahid Beheshti University |
| W3131284135 | A novel detection method for organophosphorus insecticide fenamiphos: Molecularly imprinted electrochemical sensor based on core-shell Co3O4@MOF-74 nanocomposite | 2021-02-25 | Advanced biosensing and bioanalysis techniques | 0.9938 | article | Shahid Beheshti University |
| W2096158223 | Principal components analysis by the galaxy-based search algorithm: a novel metaheuristic for continuous optimisation | 2011-01-01 | Spectroscopy and Chemometric Analyses | 0.9886 | article | Shahid Beheshti University |
| W2027703730 | Coplanar Full Adder in Quantum-Dot Cellular Automata via Clock-Zone-Based Crossover | 2015-03-05 | Quantum and electron transport phenomena | 0.9931 | article | Shahid Beheshti University |
| W3165315707 | Liposomal Nanomedicine: Applications for Drug Delivery in Cancer Therapy | 2021-05-25 | Nanoparticle-Based Drug Delivery | 1.0000 | review | Shahid Beheshti University |
| W2899821123 | Ultrasonic nano-emulsification – A review | 2018-11-09 | Ultrasound and Cavitation Phenomena | 0.9984 | review | Shahid Beheshti University |
| W2899821123 | Ultrasonic nano-emulsification – A review | 2018-11-09 | Electrohydrodynamics and Fluid Dynamics | 0.9955 | review | Shahid Beheshti University |
| W2962966855 | HE-LHC: The High-Energy Large Hadron Collider | 2019-07-01 | Particle Accelerators and Free-Electron Lasers | 0.9982 | article | Shahid Beheshti University |
After creating the new_abbasi_keywords dataset, here is what we have in comparison to the abbasi_keywords:
In general, there are 4516 topics in OpenAlex and 252 unique subfields.
We have 35305 unique works in works_393. This has 3129 unique topics and 234 unique subfields.
- new_abbasi_keywords has 99 unique topics and 36 unique subfields.
- In subfield only: 15783
- In topic only: 2193
- abbasi_keywods has 7 unique topics 4 unique subfields.
- In subfield only: 4347
- In topic only: 147
library(dplyr)
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"
masoud_keywords_v1 <- readRDS(file.path(data_path, "masoud_keywords_v1.rds"))
base_path <- "/standard/nsdpi_storage/people/czj9zj/temp_progress_new"
# university_of_tehran_long <- readRDS(file.path(base_path, "works_batch_403.rds")) %>%
# filter(type != "dataset", type != "erratum", type != "retraction") %>%
# select(id, title, publication_date, topics, type) %>%
# # keep rows where 'topics' exists and has at least one row
# filter(purrr::map_lgl(topics, ~ !is.null(.x) && nrow(.x) > 0)) %>%
# mutate(
# topics = map(topics, ~ filter(.x, type == "topic")),
# topics = map(topics, ~ select(.x, display_name, score))
# ) %>%
# unnest(cols = topics) %>%
# rename(topic = display_name) %>%
# mutate(
# across(where(is.character), ~ gsub("https://openalex.org/", "", .x, fixed = TRUE)),
# institution = "University of Tehran"
# )
#saveRDS(university_of_tehran_long, file.path(base_path, "university_of_tehran_long.rds"))
university_of_tehran_long <- readRDS(file.path(base_path, "university_of_tehran_long.rds"))
# Works in Masoud's subfields
# Works in Masoud’s areas
sbu_masoud_works <- university_of_tehran_long %>%
filter(
topic %in% masoud_keywords_v1$Topic
)
# Works NOT in Masoud’s areas
sbu_non_masoud_works <- university_of_tehran_long %>%
filter(
!(topic %in% masoud_keywords_v1$Topic)
)
# Save both datasets as RDS
saveRDS(sbu_masoud_works, file = file.path("/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data", "sbu_masoud_works.rds"))
saveRDS(sbu_non_masoud_works, file = file.path("/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data", "sbu_non_masoud_works.rds"))0.1 Major Foreign Events: Assassinations, Sabotage and Sanctions
| event | category | date | year |
|---|---|---|---|
| Stuxnet Cyberattack | Sabotage | 2010-06-17 | 2010 |
| Natanz Explosion | Sabotage | 2020-07-02 | 2020 |
| Natanz Blackout | Sabotage | 2021-04-11 | 2021 |
| Assassination of Massoud Ali-Mohammadi | Assassination | 2010-01-12 | 2010 |
| Assassination of Mohsen Fakhrizadeh | Assassination | 2020-11-27 | 2020 |
| First COVID-19 Case in Iran | Pandemic | 2020-02-19 | 2020 |
data_path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data_3topic/extra_data"
majid_timeline <- prepare_publication_timelines_with_complement(
sbu_majid_works,
sbu_non_majid_works,
author_label = "Majid"
)
plot_author_subfields_with_sabotage(
timeline_data = majid_timeline,
author_label = "Majid",
assassination_date = as.Date("2010-11-29"),
within_color = "#1f78b4",
data_path
)## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 190 rows containing non-finite outside the scale range
## (`stat_smooth()`).
# Prepare Abbasi's timeline data
abbasi_timeline <- prepare_publication_timelines_with_complement(
sbu_abbasi_works,
sbu_non_abbasi_works,
author_label = "Abbasi"
)
# Generate the plot for Abbasi
plot_author_subfields_with_sabotage(
timeline_data = abbasi_timeline,
author_label = "Abbasi",
assassination_date = as.Date("2010-11-29"), # Abbasi was not killed, but include a relevant date if analyzing a key event
within_color = "coral", # Use a distinct color for Abbasi
data_path
)## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 206 rows containing non-finite outside the scale range
## (`stat_smooth()`).
# Save the plot
ggsave("plots/abbasi_topics97threshold_plot.png", width = 10, height = 6, dpi = 300)## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 206 rows containing non-finite outside the scale range
## (`stat_smooth()`).
masoud_timeline <- prepare_publication_timelines_with_complement(
sbu_masoud_works,
sbu_non_masoud_works,
author_label = "Masoud"
)
works_monthly_long <- masoud_timeline$monthly_long
works_monthly_long <- works_monthly_long %>%
mutate(count_type = recode(count_type,
"Masoud Monthly Count" = "Within Masoud's Topics",
"Complement Monthly Count" = "Outside Masoud's Topics"))
assassination_date <- as.Date("2010-01-12")
# Load events
major_events_filtered <- readRDS(file.path(data_path, "major_events_filtered.rds"))
# Remove Massoud Ali-Mohammadi's assassination
major_events_filtered <- major_events_filtered %>%
filter(event != "Assassination of Massoud Ali-Mohammadi")
# Add Majid Shahriari's assassination (if needed again)
majid_event <- tibble::tibble(
event = "Assassination of Majid Shahriari",
category = "Assassination",
date = as.Date("2010-11-29"),
year = 2010
)
major_events_filtered <- bind_rows(major_events_filtered, majid_event)
# Define sabotage event styles
sabotage_events <- major_events_filtered %>%
filter(category == "Sabotage")# Packages
library(tidyverse)
library(ggnewscale)
# --- Styles for sabotage events ---------------------------------------------
sabotage_styles <- tibble::tibble(
event = c("Stuxnet Cyberattack", "Natanz Explosion", "Natanz Blackout"),
color = c("darkorange", "#008", "darkmagenta"),
linetype = c("dotted", "dotdash", "dashed")
)
# Merge styles onto your sabotage_events data
# (assumes you already have sabotage_events with a 'date' column)
sabotage_events <- left_join(sabotage_events, sabotage_styles, by = "event")
# Factor for consistent legend order
sabotage_events$event <- factor(sabotage_events$event, levels = sabotage_styles$event)
# --- Plot --------------------------------------------------------------------
# assumes:
# works_monthly_long with columns: year_month (Date), count (numeric), count_type (factor/chr)
# assassination_date is a Date
p <- ggplot(works_monthly_long, aes(x = year_month, y = count)) +
# Publication trends (two series) -------------------------------------------
geom_smooth(aes(color = count_type),
method = "loess", span = 0.25, se = FALSE, size = 0.9) +
scale_color_manual(
values = c(
"Within Masoud's Topics" = "darkolivegreen",
"Outside Masoud's Topics" = "#333333"
),
name = "Publication Trend"
) +
# Assassination (solid red) -------------------------------------------------
geom_vline(xintercept = assassination_date, color = "#e31a1c",
linetype = "solid", linewidth = 1) +
# Reset color scale for sabotage events ------------------------------------
new_scale_color() +
# Sabotage vertical lines (color + linetype mapped to event)
geom_vline(
data = sabotage_events,
aes(xintercept = date, color = event, linetype = event),
linewidth = 1
) +
scale_color_manual(
values = setNames(sabotage_styles$color, sabotage_styles$event),
name = "Sabotage Event"
) +
scale_linetype_manual(
values = setNames(sabotage_styles$linetype, sabotage_styles$event),
name = "Sabotage Event"
) +
# Facet & labels ------------------------------------------------------------
facet_wrap(~ count_type, scales = "free_y", ncol = 1) +
labs(
title = "Publication Output Within and Outside Massoud’s Topics at University of Tehran",
subtitle = "Solid red = Assassination of Massoud Ali-Mohammadi - January 12, 2010\nColored dashed lines = Sabotage Events",
x = "Publication Date",
y = "Monthly Publication Count"
) +
scale_x_date(
date_breaks = "2 years",
date_labels = "%Y",
limits = c(as.Date("2000-01-01"), NA)
) +
theme_minimal(base_size = 12) +
theme(
axis.text.x = element_text(angle = 60, hjust = 1),
legend.box = "vertical",
legend.position = "bottom"
)
# Print to screen
print(p)## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 423 rows containing non-finite outside the scale range
## (`new_stat_smooth()`).
# Save the plot
ggsave("plots/masoud_subfield_topics_plot.png", p, width = 10, height = 6, dpi = 300)## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 423 rows containing non-finite outside the scale range
## (`new_stat_smooth()`).
path <- "/standard/nsdpi_storage/people/czj9zj/DiD_data"
works_tagged <- readRDS(file.path(path, "works_tagged.rds"))
library(tidyverse)
library(lubridate)
library(scales)
# 1. Function to prepare timeline data
prepare_combined_author_timelines <- function(works_tagged) {
works_tagged %>%
mutate(subfield_category = factor(
subfield_category,
levels = c("Majid Only", "Abbasi Only", "Both", "Neither")
)) %>%
group_by(monthly, subfield_category) %>%
summarise(count = n(), .groups = "drop") %>%
rename(
year_month = monthly,
count_type = subfield_category
) -> monthly_long
list(monthly_long = monthly_long)
}
# 2. Prepare data
timeline_data <- prepare_combined_author_timelines(works_tagged)
# 3. Load events
major_events_filtered <- readRDS(file.path(data_path, "major_events_filtered.rds")) %>%
filter(event != "Assassination of Massoud Ali-Mohammadi")
# Add Majid Shahriari’s assassination attempt (2010-11-29)
assassination_date <- as.Date("2010-11-29")
# 4. Define sabotage styling
sabotage_styles <- tibble::tibble(
event = c("Stuxnet Cyberattack", "Natanz Explosion", "Natanz Blackout"),
color = c("darkorange", "#008", "darkmagenta"),
linetype = c("dotted", "dotdash", "dashed")
)
# Join to sabotage events
sabotage_events <- major_events_filtered %>%
filter(category == "Sabotage") %>%
left_join(sabotage_styles, by = "event") %>%
mutate(event = factor(event, levels = sabotage_styles$event))
# 5. Colors for publication categories
pub_colors <- c(
"Majid Only" = "#1f78b4",
"Abbasi Only" = "coral",
"Both" = "#8f7c82",
"Neither" = "gray4",
setNames(sabotage_styles$color, sabotage_styles$event)
)
# 6. Plot
p <- ggplot(timeline_data$monthly_long, aes(x = year_month, y = count, color = count_type)) +
geom_smooth(method = "loess", span = 0.25, se = TRUE, linewidth = 1) +
# Assassination attempt
geom_vline(xintercept = assassination_date, color = "#e31a1c", linetype = "solid", linewidth = 1) +
# Sabotage
geom_vline(data = sabotage_events,
aes(xintercept = date, color = event, linetype = event),
linewidth = 1, show.legend = TRUE) +
facet_wrap(~count_type, scales = "free_y", ncol = 1) +
labs(
title = "Publication Output at Shahid Beheshti University",
subtitle = "Solid red line = Assassination attempt on Majid Shahriari (killed) and Fereydoon Abbasi (survived) – Nov 29, 2010\nColored dashed lines = Sabotage Events",
x = "Publication Date",
y = "Monthly Publication Count",
color = "Publication Trend",
linetype = "Sabotage Event"
) +
scale_color_manual(
values = pub_colors,
breaks = c("Majid Only", "Abbasi Only", "Both", "Neither"),
guide = guide_legend(order = 1)
) +
scale_linetype_manual(
values = setNames(sabotage_styles$linetype, sabotage_styles$event),
guide = guide_legend(order = 2, override.aes = list(
color = sabotage_styles$color
))
) +
scale_x_date(
date_breaks = "6 months",
date_labels = "%b %Y",
limits = c(as.Date("2009-12-01"), as.Date("2011-12-01"))
)+
theme_minimal(base_size = 13) +
theme(
axis.text.x = element_text(angle = 60, hjust = 1),
legend.position = "bottom",
legend.box = "vertical"
)
p## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 930 rows containing non-finite outside the scale range
## (`stat_smooth()`).
# 7. Save
ggsave("plots/combined_subfield_timelines.png", plot = p, width = 10, height = 6, dpi = 300)## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 930 rows containing non-finite outside the scale range
## (`stat_smooth()`).
library(tidyverse)
library(lubridate)
# Aggregate publication count per month
all_university_monthly <- works_tagged %>%
group_by(monthly = floor_date(publication_date, "month")) %>%
summarise(publication_count = n(), .groups = "drop")
# Plot
ggplot(all_university_monthly, aes(x = monthly, y = publication_count)) +
#geom_line(color = "#333366", size = 1) +
geom_smooth(method = "loess", span = 0.25, se = TRUE, color = "black", linewidth = 0.8) +
labs(
title = "Publication Output at Shahid Beheshti University",
x = "Publication Date",
y = "Monthly Publication Count",
caption = "Note: Based on all available works in OpenAlex"
) +
scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
theme_minimal(base_size = 13) +
theme(
axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(face = "bold")
)## `geom_smooth()` using formula = 'y ~ x'
Another important limitation relates to the accuracy of publication dates provided by OpenAlex. As many publication records might have missing or incomplete publication data, OpenAlex automatically assigns either the earliest known electronic release date or the first day of the month. This leads to artificial spikes in publication counts on the 1st of each month, not reflective of actual publication behavior. Unfortunately, OpenAlex does not distinguish between genuinely first-of-the-month publications and those retroactively assigned that date, making it impossible to disambiguate between real and artificial publication dates in such cases. To mitigate this distortion, we aggregated data at the monthly level and used LOESS smoothing during visualization. This smoothing approach helps reduce the influence of day-level anomalies and provides a clearer depiction of broader publication trends. While these steps lessen the impact of misdated entries, we recommend that future research incorporate additional metadata sources (such as Crossref XML or legacy MAG data) when finer temporal resolution is required.
library(tidyverse)
library(scales)
library(lubridate)
# Identify top 5 subfields by publication count
ut_top_5_subfields <- works_403 %>%
filter(!is.na(top_subfield), !is.na(publication_year)) %>%
count(publication_year, top_subfield) %>%
group_by(top_subfield) %>%
mutate(total = sum(n)) %>%
ungroup() %>%
group_by(top_subfield) %>%
slice_max(order_by = total, n = 1, with_ties = FALSE) %>%
ungroup() %>%
slice_max(order_by = total, n = 5) %>%
pull(top_subfield)
# Ensure publication dates are parsed
ut_works <- works_403 %>%
mutate(pub_month = floor_date(publication_date, "month"))
# Filter after year 2000 only
monthly_counts_ut <- ut_works %>%
filter(top_subfield %in% ut_top_5_subfields, pub_month >= as.Date("2000-01-01")) %>%
count(pub_month, top_subfield)
# Normalize to each subfield's first available month (after 2000)
monthly_growth_ut <- monthly_counts_ut %>%
group_by(top_subfield) %>%
arrange(pub_month) %>%
mutate(
base_count = first(n),
growth_index = (n / base_count) * 100
) %>%
ungroup()library(dplyr)
library(ggplot2)
library(scales)
# Step 1: Compute top 5 subfields by average growth for SBU
top_5_sbu <- monthly_growth %>%
group_by(top_subfield) %>%
summarise(avg_index = mean(growth_index, na.rm = TRUE)) %>%
arrange(desc(avg_index)) %>%
slice_head(n = 5) %>%
pull(top_subfield)
# Step 2: Compute top 5 subfields by publication count for UT
top_5_ut <- works_403 %>%
filter(pub_month >= as.Date("2000-01-01")) %>%
group_by(top_subfield) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
slice_head(n = 5) %>%
pull(top_subfield)
# Step 3: Combine both top 5 lists
top_5_combined <- union(top_5_sbu, top_5_ut)
# Step 4: Prepare SBU data
monthly_growth_sbu <- monthly_growth %>%
filter(top_subfield %in% top_5_combined) %>%
mutate(university = "Shahid Beheshti University")
# Step 5: Prepare UT data
monthly_counts_ut <- works_403 %>%
filter(top_subfield %in% top_5_combined, pub_month >= as.Date("2000-01-01")) %>%
count(pub_month, top_subfield)
monthly_growth_ut <- monthly_counts_ut %>%
group_by(top_subfield) %>%
arrange(pub_month) %>%
mutate(
base_count = first(n),
growth_index = (n / base_count) * 100
) %>%
ungroup() %>%
mutate(university = "University of Tehran")
# Step 6: Combine both datasets
combined_growth <- bind_rows(monthly_growth_sbu, monthly_growth_ut)
# Step 7: Define consistent colors
shared_colors <- c(
"Biomedical Engineering" = "#102542", # Dark navy
"Electrical and Electronic Engineering" = "#A84400", # Burnt orange
"Molecular Biology" = "#8B0000", # Dark red
"Materials Chemistry" = "#4B0082" # Dark violet
)
sbu_only_colors <- c(
"Organic Chemistry" = "darkcyan" # Deep steel blue
)
ut_only_colors <- c(
"Mechanical Engineering" = "darkgreen" # Dark teal
)
dark_colors_combined <- c(shared_colors, sbu_only_colors, ut_only_colors)
# Step 8: Plot
ggplot(combined_growth, aes(x = pub_month, y = growth_index, color = top_subfield)) +
geom_line(linewidth = 1, alpha = 0.8) +
facet_wrap(~university) +
scale_color_manual(values = dark_colors_combined) +
scale_y_continuous(labels = percent_format(scale = 1)) +
scale_x_date(date_breaks = "2 years", date_labels = "%Y", limits = c(as.Date("2000-01-01"), NA)) +
labs(
title = "Comparison of Publication Growth in Top Subfields",
subtitle = "Indexed to each subfield's first recorded month after 2000 (Index = 100%)",
x = "Publication Date",
y = "Relative Publication Index (%)",
color = "Subfield"
) +
theme_minimal(base_size = 12)# Save the plot
ggsave("plots/ut_sbu_monthly_subfield_growth.png", width = 10, height = 6, dpi = 300)assassination_date <- as.Date("2010-11-29")
#assassination_date <- as.Date("2010-01-12")
avg_growth_before <- combined_growth %>%
filter(pub_month < assassination_date) %>%
group_by(university, top_subfield) %>%
summarise(avg_growth_index = mean(growth_index, na.rm = TRUE)) %>%
mutate(period = "Before 2010-11-29")## `summarise()` has grouped output by 'university'. You can override using the
## `.groups` argument.
avg_growth_after <- combined_growth %>%
filter(pub_month >= assassination_date) %>%
group_by(university, top_subfield) %>%
summarise(avg_growth_index = mean(growth_index, na.rm = TRUE)) %>%
mutate(period = "After 2010-11-29")## `summarise()` has grouped output by 'university'. You can override using the
## `.groups` argument.
growth_comparison <- bind_rows(avg_growth_before, avg_growth_after) %>%
arrange(university, top_subfield, period) %>%
mutate(avg_growth_index = round(avg_growth_index, 1))To identify the top five subfields at each institution, we began by filtering the publication dataset by university—Shahid Beheshti University and the University of Tehran. For each institution, we grouped publications by subfield and by month, and then computed the number of publications per subfield per month. To assess growth over time, we calculated a relative growth index for each subfield by normalizing its monthly publication count to the number of publications in its first recorded month after the year 2000. Specifically, the growth index was defined as:
\[ \text{Growth Index} = \frac{\text{Monthly Publications}}{\text{Publications in First Month}} \times 100 \]
This measure reflects how much each subfield’s output increased relative to its own baseline. We then averaged the growth index over time for each subfield and selected the top five subfields with the highest mean growth index for each university. Building on this measure, we observe that both Shahid Beheshti University and the University of Tehran share four of the same top five subfields with the highest relative growth: Biomedical Engineering, Electrical and Electronic Engineering, Materials Chemistry, and Molecular Biology. The fifth subfield differs across institutions: Organic Chemistry is prominent at Shahid Beheshti, whereas Mechanical Engineering stands out at the University of Tehran. Among these, Electrical and Electronic Engineering saw the most pronounced growth at Shahid Beheshti University, while Mechanical Engineering experienced the fastest expansion at the University of Tehran. These patterns indicate both shared and institution-specific research priorities since the beginning of the 2000s.
Take 2
take the subject vector, pull all papers associated with the identified keywords and subtopics
Pull all author IDs associated with those papers
Look at # of people working over time.
If we need more data: - Run a clustering analysis of topics/keywords/subtopics to expand aperture - Run the same snowball sample outlined above for all co-authors of the assassinated scholars
The above addresses the narrowest interpretation of our question: did assassination [and probably other co-occurring sabotage] campaigns reduce scientific output in the university and the research areas associated with the targeted scholars?
and Luz– let us know as you get through the “first pass” checklist. I think that there’s some clever way to use dplyr to search and collected those nested keyword/sub topic lists without having to permanently expand out the dataset. I don’t know it off of the top of my head, but I can look for advice if you get stumped.
Iranian research notes from today’s check-in meeting(7/24/2025)/things I’ll look into:
Create some plots with the Pre/Post Comparison of Work Output: That is, create timelines or counts by year to visualize trends, mainly of pre/post volume of works being published in areas/keywords of interest for various universities, but also look at trends of topics and subfields pre vs. post. So, is there a decrease in publication in specific areas of interest? Share in the chat.
Co-Authorship over time: Track how scholars’s direct and indirect collaborators evolve over time. Take Majid Shahriar as an example. He became an active in neutron transport research in the early 2000s and began contributing to Iran’s nuclear science community. Between 2005–2010, he held a key position at Shahid Beheshti University and worked with the Atomic Energy Organization of Iran. By 2009–2010, he was identified as a top nuclear scientist and reportedly worked on strategic applications of nuclear chain reactions. Gained international surveillance attention. SO, since Majid Shahriari gained more power and strategic responsibility within Iran’s nuclear establishment (particularly in sensitive or classified domains) his publication count would likely decrease but this doesn’t precisely means he is not publishing as much. Create a list of authors Majid Shahriari collaborated with prior to his assassination to identify potential trends or patterns in his research network
Structure the missing work: For missing or unindexed works, cross-check works in OpenAlex vs. known bibliography and flag known missing papers for deeper investigation.
Talk with Damian 8/4/2025:
Compared to other subfields/topics that they had nothing to do with, his death shouldn’t affect those.
But ALSO, if we look at at one of his subfields/topics (let’s say nuclear engineering) then it would have at another university (We could hold the exact same subfield/topic constant since it’s from another university)
Subfield level: Group by up to the subfield/month then take each subfield and give it 1 for treatmenmt and 0 for control Time indicator (post = 1 for every month that majid was assassinated and 0 for everything before then ). Month of november should be zero.
Create the same for Abbasi!