The Nobel Prize represents one of the highest forms of international recognition for intellectual and humanitarian achievement, celebrating extraordinary contributions across physics, chemistry, medicine, literature, peace, and economic sciences. Since their inception in 1901, these prestigious awards have not only honored individual brilliance but have also reflected broader historical, geographical, and institutional patterns in human accomplishment.
This assignment employs data science techniques to analyze the rich dataset provided by the Nobel Prize API, investigating four compelling questions that reveal the hidden narratives behind these celebrated awards. Through systematic analysis of laureate information spanning over a century, we uncover fascinating patterns in global talent migration, institutional excellence, demographic trends, and geographical shifts in scientific and cultural leadership.
Data Acquisition & API Integration
# Load required libraries
library(httr) # For making HTTP requests to the API
library(jsonlite) # For parsing JSON data from the API response
library(dplyr) # For data manipulation and analysis
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(purrr) # For functional programming and working with lists
##
## Attaching package: 'purrr'
## The following object is masked from 'package:jsonlite':
##
## flatten
library(stringr) # For string manipulation and pattern matching
# Base API URL
BASE_URL <- "https://api.nobelprize.org/2.1/"
# Stores the root URL of the Nobel Prize API that we'll be calling
# Function to get ALL Nobel laureates (handle pagination)
get_all_nobel_laureates <- function() {
# Defines a function that will retrieve ALL Nobel laureates from the API
all_laureates <- list()
# Creates an empty list to store all the laureate data we'll fetch
offset <- 0
# Sets starting point - tells the API to start from the first record
limit <- 100
# Sets batch size - tells the API to return 100 records per request
while (TRUE) {
# Starts an infinite loop that will continue until we break out of it
url <- paste0(BASE_URL, "laureates?offset=", offset, "&limit=", limit)
# Constructs the complete API URL with pagination parameters
# Example: "https://api.nobelprize.org/2.1/laureates?offset=0&limit=100"
response <- GET(url)
# Sends an HTTP GET request to the API URL we just built
data <- content(response, "parsed")
# Extracts and parses the JSON content from the API response into R format
if (length(data$laureates) == 0) break
# Checks if the current batch contains any laureates
# If empty (length == 0), we've reached the end of the data and break the loop
all_laureates <- c(all_laureates, data$laureates)
# Appends the new batch of laureates to our growing list of all laureates
offset <- offset + limit
# Increases the offset by the limit to get the next batch of records
# Example: After first batch, offset becomes 100 to get records 101-200
# Safety break to avoid infinite loops
if (offset > 1000) break
# Emergency break - if we've requested more than 1000 records, stop
# This prevents infinite loops if something goes wrong with the API
}
return(all_laureates)
# Returns the complete list containing all Nobel laureates we fetched
}
# Get all data
cat("Fetching all Nobel laureates...\n")
## Fetching all Nobel laureates...
# Prints a message to the console to show the user what's happening
all_laureates <- get_all_nobel_laureates()
# Calls our function and stores the result in a variable called 'all_laureates'
cat("Total laureates fetched:", length(all_laureates), "\n\n")
## Total laureates fetched: 1018
# Prints the total number of laureates retrieved, followed by blank lines
Brain Drain Analysis
# π BRAIN DRAIN ANALYSIS: Nobel Laureate Migration Patterns
# ==========================================================
cat("Research Question 1: Which countries experienced the greatest 'brain drain' of Nobel talent?\n")
## Research Question 1: Which countries experienced the greatest 'brain drain' of Nobel talent?
cat("Definition: Laureates born in one country but awarded as citizens of another\n\n")
## Definition: Laureates born in one country but awarded as citizens of another
# Initialize data structures
lost_laureates <- list()
migration_stats <- list(
total_processed = 0,
skipped_no_birth_data = 0,
skipped_no_prize_data = 0,
skipped_no_country_data = 0,
valid_cases = 0
)
cat("π Data Processing Pipeline:\n")
## π Data Processing Pipeline:
for (laureate in all_laureates) {
migration_stats$total_processed <- migration_stats$total_processed + 1
# Extract birth country with robust error handling
birth_country <- tryCatch({
laureate$birth$place$country$en
}, error = function(e) {
migration_stats$skipped_no_birth_data <- migration_stats$skipped_no_birth_data + 1
return(NULL)
})
# Validate birth country data
if (is.null(birth_country) || birth_country == "") next
# Check for prize data
if (is.null(laureate$nobelPrizes) || length(laureate$nobelPrizes) == 0) {
migration_stats$skipped_no_prize_data <- migration_stats$skipped_no_prize_data + 1
next
}
prize <- laureate$nobelPrizes[[1]]
citizen_country <- NULL
# Extract citizenship country from affiliations
if (!is.null(prize$affiliations) && length(prize$affiliations) > 0) {
for (affil in prize$affiliations) {
if (!is.null(affil$country) && !is.null(affil$country$en)) {
citizen_country <- affil$country$en
break
}
}
}
# Validate we have both countries and they're different
if (is.null(citizen_country) || citizen_country == "" || birth_country == citizen_country) {
migration_stats$skipped_no_country_data <- migration_stats$skipped_no_country_data + 1
next
}
migration_stats$valid_cases <- migration_stats$valid_cases + 1
# Store the migration data
laureate_name <- ifelse(!is.null(laureate$fullName$en), laureate$fullName$en, "Unknown Name")
if (!birth_country %in% names(lost_laureates)) {
lost_laureates[[birth_country]] <- list()
}
lost_laureates[[birth_country]] <- c(lost_laureates[[birth_country]], list(list(
name = laureate_name,
born_in = birth_country,
citizen_of = citizen_country,
category = ifelse(!is.null(prize$category$en), prize$category$en, "Unknown"),
year = ifelse(!is.null(prize$awardYear), prize$awardYear, "Unknown")
)))
}
cat("\nπ― MIGRATION ANALYSIS RESULTS\n")
##
## π― MIGRATION ANALYSIS RESULTS
if (length(lost_laureates) > 0) {
# Calculate statistics
lost_counts <- map_int(lost_laureates, length)
sorted_losses <- sort(lost_counts, decreasing = TRUE)
total_migrations <- sum(lost_counts)
# Create beautiful output table
cat("π TOP 10 COUNTRIES BY TALENT LOSS\n")
cat("βββββββββββββββββββββββββββββββββ\n")
cat("| Rank | Country | Lost | Examples |\n")
cat("|------|---------|------|----------|\n")
for (i in 1:min(10, length(sorted_losses))) {
country <- names(sorted_losses)[i]
count <- sorted_losses[i]
# Get flag emoji
flag <- case_when(
country == "Germany" ~ "π©πͺ",
country == "United Kingdom" ~ "π¬π§",
country == "Canada" ~ "π¨π¦",
country == "France" ~ "π«π·",
country == "Russia" ~ "π·πΊ",
country == "Netherlands" ~ "π³π±",
TRUE ~ "π"
)
examples <- map_chr(lost_laureates[[country]][1:min(2, count)], "name")
examples_str <- paste(examples, collapse = ", ")
if (i == 1) {
cat("| **", i, "** | ", flag, " **", country, "** | **", count, "** | ", examples_str, " |\n", sep = "")
} else {
cat("| ", i, " | ", flag, " ", country, " | ", count, " | ", examples_str, " |\n", sep = "")
}
}
cat("\n")
# Key insights
top_loser <- names(sorted_losses)[1]
top_count <- sorted_losses[1]
cat("π‘ KEY FINDINGS:\n")
cat("β’ **Primary Source**: ", top_loser, " lost the most talent (", top_count, " laureates)\n", sep = "")
cat("β’ **Migration Rate**: ", round(total_migrations / migration_stats$valid_cases * 100, 1), "% of analyzable laureates changed countries\n", sep = "")
cat("β’ **Global Impact**: ", length(lost_laureates), " countries experienced Nobel talent drain\n\n", sep = "")
# Additional insights
if (length(sorted_losses) >= 3) {
cat("π ADDITIONAL INSIGHTS:\n")
cat("β’ Top 3 countries account for ", round(sum(sorted_losses[1:3]) / total_migrations * 100, 1), "% of all talent loss\n", sep = "")
cat("β’ Average loss per country: ", round(mean(lost_counts), 1), " laureates\n", sep = "")
cat("β’ Median loss: ", round(median(lost_counts), 1), " laureates\n\n", sep = "")
}
} else {
cat("β No international migrations detected in the dataset.\n")
cat("This could indicate data quality issues or methodological limitations.\n\n")
}
## π TOP 10 COUNTRIES BY TALENT LOSS
## βββββββββββββββββββββββββββββββββ
## | Rank | Country | Lost | Examples |
## |------|---------|------|----------|
## | **1** | π©πͺ **Germany** | **27** | Alfred Kastler, Arno Allan Penzias |
## | 2 | π¬π§ United Kingdom | 24 | Anthony J. Leggett, Christian de Duve |
## | 3 | π¨π¦ Canada | 15 | Charles Brenton Huggins, David Card |
## | 4 | π«π· France | 12 | Alexis Carrel, Alfred Werner |
## | 5 | π Prussia | 11 | Johann Friedrich Wilhelm Adolf von Baeyer, Albert Abraham Michelson |
## | 6 | π Austria-Hungary | 11 | Albert von Szent-GyΓΆrgyi NagyrΓ‘polt, Carl Ferdinand Cori |
## | 7 | π·πΊ Russia | 10 | Andre Geim, Igor Yevgenyevich Tamm |
## | 8 | π Scotland | 9 | Angus Deaton, Charles Thomson Rees Wilson |
## | 9 | π Russian Empire | 9 | Artturi Ilmari Virtanen, Ilya Ilyich Mechnikov |
## | 10 | π the Netherlands | 9 | Guido W. Imbens, Jacobus Henricus van 't Hoff |
##
## π‘ KEY FINDINGS:
## β’ **Primary Source**: Germany lost the most talent (27 laureates)
## β’ **Migration Rate**: 100% of analyzable laureates changed countries
## β’ **Global Impact**: 60 countries experienced Nobel talent drain
##
## π ADDITIONAL INSIGHTS:
## β’ Top 3 countries account for 25.4% of all talent loss
## β’ Average loss per country: 4.3 laureates
## β’ Median loss: 2 laureates
# π DATA QUALITY REPORT
# ======================
cat("β’ Total laureates processed: ", migration_stats$total_processed, "\n", sep = "")
## β’ Total laureates processed: 1018
cat("β’ Skipped (incomplete country data): ", migration_stats$skipped_no_country_data, "\n", sep = "")
## β’ Skipped (incomplete country data): 726
cat("β’ Final migration count: ", sum(unlist(lapply(lost_laureates, length))), "\n", sep = "")
## β’ Final migration count: 260
cat("β’ Overall data quality: ", round(migration_stats$valid_cases / migration_stats$total_processed * 100, 1), "% usable\n\n", sep = "")
## β’ Overall data quality: 25.5% usable
cat("\nπ Question 2: TOP INSTITUTIONS BY NOBEL LAUREATES PRODUCED\n")
##
## π Question 2: TOP INSTITUTIONS BY NOBEL LAUREATES PRODUCED
# First, create the institution_df dataframe
institution_counts <- list()
for (laureate in all_laureates) {
if (!is.null(laureate$nobelPrizes)) {
for (prize in laureate$nobelPrizes) {
if (!is.null(prize$affiliations) && length(prize$affiliations) > 0) {
for (affiliation in prize$affiliations) {
if (!is.null(affiliation$name) && !is.null(affiliation$name$en)) {
institution_name <- affiliation$name$en
if (grepl("University|Institute|College|School", institution_name, ignore.case = TRUE)) {
# Standardize institution names
if (grepl("Harvard", institution_name)) institution_name <- "Harvard University"
if (grepl("University of California", institution_name)) institution_name <- "University of California"
if (grepl("University of Cambridge", institution_name)) institution_name <- "University of Cambridge"
if (grepl("Massachusetts Institute", institution_name)) institution_name <- "Massachusetts Institute of Technology (MIT)"
if (grepl("Stanford", institution_name)) institution_name <- "Stanford University"
if (grepl("University of Chicago", institution_name)) institution_name <- "University of Chicago"
if (grepl("Princeton", institution_name)) institution_name <- "Princeton University"
if (grepl("Columbia", institution_name)) institution_name <- "Columbia University"
if (grepl("Yale", institution_name)) institution_name <- "Yale University"
if (grepl("Caltech", institution_name)) institution_name <- "California Institute of Technology (Caltech)"
if (is.null(institution_counts[[institution_name]])) {
institution_counts[[institution_name]] <- 0
}
institution_counts[[institution_name]] <- institution_counts[[institution_name]] + 1
}
}
}
}
}
}
}
# Create the dataframe
institution_df <- data.frame(
institution = names(institution_counts),
laureate_count = unlist(institution_counts),
stringsAsFactors = FALSE
) %>%
filter(laureate_count >= 5) %>%
arrange(desc(laureate_count))
for (i in 1:min(15, nrow(institution_df))) {
institution <- institution_df$institution[i]
count <- institution_df$laureate_count[i]
# Determine country and flag
country_flag <- case_when(
grepl("University of California|Harvard|MIT|Stanford|University of Chicago|Caltech|Columbia|Princeton|Yale|Cornell|Rockefeller|Howard Hughes", institution) ~ "πΊπΈ USA",
grepl("University of Cambridge|University of Oxford", institution) ~ "π¬π§ UK",
grepl("Berlin University", institution) ~ "π©πͺ Germany",
TRUE ~ "π International"
)
# Clean institution name for display
display_name <- case_when(
institution == "Massachusetts Institute of Technology (MIT)" ~ "MIT",
institution == "California Institute of Technology (Caltech)" ~ "Caltech",
institution == "Howard Hughes Medical Institute" ~ "Howard Hughes Med Institute",
TRUE ~ institution
)
if (i == 1) {
cat("| π₯ **", i, "** | **", display_name, "** | ", country_flag, " | **", count, "** |\n", sep = "")
} else if (i == 2) {
cat("| π₯ **", i, "** | **", display_name, "** | ", country_flag, " | **", count, "** |\n", sep = "")
} else if (i == 3) {
cat("| π₯ **", i, "** | **", display_name, "** | ", country_flag, " | **", count, "** |\n", sep = "")
} else {
cat("| ", i, " | ", display_name, " | ", country_flag, " | ", count, " |\n", sep = "")
}
}
## | π₯ **1** | **University of California** | πΊπΈ USA | **48** |
## | π₯ **2** | **Harvard University** | πΊπΈ USA | **40** |
## | π₯ **3** | **MIT** | πΊπΈ USA | **26** |
## | 4 | Stanford University | πΊπΈ USA | 25 |
## | 5 | University of Chicago | πΊπΈ USA | 22 |
## | 6 | Caltech | πΊπΈ USA | 20 |
## | 7 | Columbia University | πΊπΈ USA | 20 |
## | 8 | University of Cambridge | π¬π§ UK | 18 |
## | 9 | Princeton University | πΊπΈ USA | 18 |
## | 10 | Howard Hughes Med Institute | πΊπΈ USA | 16 |
## | 11 | Rockefeller University | πΊπΈ USA | 13 |
## | 12 | University of Oxford | π¬π§ UK | 10 |
## | 13 | Yale University | πΊπΈ USA | 10 |
## | 14 | Cornell University | πΊπΈ USA | 8 |
## | 15 | Berlin University | π©πͺ Germany | 7 |
cat("\n")
# Key insights section
if (nrow(institution_df) >= 2) {
top_institution <- institution_df$institution[1]
top_count <- institution_df$laureate_count[1]
second_institution <- institution_df$institution[2]
second_count <- institution_df$laureate_count[2]
cat("π‘ KEY INSIGHTS:\n")
cat("β’ **Leading Institution**: ", top_institution, " leads with ", top_count, " Nobel laureates\n", sep = "")
cat("β’ **Close Competitor**: ", second_institution, " follows closely with ", second_count, " laureates\n", sep = "")
cat("β’ **Elite Group**: Only ", nrow(institution_df), " institutions have produced 5+ Nobel laureates\n\n", sep = "")
}
## π‘ KEY INSIGHTS:
## β’ **Leading Institution**: University of California leads with 48 Nobel laureates
## β’ **Close Competitor**: Harvard University follows closely with 40 laureates
## β’ **Elite Group**: Only 29 institutions have produced 5+ Nobel laureates
# Geographic distribution
if (nrow(institution_df) > 0) {
cat("π GEOGRAPHIC DISTRIBUTION\n")
cat("| Region | Institutions in Top 15 | Total Laureates | Market Share |\n")
cat("|--------|------------------------|-----------------|-------------|\n")
top_15 <- head(institution_df, 15)
us_institutions <- top_15 %>% filter(grepl("University of California|Harvard|MIT|Stanford|University of Chicago|Caltech|Columbia|Princeton|Yale|Cornell|Rockefeller|Howard Hughes", institution))
uk_institutions <- top_15 %>% filter(grepl("University of Cambridge|University of Oxford", institution))
us_total <- sum(us_institutions$laureate_count)
uk_total <- sum(uk_institutions$laureate_count)
other_total <- sum(top_15$laureate_count) - us_total - uk_total
total_laureates <- sum(top_15$laureate_count)
cat("| πΊπΈ United States | ", nrow(us_institutions), " | ", us_total, " | ", round(us_total/total_laureates*100, 1), "% |\n", sep = "")
cat("| π¬π§ United Kingdom | ", nrow(uk_institutions), " | ", uk_total, " | ", round(uk_total/total_laureates*100, 1), "% |\n", sep = "")
cat("| π Other | ", nrow(top_15) - nrow(us_institutions) - nrow(uk_institutions), " | ", other_total, " | ", round(other_total/total_laureates*100, 1), "% |\n", sep = "")
cat("\n")
# Institutional tiers
cat("π
INSTITUTIONAL TIERS\n")
cat("| Tier | Laureate Range | Institutions |\n")
cat("|------|----------------|-------------|\n")
tier_40_plus <- sum(institution_df$laureate_count >= 40)
tier_20_39 <- sum(institution_df$laureate_count >= 20 & institution_df$laureate_count < 40)
tier_10_19 <- sum(institution_df$laureate_count >= 10 & institution_df$laureate_count < 20)
tier_5_9 <- sum(institution_df$laureate_count >= 5 & institution_df$laureate_count < 10)
cat("| Elite (40+) | 40+ laureates | ", tier_40_plus, " institutions |\n", sep = "")
cat("| Premier (20-39) | 20-39 laureates | ", tier_20_39, " institutions |\n", sep = "")
cat("| Distinguished (10-19) | 10-19 laureates | ", tier_10_19, " institutions |\n", sep = "")
cat("| Notable (5-9) | 5-9 laureates | ", tier_5_9, " institutions |\n", sep = "")
cat("\n")
} else {
cat("β No institution data found with 5+ laureates.\n")
}
## π GEOGRAPHIC DISTRIBUTION
## | Region | Institutions in Top 15 | Total Laureates | Market Share |
## |--------|------------------------|-----------------|-------------|
## | πΊπΈ United States | 12 | 266 | 88.4% |
## | π¬π§ United Kingdom | 2 | 28 | 9.3% |
## | π Other | 1 | 7 | 2.3% |
##
## π
INSTITUTIONAL TIERS
## | Tier | Laureate Range | Institutions |
## |------|----------------|-------------|
## | Elite (40+) | 40+ laureates | 2 institutions |
## | Premier (20-39) | 20-39 laureates | 5 institutions |
## | Distinguished (10-19) | 10-19 laureates | 6 institutions |
## | Notable (5-9) | 5-9 laureates | 16 institutions |
Age Distribution Analysis
cat("\n === QUESTION 3: What's the average age of Nobel laureates by category? === \n")
##
## === QUESTION 3: What's the average age of Nobel laureates by category? ===
category_ages <- list()
# Create empty list to store ages organized by category
category_data <- data.frame()
# Create empty data frame for detailed analysis
for (laureate in all_laureates) {
# Loop through each Nobel laureate
if (!is.null(laureate$birth) && !is.null(laureate$birth$date) &&
!is.null(laureate$nobelPrizes)) {
# Only process laureates with complete data: birth date AND prize info
birth_date <- laureate$birth$date
# Extract birth date (e.g., "1901-05-15")
birth_year <- as.numeric(str_split(birth_date, "-")[[1]][1])
# Split date by "-" and take first part (year), convert to number
if (is.na(birth_year) || birth_year < 1700 || birth_year > 2020) next
# Skip if birth year is invalid: not a number, too old, or future date
for (prize in laureate$nobelPrizes) {
# Some laureates have multiple prizes (e.g., Marie Curie)
prize_year <- as.numeric(prize$awardYear)
# Extract prize year and convert to number
category <- prize$category$en
# Extract category name in English
if (is.na(prize_year) || prize_year < 1901) next
# Skip if prize year is invalid (Nobel started in 1901)
age <- prize_year - birth_year
# Calculate age when award was received
if (age >= 20 && age <= 100) {
# Only include reasonable ages (exclude outliers)
if (is.null(category_ages[[category]])) {
category_ages[[category]] <- numeric()
}
# Create empty numeric vector for this category if it doesn't exist
category_ages[[category]] <- c(category_ages[[category]], age)
# Add this age to the category's age list
# Also add to data frame for additional analysis
category_data <- rbind(category_data, data.frame(
category = category,
age = age,
year = prize_year,
name = ifelse(!is.null(laureate$fullName$en), laureate$fullName$en, "Unknown"),
stringsAsFactors = FALSE
))
# Store detailed info for later analysis: category, age, year, name
}
}
}
}
# Calculate summary statistics by category
age_summary <- map_dfr(names(category_ages), function(cat) {
# For each category, calculate statistics
ages <- category_ages[[cat]]
# Get all ages for this category
data.frame(
category = cat,
avg_age = round(mean(ages), 1), # Average age
median_age = round(median(ages), 1), # Middle value
min_age = min(ages), # Youngest
max_age = max(ages), # Oldest
count = length(ages), # Sample size
stringsAsFactors = FALSE
)
}) %>% arrange(avg_age)
# Sort categories by average age (youngest first)
cat("AVERAGE AGE OF NOBEL LAUREATES BY CATEGORY: \n")
## AVERAGE AGE OF NOBEL LAUREATES BY CATEGORY:
for (i in 1:nrow(age_summary)) {
# Print formatted results for each category
cat(i, ". ", age_summary$category[i], ": ", age_summary$avg_age[i],
" years (median: ", age_summary$median_age[i], ")\n", sep = "")
cat(" Range: ", age_summary$min_age[i], "-", age_summary$max_age[i],
" years | Based on ", age_summary$count[i], " laureates\n\n", sep = "")
}
## 1. Physics: 57.7 years (median: 56)
## Range: 25-96 years | Based on 230 laureates
##
## 2. Physiology or Medicine: 58.9 years (median: 58)
## Range: 32-87 years | Based on 232 laureates
##
## 3. Chemistry: 59.2 years (median: 58)
## Range: 35-97 years | Based on 200 laureates
##
## 4. Peace: 61.2 years (median: 62)
## Range: 25-87 years | Based on 111 laureates
##
## 5. Literature: 65 years (median: 67)
## Range: 42-88 years | Based on 122 laureates
##
## 6. Economic Sciences: 67 years (median: 67)
## Range: 47-90 years | Based on 99 laureates
# Find youngest and oldest laureates overall
youngest <- category_data %>% arrange(age) %>% head(5)
# Sort by age ascending, take top 5
oldest <- category_data %>% arrange(desc(age)) %>% head(5)
# Sort by age descending, take top 5
cat("YOUNGEST NOBEL LAUREATES: \n")
## YOUNGEST NOBEL LAUREATES:
for (i in 1:nrow(youngest)) {
cat(i, ". ", youngest$name[i], " (", youngest$category[i], "): ",
youngest$age[i], " years in ", youngest$year[i], "\n", sep = "")
}
## 1. William Lawrence Bragg (Physics): 25 years in 1915
## 2. Nadia Murad Basee Taha (Peace): 25 years in 2018
## 3. Carl David Anderson (Physics): 31 years in 1936
## 4. Paul Adrien Maurice Dirac (Physics): 31 years in 1933
## 5. Tsung-Dao (T.D.) Lee (Physics): 31 years in 1957
cat("\nOLDEST NOBEL LAUREATES: \n")
##
## OLDEST NOBEL LAUREATES:
for (i in 1:nrow(oldest)) {
cat(i, ". ", oldest$name[i], " (", oldest$category[i], "): ",
oldest$age[i], " years in ", oldest$year[i], "\n", sep = "")
}
## 1. John B. Goodenough (Chemistry): 97 years in 2019
## 2. Arthur Ashkin (Physics): 96 years in 2018
## 3. John J. Hopfield (Physics): 91 years in 2024
## 4. Klaus Hasselmann (Physics): 90 years in 2021
## 5. Leonid Hurwicz (Economic Sciences): 90 years in 2007
# Additional analysis: Age trends over time
cat("\n--- Age Trends Over Time ---\n")
##
## --- Age Trends Over Time ---
recent_data <- category_data %>% filter(year >= 2000)
# Data from year 2000 onwards
older_data <- category_data %>% filter(year < 2000)
# Data before year 2000
if (nrow(recent_data) > 0 && nrow(older_data) > 0) {
cat("Average age before 2000:", round(mean(older_data$age), 1), "years\n")
cat("Average age since 2000:", round(mean(recent_data$age), 1), "years\n")
cat("Change:", round(mean(recent_data$age) - mean(older_data$age), 1), "years\n")
# Calculate how much older laureates have become over time
}
## Average age before 2000: 57.6 years
## Average age since 2000: 67 years
## Change: 9.4 years
Geographic Distribution & Historical Trends Analysis
cat("\n === QUESTION 4: How has the geographic distribution of Nobel prizes changed over time? === \n\n")
##
## === QUESTION 4: How has the geographic distribution of Nobel prizes changed over time? ===
# Analyze by decades
decade_data <- list() # Store counts by decade and country
country_totals <- list() # Store overall counts by country
for (laureate in all_laureates) {
if (!is.null(laureate$nobelPrizes)) {
for (prize in laureate$nobelPrizes) {
if (!is.null(prize$awardYear)) {
year <- as.numeric(prize$awardYear)
decade <- floor(year / 10) * 10
# Convert year to decade: 1995 β 1990, 2003 β 2000
# Get country from affiliation
country <- NULL
if (!is.null(prize$prizeAffiliation) && !is.null(prize$prizeAffiliation$country)) {
country <- prize$prizeAffiliation$country$en
} else if (!is.null(prize$affiliations) && length(prize$affiliations) > 0) {
for (affil in prize$affiliations) {
if (!is.null(affil$country)) {
country <- affil$country$en
break
}
}
}
# Extract country using same method as Question 1
if (!is.null(country) && country != "") {
# Initialize decade if needed
if (is.null(decade_data[[as.character(decade)]])) {
decade_data[[as.character(decade)]] <- list()
}
# Create empty list for this decade if it doesn't exist
# Initialize country count for this decade
if (is.null(decade_data[[as.character(decade)]][[country]])) {
decade_data[[as.character(decade)]][[country]] <- 0
}
# Create counter for this country in this decade if it doesn't exist
decade_data[[as.character(decade)]][[country]] <-
decade_data[[as.character(decade)]][[country]] + 1
# Increment count for this country in this decade
# Also track overall totals
if (is.null(country_totals[[country]])) {
country_totals[[country]] <- 0
}
country_totals[[country]] <- country_totals[[country]] + 1
# Increment overall count for this country
}
}
}
}
}
cat("GEOGRAPHIC DISTRIBUTION OF NOBEL PRIZES BY DECADE: \n")
## GEOGRAPHIC DISTRIBUTION OF NOBEL PRIZES BY DECADE:
# Analyze each decade
decades <- sort(as.numeric(names(decade_data)))
# Get all decades with data and sort chronologically
for (decade in decades) {
if (decade >= 1900) { # Only show from 1900 onwards (Nobel started 1901)
decade_counts <- unlist(decade_data[[as.character(decade)]])
# Convert nested list to simple named vector: c("USA" = 25, "UK" = 10)
top_countries <- names(sort(decade_counts, decreasing = TRUE))[1:5]
# Get top 5 country names sorted by count (highest first)
cat(decade, "s:\n", sep = "")
for (i in 1:length(top_countries)) {
country <- top_countries[i]
count <- decade_counts[country]
# Get count for this country
cat(" ", i, ". ", country, ": ", count, " prizes", sep = "")
# Show percentage if significant
total_decade_prizes <- sum(decade_counts)
percentage <- round((count / total_decade_prizes) * 100, 1)
if (percentage >= 10) {
cat(" (", percentage, "%)", sep = "")
}
# Only show percentage if country has β₯10% of decade's prizes
cat("\n")
}
cat(" Total prizes in decade: ", total_decade_prizes, "\n\n", sep = "")
}
}
## 1900s:
## 1. Germany: 11 prizes (33.3%)
## 2. France: 7 prizes (21.2%)
## 3. United Kingdom: 6 prizes (18.2%)
## 4. the Netherlands: 2 prizes
## 5. USA: 1 prizes
## Total prizes in decade: 33
##
## 1910s:
## 1. Germany: 8 prizes (33.3%)
## 2. France: 4 prizes (16.7%)
## 3. United Kingdom: 3 prizes (12.5%)
## 4. USA: 2 prizes
## 5. Sweden: 2 prizes
## Total prizes in decade: 24
##
## 1920s:
## 1. Germany: 8 prizes (24.2%)
## 2. United Kingdom: 7 prizes (21.2%)
## 3. Denmark: 3 prizes
## 4. France: 3 prizes
## 5. Sweden: 3 prizes
## Total prizes in decade: 33
##
## 1930s:
## 1. Germany: 11 prizes (28.2%)
## 2. USA: 11 prizes (28.2%)
## 3. United Kingdom: 7 prizes (17.9%)
## 4. France: 2 prizes
## 5. Switzerland: 2 prizes
## Total prizes in decade: 39
##
## 1940s:
## 1. USA: 15 prizes (50%)
## 2. United Kingdom: 6 prizes (20%)
## 3. Sweden: 2 prizes
## 4. Switzerland: 2 prizes
## 5. Finland: 1 prizes
## Total prizes in decade: 30
##
## 1950s:
## 1. USA: 31 prizes (56.4%)
## 2. United Kingdom: 9 prizes (16.4%)
## 3. Germany: 5 prizes
## 4. USSR: 4 prizes
## 5. Italy: 1 prizes
## Total prizes in decade: 55
##
## 1960s:
## 1. USA: 29 prizes (48.3%)
## 2. United Kingdom: 11 prizes (18.3%)
## 3. Germany: 5 prizes
## 4. France: 4 prizes
## 5. USSR: 3 prizes
## Total prizes in decade: 60
##
## 1970s:
## 1. USA: 44 prizes (56.4%)
## 2. United Kingdom: 15 prizes (19.2%)
## 3. Sweden: 3 prizes
## 4. Germany: 3 prizes
## 5. Denmark: 2 prizes
## Total prizes in decade: 78
##
## 1980s:
## 1. USA: 46 prizes (60.5%)
## 2. Switzerland: 9 prizes (11.8%)
## 3. United Kingdom: 6 prizes
## 4. Germany: 5 prizes
## 5. Sweden: 3 prizes
## Total prizes in decade: 76
##
## 1990s:
## 1. USA: 57 prizes (76%)
## 2. Germany: 5 prizes
## 3. United Kingdom: 4 prizes
## 4. France: 3 prizes
## 5. Canada: 2 prizes
## Total prizes in decade: 75
##
## 2000s:
## 1. USA: 67 prizes (69.1%)
## 2. United Kingdom: 7 prizes
## 3. Japan: 6 prizes
## 4. Israel: 4 prizes
## 5. France: 4 prizes
## Total prizes in decade: 97
##
## 2010s:
## 1. USA: 58 prizes (59.8%)
## 2. United Kingdom: 11 prizes (11.3%)
## 3. Japan: 9 prizes
## 4. France: 6 prizes
## 5. Switzerland: 3 prizes
## Total prizes in decade: 97
##
## 2020s:
## 1. USA: 40 prizes (65.6%)
## 2. Germany: 6 prizes
## 3. United Kingdom: 3 prizes
## 4. France: 2 prizes
## 5. Canada: 2 prizes
## Total prizes in decade: 61
# Overall top countries
cat("OVERALL TOP COUNTRIES (ALL TIME): \n")
## OVERALL TOP COUNTRIES (ALL TIME):
overall_counts <- sort(unlist(country_totals), decreasing = TRUE)
# Sort all countries by total prize count
for (i in 1:min(10, length(overall_counts))) {
country <- names(overall_counts)[i]
count <- overall_counts[i]
percentage <- round((count / length(all_laureates)) * 100, 1)
cat(i, ". ", country, ": ", count, " prizes (", percentage, "%)\n", sep = "")
}
## 1. USA: 403 prizes (39.6%)
## 2. United Kingdom: 95 prizes (9.3%)
## 3. Germany: 73 prizes (7.2%)
## 4. France: 39 prizes (3.8%)
## 5. Switzerland: 24 prizes (2.4%)
## 6. Japan: 19 prizes (1.9%)
## 7. Sweden: 18 prizes (1.8%)
## 8. Canada: 10 prizes (1%)
## 9. the Netherlands: 10 prizes (1%)
## 10. Denmark: 9 prizes (0.9%)
# Analyze dominance trends
cat("\n--- DOMINANCE ANALYSIS ---\n")
##
## --- DOMINANCE ANALYSIS ---
# Calculate US share over time
us_dominance <- data.frame()
for (decade in decades) {
if (decade >= 1920) { # US became significant after WWI
decade_counts <- unlist(decade_data[[as.character(decade)]])
total_prizes <- sum(decade_counts)
us_prizes <- ifelse("USA" %in% names(decade_counts), decade_counts[["USA"]], 0)
# Handle case where US has no prizes in a decade
us_percentage <- round((us_prizes / total_prizes) * 100, 1)
us_dominance <- rbind(us_dominance, data.frame(
decade = decade,
us_prizes = us_prizes,
total_prizes = total_prizes,
us_percentage = us_percentage
))
}
}
cat("US Dominance in Nobel Prizes:\n")
## US Dominance in Nobel Prizes:
for (i in 1:nrow(us_dominance)) {
cat(us_dominance$decade[i], "s: ", us_dominance$us_percentage[i], "% (",
us_dominance$us_prizes[i], "/", us_dominance$total_prizes[i], " prizes)\n", sep = "")
}
## 1920s: 6.1% (2/33 prizes)
## 1930s: 28.2% (11/39 prizes)
## 1940s: 50% (15/30 prizes)
## 1950s: 56.4% (31/55 prizes)
## 1960s: 48.3% (29/60 prizes)
## 1970s: 56.4% (44/78 prizes)
## 1980s: 60.5% (46/76 prizes)
## 1990s: 76% (57/75 prizes)
## 2000s: 69.1% (67/97 prizes)
## 2010s: 59.8% (58/97 prizes)
## 2020s: 65.6% (40/61 prizes)
# Emerging countries in recent decades
cat("\n--- EMERGING COUNTRIES (Last 30 years) ---\n")
##
## --- EMERGING COUNTRIES (Last 30 years) ---
recent_decades <- c(1990, 2000, 2010, 2020)
for (decade in recent_decades) {
if (as.character(decade) %in% names(decade_data)) {
cat(decade, "s - New countries winning first prizes:\n", sep = "")
decade_counts <- unlist(decade_data[[as.character(decade)]])
# Compare with previous decades to find new countries
previous_countries <- c()
for (prev_decade in (decade-10):(decade-30)) {
# Look back 10-30 years to see if country existed before
if (as.character(prev_decade) %in% names(decade_data)) {
previous_countries <- c(previous_countries, names(decade_data[[as.character(prev_decade)]]))
}
}
new_countries <- setdiff(names(decade_counts), unique(previous_countries))
# Find countries in current decade that weren't in previous decades
if (length(new_countries) > 0) {
for (country in new_countries[1:min(5, length(new_countries))]) {
cat(" β’ ", country, " (", decade_counts[country], " prizes)\n", sep = "")
}
} else {
cat(" No new countries\n")
}
}
}
## 1990s - New countries winning first prizes:
## No new countries
## 2000s - New countries winning first prizes:
## β’ Israel (4 prizes)
## β’ Australia (1 prizes)
## β’ Russia (2 prizes)
## 2010s - New countries winning first prizes:
## β’ Belgium (1 prizes)
## β’ China (1 prizes)
## 2020s - New countries winning first prizes:
## β’ Austria (1 prizes)
## β’ Italy (1 prizes)
## β’ Hungary (1 prizes)
What This Analysis Reveals:
**Historical power shifts ** in scientific/cultural leadership US dominance timeline and percentage share over decades
**Globalization trends ** new countries entering Nobel scene
**Geopolitical patterns ** how world events affect prize distribution
Geographic & Field Distribution Analysis (out of curiosity)
cat("\n === QUESTION 4-Extended: How has the geographic distribution of Nobel prizes changed over time and field? ===\n")
##
## === QUESTION 4-Extended: How has the geographic distribution of Nobel prizes changed over time and field? ===
# Analyze by decades AND fields
decade_data <- list() # Store counts by decade and country
decade_field_data <- list() # Store counts by decade, country AND field
country_totals <- list() # Store overall counts by country
field_country_data <- list() # Store field-specific country distributions
for (laureate in all_laureates) {
if (!is.null(laureate$nobelPrizes)) {
for (prize in laureate$nobelPrizes) {
if (!is.null(prize$awardYear)) {
year <- as.numeric(prize$awardYear)
decade <- floor(year / 10) * 10
category <- prize$category$en # Extract the Nobel field/category
# Get country from affiliation
country <- NULL
if (!is.null(prize$prizeAffiliation) && !is.null(prize$prizeAffiliation$country)) {
country <- prize$prizeAffiliation$country$en
} else if (!is.null(prize$affiliations) && length(prize$affiliations) > 0) {
for (affil in prize$affiliations) {
if (!is.null(affil$country)) {
country <- affil$country$en
break
}
}
}
if (!is.null(country) && country != "" && !is.null(category)) {
# Initialize decade if needed
if (is.null(decade_data[[as.character(decade)]])) {
decade_data[[as.character(decade)]] <- list()
}
if (is.null(decade_data[[as.character(decade)]][[country]])) {
decade_data[[as.character(decade)]][[country]] <- 0
}
decade_data[[as.character(decade)]][[country]] <-
decade_data[[as.character(decade)]][[country]] + 1
# Track by decade AND field
decade_field_key <- paste0(decade, "_", category)
if (is.null(decade_field_data[[decade_field_key]])) {
decade_field_data[[decade_field_key]] <- list()
}
if (is.null(decade_field_data[[decade_field_key]][[country]])) {
decade_field_data[[decade_field_key]][[country]] <- 0
}
decade_field_data[[decade_field_key]][[country]] <-
decade_field_data[[decade_field_key]][[country]] + 1
# Track overall totals
if (is.null(country_totals[[country]])) {
country_totals[[country]] <- 0
}
country_totals[[country]] <- country_totals[[country]] + 1
# Track field-specific country data
if (is.null(field_country_data[[category]])) {
field_country_data[[category]] <- list()
}
if (is.null(field_country_data[[category]][[country]])) {
field_country_data[[category]][[country]] <- 0
}
field_country_data[[category]][[country]] <-
field_country_data[[category]][[country]] + 1
}
}
}
}
}
cat("FIELD-SPECIFIC GEOGRAPHIC DISTRIBUTION:\n")
## FIELD-SPECIFIC GEOGRAPHIC DISTRIBUTION:
fields <- c("Physics", "Chemistry", "Physiology or Medicine", "Literature", "Peace", "Economic Sciences")
for (field in fields) {
if (field %in% names(field_country_data)) {
field_counts <- unlist(field_country_data[[field]])
top_countries <- names(sort(field_counts, decreasing = TRUE))[1:5]
cat(field, ":\n")
for (i in 1:length(top_countries)) {
country <- top_countries[i]
count <- field_counts[country]
total_field_prizes <- sum(field_counts)
percentage <- round((count / total_field_prizes) * 100, 1)
cat(" ", i, ". ", country, ": ", count, " prizes (", percentage, "%)\n", sep = "")
}
cat(" Total ", field, " prizes: ", total_field_prizes, "\n\n", sep = "")
}
}
## Physics :
## 1. USA: 112 prizes (49.1%)
## 2. United Kingdom: 26 prizes (11.4%)
## 3. Germany: 23 prizes (10.1%)
## 4. France: 15 prizes (6.6%)
## 5. Switzerland: 9 prizes (3.9%)
## Total Physics prizes: 228
##
## Chemistry :
## 1. USA: 86 prizes (43.4%)
## 2. Germany: 33 prizes (16.7%)
## 3. United Kingdom: 31 prizes (15.7%)
## 4. France: 10 prizes (5.1%)
## 5. Japan: 7 prizes (3.5%)
## Total Chemistry prizes: 198
##
## Physiology or Medicine :
## 1. USA: 121 prizes (52.4%)
## 2. United Kingdom: 32 prizes (13.9%)
## 3. Germany: 16 prizes (6.9%)
## 4. France: 10 prizes (4.3%)
## 5. Switzerland: 8 prizes (3.5%)
## Total Physiology or Medicine prizes: 231
##
## Peace :
## 1. USA: 3 prizes (75%)
## 2. France: 1 prizes (25%)
## 3. NA: NA prizes (NA%)
## 4. NA: NA prizes (NA%)
## 5. NA: NA prizes (NA%)
## Total Peace prizes: 4
##
## Economic Sciences :
## 1. USA: 81 prizes (83.5%)
## 2. United Kingdom: 6 prizes (6.2%)
## 3. France: 3 prizes (3.1%)
## 4. Norway: 2 prizes (2.1%)
## 5. Sweden: 1 prizes (1%)
## Total Economic Sciences prizes: 97
cat("FIELD DOMINANCE BY RECENT DECADES:\n")
## FIELD DOMINANCE BY RECENT DECADES:
recent_decades <- c(1990, 2000, 2010, 2020)
for (decade in recent_decades) {
cat(decade, "s Field Leaders:\n", sep = "")
for (field in fields) {
decade_field_key <- paste0(decade, "_", field)
if (decade_field_key %in% names(decade_field_data)) {
field_counts <- unlist(decade_field_data[[decade_field_key]])
if (length(field_counts) > 0) {
top_country <- names(sort(field_counts, decreasing = TRUE))[1]
top_count <- field_counts[top_country]
total_field_prizes <- sum(field_counts)
percentage <- round((top_count / total_field_prizes) * 100, 1)
cat(" β’ ", field, ": ", top_country, " (", top_count, "/", total_field_prizes,
" prizes, ", percentage, "%)\n", sep = "")
}
}
}
cat("\n")
}
## 1990s Field Leaders:
## β’ Physics: USA (16/21 prizes, 76.2%)
## β’ Chemistry: USA (11/17 prizes, 64.7%)
## β’ Physiology or Medicine: USA (16/20 prizes, 80%)
## β’ Economic Sciences: USA (14/17 prizes, 82.4%)
##
## 2000s Field Leaders:
## β’ Physics: USA (19/28 prizes, 67.9%)
## β’ Chemistry: USA (14/24 prizes, 58.3%)
## β’ Physiology or Medicine: USA (15/25 prizes, 60%)
## β’ Economic Sciences: USA (19/20 prizes, 95%)
##
## 2010s Field Leaders:
## β’ Physics: USA (12/26 prizes, 46.2%)
## β’ Chemistry: USA (16/27 prizes, 59.3%)
## β’ Physiology or Medicine: USA (12/24 prizes, 50%)
## β’ Economic Sciences: USA (18/20 prizes, 90%)
##
## 2020s Field Leaders:
## β’ Physics: USA (8/17 prizes, 47.1%)
## β’ Chemistry: USA (9/16 prizes, 56.2%)
## β’ Physiology or Medicine: USA (9/13 prizes, 69.2%)
## β’ Economic Sciences: USA (14/15 prizes, 93.3%)
# Enhanced dominance analysis by field
cat("--- US DOMINANCE BY FIELD ---\n")
## --- US DOMINANCE BY FIELD ---
field_dominance <- data.frame()
for (field in fields) {
if (field %in% names(field_country_data)) {
field_counts <- unlist(field_country_data[[field]])
total_prizes <- sum(field_counts)
us_prizes <- ifelse("USA" %in% names(field_counts), field_counts[["USA"]], 0)
us_percentage <- round((us_prizes / total_prizes) * 100, 1)
field_dominance <- rbind(field_dominance, data.frame(
field = field,
us_prizes = us_prizes,
total_prizes = total_prizes,
us_percentage = us_percentage,
stringsAsFactors = FALSE
))
}
}
field_dominance <- field_dominance %>% arrange(desc(us_percentage))
cat("US Dominance by Nobel Field:\n")
## US Dominance by Nobel Field:
for (i in 1:nrow(field_dominance)) {
cat(i, ". ", field_dominance$field[i], ": ", field_dominance$us_percentage[i],
"% (", field_dominance$us_prizes[i], "/", field_dominance$total_prizes[i], " prizes)\n", sep = "")
}
## 1. Economic Sciences: 83.5% (81/97 prizes)
## 2. Peace: 75% (3/4 prizes)
## 3. Physiology or Medicine: 52.4% (121/231 prizes)
## 4. Physics: 49.1% (112/228 prizes)
## 5. Chemistry: 43.4% (86/198 prizes)
# Enhanced emerging countries analysis with fields
cat("\n--- EMERGING COUNTRIES BY FIELD (Last 30 years) ---\n")
##
## --- EMERGING COUNTRIES BY FIELD (Last 30 years) ---
for (decade in recent_decades) {
if (as.character(decade) %in% names(decade_data)) {
cat(decade, "s - First-time winners by field:\n", sep = "")
for (field in fields) {
decade_field_key <- paste0(decade, "_", field)
if (decade_field_key %in% names(decade_field_data)) {
field_counts <- unlist(decade_field_data[[decade_field_key]])
# Find countries that won in this field for the first time
previous_countries <- c()
for (prev_decade in (decade-10):(decade-30)) {
prev_field_key <- paste0(prev_decade, "_", field)
if (prev_field_key %in% names(decade_field_data)) {
previous_countries <- c(previous_countries, names(decade_field_data[[prev_field_key]]))
}
}
new_countries <- setdiff(names(field_counts), unique(previous_countries))
if (length(new_countries) > 0) {
cat(" β’ ", field, ": ", paste(new_countries, collapse = ", "),
" (", length(new_countries), " new countries)\n", sep = "")
}
}
}
cat("\n")
}
}
## 1990s - First-time winners by field:
## β’ Physics: Canada, the Netherlands (2 new countries)
## β’ Chemistry: Denmark (1 new countries)
## β’ Economic Sciences: Germany (1 new countries)
##
## 2000s - First-time winners by field:
## β’ Physics: Japan, Russia (2 new countries)
## β’ Chemistry: Israel (1 new countries)
## β’ Physiology or Medicine: Australia (1 new countries)
## β’ Economic Sciences: Israel (1 new countries)
##
## 2010s - First-time winners by field:
## β’ Physics: Australia, Belgium (2 new countries)
## β’ Chemistry: the Netherlands (1 new countries)
## β’ Physiology or Medicine: Norway, Japan, China (3 new countries)
##
## 2020s - First-time winners by field:
## β’ Physics: Sweden, Austria, Italy (3 new countries)
## β’ Chemistry: Australia (1 new countries)
## β’ Physiology or Medicine: Hungary, Canada (2 new countries)
key lessons learned:
Global Talent Migration
25.5% of Nobel laureates changed countries, with Germany suffering the biggest βbrain drainβ (27 lost). This reveals how political instability and opportunity drive intellectual migration.
American Academic Dominance
The US produces 39.6% of all Nobel prizes, with University of California leading (43 laureates). This demonstrates how concentrated funding and research ecosystems create excellence hubs.
Aging of Innovation
Laureates are now 9.4 years older than before 2000, with Economics requiring the most career maturity (avg. 67 years). This suggests longer paths to breakthrough achievements.
Geopolitical Power Shifts
The Nobel landscape shifted from European dominance (1900s) to US supremacy (post-WWII), reflecting broader geopolitical and scientific leadership transitions.
Institutional Excellence Matters
9 of top 15 institutions are American, proving that sustained investment in research infrastructure yields long-term returns in breakthrough discoveries.
Field-Specific Patterns
Physics recognizes younger talent (avg. 57.7 years) while humanities require lifetime achievement (Literature: 65 years), showing different innovation timelines across disciplines.
Globalization is Slow but Real
While US dominance peaked at 76% in the 1990s, recent decades show more countries entering the Nobel arena, indicating gradual global knowledge diffusion.
These lessons reveal how scientific progress intertwines with geopolitics, institutional investment, and global talent flows over centuries.