Assignment 10B

Nobel Prize Data Analysis - Exploring Patterns of Excellence

Introduction

The Nobel Prize represents one of the highest forms of international recognition for intellectual and humanitarian achievement, celebrating extraordinary contributions across physics, chemistry, medicine, literature, peace, and economic sciences. Since their inception in 1901, these prestigious awards have not only honored individual brilliance but have also reflected broader historical, geographical, and institutional patterns in human accomplishment.

This assignment employs data science techniques to analyze the rich dataset provided by the Nobel Prize API, investigating four compelling questions that reveal the hidden narratives behind these celebrated awards. Through systematic analysis of laureate information spanning over a century, we uncover fascinating patterns in global talent migration, institutional excellence, demographic trends, and geographical shifts in scientific and cultural leadership.

Data Acquisition & API Integration

# Load required libraries
library(httr)        # For making HTTP requests to the API
library(jsonlite)    # For parsing JSON data from the API response
library(dplyr)       # For data manipulation and analysis

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(purrr)       # For functional programming and working with lists

## 
## Attaching package: 'purrr'

## The following object is masked from 'package:jsonlite':
## 
##     flatten

library(stringr)     # For string manipulation and pattern matching

# Base API URL
BASE_URL <- "https://api.nobelprize.org/2.1/"
# Stores the root URL of the Nobel Prize API that we'll be calling

# Function to get ALL Nobel laureates (handle pagination)
get_all_nobel_laureates <- function() {
  # Defines a function that will retrieve ALL Nobel laureates from the API
  
  all_laureates <- list()
  # Creates an empty list to store all the laureate data we'll fetch
  
  offset <- 0
  # Sets starting point - tells the API to start from the first record
  
  limit <- 100
  # Sets batch size - tells the API to return 100 records per request
  
  while (TRUE) {
    # Starts an infinite loop that will continue until we break out of it
    
    url <- paste0(BASE_URL, "laureates?offset=", offset, "&limit=", limit)
    # Constructs the complete API URL with pagination parameters
    # Example: "https://api.nobelprize.org/2.1/laureates?offset=0&limit=100"
    
    response <- GET(url)
    # Sends an HTTP GET request to the API URL we just built
    
    data <- content(response, "parsed")
    # Extracts and parses the JSON content from the API response into R format
    
    if (length(data$laureates) == 0) break
    # Checks if the current batch contains any laureates
    # If empty (length == 0), we've reached the end of the data and break the loop
    
    all_laureates <- c(all_laureates, data$laureates)
    # Appends the new batch of laureates to our growing list of all laureates
    
    offset <- offset + limit
    # Increases the offset by the limit to get the next batch of records
    # Example: After first batch, offset becomes 100 to get records 101-200
    
    # Safety break to avoid infinite loops
    if (offset > 1000) break
    # Emergency break - if we've requested more than 1000 records, stop
    # This prevents infinite loops if something goes wrong with the API
  }
  
  return(all_laureates)
  # Returns the complete list containing all Nobel laureates we fetched
}

# Get all data
cat("Fetching all Nobel laureates...\n")

## Fetching all Nobel laureates...

# Prints a message to the console to show the user what's happening

all_laureates <- get_all_nobel_laureates()
# Calls our function and stores the result in a variable called 'all_laureates'

cat("Total laureates fetched:", length(all_laureates), "\n\n")

## Total laureates fetched: 1018

# Prints the total number of laureates retrieved, followed by blank lines

Brain Drain Analysis

# 🌍 BRAIN DRAIN ANALYSIS: Nobel Laureate Migration Patterns
# ==========================================================
cat("Research Question 1: Which countries experienced the greatest 'brain drain' of Nobel talent?\n")

## Research Question 1: Which countries experienced the greatest 'brain drain' of Nobel talent?

cat("Definition: Laureates born in one country but awarded as citizens of another\n\n")

## Definition: Laureates born in one country but awarded as citizens of another

# Initialize data structures
lost_laureates <- list()
migration_stats <- list(
  total_processed = 0,
  skipped_no_birth_data = 0,
  skipped_no_prize_data = 0,
  skipped_no_country_data = 0,
  valid_cases = 0
)

cat("📊 Data Processing Pipeline:\n")

## 📊 Data Processing Pipeline:

for (laureate in all_laureates) {
  migration_stats$total_processed <- migration_stats$total_processed + 1
  
  # Extract birth country with robust error handling
  birth_country <- tryCatch({
    laureate$birth$place$country$en
  }, error = function(e) {
    migration_stats$skipped_no_birth_data <- migration_stats$skipped_no_birth_data + 1
    return(NULL)
  })
  
  # Validate birth country data
  if (is.null(birth_country) || birth_country == "") next
  
  # Check for prize data
  if (is.null(laureate$nobelPrizes) || length(laureate$nobelPrizes) == 0) {
    migration_stats$skipped_no_prize_data <- migration_stats$skipped_no_prize_data + 1
    next
  }
  
  prize <- laureate$nobelPrizes[[1]]
  citizen_country <- NULL
  
  # Extract citizenship country from affiliations
  if (!is.null(prize$affiliations) && length(prize$affiliations) > 0) {
    for (affil in prize$affiliations) {
      if (!is.null(affil$country) && !is.null(affil$country$en)) {
        citizen_country <- affil$country$en
        break
      }
    }
  }
  
  # Validate we have both countries and they're different
  if (is.null(citizen_country) || citizen_country == "" || birth_country == citizen_country) {
    migration_stats$skipped_no_country_data <- migration_stats$skipped_no_country_data + 1
    next
  }
  
  migration_stats$valid_cases <- migration_stats$valid_cases + 1
  
  # Store the migration data
  laureate_name <- ifelse(!is.null(laureate$fullName$en), laureate$fullName$en, "Unknown Name")
  
  if (!birth_country %in% names(lost_laureates)) {
    lost_laureates[[birth_country]] <- list()
  }
  
  lost_laureates[[birth_country]] <- c(lost_laureates[[birth_country]], list(list(
    name = laureate_name,
    born_in = birth_country,
    citizen_of = citizen_country,
    category = ifelse(!is.null(prize$category$en), prize$category$en, "Unknown"),
    year = ifelse(!is.null(prize$awardYear), prize$awardYear, "Unknown")
  )))
}

cat("\n🎯 MIGRATION ANALYSIS RESULTS\n")

## 
## 🎯 MIGRATION ANALYSIS RESULTS

if (length(lost_laureates) > 0) {
  # Calculate statistics
  lost_counts <- map_int(lost_laureates, length)
  sorted_losses <- sort(lost_counts, decreasing = TRUE)
  total_migrations <- sum(lost_counts)
  
  # Create beautiful output table
  cat("🏆 TOP 10 COUNTRIES BY TALENT LOSS\n")
  cat("─────────────────────────────────\n")
  cat("| Rank | Country | Lost | Examples |\n")
  cat("|------|---------|------|----------|\n")
  
  for (i in 1:min(10, length(sorted_losses))) {
    country <- names(sorted_losses)[i]
    count <- sorted_losses[i]
    
    # Get flag emoji
    flag <- case_when(
      country == "Germany" ~ "🇩🇪",
      country == "United Kingdom" ~ "🇬🇧",
      country == "Canada" ~ "🇨🇦", 
      country == "France" ~ "🇫🇷",
      country == "Russia" ~ "🇷🇺",
      country == "Netherlands" ~ "🇳🇱",
      TRUE ~ "🌍"
    )
    
    examples <- map_chr(lost_laureates[[country]][1:min(2, count)], "name")
    examples_str <- paste(examples, collapse = ", ")
    
    if (i == 1) {
      cat("| **", i, "** | ", flag, " **", country, "** | **", count, "** | ", examples_str, " |\n", sep = "")
    } else {
      cat("| ", i, " | ", flag, " ", country, " | ", count, " | ", examples_str, " |\n", sep = "")
    }
  }
  
  cat("\n")
  
  # Key insights
  top_loser <- names(sorted_losses)[1]
  top_count <- sorted_losses[1]
  
  cat("💡 KEY FINDINGS:\n")
  cat("• **Primary Source**: ", top_loser, " lost the most talent (", top_count, " laureates)\n", sep = "")
  cat("• **Migration Rate**: ", round(total_migrations / migration_stats$valid_cases * 100, 1), "% of analyzable laureates changed countries\n", sep = "")
  cat("• **Global Impact**: ", length(lost_laureates), " countries experienced Nobel talent drain\n\n", sep = "")
  
  # Additional insights
  if (length(sorted_losses) >= 3) {
    cat("📊 ADDITIONAL INSIGHTS:\n")
    cat("• Top 3 countries account for ", round(sum(sorted_losses[1:3]) / total_migrations * 100, 1), "% of all talent loss\n", sep = "")
    cat("• Average loss per country: ", round(mean(lost_counts), 1), " laureates\n", sep = "")
    cat("• Median loss: ", round(median(lost_counts), 1), " laureates\n\n", sep = "")
  }
  
} else {
  cat("❌ No international migrations detected in the dataset.\n")
  cat("This could indicate data quality issues or methodological limitations.\n\n")
}

## 🏆 TOP 10 COUNTRIES BY TALENT LOSS
## ─────────────────────────────────
## | Rank | Country | Lost | Examples |
## |------|---------|------|----------|
## | **1** | 🇩🇪 **Germany** | **27** | Alfred Kastler, Arno Allan Penzias |
## | 2 | 🇬🇧 United Kingdom | 24 | Anthony J. Leggett, Christian de Duve |
## | 3 | 🇨🇦 Canada | 15 | Charles Brenton Huggins, David Card |
## | 4 | 🇫🇷 France | 12 | Alexis Carrel, Alfred Werner |
## | 5 | 🌍 Prussia | 11 | Johann Friedrich Wilhelm Adolf von Baeyer, Albert Abraham Michelson |
## | 6 | 🌍 Austria-Hungary | 11 | Albert von Szent-Györgyi Nagyrápolt, Carl Ferdinand Cori |
## | 7 | 🇷🇺 Russia | 10 | Andre Geim, Igor Yevgenyevich Tamm |
## | 8 | 🌍 Scotland | 9 | Angus Deaton, Charles Thomson Rees Wilson |
## | 9 | 🌍 Russian Empire | 9 | Artturi Ilmari Virtanen, Ilya Ilyich Mechnikov |
## | 10 | 🌍 the Netherlands | 9 | Guido W. Imbens, Jacobus Henricus van 't Hoff |
## 
## 💡 KEY FINDINGS:
## • **Primary Source**: Germany lost the most talent (27 laureates)
## • **Migration Rate**: 100% of analyzable laureates changed countries
## • **Global Impact**: 60 countries experienced Nobel talent drain
## 
## 📊 ADDITIONAL INSIGHTS:
## • Top 3 countries account for 25.4% of all talent loss
## • Average loss per country: 4.3 laureates
## • Median loss: 2 laureates

# 🔍 DATA QUALITY REPORT
# ======================

cat("• Total laureates processed: ", migration_stats$total_processed, "\n", sep = "")

## • Total laureates processed: 1018

cat("• Skipped (incomplete country data): ", migration_stats$skipped_no_country_data, "\n", sep = "")

## • Skipped (incomplete country data): 726

cat("• Final migration count: ", sum(unlist(lapply(lost_laureates, length))), "\n", sep = "")

## • Final migration count: 260

cat("• Overall data quality: ", round(migration_stats$valid_cases / migration_stats$total_processed * 100, 1), "% usable\n\n", sep = "")

## • Overall data quality: 25.5% usable

cat("\n🎓 Question 2: TOP INSTITUTIONS BY NOBEL LAUREATES PRODUCED\n")

## 
## 🎓 Question 2: TOP INSTITUTIONS BY NOBEL LAUREATES PRODUCED

# First, create the institution_df dataframe
institution_counts <- list()

for (laureate in all_laureates) {
  if (!is.null(laureate$nobelPrizes)) {
    for (prize in laureate$nobelPrizes) {
      if (!is.null(prize$affiliations) && length(prize$affiliations) > 0) {
        for (affiliation in prize$affiliations) {
          if (!is.null(affiliation$name) && !is.null(affiliation$name$en)) {
            institution_name <- affiliation$name$en
            
            if (grepl("University|Institute|College|School", institution_name, ignore.case = TRUE)) {
              # Standardize institution names
              if (grepl("Harvard", institution_name)) institution_name <- "Harvard University"
              if (grepl("University of California", institution_name)) institution_name <- "University of California"
              if (grepl("University of Cambridge", institution_name)) institution_name <- "University of Cambridge"
              if (grepl("Massachusetts Institute", institution_name)) institution_name <- "Massachusetts Institute of Technology (MIT)"
              if (grepl("Stanford", institution_name)) institution_name <- "Stanford University"
              if (grepl("University of Chicago", institution_name)) institution_name <- "University of Chicago"
              if (grepl("Princeton", institution_name)) institution_name <- "Princeton University"
              if (grepl("Columbia", institution_name)) institution_name <- "Columbia University"
              if (grepl("Yale", institution_name)) institution_name <- "Yale University"
              if (grepl("Caltech", institution_name)) institution_name <- "California Institute of Technology (Caltech)"
              
              if (is.null(institution_counts[[institution_name]])) {
                institution_counts[[institution_name]] <- 0
              }
              institution_counts[[institution_name]] <- institution_counts[[institution_name]] + 1
            }
          }
        }
      }
    }
  }
}

# Create the dataframe
institution_df <- data.frame(
  institution = names(institution_counts),
  laureate_count = unlist(institution_counts),
  stringsAsFactors = FALSE
) %>% 
  filter(laureate_count >= 5) %>%
  arrange(desc(laureate_count))

for (i in 1:min(15, nrow(institution_df))) {
  institution <- institution_df$institution[i]
  count <- institution_df$laureate_count[i]
  
  # Determine country and flag
  country_flag <- case_when(
    grepl("University of California|Harvard|MIT|Stanford|University of Chicago|Caltech|Columbia|Princeton|Yale|Cornell|Rockefeller|Howard Hughes", institution) ~ "🇺🇸 USA",
    grepl("University of Cambridge|University of Oxford", institution) ~ "🇬🇧 UK",
    grepl("Berlin University", institution) ~ "🇩🇪 Germany",
    TRUE ~ "🌍 International"
  )
  
  # Clean institution name for display
  display_name <- case_when(
    institution == "Massachusetts Institute of Technology (MIT)" ~ "MIT",
    institution == "California Institute of Technology (Caltech)" ~ "Caltech",
    institution == "Howard Hughes Medical Institute" ~ "Howard Hughes Med Institute",
    TRUE ~ institution
  )
  
  if (i == 1) {
    cat("| 🥇 **", i, "** | **", display_name, "** | ", country_flag, " | **", count, "** |\n", sep = "")
  } else if (i == 2) {
    cat("| 🥈 **", i, "** | **", display_name, "** | ", country_flag, " | **", count, "** |\n", sep = "")
  } else if (i == 3) {
    cat("| 🥉 **", i, "** | **", display_name, "** | ", country_flag, " | **", count, "** |\n", sep = "")
  } else {
    cat("| ", i, " | ", display_name, " | ", country_flag, " | ", count, " |\n", sep = "")
  }
}

## | 🥇 **1** | **University of California** | 🇺🇸 USA | **48** |
## | 🥈 **2** | **Harvard University** | 🇺🇸 USA | **40** |
## | 🥉 **3** | **MIT** | 🇺🇸 USA | **26** |
## | 4 | Stanford University | 🇺🇸 USA | 25 |
## | 5 | University of Chicago | 🇺🇸 USA | 22 |
## | 6 | Caltech | 🇺🇸 USA | 20 |
## | 7 | Columbia University | 🇺🇸 USA | 20 |
## | 8 | University of Cambridge | 🇬🇧 UK | 18 |
## | 9 | Princeton University | 🇺🇸 USA | 18 |
## | 10 | Howard Hughes Med Institute | 🇺🇸 USA | 16 |
## | 11 | Rockefeller University | 🇺🇸 USA | 13 |
## | 12 | University of Oxford | 🇬🇧 UK | 10 |
## | 13 | Yale University | 🇺🇸 USA | 10 |
## | 14 | Cornell University | 🇺🇸 USA | 8 |
## | 15 | Berlin University | 🇩🇪 Germany | 7 |

cat("\n")

# Key insights section
if (nrow(institution_df) >= 2) {
  top_institution <- institution_df$institution[1]
  top_count <- institution_df$laureate_count[1]
  second_institution <- institution_df$institution[2]
  second_count <- institution_df$laureate_count[2]
  
  cat("💡 KEY INSIGHTS:\n")
  cat("• **Leading Institution**: ", top_institution, " leads with ", top_count, " Nobel laureates\n", sep = "")
  cat("• **Close Competitor**: ", second_institution, " follows closely with ", second_count, " laureates\n", sep = "")
  cat("• **Elite Group**: Only ", nrow(institution_df), " institutions have produced 5+ Nobel laureates\n\n", sep = "")
}

## 💡 KEY INSIGHTS:
## • **Leading Institution**: University of California leads with 48 Nobel laureates
## • **Close Competitor**: Harvard University follows closely with 40 laureates
## • **Elite Group**: Only 29 institutions have produced 5+ Nobel laureates

# Geographic distribution
if (nrow(institution_df) > 0) {
  cat("🌎 GEOGRAPHIC DISTRIBUTION\n")
  cat("| Region | Institutions in Top 15 | Total Laureates | Market Share |\n")
  cat("|--------|------------------------|-----------------|-------------|\n")
  
  top_15 <- head(institution_df, 15)
  us_institutions <- top_15 %>% filter(grepl("University of California|Harvard|MIT|Stanford|University of Chicago|Caltech|Columbia|Princeton|Yale|Cornell|Rockefeller|Howard Hughes", institution))
  uk_institutions <- top_15 %>% filter(grepl("University of Cambridge|University of Oxford", institution))
  
  us_total <- sum(us_institutions$laureate_count)
  uk_total <- sum(uk_institutions$laureate_count)
  other_total <- sum(top_15$laureate_count) - us_total - uk_total
  total_laureates <- sum(top_15$laureate_count)
  
  cat("| 🇺🇸 United States | ", nrow(us_institutions), " | ", us_total, " | ", round(us_total/total_laureates*100, 1), "% |\n", sep = "")
  cat("| 🇬🇧 United Kingdom | ", nrow(uk_institutions), " | ", uk_total, " | ", round(uk_total/total_laureates*100, 1), "% |\n", sep = "")
  cat("| 🌍 Other | ", nrow(top_15) - nrow(us_institutions) - nrow(uk_institutions), " | ", other_total, " | ", round(other_total/total_laureates*100, 1), "% |\n", sep = "")
  
  cat("\n")
  
  # Institutional tiers
  cat("🏅 INSTITUTIONAL TIERS\n")
  cat("| Tier | Laureate Range | Institutions |\n")
  cat("|------|----------------|-------------|\n")
  
  tier_40_plus <- sum(institution_df$laureate_count >= 40)
  tier_20_39 <- sum(institution_df$laureate_count >= 20 & institution_df$laureate_count < 40)
  tier_10_19 <- sum(institution_df$laureate_count >= 10 & institution_df$laureate_count < 20)
  tier_5_9 <- sum(institution_df$laureate_count >= 5 & institution_df$laureate_count < 10)
  
  cat("| Elite (40+) | 40+ laureates | ", tier_40_plus, " institutions |\n", sep = "")
  cat("| Premier (20-39) | 20-39 laureates | ", tier_20_39, " institutions |\n", sep = "")
  cat("| Distinguished (10-19) | 10-19 laureates | ", tier_10_19, " institutions |\n", sep = "")
  cat("| Notable (5-9) | 5-9 laureates | ", tier_5_9, " institutions |\n", sep = "")
  
  cat("\n")
}  else {
  cat("❌ No institution data found with 5+ laureates.\n")
}

## 🌎 GEOGRAPHIC DISTRIBUTION
## | Region | Institutions in Top 15 | Total Laureates | Market Share |
## |--------|------------------------|-----------------|-------------|
## | 🇺🇸 United States | 12 | 266 | 88.4% |
## | 🇬🇧 United Kingdom | 2 | 28 | 9.3% |
## | 🌍 Other | 1 | 7 | 2.3% |
## 
## 🏅 INSTITUTIONAL TIERS
## | Tier | Laureate Range | Institutions |
## |------|----------------|-------------|
## | Elite (40+) | 40+ laureates | 2 institutions |
## | Premier (20-39) | 20-39 laureates | 5 institutions |
## | Distinguished (10-19) | 10-19 laureates | 6 institutions |
## | Notable (5-9) | 5-9 laureates | 16 institutions |

Age Distribution Analysis

cat("\n === QUESTION 3: What's the average age of Nobel laureates by category? === \n")

## 
##  === QUESTION 3: What's the average age of Nobel laureates by category? ===

category_ages <- list()
# Create empty list to store ages organized by category
category_data <- data.frame()
# Create empty data frame for detailed analysis

for (laureate in all_laureates) {
  # Loop through each Nobel laureate
  
  if (!is.null(laureate$birth) && !is.null(laureate$birth$date) && 
      !is.null(laureate$nobelPrizes)) {
    # Only process laureates with complete data: birth date AND prize info
    
    birth_date <- laureate$birth$date
    # Extract birth date (e.g., "1901-05-15")
    
    birth_year <- as.numeric(str_split(birth_date, "-")[[1]][1])
    # Split date by "-" and take first part (year), convert to number
    
    if (is.na(birth_year) || birth_year < 1700 || birth_year > 2020) next
    # Skip if birth year is invalid: not a number, too old, or future date
    
    for (prize in laureate$nobelPrizes) {
      # Some laureates have multiple prizes (e.g., Marie Curie)
      
      prize_year <- as.numeric(prize$awardYear)
      # Extract prize year and convert to number
      category <- prize$category$en
      # Extract category name in English
      
      if (is.na(prize_year) || prize_year < 1901) next
      # Skip if prize year is invalid (Nobel started in 1901)
      
      age <- prize_year - birth_year
      # Calculate age when award was received
      
      if (age >= 20 && age <= 100) {
        # Only include reasonable ages (exclude outliers)
        
        if (is.null(category_ages[[category]])) {
          category_ages[[category]] <- numeric()
        }
        # Create empty numeric vector for this category if it doesn't exist
        
        category_ages[[category]] <- c(category_ages[[category]], age)
        # Add this age to the category's age list
        
        # Also add to data frame for additional analysis
        category_data <- rbind(category_data, data.frame(
          category = category,
          age = age,
          year = prize_year,
          name = ifelse(!is.null(laureate$fullName$en), laureate$fullName$en, "Unknown"),
          stringsAsFactors = FALSE
        ))
        # Store detailed info for later analysis: category, age, year, name
      }
    }
  }
}

# Calculate summary statistics by category
age_summary <- map_dfr(names(category_ages), function(cat) {
  # For each category, calculate statistics
  
  ages <- category_ages[[cat]]
  # Get all ages for this category
  
  data.frame(
    category = cat,
    avg_age = round(mean(ages), 1),      # Average age
    median_age = round(median(ages), 1), # Middle value
    min_age = min(ages),                 # Youngest
    max_age = max(ages),                 # Oldest
    count = length(ages),                # Sample size
    stringsAsFactors = FALSE
  )
}) %>% arrange(avg_age)
# Sort categories by average age (youngest first)

cat("AVERAGE AGE OF NOBEL LAUREATES BY CATEGORY: \n")

## AVERAGE AGE OF NOBEL LAUREATES BY CATEGORY:

for (i in 1:nrow(age_summary)) {
  # Print formatted results for each category
  cat(i, ". ", age_summary$category[i], ": ", age_summary$avg_age[i], 
      " years (median: ", age_summary$median_age[i], ")\n", sep = "")
  cat("    Range: ", age_summary$min_age[i], "-", age_summary$max_age[i], 
      " years | Based on ", age_summary$count[i], " laureates\n\n", sep = "")
}

## 1. Physics: 57.7 years (median: 56)
##     Range: 25-96 years | Based on 230 laureates
## 
## 2. Physiology or Medicine: 58.9 years (median: 58)
##     Range: 32-87 years | Based on 232 laureates
## 
## 3. Chemistry: 59.2 years (median: 58)
##     Range: 35-97 years | Based on 200 laureates
## 
## 4. Peace: 61.2 years (median: 62)
##     Range: 25-87 years | Based on 111 laureates
## 
## 5. Literature: 65 years (median: 67)
##     Range: 42-88 years | Based on 122 laureates
## 
## 6. Economic Sciences: 67 years (median: 67)
##     Range: 47-90 years | Based on 99 laureates

# Find youngest and oldest laureates overall
youngest <- category_data %>% arrange(age) %>% head(5)
# Sort by age ascending, take top 5
oldest <- category_data %>% arrange(desc(age)) %>% head(5)
# Sort by age descending, take top 5

cat("YOUNGEST NOBEL LAUREATES: \n")

## YOUNGEST NOBEL LAUREATES:

for (i in 1:nrow(youngest)) {
  cat(i, ". ", youngest$name[i], " (", youngest$category[i], "): ", 
      youngest$age[i], " years in ", youngest$year[i], "\n", sep = "")
}

## 1. William Lawrence Bragg (Physics): 25 years in 1915
## 2. Nadia Murad Basee Taha (Peace): 25 years in 2018
## 3. Carl David Anderson (Physics): 31 years in 1936
## 4. Paul Adrien Maurice Dirac (Physics): 31 years in 1933
## 5. Tsung-Dao (T.D.) Lee (Physics): 31 years in 1957

cat("\nOLDEST NOBEL LAUREATES: \n")

## 
## OLDEST NOBEL LAUREATES:

for (i in 1:nrow(oldest)) {
  cat(i, ". ", oldest$name[i], " (", oldest$category[i], "): ", 
      oldest$age[i], " years in ", oldest$year[i], "\n", sep = "")
}

## 1. John B. Goodenough (Chemistry): 97 years in 2019
## 2. Arthur Ashkin (Physics): 96 years in 2018
## 3. John J. Hopfield (Physics): 91 years in 2024
## 4. Klaus Hasselmann (Physics): 90 years in 2021
## 5. Leonid Hurwicz (Economic Sciences): 90 years in 2007

# Additional analysis: Age trends over time
cat("\n--- Age Trends Over Time ---\n")

## 
## --- Age Trends Over Time ---

recent_data <- category_data %>% filter(year >= 2000)
# Data from year 2000 onwards
older_data <- category_data %>% filter(year < 2000)
# Data before year 2000

if (nrow(recent_data) > 0 && nrow(older_data) > 0) {
  cat("Average age before 2000:", round(mean(older_data$age), 1), "years\n")
  cat("Average age since 2000:", round(mean(recent_data$age), 1), "years\n")
  cat("Change:", round(mean(recent_data$age) - mean(older_data$age), 1), "years\n")
  # Calculate how much older laureates have become over time
}

## Average age before 2000: 57.6 years
## Average age since 2000: 67 years
## Change: 9.4 years

Geographic Distribution & Historical Trends Analysis

cat("\n === QUESTION 4: How has the geographic distribution of Nobel prizes changed over time? === \n\n")

## 
##  === QUESTION 4: How has the geographic distribution of Nobel prizes changed over time? ===

# Analyze by decades
decade_data <- list()      # Store counts by decade and country
country_totals <- list()   # Store overall counts by country

for (laureate in all_laureates) {
  if (!is.null(laureate$nobelPrizes)) {
    for (prize in laureate$nobelPrizes) {
      if (!is.null(prize$awardYear)) {
        year <- as.numeric(prize$awardYear)
        decade <- floor(year / 10) * 10
        # Convert year to decade: 1995 → 1990, 2003 → 2000
        
        # Get country from affiliation
        country <- NULL
        if (!is.null(prize$prizeAffiliation) && !is.null(prize$prizeAffiliation$country)) {
          country <- prize$prizeAffiliation$country$en
        } else if (!is.null(prize$affiliations) && length(prize$affiliations) > 0) {
          for (affil in prize$affiliations) {
            if (!is.null(affil$country)) {
              country <- affil$country$en
              break
            }
          }
        }
        # Extract country using same method as Question 1
        
        if (!is.null(country) && country != "") {
          # Initialize decade if needed
          if (is.null(decade_data[[as.character(decade)]])) {
            decade_data[[as.character(decade)]] <- list()
          }
          # Create empty list for this decade if it doesn't exist
          
          # Initialize country count for this decade
          if (is.null(decade_data[[as.character(decade)]][[country]])) {
            decade_data[[as.character(decade)]][[country]] <- 0
          }
          # Create counter for this country in this decade if it doesn't exist
          
          decade_data[[as.character(decade)]][[country]] <- 
            decade_data[[as.character(decade)]][[country]] + 1
          # Increment count for this country in this decade
            
          # Also track overall totals
          if (is.null(country_totals[[country]])) {
            country_totals[[country]] <- 0
          }
          country_totals[[country]] <- country_totals[[country]] + 1
          # Increment overall count for this country
        }
      }
    }
  }
}

cat("GEOGRAPHIC DISTRIBUTION OF NOBEL PRIZES BY DECADE: \n")

## GEOGRAPHIC DISTRIBUTION OF NOBEL PRIZES BY DECADE:

# Analyze each decade
decades <- sort(as.numeric(names(decade_data)))
# Get all decades with data and sort chronologically

for (decade in decades) {
  if (decade >= 1900) {  # Only show from 1900 onwards (Nobel started 1901)
    decade_counts <- unlist(decade_data[[as.character(decade)]])
    # Convert nested list to simple named vector: c("USA" = 25, "UK" = 10)
    
    top_countries <- names(sort(decade_counts, decreasing = TRUE))[1:5]
    # Get top 5 country names sorted by count (highest first)
    
    cat(decade, "s:\n", sep = "")
    for (i in 1:length(top_countries)) {
      country <- top_countries[i]
      count <- decade_counts[country]
      # Get count for this country
      
      cat("  ", i, ". ", country, ": ", count, " prizes", sep = "")
      
      # Show percentage if significant
      total_decade_prizes <- sum(decade_counts)
      percentage <- round((count / total_decade_prizes) * 100, 1)
      if (percentage >= 10) {
        cat(" (", percentage, "%)", sep = "")
      }
      # Only show percentage if country has ≥10% of decade's prizes
      cat("\n")
    }
    cat("  Total prizes in decade: ", total_decade_prizes, "\n\n", sep = "")
  }
}

## 1900s:
##   1. Germany: 11 prizes (33.3%)
##   2. France: 7 prizes (21.2%)
##   3. United Kingdom: 6 prizes (18.2%)
##   4. the Netherlands: 2 prizes
##   5. USA: 1 prizes
##   Total prizes in decade: 33
## 
## 1910s:
##   1. Germany: 8 prizes (33.3%)
##   2. France: 4 prizes (16.7%)
##   3. United Kingdom: 3 prizes (12.5%)
##   4. USA: 2 prizes
##   5. Sweden: 2 prizes
##   Total prizes in decade: 24
## 
## 1920s:
##   1. Germany: 8 prizes (24.2%)
##   2. United Kingdom: 7 prizes (21.2%)
##   3. Denmark: 3 prizes
##   4. France: 3 prizes
##   5. Sweden: 3 prizes
##   Total prizes in decade: 33
## 
## 1930s:
##   1. Germany: 11 prizes (28.2%)
##   2. USA: 11 prizes (28.2%)
##   3. United Kingdom: 7 prizes (17.9%)
##   4. France: 2 prizes
##   5. Switzerland: 2 prizes
##   Total prizes in decade: 39
## 
## 1940s:
##   1. USA: 15 prizes (50%)
##   2. United Kingdom: 6 prizes (20%)
##   3. Sweden: 2 prizes
##   4. Switzerland: 2 prizes
##   5. Finland: 1 prizes
##   Total prizes in decade: 30
## 
## 1950s:
##   1. USA: 31 prizes (56.4%)
##   2. United Kingdom: 9 prizes (16.4%)
##   3. Germany: 5 prizes
##   4. USSR: 4 prizes
##   5. Italy: 1 prizes
##   Total prizes in decade: 55
## 
## 1960s:
##   1. USA: 29 prizes (48.3%)
##   2. United Kingdom: 11 prizes (18.3%)
##   3. Germany: 5 prizes
##   4. France: 4 prizes
##   5. USSR: 3 prizes
##   Total prizes in decade: 60
## 
## 1970s:
##   1. USA: 44 prizes (56.4%)
##   2. United Kingdom: 15 prizes (19.2%)
##   3. Sweden: 3 prizes
##   4. Germany: 3 prizes
##   5. Denmark: 2 prizes
##   Total prizes in decade: 78
## 
## 1980s:
##   1. USA: 46 prizes (60.5%)
##   2. Switzerland: 9 prizes (11.8%)
##   3. United Kingdom: 6 prizes
##   4. Germany: 5 prizes
##   5. Sweden: 3 prizes
##   Total prizes in decade: 76
## 
## 1990s:
##   1. USA: 57 prizes (76%)
##   2. Germany: 5 prizes
##   3. United Kingdom: 4 prizes
##   4. France: 3 prizes
##   5. Canada: 2 prizes
##   Total prizes in decade: 75
## 
## 2000s:
##   1. USA: 67 prizes (69.1%)
##   2. United Kingdom: 7 prizes
##   3. Japan: 6 prizes
##   4. Israel: 4 prizes
##   5. France: 4 prizes
##   Total prizes in decade: 97
## 
## 2010s:
##   1. USA: 58 prizes (59.8%)
##   2. United Kingdom: 11 prizes (11.3%)
##   3. Japan: 9 prizes
##   4. France: 6 prizes
##   5. Switzerland: 3 prizes
##   Total prizes in decade: 97
## 
## 2020s:
##   1. USA: 40 prizes (65.6%)
##   2. Germany: 6 prizes
##   3. United Kingdom: 3 prizes
##   4. France: 2 prizes
##   5. Canada: 2 prizes
##   Total prizes in decade: 61

# Overall top countries
cat("OVERALL TOP COUNTRIES (ALL TIME): \n")

## OVERALL TOP COUNTRIES (ALL TIME):

overall_counts <- sort(unlist(country_totals), decreasing = TRUE)
# Sort all countries by total prize count

for (i in 1:min(10, length(overall_counts))) {
  country <- names(overall_counts)[i]
  count <- overall_counts[i]
  percentage <- round((count / length(all_laureates)) * 100, 1)
  cat(i, ". ", country, ": ", count, " prizes (", percentage, "%)\n", sep = "")
}

## 1. USA: 403 prizes (39.6%)
## 2. United Kingdom: 95 prizes (9.3%)
## 3. Germany: 73 prizes (7.2%)
## 4. France: 39 prizes (3.8%)
## 5. Switzerland: 24 prizes (2.4%)
## 6. Japan: 19 prizes (1.9%)
## 7. Sweden: 18 prizes (1.8%)
## 8. Canada: 10 prizes (1%)
## 9. the Netherlands: 10 prizes (1%)
## 10. Denmark: 9 prizes (0.9%)

# Analyze dominance trends
cat("\n--- DOMINANCE ANALYSIS ---\n")

## 
## --- DOMINANCE ANALYSIS ---

# Calculate US share over time
us_dominance <- data.frame()
for (decade in decades) {
  if (decade >= 1920) {  # US became significant after WWI
    decade_counts <- unlist(decade_data[[as.character(decade)]])
    total_prizes <- sum(decade_counts)
    us_prizes <- ifelse("USA" %in% names(decade_counts), decade_counts[["USA"]], 0)
    # Handle case where US has no prizes in a decade
    us_percentage <- round((us_prizes / total_prizes) * 100, 1)
    
    us_dominance <- rbind(us_dominance, data.frame(
      decade = decade,
      us_prizes = us_prizes,
      total_prizes = total_prizes,
      us_percentage = us_percentage
    ))
  }
}

cat("US Dominance in Nobel Prizes:\n")

## US Dominance in Nobel Prizes:

for (i in 1:nrow(us_dominance)) {
  cat(us_dominance$decade[i], "s: ", us_dominance$us_percentage[i], "% (", 
      us_dominance$us_prizes[i], "/", us_dominance$total_prizes[i], " prizes)\n", sep = "")
}

## 1920s: 6.1% (2/33 prizes)
## 1930s: 28.2% (11/39 prizes)
## 1940s: 50% (15/30 prizes)
## 1950s: 56.4% (31/55 prizes)
## 1960s: 48.3% (29/60 prizes)
## 1970s: 56.4% (44/78 prizes)
## 1980s: 60.5% (46/76 prizes)
## 1990s: 76% (57/75 prizes)
## 2000s: 69.1% (67/97 prizes)
## 2010s: 59.8% (58/97 prizes)
## 2020s: 65.6% (40/61 prizes)

# Emerging countries in recent decades
cat("\n--- EMERGING COUNTRIES (Last 30 years) ---\n")

## 
## --- EMERGING COUNTRIES (Last 30 years) ---

recent_decades <- c(1990, 2000, 2010, 2020)
for (decade in recent_decades) {
  if (as.character(decade) %in% names(decade_data)) {
    cat(decade, "s - New countries winning first prizes:\n", sep = "")
    decade_counts <- unlist(decade_data[[as.character(decade)]])
    
    # Compare with previous decades to find new countries
    previous_countries <- c()
    for (prev_decade in (decade-10):(decade-30)) {
      # Look back 10-30 years to see if country existed before
      if (as.character(prev_decade) %in% names(decade_data)) {
        previous_countries <- c(previous_countries, names(decade_data[[as.character(prev_decade)]]))
      }
    }
    
    new_countries <- setdiff(names(decade_counts), unique(previous_countries))
    # Find countries in current decade that weren't in previous decades
    
    if (length(new_countries) > 0) {
      for (country in new_countries[1:min(5, length(new_countries))]) {
        cat("  • ", country, " (", decade_counts[country], " prizes)\n", sep = "")
      }
    } else {
      cat("  No new countries\n")
    }
  }
}

## 1990s - New countries winning first prizes:
##   No new countries
## 2000s - New countries winning first prizes:
##   • Israel (4 prizes)
##   • Australia (1 prizes)
##   • Russia (2 prizes)
## 2010s - New countries winning first prizes:
##   • Belgium (1 prizes)
##   • China (1 prizes)
## 2020s - New countries winning first prizes:
##   • Austria (1 prizes)
##   • Italy (1 prizes)
##   • Hungary (1 prizes)

What This Analysis Reveals:

**Historical power shifts ** in scientific/cultural leadership US dominance timeline and percentage share over decades

**Globalization trends ** new countries entering Nobel scene

**Geopolitical patterns ** how world events affect prize distribution

Geographic & Field Distribution Analysis (out of curiosity)

cat("\n === QUESTION 4-Extended: How has the geographic distribution of Nobel prizes changed over time and field? ===\n")

## 
##  === QUESTION 4-Extended: How has the geographic distribution of Nobel prizes changed over time and field? ===

# Analyze by decades AND fields
decade_data <- list()           # Store counts by decade and country
decade_field_data <- list()     # Store counts by decade, country AND field  
country_totals <- list()        # Store overall counts by country
field_country_data <- list()    # Store field-specific country distributions

for (laureate in all_laureates) {
  if (!is.null(laureate$nobelPrizes)) {
    for (prize in laureate$nobelPrizes) {
      if (!is.null(prize$awardYear)) {
        year <- as.numeric(prize$awardYear)
        decade <- floor(year / 10) * 10
        category <- prize$category$en  # Extract the Nobel field/category
        
        # Get country from affiliation
        country <- NULL
        if (!is.null(prize$prizeAffiliation) && !is.null(prize$prizeAffiliation$country)) {
          country <- prize$prizeAffiliation$country$en
        } else if (!is.null(prize$affiliations) && length(prize$affiliations) > 0) {
          for (affil in prize$affiliations) {
            if (!is.null(affil$country)) {
              country <- affil$country$en
              break
            }
          }
        }
        
        if (!is.null(country) && country != "" && !is.null(category)) {
          # Initialize decade if needed
          if (is.null(decade_data[[as.character(decade)]])) {
            decade_data[[as.character(decade)]] <- list()
          }
          if (is.null(decade_data[[as.character(decade)]][[country]])) {
            decade_data[[as.character(decade)]][[country]] <- 0
          }
          decade_data[[as.character(decade)]][[country]] <- 
            decade_data[[as.character(decade)]][[country]] + 1
            
          # Track by decade AND field
          decade_field_key <- paste0(decade, "_", category)
          if (is.null(decade_field_data[[decade_field_key]])) {
            decade_field_data[[decade_field_key]] <- list()
          }
          if (is.null(decade_field_data[[decade_field_key]][[country]])) {
            decade_field_data[[decade_field_key]][[country]] <- 0
          }
          decade_field_data[[decade_field_key]][[country]] <- 
            decade_field_data[[decade_field_key]][[country]] + 1
            
          # Track overall totals
          if (is.null(country_totals[[country]])) {
            country_totals[[country]] <- 0
          }
          country_totals[[country]] <- country_totals[[country]] + 1
          
          # Track field-specific country data
          if (is.null(field_country_data[[category]])) {
            field_country_data[[category]] <- list()
          }
          if (is.null(field_country_data[[category]][[country]])) {
            field_country_data[[category]][[country]] <- 0
          }
          field_country_data[[category]][[country]] <- 
            field_country_data[[category]][[country]] + 1
        }
      }
    }
  }
}


cat("FIELD-SPECIFIC GEOGRAPHIC DISTRIBUTION:\n")

## FIELD-SPECIFIC GEOGRAPHIC DISTRIBUTION:

fields <- c("Physics", "Chemistry", "Physiology or Medicine", "Literature", "Peace", "Economic Sciences")

for (field in fields) {
  if (field %in% names(field_country_data)) {
    field_counts <- unlist(field_country_data[[field]])
    top_countries <- names(sort(field_counts, decreasing = TRUE))[1:5]
    
    cat(field, ":\n")
    for (i in 1:length(top_countries)) {
      country <- top_countries[i]
      count <- field_counts[country]
      total_field_prizes <- sum(field_counts)
      percentage <- round((count / total_field_prizes) * 100, 1)
      
      cat("  ", i, ". ", country, ": ", count, " prizes (", percentage, "%)\n", sep = "")
    }
    cat("  Total ", field, " prizes: ", total_field_prizes, "\n\n", sep = "")
  }
}

## Physics :
##   1. USA: 112 prizes (49.1%)
##   2. United Kingdom: 26 prizes (11.4%)
##   3. Germany: 23 prizes (10.1%)
##   4. France: 15 prizes (6.6%)
##   5. Switzerland: 9 prizes (3.9%)
##   Total Physics prizes: 228
## 
## Chemistry :
##   1. USA: 86 prizes (43.4%)
##   2. Germany: 33 prizes (16.7%)
##   3. United Kingdom: 31 prizes (15.7%)
##   4. France: 10 prizes (5.1%)
##   5. Japan: 7 prizes (3.5%)
##   Total Chemistry prizes: 198
## 
## Physiology or Medicine :
##   1. USA: 121 prizes (52.4%)
##   2. United Kingdom: 32 prizes (13.9%)
##   3. Germany: 16 prizes (6.9%)
##   4. France: 10 prizes (4.3%)
##   5. Switzerland: 8 prizes (3.5%)
##   Total Physiology or Medicine prizes: 231
## 
## Peace :
##   1. USA: 3 prizes (75%)
##   2. France: 1 prizes (25%)
##   3. NA: NA prizes (NA%)
##   4. NA: NA prizes (NA%)
##   5. NA: NA prizes (NA%)
##   Total Peace prizes: 4
## 
## Economic Sciences :
##   1. USA: 81 prizes (83.5%)
##   2. United Kingdom: 6 prizes (6.2%)
##   3. France: 3 prizes (3.1%)
##   4. Norway: 2 prizes (2.1%)
##   5. Sweden: 1 prizes (1%)
##   Total Economic Sciences prizes: 97

cat("FIELD DOMINANCE BY RECENT DECADES:\n")

## FIELD DOMINANCE BY RECENT DECADES:

recent_decades <- c(1990, 2000, 2010, 2020)
for (decade in recent_decades) {
  cat(decade, "s Field Leaders:\n", sep = "")
  
  for (field in fields) {
    decade_field_key <- paste0(decade, "_", field)
    if (decade_field_key %in% names(decade_field_data)) {
      field_counts <- unlist(decade_field_data[[decade_field_key]])
      if (length(field_counts) > 0) {
        top_country <- names(sort(field_counts, decreasing = TRUE))[1]
        top_count <- field_counts[top_country]
        total_field_prizes <- sum(field_counts)
        percentage <- round((top_count / total_field_prizes) * 100, 1)
        
        cat("  • ", field, ": ", top_country, " (", top_count, "/", total_field_prizes, 
            " prizes, ", percentage, "%)\n", sep = "")
      }
    }
  }
  cat("\n")
}

## 1990s Field Leaders:
##   • Physics: USA (16/21 prizes, 76.2%)
##   • Chemistry: USA (11/17 prizes, 64.7%)
##   • Physiology or Medicine: USA (16/20 prizes, 80%)
##   • Economic Sciences: USA (14/17 prizes, 82.4%)
## 
## 2000s Field Leaders:
##   • Physics: USA (19/28 prizes, 67.9%)
##   • Chemistry: USA (14/24 prizes, 58.3%)
##   • Physiology or Medicine: USA (15/25 prizes, 60%)
##   • Economic Sciences: USA (19/20 prizes, 95%)
## 
## 2010s Field Leaders:
##   • Physics: USA (12/26 prizes, 46.2%)
##   • Chemistry: USA (16/27 prizes, 59.3%)
##   • Physiology or Medicine: USA (12/24 prizes, 50%)
##   • Economic Sciences: USA (18/20 prizes, 90%)
## 
## 2020s Field Leaders:
##   • Physics: USA (8/17 prizes, 47.1%)
##   • Chemistry: USA (9/16 prizes, 56.2%)
##   • Physiology or Medicine: USA (9/13 prizes, 69.2%)
##   • Economic Sciences: USA (14/15 prizes, 93.3%)

# Enhanced dominance analysis by field
cat("--- US DOMINANCE BY FIELD ---\n")

## --- US DOMINANCE BY FIELD ---

field_dominance <- data.frame()

for (field in fields) {
  if (field %in% names(field_country_data)) {
    field_counts <- unlist(field_country_data[[field]])
    total_prizes <- sum(field_counts)
    us_prizes <- ifelse("USA" %in% names(field_counts), field_counts[["USA"]], 0)
    us_percentage <- round((us_prizes / total_prizes) * 100, 1)
    
    field_dominance <- rbind(field_dominance, data.frame(
      field = field,
      us_prizes = us_prizes,
      total_prizes = total_prizes,
      us_percentage = us_percentage,
      stringsAsFactors = FALSE
    ))
  }
}

field_dominance <- field_dominance %>% arrange(desc(us_percentage))

cat("US Dominance by Nobel Field:\n")

## US Dominance by Nobel Field:

for (i in 1:nrow(field_dominance)) {
  cat(i, ". ", field_dominance$field[i], ": ", field_dominance$us_percentage[i], 
      "% (", field_dominance$us_prizes[i], "/", field_dominance$total_prizes[i], " prizes)\n", sep = "")
}

## 1. Economic Sciences: 83.5% (81/97 prizes)
## 2. Peace: 75% (3/4 prizes)
## 3. Physiology or Medicine: 52.4% (121/231 prizes)
## 4. Physics: 49.1% (112/228 prizes)
## 5. Chemistry: 43.4% (86/198 prizes)

# Enhanced emerging countries analysis with fields
cat("\n--- EMERGING COUNTRIES BY FIELD (Last 30 years) ---\n")

## 
## --- EMERGING COUNTRIES BY FIELD (Last 30 years) ---

for (decade in recent_decades) {
  if (as.character(decade) %in% names(decade_data)) {
    cat(decade, "s - First-time winners by field:\n", sep = "")
    
    for (field in fields) {
      decade_field_key <- paste0(decade, "_", field)
      if (decade_field_key %in% names(decade_field_data)) {
        field_counts <- unlist(decade_field_data[[decade_field_key]])
        
        # Find countries that won in this field for the first time
        previous_countries <- c()
        for (prev_decade in (decade-10):(decade-30)) {
          prev_field_key <- paste0(prev_decade, "_", field)
          if (prev_field_key %in% names(decade_field_data)) {
            previous_countries <- c(previous_countries, names(decade_field_data[[prev_field_key]]))
          }
        }
        
        new_countries <- setdiff(names(field_counts), unique(previous_countries))
        if (length(new_countries) > 0) {
          cat("  • ", field, ": ", paste(new_countries, collapse = ", "), 
              " (", length(new_countries), " new countries)\n", sep = "")
        }
      }
    }
    cat("\n")
  }
}

## 1990s - First-time winners by field:
##   • Physics: Canada, the Netherlands (2 new countries)
##   • Chemistry: Denmark (1 new countries)
##   • Economic Sciences: Germany (1 new countries)
## 
## 2000s - First-time winners by field:
##   • Physics: Japan, Russia (2 new countries)
##   • Chemistry: Israel (1 new countries)
##   • Physiology or Medicine: Australia (1 new countries)
##   • Economic Sciences: Israel (1 new countries)
## 
## 2010s - First-time winners by field:
##   • Physics: Australia, Belgium (2 new countries)
##   • Chemistry: the Netherlands (1 new countries)
##   • Physiology or Medicine: Norway, Japan, China (3 new countries)
## 
## 2020s - First-time winners by field:
##   • Physics: Sweden, Austria, Italy (3 new countries)
##   • Chemistry: Australia (1 new countries)
##   • Physiology or Medicine: Hungary, Canada (2 new countries)

Key Insights from All 4 Questions:

key lessons learned:

Global Talent Migration

25.5% of Nobel laureates changed countries, with Germany suffering the biggest “brain drain” (27 lost). This reveals how political instability and opportunity drive intellectual migration.

American Academic Dominance

The US produces 39.6% of all Nobel prizes, with University of California leading (43 laureates). This demonstrates how concentrated funding and research ecosystems create excellence hubs.

Aging of Innovation

Laureates are now 9.4 years older than before 2000, with Economics requiring the most career maturity (avg. 67 years). This suggests longer paths to breakthrough achievements.

Geopolitical Power Shifts

The Nobel landscape shifted from European dominance (1900s) to US supremacy (post-WWII), reflecting broader geopolitical and scientific leadership transitions.

Institutional Excellence Matters

9 of top 15 institutions are American, proving that sustained investment in research infrastructure yields long-term returns in breakthrough discoveries.

Field-Specific Patterns

Physics recognizes younger talent (avg. 57.7 years) while humanities require lifetime achievement (Literature: 65 years), showing different innovation timelines across disciplines.

Globalization is Slow but Real

While US dominance peaked at 76% in the 1990s, recent decades show more countries entering the Nobel arena, indicating gradual global knowledge diffusion.

These lessons reveal how scientific progress intertwines with geopolitics, institutional investment, and global talent flows over centuries.

Assignment 10B

Mehreen Ali Gillani

2025-11-05

Nobel Prize Data Analysis - Exploring Patterns of Excellence

Introduction

Key Insights from All 4 Questions: