Assignment 10b

Author

Mark Hamer

Introduction

This report uses the Nobel Prize public API to investigate patterns in Nobel Prize awards since 1901. We pull structured JSON data from the laureate endpoint, transform it into tidy data frames in R, and explore four questions about gender representation, institutional dominance, emigration patterns, and geographic concentration by prize category. No API key is required.

Setup & Data

 [1] "nobelPrizes_awardYear"           "nobelPrizes_sortOrder"          
 [3] "nobelPrizes_portion"             "nobelPrizes_dateAwarded"        
 [5] "nobelPrizes_prizeStatus"         "nobelPrizes_prizeAmount"        
 [7] "nobelPrizes_prizeAmountAdjusted" "nobelPrizes_affiliations"       
 [9] "nobelPrizes_links"               "nobelPrizes_category.en"        
[11] "nobelPrizes_category.no"         "nobelPrizes_category.se"        
[13] "nobelPrizes_categoryFullName.en" "nobelPrizes_categoryFullName.no"
[15] "nobelPrizes_categoryFullName.se" "nobelPrizes_motivation.en"      
[17] "nobelPrizes_motivation.se"       "nobelPrizes_motivation.no"      
[19] "nobelPrizes_residences"          "nobelPrizes_topMotivation.en"   
[21] "nobelPrizes_topMotivation.se"   

How Has Female Representation Changed Over Time?

gender_df <- laureates %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  select(
    gender,
    year     = nobelPrizes_awardYear,
    category = nobelPrizes_category.en
  ) %>%
  filter(!is.na(gender), gender %in% c("male", "female")) %>%
  mutate(year = as.integer(year),
         decade = floor(year / 10) * 10)

gender_df %>%
  count(decade, gender) %>%
  ggplot(aes(x = decade, y = n, fill = gender)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Nobel Prize Gender Breakdown by Decade",
       x = "Decade", y = "Share of Laureates", fill = "Gender") +
  theme_minimal()

gender_trend <- laureates %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  select(gender, year = nobelPrizes_awardYear) %>%
  filter(!is.na(gender), gender %in% c("male", "female")) %>%
  mutate(year = as.integer(year)) %>%
  count(year, gender) %>%
  complete(year, gender, fill = list(n = 0)) %>%
  group_by(year) %>%
  mutate(pct = n / sum(n)) %>%
  ungroup() %>%
  filter(gender == "female")

ggplot(gender_trend, aes(x = year, y = pct)) +
  geom_smooth(method = "loess", color = "darkblue", se = TRUE) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Female Share of Nobel Prizes Over Time",
       x = "Year", y = "% of Prizes Won by Women") +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Note: This forecast uses a simple linear model fit on post-1970 data and assumes the upward trend in female representation continues at a constant rate. In reality, this may not hold, there is likely a ceiling effect as representation approaches parity, and the trend could accelerate or slow depending on changes in nomination practices, institutional culture, and the pipeline of women entering research careers. The shaded region shows the 95% prediction interval, reflecting meaningful uncertainty in any single future year. This model is best interpreted as a directional projection rather than a precise prediction.

Which Countries Lost the Most Laureates to Emigration?

migration_df <- laureates %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  unnest(nobelPrizes_residences, names_sep = "_") %>%
  select(
    name          = knownName.en,
    birth_country = birth.place.countryNow.en,
    award_country = nobelPrizes_residences_country.en
  ) %>%
  filter(
    !is.na(birth_country),
    !is.na(award_country),
    birth_country != award_country
  )

losses <- migration_df %>%
  count(birth_country, sort = TRUE) %>%
  slice_head(n = 15)

losses %>%
  ggplot(aes(x = reorder(birth_country, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Countries That Lost the Most Laureates to Emigration",
       x = "Birth Country", y = "Number of Laureates") +
  theme_minimal()

Which Institutions Have Produced the Most Laureates?

institutions_df <- laureates %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  unnest(nobelPrizes_affiliations, names_sep = "_") %>%
  select(institution = nobelPrizes_affiliations_name.en) %>%
  filter(!is.na(institution)) %>%
  count(institution, sort = TRUE) %>%
  slice_head(n = 15)

institutions_df %>%
  ggplot(aes(x = reorder(institution, n), y = n)) +
  geom_col(fill = "darkgreen") +
  coord_flip() +
  labs(title = "Top 15 Institutions by Nobel Laureate Count",
       x = "Institution", y = "Count") +
  theme_minimal()

Which Countries Dominate Specific Prize Categories?

country_cat_df <- laureates %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  select(
    country  = birth.place.countryNow.en,
    category = nobelPrizes_category.en
  ) %>%
  filter(!is.na(country), !is.na(category))

# Get top 10 countries overall to keep the heatmap readable
top_countries <- country_cat_df %>%
  count(country, sort = TRUE) %>%
  slice_head(n = 10) %>%
  pull(country)

country_cat_df %>%
  filter(country %in% top_countries) %>%
  count(country, category) %>%
  ggplot(aes(x = category, y = country, fill = n)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "lightyellow", high = "darkred") +
  labs(title = "Nobel Prizes by Country and Category (Top 10 Countries)",
       x = "Category", y = "Country", fill = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))