Assignment 10B Nobel Prize

Author

Khandker Qaiduzzaman

Objective

The goal of this assignment is to use the Nobel Prize API to retrieve JSON data, transform it into tidy data frames in R, and explore the dataset to answer four data-driven questions.


Approach

This analysis focuses on retrieving structured JSON data from the Nobel Prize API, transforming it into tidy data using tidyverse principles, and preparing it for exploratory data analysis.


Step 1: Data Collection Using Nobel Prize API

The dataset is obtained from the Nobel Prize Developer Zone, which provides structured JSON data on Nobel laureates and prize awards.

Two endpoints are used:

  • Laureates Endpoint: contains personal information such as name, gender, birth date, and country of birth
  • Prizes Endpoint: contains award-level information such as year, category, and prize details

The data is retrieved in JSON format and converted into R data frames for further processing.


Step 2: JSON Structure and Data Preparation

The JSON data is parsed using the jsonlite package with flatten = TRUE to partially simplify nested structures.

However, the datasets still contain complex nested components such as: - List-columns (e.g., nobelPrizes, laureates, links) - Nested identifiers inside sub-structures - Hierarchical fields generated from JSON paths (e.g., birth.place.country.en)

Initial inspection using head() and glimpse() is used to understand:

- Variable structure
- Key identifiers (especially id)
- Fields required for analysis (country, year, category)

Further data wrangling such as unnesting and restructuring will be required in later stages before analysis.


Data Analysis Workflow

The workflow begins by retrieving JSON data from the Nobel Prize API and converting it into data frames.

Next, both datasets are explored to understand their structure and nesting behavior. The laureates dataset contains individual-level data, while the prizes dataset contains award-level data with embedded laureate information.

Since the data is not structured in a traditional relational format, joins require extracting nested identifiers (such as id) from list-columns rather than relying on a simple shared key.

After restructuring, the datasets will be combined to enable comparative and longitudinal analysis.

Finally, the cleaned dataset will be used to answer four research questions involving grouping, filtering, time trends, and cross-variable comparisons.


Research Questions

The following four questions will guide the analysis. The questions are designed to balance interpretability and analytical depth while remaining feasible using the Nobel Prize API structure.


1. How has the number of Nobel Prizes awarded changed over time across different categories?

This question examines historical trends in Nobel Prize distribution and compares how awards have evolved across major categories such as Physics, Chemistry, Medicine, Literature, and Peace. It helps identify whether certain fields have become more prominent over time.


2. What is the distribution of age at the time of receiving a Nobel Prize across different categories?

This question investigates whether laureates in different fields tend to receive the Nobel Prize at different stages of life. Age will be derived using birth year and award year, and summarized by category.


3. Which countries have produced the most Nobel laureates, and how does this compare between birth country and award affiliation country?

This question explores geographic distribution of Nobel laureates using two perspectives: the country of birth and the country associated with their affiliation at the time of the award. This allows for a comparison of national contribution versus institutional recognition.


4. Which Nobel laureates have won prizes in multiple categories?

This question identifies individuals who have received Nobel Prizes in more than one category. It requires grouping and filtering across the joined dataset and highlights rare cases of cross-disciplinary recognition.


Anticipated Challenges

Several challenges are expected when working with this dataset:

  • JSON data contains deeply nested structures requiring flattening and unnesting using tidyverse tools such as unnest_longer() and unnest_wider()
  • The dataset does not follow a traditional relational database structure; joins must be constructed using nested identifiers (e.g., extracting id from embedded lists)
  • Some fields contain missing or inconsistent values (e.g., unknown countries or incomplete birth information)
  • Column names are generated from nested JSON paths, resulting in long and complex variable names
  • List-columns such as links and laureates require additional processing before analysis

Example API Implementation (Nobel Prize API)

# Load libraries
library(jsonlite)
library(tibble)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(gt)

get_page <- function(url) {
  tryCatch({
    fromJSON(url, flatten = TRUE)
  }, error = function(e) {
    return(NULL)
  })
}

laureates_list <- list()
offset <- 0
limit <- 100

repeat {
  url <- paste0(
    "https://api.nobelprize.org/2.1/laureates",
    "?limit=", limit,
    "&offset=", offset
  )
  res <- get_page(url)
  if (is.null(res) || length(res$laureates) == 0) break
  laureates_list[[length(laureates_list) + 1]] <- res$laureates
  offset <- offset + limit
}

laureates_df <- bind_rows(laureates_list)


prizes_list <- list()
offset <- 0
limit <- 100

repeat {
  url <- paste0(
    "https://api.nobelprize.org/2.1/nobelPrizes",
    "?limit=", limit,
    "&offset=", offset
  )
  res <- get_page(url)
  if (is.null(res) || length(res$nobelPrizes) == 0) break
  prizes_list[[length(prizes_list) + 1]] <- res$nobelPrizes
  offset <- offset + limit
}
prizes_df <- bind_rows(prizes_list)

# Preview data
laureates_df |> 
  head(n = 2) |> 
  gt()
id fileName gender sameAs links nobelPrizes acronym nativeName penName knownName.en knownName.se knownName.no givenName.en givenName.se givenName.no familyName.en familyName.se familyName.no fullName.en fullName.se fullName.no birth.date birth.year birth.place.city.en birth.place.city.no birth.place.city.se birth.place.country.en birth.place.country.no birth.place.country.se birth.place.cityNow.en birth.place.cityNow.no birth.place.cityNow.se birth.place.cityNow.sameAs birth.place.cityNow.latitude birth.place.cityNow.longitude birth.place.countryNow.en birth.place.countryNow.no birth.place.countryNow.se birth.place.countryNow.sameAs birth.place.countryNow.latitude birth.place.countryNow.longitude birth.place.continent.en birth.place.continent.no birth.place.continent.se birth.place.locationString.en birth.place.locationString.no birth.place.locationString.se wikipedia.slug wikipedia.english wikidata.id wikidata.url death.date death.place.city.en death.place.city.no death.place.city.se death.place.country.en death.place.country.no death.place.country.se death.place.country.sameAs death.place.cityNow.en death.place.cityNow.no death.place.cityNow.se death.place.cityNow.sameAs death.place.cityNow.latitude death.place.cityNow.longitude death.place.countryNow.en death.place.countryNow.no death.place.countryNow.se death.place.countryNow.sameAs death.place.countryNow.latitude death.place.countryNow.longitude death.place.continent.en death.place.continent.no death.place.continent.se death.place.locationString.en death.place.locationString.no death.place.locationString.se orgName.en orgName.no orgName.se founded.date founded.place.city.en founded.place.city.no founded.place.city.se founded.place.country.en founded.place.country.no founded.place.country.se founded.place.country.sameAs founded.place.cityNow.en founded.place.cityNow.no founded.place.cityNow.se founded.place.cityNow.sameAs founded.place.countryNow.en founded.place.countryNow.no founded.place.countryNow.se founded.place.countryNow.sameAs founded.place.continent.en founded.place.continent.no founded.place.continent.se founded.place.locationString.en founded.place.locationString.no founded.place.locationString.se penNameOf.fullName foundedCountry.en foundedCountry.no foundedCountry.se foundedCountryNow.en foundedCountryNow.no foundedCountryNow.se foundedContinent.en
745 spence male https://www.wikidata.org/wiki/Q157245, https://en.wikipedia.org/wiki/Michael_Spence c("laureate", "external"), c("https://api.nobelprize.org/2/laureate/745", "https://www.nobelprize.org/laureate/745"), c("GET", "GET"), c("application/json", "text/html"), c(NA, "A. Michael Spence - Facts"), list(NULL, "laureate facts") 2001, 2, 1/3, 2001-10-10, received, 10000000, 15547541, list(list(name.en = "Stanford University", name.no = "Stanford University", name.se = "Stanford University", nameNow.en = "Stanford University", city.en = "Stanford, CA", city.no = "Stanford, CA", city.se = "Stanford, CA", country.en = "USA", country.no = "USA", country.se = "USA", cityNow.en = "Stanford, CA", cityNow.no = "Stanford, CA", cityNow.se = "Stanford, CA", cityNow.sameAs = list(c("https://www.wikidata.org/wiki/Q173813", "https://www.wikipedia.org/wiki/Stanford,_California")), cityNow.latitude = "37.424734", cityNow.longitude = "-122.163858", countryNow.en = "USA", countryNow.no = "USA", countryNow.se = "USA", countryNow.sameAs = list("https://www.wikidata.org/wiki/Q30"), countryNow.latitude = "39.828175", countryNow.longitude = "-98.579500", continent.en = "North America", locationString.en = "Stanford, CA, USA", locationString.no = "Stanford, CA, USA", locationString.se = "Stanford, CA, USA")), list(list(rel = c("nobelPrize", "external", "external"), href = c("https://api.nobelprize.org/2/nobelPrize/eco/2001", "https://www.nobelprize.org/prizes/economic-sciences/2001/spence/facts/", "https://www.nobelprize.org/prizes/economic-sciences/2001/summary/"), action = c("GET", "GET", "GET"), types = c("application/json", "text/html", "text/html"), title = c(NA, "A. Michael Spence - Facts", "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2001"), class = list(NULL, "laureate facts", "prize summary"))), Economic Sciences, Økonomi, Ekonomi, The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, Sveriges Riksbanks pris i økonomisk vitenskap til minne om Alfred Nobel, Sveriges Riksbanks pris i ekonomisk vetenskap till Alfred Nobels minne, for their analyses of markets with asymmetric information, för deras analys av marknader med assymetrisk informations NA NA NA A. Michael Spence A. Michael Spence NA A. Michael A. Michael NA Spence Spence NA A. Michael Spence A. Michael Spence NA 1943-00-00 1943 Montclair, NJ Montclair, NJ Montclair, NJ USA USA USA Montclair, NJ Montclair, NJ Montclair, NJ https://www.wikidata.org/wiki/Q678437, https://www.wikipedia.org/wiki/Montclair,_New_Jersey 40.825930 -74.209030 USA USA USA https://www.wikidata.org/wiki/Q30 39.828175 -98.579500 North America Nord-Amerika Nordamerika Montclair, NJ, USA Montclair, NJ, USA Montclair, NJ, USA Michael_Spence https://en.wikipedia.org/wiki/Michael_Spence Q157245 https://www.wikidata.org/wiki/Q157245 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
102 bohr male https://www.wikidata.org/wiki/Q103854, https://en.wikipedia.org/wiki/Aage_Bohr c("laureate", "external"), c("https://api.nobelprize.org/2/laureate/102", "https://www.nobelprize.org/laureate/102"), c("GET", "GET"), c("application/json", "text/html"), c(NA, "Aage N. Bohr - Facts"), list(NULL, "laureate facts") 1975, 1, 1/3, 1975-10-17, received, 630000, 4304697, list(list(name.en = "Niels Bohr Institute", name.no = "Niels Bohr Institute", name.se = "Niels Bohr Institute", nameNow.en = "Niels Bohr Institute", city.en = "Copenhagen", city.no = "København", city.se = "Köpenhamn", country.en = "Denmark", country.no = "Danmark", country.se = "Danmark", cityNow.en = "Copenhagen", cityNow.no = "København", cityNow.se = "Köpenhamn", cityNow.sameAs = list(c("https://www.wikidata.org/wiki/Q1748", "https://www.wikipedia.org/wiki/Copenhagen")), cityNow.latitude = "55.678127", cityNow.longitude = "12.572532", countryNow.en = "Denmark", countryNow.no = "Danmark", countryNow.se = "Danmark", countryNow.sameAs = list("https://www.wikidata.org/wiki/Q35"), countryNow.latitude = "56.000000", countryNow.longitude = "10.000000", continent.en = "Europe", locationString.en = "Copenhagen, Denmark", locationString.no = "København, Danmark", locationString.se = "Köpenhamn, Danmark")), list(list(rel = c("nobelPrize", "external", "external"), href = c("https://api.nobelprize.org/2/nobelPrize/phy/1975", "https://www.nobelprize.org/prizes/physics/1975/bohr/facts/", "https://www.nobelprize.org/prizes/physics/1975/summary/"), action = c("GET", "GET", "GET"), types = c("application/json", "text/html", "text/html"), title = c(NA, "Aage N. Bohr - Facts", "The Nobel Prize in Physics 1975"), class = list(NULL, "laureate facts", "prize summary"))), Physics, Fysikk, Fysik, The Nobel Prize in Physics, Nobelprisen i fysikk, Nobelpriset i fysik, for the discovery of the connection between collective motion and particle motion in atomic nuclei and the development of the theory of the structure of the atomic nucleus based on this connection, för upptäckten av sambandet mellan kollektiva rörelser och partikelrörelser i atomkärnor, samt den därpå baserade utvecklingen av teorien för atomkärnans struktur NA NA NA Aage N. Bohr Aage N. Bohr NA Aage N. Aage N. NA Bohr Bohr NA Aage Niels Bohr Aage Niels Bohr NA 1922-06-19 1922 Copenhagen København Köpenhamn Denmark Danmark Danmark Copenhagen København Köpenhamn https://www.wikidata.org/wiki/Q1748, https://www.wikipedia.org/wiki/Copenhagen 55.678127 12.572532 Denmark Danmark Danmark https://www.wikidata.org/wiki/Q35 56.000000 10.000000 Europe Europa Europa Copenhagen, Denmark København, Danmark Köpenhamn, Danmark Aage_Bohr https://en.wikipedia.org/wiki/Aage_Bohr Q103854 https://www.wikidata.org/wiki/Q103854 2009-09-08 Copenhagen København Köpenhamn Denmark Danmark Danmark https://www.wikidata.org/wiki/Q35 Copenhagen København Köpenhamn https://www.wikidata.org/wiki/Q1748, https://www.wikipedia.org/wiki/Copenhagen 55.678127 12.572532 Denmark Danmark Danmark https://www.wikidata.org/wiki/Q35 56.000000 10.000000 Europe Europa Europa Copenhagen, Denmark København, Danmark Köpenhamn, Danmark NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
prizes_df |>
  head(n = 2) |> 
  gt()
awardYear dateAwarded prizeAmount prizeAmountAdjusted links laureates category.en category.no category.se categoryFullName.en categoryFullName.no categoryFullName.se topMotivation.en topMotivation.se
1901 1901-11-12 150782 10833458 nobelPrize, https://api.nobelprize.org/2/nobelPrize/che/1901, GET, application/json 160, 1, 1, list(list(rel = "laureate", href = "https://api.nobelprize.org/2/laureate/160", action = "GET", types = "application/json")), Jacobus H. van 't Hoff, Jacobus Henricus van 't Hoff, in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions, såsom ett erkännande av den utomordentliga förtjänst han inlagt genom upptäckten av lagarna för den kemiska dynamiken och för det osmotiska trycket i lösningar Chemistry Kjemi Kemi The Nobel Prize in Chemistry Nobelprisen i kjemi Nobelpriset i kemi NA NA
1901 1901-11-14 150782 10833458 nobelPrize, https://api.nobelprize.org/2/nobelPrize/lit/1901, GET, application/json 569, 1, 1, list(list(rel = "laureate", href = "https://api.nobelprize.org/2/laureate/569", action = "GET", types = "application/json")), Sully Prudhomme, Sully Prudhomme, in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect, såsom ett erkännande av hans utmärkta, jämväl under senare år ådagalagda förtjänster som författare och särskilt av hans om hög idealitet, konstnärlig fulländning samt sällspord förening av hjärtats och snillets egenskaper vittnande diktning Literature Litteratur Litteratur The Nobel Prize in Literature Nobelprisen i litteratur Nobelpriset i litteratur NA NA

Data Cleaning & Preparation (Core Step for All Questions)

Goal

  • Create a tidy, analysis-ready dataset where:
  • Each row = one laureate–one prize
  • Includes:
    • id
    • year
    • category
    • birth_year
    • birth_country
    • affiliation_country

Step 1: Clean laureates_df

nobelPrizes is a list-column, must be unnested.

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.1     ✔ readr     2.2.0
✔ ggplot2   4.0.1     ✔ stringr   1.6.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)

laureates_clean <- laureates_df %>%
  select(
    id,
    given_name = givenName.en,
    family_name = familyName.en,
    known_name = knownName.en,
    birth_year = birth.year,
    birth_country = birth.place.country.en,
    nobelPrizes
  ) %>%
  unnest_longer(nobelPrizes) %>%
  unnest_wider(nobelPrizes) %>%
  mutate(
    awardYear = as.numeric(awardYear),
    category = category.en,
    
    # SAFE full name creation
    fullName = coalesce(
      known_name,
      paste(given_name, family_name)
    )
  ) %>%
  
  select(
    id,
    fullName,
    awardYear,
    category,
    birth_year,
    birth_country
  )

Step 2: Extract Affiliation Country

Affiliations are deeply nested inside affiliations, need careful extraction.

library(tidyverse)

laureates_affiliation <- laureates_df %>%
  select(id, nobelPrizes) %>%
  unnest_longer(nobelPrizes) %>%
  unnest_wider(nobelPrizes) %>%
  # Expand affiliations
  unnest_longer(affiliations, keep_empty = TRUE) %>%
  unnest_wider(affiliations) %>%
  mutate(
    awardYear = as.numeric(awardYear)
  ) %>%
  # KEY: expand country list properly
  unnest_longer(country.en, keep_empty = TRUE) %>%
  rename(affiliation_country = country.en) %>%
  select(id, awardYear, affiliation_country)

laureates_affiliation_clean <- laureates_affiliation %>%
  # Clean whitespace
  mutate(affiliation_country = str_trim(affiliation_country)) %>%
  # Remove duplicate country per laureate-year
  distinct(id, awardYear, affiliation_country)

Step 3: Merge Affiliation with Laureates

laureates_full <- laureates_clean %>%
  left_join(laureates_affiliation_clean,
            by = c("id", "awardYear"))

Step 4: Clean prizes_df

We only need structured fields here:

library(tidyverse)
library(tidyr)

prize_laureates <- prizes_df %>%
  mutate(
    awardYear = as.numeric(awardYear),
    category = category.en
  ) %>%
  select(awardYear, category, laureates) %>%
  unnest_longer(laureates) %>%
  unnest_wider(laureates)

prize_laureates_clean <- prize_laureates %>%
  select(id, awardYear, category)

Step 5: Join leaureates and prizes datasets

final_joined <- laureates_full %>%
  left_join(prize_laureates_clean,
            by = c("id", "awardYear", "category"))
head(final_joined)
# A tibble: 6 × 7
  id    fullName awardYear category birth_year birth_country affiliation_country
  <chr> <chr>        <dbl> <chr>    <chr>      <chr>         <chr>              
1 745   A. Mich…      2001 Economi… 1943       USA           USA                
2 102   Aage N.…      1975 Physics  1922       Denmark       Denmark            
3 779   Aaron C…      2004 Chemist… 1947       British Prot… Israel             
4 259   Aaron K…      1982 Chemist… 1926       Lithuania     United Kingdom     
5 1004  Abdulra…      2021 Literat… 1948       <NA>          <NA>               
6 114   Abdus S…      1979 Physics  1926       India         Italy              

Research Question 1: How has the number of Nobel Prizes awarded changed over time across different categories?

The visualization shows clear differences in how Nobel Prize awards evolve across categories over time. Overall, there is an increase in the number of laureates per year in later decades, reflecting the growth of global research activity and the expansion of award structures. Physics and Physiology or Medicine show the most variation and higher counts in recent years, suggesting a shift toward collaborative, team-based discoveries. Chemistry follows a more stable pattern with steady growth and fewer extreme fluctuations compared to Physics and Medicine.

Economic Sciences appears only in the late 20th century and shows a gradual upward trend with relatively consistent awarding patterns, reflecting its newer establishment as a Nobel category. In contrast, Literature and Peace remain relatively stable over time, typically awarding one laureate or small groups each year with limited variation. Overall, the plot highlights a clear divide between scientific fields, which are becoming more collaborative and variable, and non-scientific fields, which remain comparatively steady and individual-focused.

library(ggplot2)

prize_trends <- prize_laureates_clean %>%
  group_by(awardYear, category) %>%
  summarise(n_laureates = n_distinct(id), .groups = "drop")

ggplot(prize_trends, aes(x = awardYear, y = n_laureates, fill = category)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~category, scales = "free_y") +
  labs(
    title = "Nobel Laureates Over Time by Category",
    x = "Year",
    y = "Number of Laureates"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold")
  )

Research Question 2: What is the distribution of age at the time of receiving a Nobel Prize across different categories?

The age distribution shows that the lowest median for winning Nobel prize is for Physics, followed by Chemistry. Physiology or Medicine also has median age close to 55. Chemistry and Peace shows several outliers, which is an interesting finding.

library(tidyverse)

laureates_age <- laureates_full %>%
  mutate(
    birth_year = as.numeric(birth_year),
    awardYear = as.numeric(awardYear),
    age = awardYear - birth_year
  ) %>%
  filter(!is.na(age), age > 0, age < 120)

ggplot(laureates_age, aes(x = category, y = age, fill = category)) +
  geom_boxplot(alpha = 0.7, outlier.color = "red") +
  geom_jitter(width = 0.2, alpha = 0.4, size = 1) +
  stat_summary(fun = median, geom = "point", size = 3, color = "black") +
  labs(
    title = "Distribution of Age at Nobel Prize Award by Category",
    x = "Category",
    y = "Age at Award (years)"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text.x = element_text(angle = 30, hjust = 1),
    plot.title = element_text(face = "bold")
  )

Research Question 3: Which countries have produced the most Nobel laureates, and how does this compare between birth country and affiliation country?

USA has the highest Nobel price winner both as birth and affiliation country, follower by UK and Germany. USA, UK, Switzerland, and Denmark has was listed more as affiliation country than birth country.

library(tidyverse)

birth_country <- laureates_full %>%
  filter(!is.na(birth_country)) %>%
  group_by(birth_country) %>%
  summarise(n_birth = n_distinct(id), .groups = "drop") %>%
  arrange(desc(n_birth)) %>%
  slice_head(n = 10)

affiliation_country <- laureates_full %>%
  filter(!is.na(affiliation_country)) %>%
  separate_rows(affiliation_country, sep = ",\\s*") %>%
  group_by(affiliation_country) %>%
  summarise(n_affiliation = n_distinct(id), .groups = "drop") %>%
  arrange(desc(n_affiliation)) %>%
  slice_head(n = 10)

birth_country2 <- birth_country %>%
  rename(country = birth_country, count = n_birth) %>%
  mutate(type = "Birth Country")

affiliation_country2 <- affiliation_country %>%
  rename(country = affiliation_country, count = n_affiliation) %>%
  mutate(type = "Affiliation Country")

country_compare <- bind_rows(birth_country2, affiliation_country2)

ggplot(country_compare, aes(x = reorder(country, count), y = count, fill = type)) +
  geom_col(position = "dodge") +
  coord_flip() +
  labs(
    title = "Top Countries Producing Nobel Laureates: Birth vs Affiliation",
    x = "Country",
    y = "Number of Laureates",
    fill = "Country Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold")
  )

Research Question 4: Which Nobel laureates have won prizes in multiple categories?

There was only two Nobel laureates, Marie Curie and Linus Pauling, who won in highest two categories in their lifetime.

library(tidyverse)

multi_category_laureates <- prize_laureates_clean %>%
  group_by(id) %>%
  summarise(
    n_categories = n_distinct(category),
    categories = paste(sort(unique(category)), collapse = ", "),
    n_prizes = n(),
    .groups = "drop"
  ) %>%
  filter(n_categories > 1)

multi_category_named <- multi_category_laureates %>%
  left_join(
    laureates_clean %>% select(id, fullName),
    by = "id"
  )

ggplot(multi_category_named,
       aes(x = fullName, y = n_categories)) +
  geom_point(size = 4, color = "darkorange") +
  geom_segment(aes(x = fullName, xend = fullName,
                   y = 1, yend = n_categories),
               color = "gray70") +
  geom_text(
    aes(label = categories),
    vjust = -0.5,
    size = 3
  ) +
  labs(
    title = "Nobel Laureates with Prizes in Multiple Categories",
    x = "Laureate",
    y = "Number of Categories Awarded"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 20, hjust = 1),
    plot.title = element_text(face = "bold")
  )

Conclusion

This analysis used Nobel Prize API data to explore patterns in laureates’ demographics, award distribution, and geographic mobility. The results show clear temporal trends in prize distribution, meaningful variation in age across categories, and a strong difference between birth country and affiliation country, highlighting global mobility of researchers toward major research hubs. Finally, the analysis confirms that Nobel laureates receiving awards in multiple categories are extremely rare, emphasizing the highly specialized nature of Nobel Prize recognition.

References

  1. OpenAI. (2026). ChatGPT (GPT-5.3) responses on Nobel Prize JSON data analysis, data cleaning, visualization, and R programming support [Large language model]. https://chat.openai.com/