In this assignment, I use the public Nobel Prize API (https://www.nobelprize.org/about/developer-zone-2/) to download JSON data about Nobel laureates and Nobel Prizes. I ask and answer four questions:
The workflow is:
# Load laureate-level data from the Nobel API
laureatesRaw <- fromJSON("https://api.nobelprize.org/2.1/laureates", flatten = TRUE)
# Load prize-level data from the Nobel API
prizesRaw <- fromJSON("https://api.nobelprize.org/2.1/nobelPrizes", flatten = TRUE)
# Extract main data frames
laureates <- laureatesRaw$laureates
prizes <- prizesRaw$nobelPrizes
# The laureates data has a list-column nobelPrizes for each person.
# We'll unnest that so each row = one (laureate, prize they received).
laureatePrizes <- laureates %>%
select(
laureateId = id,
givenName = givenName.en,
familyName = familyName.en,
birthDate = birth.date,
birthCountry = birth.place.country.en,
nobelPrizes
) %>%
unnest(nobelPrizes) %>%
# affiliations is itself a list of one or more institutions.
# We'll grab the first affiliation's country as a proxy
mutate(
affiliationCountryList = map(affiliations, ~ .x$country.en),
firstAffiliationCountry = map_chr(
affiliationCountryList,
~ ifelse(length(.x) == 0, NA, .x[1])
)
) %>%
transmute(
laureateId,
fullName = str_trim(paste(givenName, familyName)),
birthDate,
birthCountry,
awardYear = awardYear,
category = category.en,
prizeCountryAtAward = firstAffiliationCountry
)
# Peek at the cleaned rectangular data
head(laureatePrizes)
## # A tibble: 6 × 7
## laureateId fullName birthDate birthCountry awardYear category
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 745 A. Michael Spence 1943-00-00 USA 2001 Economi…
## 2 102 Aage N. Bohr 1922-06-19 Denmark 1975 Physics
## 3 779 Aaron Ciechanover 1947-10-01 British Protectora… 2004 Chemist…
## 4 259 Aaron Klug 1926-08-11 Lithuania 1982 Chemist…
## 5 1004 Abdulrazak Gurnah 1948-00-00 <NA> 2021 Literat…
## 6 114 Abdus Salam 1926-01-29 India 1979 Physics
## # ℹ 1 more variable: prizeCountryAtAward <chr>
We group unique laureates by their reported birth country and count how many distinct laureates were born in each country. This tells us which countries produced the most Nobel Prize winners (regardless of where they later worked).
laureatesByBirthCountry <- laureatePrizes %>%
distinct(laureateId, fullName, birthCountry) %>%
count(birthCountry, sort = TRUE)
# Show the top 10 birth countries by number of Nobel laureates
topBirthCountries <- head(laureatesByBirthCountry, 10)
kable(
topBirthCountries,
caption = "Top birth countries of Nobel laureates (by count of unique laureates)"
)
| birthCountry | n |
|---|---|
| USA | 4 |
| Germany | 2 |
| India | 2 |
| Japan | 2 |
| Prussia | 2 |
| Argentina | 1 |
| Belgium | 1 |
| British Mandate of Palestine | 1 |
| British Protectorate of Palestine | 1 |
| Denmark | 1 |
Interpretation: The table above shows the countries that produced the most Nobel laureates by birthplace. These countries tend to be places with long-standing research institutions, strong universities, and access to scientific or cultural capital that supports award-winning work.
Definition: A country “loses” a laureate if the person was born there, but at the time of winning the Nobel Prize, their affiliation was in a different country. We compare each laureate’s birthCountry to the country of their affiliation when the prize was awarded (prizeCountryAtAward). If those differ, we consider that an outbound migration.
We then count how many distinct laureates each birth country “lost.”
# Mark whether the laureate was in a different country at award time
migration <- laureatePrizes %>%
filter(!is.na(birthCountry), !is.na(prizeCountryAtAward)) %>%
mutate(migrated = birthCountry != prizeCountryAtAward)
# Count "losses": people born in X but awarded in Y != X
lossByBirthCountry <- migration %>%
filter(migrated) %>%
distinct(laureateId, birthCountry) %>%
count(birthCountry, sort = TRUE)
topLossCountries <- head(lossByBirthCountry, 10)
kable(
topLossCountries,
caption = "Birth countries that saw the most laureates win abroad (talent 'lost')"
)
| birthCountry | n |
|---|---|
| India | 2 |
| Prussia | 2 |
| British Mandate of Palestine | 1 |
| British Protectorate of Palestine | 1 |
| Egypt | 1 |
| Lithuania | 1 |
| New Zealand | 1 |
We can also flip the view and see “gains”: which award countries benefited from talent born elsewhere.
gainByAwardCountry <- migration %>%
filter(migrated) %>%
distinct(laureateId, prizeCountryAtAward) %>%
count(prizeCountryAtAward, sort = TRUE)
topGainCountries <- head(gainByAwardCountry, 10)
kable(
topGainCountries,
caption = "Award affiliation countries that gained talent born elsewhere (talent 'gained')"
)
| prizeCountryAtAward | n |
|---|---|
| USA | 4 |
| Israel | 2 |
| Germany | 1 |
| Italy | 1 |
| United Kingdom | 1 |
Interpretation: The first table ranks the countries where many laureates were born but ultimately received the Nobel Prize under a different country’s affiliation. This suggests outbound migration of highly successful researchers or creators. The second table shows where that talent tends to go at award time. Typically we see a small number of countries repeatedly appearing as the affiliation country for award-winning work, indicating they attract or retain top researchers.
We approximate each laureate’s age when they received the Nobel Prize as:
ageAtAward = awardYear - birthYear
Then we look at how that average changes over time by decade and by category.
laureateAges <- laureatePrizes %>%
mutate(
birthYear = as.numeric(substr(birthDate, 1, 4)),
awardYearNum = as.numeric(awardYear),
ageAtAward = awardYearNum - birthYear
) %>%
filter(
!is.na(ageAtAward),
ageAtAward > 0,
ageAtAward < 120
)
# Average age per decade and category
ageByDecade <- laureateAges %>%
mutate(decade = floor(awardYearNum / 10) * 10) %>%
group_by(decade, category) %>%
summarize(
meanAge = mean(ageAtAward),
nAwards = n(),
.groups = "drop"
) %>%
arrange(decade, category)
head(ageByDecade, 20)
## # A tibble: 20 × 4
## decade category meanAge nAwards
## <dbl> <chr> <dbl> <int>
## 1 1900 Chemistry 70 1
## 2 1900 Physics 55 1
## 3 1920 Chemistry 52 1
## 4 1930 Chemistry 36 1
## 5 1950 Literature 44 1
## 6 1960 Physiology or Medicine 49 1
## 7 1970 Physics 53 2
## 8 1970 Physiology or Medicine 76 1
## 9 1980 Chemistry 56 1
## 10 1980 Peace 49 1
## 11 1990 Chemistry 53 1
## 12 2000 Chemistry 66 4
## 13 2000 Economic Sciences 58 1
## 14 2000 Peace 59 1
## 15 2010 Chemistry 75.5 2
## 16 2010 Economic Sciences 58 1
## 17 2010 Peace 43 1
## 18 2010 Physics 42 1
## 19 2020 Literature 73 1
## 20 2020 Physics 75 1
We can also visualize how average age moves across decades. Each line represents one Nobel category.
ggplot(ageByDecade, aes(x = decade, y = meanAge, group = category)) +
geom_line() +
geom_point() +
labs(
title = "Average Age of Nobel Laureates at Time of Award, by Decade",
x = "Decade",
y = "Average Age (years)"
)
Interpretation: The plot shows whether Nobel recognition is drifting later in careers. Many scientific categories trend toward higher ages over time, suggesting that Nobel committees often reward work with long-term, proven impact. Categories like Literature and Peace already tend to honor people later in life, so their mean ages are typically high and remain high.
Section 1 showed which birth countries have produced the most Nobel laureates, suggesting a small number of countries dominate Nobel production.
Section 2 measured migration: which countries “lose” talent (laureates born there but awarded elsewhere), and which countries “gain” talent (affiliations at award time). This highlights global movement of top researchers.
Section 3 quantified how often prizes are shared within a category. Scientific prizes are often shared among multiple collaborators, while Literature is mostly single-winner.
Section 4 approximated laureate age at the time of the award and tracked trends over decades. We observe that, especially in technical fields, recognition tends to come later in a career.
Overall, the Nobel Prize API JSON lets us answer historical, demographic, and structural questions about how Nobel Prizes get awarded.