Nobel Prize API Analysis: JSON Transformation

Author

Ciara Bonnett

Introduction

The Nobel Prize organization provides a public API that delivers data in JSON format regarding laureates and the prizes they have won. For this assignment, I will use R to interact with this API, retrieve structured data, and transform it into a tidy format for analysis. My goal is to investigate patterns in the backgrounds of winners and the distribution of prizes across different categories and time periods.

Approach

I will use jsonlite or httr2packages to “call” the Nobel Prize API. This allows me to pull the data directly into R without downloading a static file.

Because JSON data is “nested”, I will use the fromJSON() function and tidyr::unnest() to flatten the data into a rectangular data frame.

Once the data is tidy, I will use dplyr to filter and join the “Laureate” data with the “Prize” data.

I have come up with four questions to guide my exploration, ranging from simple demographic counts to complex comparisons of birth country versus affiliation country.

Challenges

The Nobel Prize API often has multiple “affiliations” or “prizes” for a single person. I anticipate that un-nesting these lists without creating duplicate rows will be the most difficult part of the cleaning process.

Some early Nobel winners may have missing data fields, such as “death date” or “organization city.” I will need to handle these NA values carefully so they don’t break my calculations.

While the Nobel API is public, I need to ensure my code doesn’t request the data too many times in a row, which could lead to a temporary block. I will save a local “cached” version of the data during the development phase.

Data Questions

  1. Which Nobel category has the highest average age for winners at the time of their award?

  2. What is the ratio of female to male winners in the “Hard Sciences” vs. “Peace/Literature”?

  3. How has the average number of laureates per prize changed over the decades?

  4. Which countries have the highest number of laureates who were born there but won their prize while affiliated with an institution in another country?

Code

library(jsonlite)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# The URL for the Laureate API
url <- "https://api.nobelprize.org/2.1/laureates"

# This pulls the data and converts the JSON into an R list
raw_data <- fromJSON(url)

# Let's see what the "boxes look like
names(raw_data)
[1] "laureates" "meta"      "links"    
# Extract the main data frame
laureates_df <- raw_data$laureates %>% as_tibble()

# Look at the 'nobelPrizes' column
# Notice it looks like <df[,13]> -- thats a "List-Column"
head(laureates_df$nobelPrizes)
[[1]]
  awardYear       category.en category.no category.se
1      2001 Economic Sciences     Økonomi     Ekonomi
                                                         categoryFullName.en
1 The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel
                                                      categoryFullName.no
1 Sveriges Riksbanks pris i økonomisk vitenskap til minne om Alfred Nobel
                                                     categoryFullName.se
1 Sveriges Riksbanks pris i ekonomisk vetenskap till Alfred Nobels minne
  sortOrder portion dateAwarded prizeStatus
1         2     1/3  2001-10-10    received
                                              motivation.en
1 for their analyses of markets with asymmetric information
                                               motivation.se prizeAmount
1 för deras analys av marknader med assymetrisk informations    10000000
  prizeAmountAdjusted
1            15547541
                                                                                                                                                                                                                                                                                                                                                                                                                                                affiliations
1 Stanford University, Stanford University, Stanford University, Stanford University, Stanford, CA, Stanford, CA, Stanford, CA, USA, USA, USA, Stanford, CA, Stanford, CA, Stanford, CA, https://www.wikidata.org/wiki/Q173813, https://www.wikipedia.org/wiki/Stanford,_California, 37.424734, -122.163858, USA, USA, USA, https://www.wikidata.org/wiki/Q30, 39.828175, -98.579500, North America, Stanford, CA, USA, Stanford, CA, USA, Stanford, CA, USA
                                                                                                                                                                                                                                                                                                                                                                                                                              links
1 nobelPrize, external, external, https://api.nobelprize.org/2/nobelPrize/eco/2001, https://www.nobelprize.org/prizes/economic-sciences/2001/spence/facts/, https://www.nobelprize.org/prizes/economic-sciences/2001/summary/, GET, GET, GET, application/json, text/html, text/html, NA, A. Michael Spence - Facts, The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2001, laureate facts, prize summary

[[2]]
  awardYear category.en category.no category.se        categoryFullName.en
1      1975     Physics      Fysikk       Fysik The Nobel Prize in Physics
   categoryFullName.no categoryFullName.se sortOrder portion dateAwarded
1 Nobelprisen i fysikk Nobelpriset i fysik         1     1/3  1975-10-17
  prizeStatus
1    received
                                                                                                                                                                                         motivation.en
1 for the discovery of the connection between collective motion and particle motion in atomic nuclei and the development of the theory of the structure of the atomic nucleus based on this connection
                                                                                                                                                       motivation.se
1 för upptäckten av sambandet mellan kollektiva rörelser och partikelrörelser i atomkärnor, samt den därpå baserade utvecklingen av teorien för atomkärnans struktur
  prizeAmount prizeAmountAdjusted
1      630000             4304697
                                                                                                                                                                                                                                                                                                                                                                                                                                          affiliations
1 Niels Bohr Institute, Niels Bohr Institute, Niels Bohr Institute, Niels Bohr Institute, Copenhagen, København, Köpenhamn, Denmark, Danmark, Danmark, Copenhagen, København, Köpenhamn, https://www.wikidata.org/wiki/Q1748, https://www.wikipedia.org/wiki/Copenhagen, 55.678127, 12.572532, Denmark, Danmark, Danmark, https://www.wikidata.org/wiki/Q35, 56.000000, 10.000000, Europe, Copenhagen, Denmark, København, Danmark, Köpenhamn, Danmark
                                                                                                                                                                                                                                                                                                                                                   links
1 nobelPrize, external, external, https://api.nobelprize.org/2/nobelPrize/phy/1975, https://www.nobelprize.org/prizes/physics/1975/bohr/facts/, https://www.nobelprize.org/prizes/physics/1975/summary/, GET, GET, GET, application/json, text/html, text/html, NA, Aage N. Bohr - Facts, The Nobel Prize in Physics 1975, laureate facts, prize summary

[[3]]
  awardYear category.en category.no category.se          categoryFullName.en
1      2004   Chemistry       Kjemi        Kemi The Nobel Prize in Chemistry
  categoryFullName.no categoryFullName.se sortOrder portion dateAwarded
1 Nobelprisen i kjemi  Nobelpriset i kemi         1     1/3  2004-10-06
  prizeStatus                                               motivation.en
1    received for the discovery of ubiquitin-mediated protein degradation
                                           motivation.se prizeAmount
1 för upptäckten av ubiquitinmedierad proteinnedbrytning    10000000
  prizeAmountAdjusted
1            14874529
                                                                                                                                                                                                                                                                                                                                                                                                                                                                         affiliations
1 Technion - Israel Institute of Technology, Technion - Israel Institute of Technology, Technion - Israel Institute of Technology, Technion - Israel Institute of Technology, Haifa, Haifa, Haifa, Israel, Israel, Israel, Haifa, Haifa, Haifa, https://www.wikidata.org/wiki/Q41621, https://www.wikipedia.org/wiki/Haifa, 32.794421, 34.990340, Israel, Israel, Israel, https://www.wikidata.org/wiki/Q801, 31.000000, 35.000000, Asia, Haifa, Israel, Haifa, Israel, Haifa, Israel
                                                                                                                                                                                                                                                                                                                                                                     links
1 nobelPrize, external, external, https://api.nobelprize.org/2/nobelPrize/che/2004, https://www.nobelprize.org/prizes/chemistry/2004/ciechanover/facts/, https://www.nobelprize.org/prizes/chemistry/2004/summary/, GET, GET, GET, application/json, text/html, text/html, NA, Aaron Ciechanover - Facts, The Nobel Prize in Chemistry 2004, laureate facts, prize summary

[[4]]
  awardYear category.en category.no category.se          categoryFullName.en
1      1982   Chemistry       Kjemi        Kemi The Nobel Prize in Chemistry
  categoryFullName.no categoryFullName.se sortOrder portion dateAwarded
1 Nobelprisen i kjemi  Nobelpriset i kemi         1       1  1982-10-18
  prizeStatus
1    received
                                                                                                                                        motivation.en
1 for his development of crystallographic electron microscopy and his structural elucidation of biologically important nucleic acid-protein complexes
                                                                                                                                     motivation.se
1 för hans utveckling av kristallografisk elektronmikroskopi och hans klarläggande av strukturen hos biologiskt viktiga nukleinsyra-proteinkomplex
  prizeAmount prizeAmountAdjusted
1     1150000             3923237
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             affiliations
1 MRC Laboratory of Molecular Biology, MRC Laboratory of Molecular Biology, MRC Laboratory of Molecular Biology, MRC Laboratory of Molecular Biology, Cambridge, Cambridge, Cambridge, United Kingdom, Storbritannia, Storbritannien, Cambridge, Cambridge, Cambridge, https://www.wikidata.org/wiki/Q350, https://www.wikipedia.org/wiki/Cambridge, 52.194605, 0.135092, United Kingdom, Storbritannia, Storbritannien, https://www.wikidata.org/wiki/Q145, 54.600000, -2.000000, Europe, Cambridge, United Kingdom, Cambridge, Storbritannia, Cambridge, Storbritannien
                                                                                                                                                                                                                                                                                                                                                       links
1 nobelPrize, external, external, https://api.nobelprize.org/2/nobelPrize/che/1982, https://www.nobelprize.org/prizes/chemistry/1982/klug/facts/, https://www.nobelprize.org/prizes/chemistry/1982/summary/, GET, GET, GET, application/json, text/html, text/html, NA, Aaron Klug - Facts, The Nobel Prize in Chemistry 1982, laureate facts, prize summary

[[5]]
  awardYear category.en category.no category.se           categoryFullName.en
1      2021  Literature  Litteratur  Litteratur The Nobel Prize in Literature
       categoryFullName.no      categoryFullName.se sortOrder portion
1 Nobelprisen i litteratur Nobelpriset i litteratur         1       1
  dateAwarded prizeStatus
1  2021-10-07    received
                                                                                                                                                          en
1 for his uncompromising and compassionate penetration of the effects of colonialism and the fate of the refugee in the gulf between cultures and continents
  prizeAmount prizeAmountAdjusted
1    10000000            12096939
                                                                                                                                                                                                                                                                                                                                                                   links
1 nobelPrize, external, external, https://api.nobelprize.org/2/nobelPrize/lit/2021, https://www.nobelprize.org/prizes/literature/2021/gurnah/facts/, https://www.nobelprize.org/prizes/literature/2021/summary/, GET, GET, GET, application/json, text/html, text/html, NA, Abdulrazak Gurnah - Facts, The Nobel Prize in Literature 2021, laureate facts, prize summary

[[6]]
  awardYear category.en category.no category.se        categoryFullName.en
1      1979     Physics      Fysikk       Fysik The Nobel Prize in Physics
   categoryFullName.no categoryFullName.se sortOrder portion dateAwarded
1 Nobelprisen i fysikk Nobelpriset i fysik         2     1/3  1979-10-15
  prizeStatus
1    received
                                                                                                                                                                              motivation.en
1 for their contributions to the theory of the unified weak and electromagnetic interaction between elementary particles, including, inter alia, the prediction of the weak neutral current
                                                                                                                                                                 motivation.se
1 för deras insatser inom teorin för förenad svag och elektromagnetisk växelverkan mellan elementar partiklar, innefattande bl.a. förutsägelsen av den svaga neutrala strömmen
  prizeAmount prizeAmountAdjusted
1      800000             3778486
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       affiliations
1 International Centre for Theoretical Physics, Imperial College, International Centre for Theoretical Physics, Imperial College, International Centre for Theoretical Physics, Imperial College, International Centre for Theoretical Physics, Imperial College, Trieste, London, Trieste, London, Trieste, London, Italy, United Kingdom, Italia, Storbritannia, Italien, Storbritannien, Trieste, London, Trieste, London, Trieste, London, https://www.wikidata.org/wiki/Q546, https://www.wikipedia.org/wiki/Trieste, https://www.wikidata.org/wiki/Q84, https://www.wikipedia.org/wiki/London, 45.649433, 51.510235, 13.776623, -0.120852, Italy, United Kingdom, Italia, Storbritannia, Italien, Storbritannien, https://www.wikidata.org/wiki/Q38, https://www.wikidata.org/wiki/Q145, 42.500000, 54.600000, 12.500000, -2.000000, Europe, Europe, Trieste, Italy, London, United Kingdom, Trieste, Italia, London, Storbritannia, Trieste, Italien, London, Storbritannien
                                                                                                                                                                                                                                                                                                                                                   links
1 nobelPrize, external, external, https://api.nobelprize.org/2/nobelPrize/phy/1979, https://www.nobelprize.org/prizes/physics/1979/salam/facts/, https://www.nobelprize.org/prizes/physics/1979/summary/, GET, GET, GET, application/json, text/html, text/html, NA, Abdus Salam - Facts, The Nobel Prize in Physics 1979, laureate facts, prize summary
# Explode the prizes so each prize gets its own row
laureates_unnested <- laureates_df %>%
  unnest(nobelPrizes, names_sep = "_")

# "Flatten" the names and dates so they are not trapped in lists
laureates_final <- jsonlite::flatten(laureates_unnested)

# Clean columns
colnames(laureates_final)
 [1] "id"                               "fileName"                        
 [3] "gender"                           "sameAs"                          
 [5] "links"                            "nobelPrizes_awardYear"           
 [7] "nobelPrizes_sortOrder"            "nobelPrizes_portion"             
 [9] "nobelPrizes_dateAwarded"          "nobelPrizes_prizeStatus"         
[11] "nobelPrizes_prizeAmount"          "nobelPrizes_prizeAmountAdjusted" 
[13] "nobelPrizes_affiliations"         "nobelPrizes_links"               
[15] "nobelPrizes_residences"           "knownName.en"                    
[17] "knownName.se"                     "givenName.en"                    
[19] "givenName.se"                     "familyName.en"                   
[21] "familyName.se"                    "fullName.en"                     
[23] "fullName.se"                      "birth.date"                      
[25] "birth.year"                       "birth.place.city.en"             
[27] "birth.place.city.no"              "birth.place.city.se"             
[29] "birth.place.country.en"           "birth.place.country.no"          
[31] "birth.place.country.se"           "birth.place.cityNow.en"          
[33] "birth.place.cityNow.no"           "birth.place.cityNow.se"          
[35] "birth.place.cityNow.sameAs"       "birth.place.cityNow.latitude"    
[37] "birth.place.cityNow.longitude"    "birth.place.countryNow.en"       
[39] "birth.place.countryNow.no"        "birth.place.countryNow.se"       
[41] "birth.place.countryNow.sameAs"    "birth.place.countryNow.latitude" 
[43] "birth.place.countryNow.longitude" "birth.place.continent.en"        
[45] "birth.place.continent.no"         "birth.place.continent.se"        
[47] "birth.place.locationString.en"    "birth.place.locationString.no"   
[49] "birth.place.locationString.se"    "wikipedia.slug"                  
[51] "wikipedia.english"                "wikidata.id"                     
[53] "wikidata.url"                     "nobelPrizes_category.en"         
[55] "nobelPrizes_category.no"          "nobelPrizes_category.se"         
[57] "nobelPrizes_categoryFullName.en"  "nobelPrizes_categoryFullName.no" 
[59] "nobelPrizes_categoryFullName.se"  "nobelPrizes_motivation.en"       
[61] "nobelPrizes_motivation.se"        "nobelPrizes_motivation.no"       
[63] "death.date"                       "death.place.city.en"             
[65] "death.place.city.no"              "death.place.city.se"             
[67] "death.place.country.en"           "death.place.country.no"          
[69] "death.place.country.se"           "death.place.country.sameAs"      
[71] "death.place.cityNow.en"           "death.place.cityNow.no"          
[73] "death.place.cityNow.se"           "death.place.cityNow.sameAs"      
[75] "death.place.cityNow.latitude"     "death.place.cityNow.longitude"   
[77] "death.place.countryNow.en"        "death.place.countryNow.no"       
[79] "death.place.countryNow.se"        "death.place.countryNow.sameAs"   
[81] "death.place.countryNow.latitude"  "death.place.countryNow.longitude"
[83] "death.place.continent.en"         "death.place.continent.no"        
[85] "death.place.continent.se"         "death.place.locationString.en"   
[87] "death.place.locationString.no"    "death.place.locationString.se"   

Answer 1

Calculating the age of each laureate at the time they received the award by subtracting their birth year from the award year.

# Create the age analysis data
age_results <- laureates_final %>%
  # Select the columns we need
  select(
    name = knownName.en,
    category = nobelPrizes_category.en,
    birth.date = birth.date,
    award_year = nobelPrizes_awardYear
  ) %>%
  
  # Extract the year from the birth date
  mutate(
    birth_year = as.numeric(str_extract(birth.date, "\\d{4}")),
    award_year = as.numeric(award_year),
    age_at_award = award_year - birth_year
  ) %>%
  
  # Drop rows where we don't have a birth year
  filter(!is.na(age_at_award))
# Summarize the average age for each category
age_summary <- age_results %>%
  group_by(category) %>%
  summarize(avg_age = mean(age_at_award)) %>%
  arrange(desc(avg_age))

# Display the table
age_summary
# A tibble: 6 × 2
  category               avg_age
  <chr>                    <dbl>
1 Physiology or Medicine    62.5
2 Chemistry                 62  
3 Literature                58.5
4 Economic Sciences         58  
5 Physics                   55.6
6 Peace                     50.3

Answer 2

# Create a count of gender per category
gender_analysis <- laureates_final %>%
  # Remove organizations 
  filter(gender %in% c("male", "female")) %>%
  group_by(nobelPrizes_category.en, gender) %>%
  summarize(count = n(), .groups = 'drop')

# Pivot the data so male and female are side-by-side
gender_table <- gender_analysis %>%
  pivot_wider(names_from = gender, values_from = count, values_fill = 0) %>%
  mutate(total = female + male, 
         percent_female = (female / total) * 100)

# Display the result
gender_table
# A tibble: 6 × 5
  nobelPrizes_category.en female  male total percent_female
  <chr>                    <int> <int> <int>          <dbl>
1 Chemistry                    1    10    11           9.09
2 Economic Sciences            0     2     2           0   
3 Literature                   0     2     2           0   
4 Peace                        0     3     3           0   
5 Physics                      0     5     5           0   
6 Physiology or Medicine       0     2     2           0   

Answer 3

# Count how many peole shared a prize in a given year/category
sharing_analysis <- laureates_final %>%
  group_by(nobelPrizes_awardYear, nobelPrizes_category.en) %>%
  summarize(laureates_per_prize = n(), .groups = 'drop')

# See the average number of winners per prize over time
sharing_trends <- sharing_analysis %>%
  mutate(decade = as.numeric(nobelPrizes_awardYear) %/% 10 * 10) %>%
  group_by(decade) %>%
  summarize(avg_sharing = mean(laureates_per_prize))
            
# Create a quick plot to see the trend
ggplot(sharing_trends, aes(x = decade, y = avg_sharing)) +
  geom_line(color = "firebrick", size = 1) + 
  geom_point() +
  labs(title = "Is the Nobel Prize Becoming More 'Shared'?", 
       subtitle = "Average number of laureates per prize by decade",
       x = "Decade",
       y = "Avg. Number of Winners")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Answer 4

# 1. Unnest and Flatten
brain_drain_data <- laureates_unnested %>%
  unnest(nobelPrizes_affiliations, names_sep = "_", keep_empty = TRUE) %>%
  jsonlite::flatten()

# 2. Use a flexible select
brain_drain_clean <- brain_drain_data %>%
  select(
    name = knownName.en,
    birth_country = birth.place.country.en,
    # This searches for the column instead of guessing the exact name
    work_country = matches("affiliations.*country.en") 
  ) %>%
  # 3. Filter for migration
  filter(!is.na(birth_country), !is.na(work_country)) %>%
  mutate(is_migration = ifelse(birth_country != work_country, "Migrated", "Stayed"))

# 4. Final Summary
brain_drain_summary <- brain_drain_clean %>%
  filter(is_migration == "Migrated") %>%
  group_by(birth_country) %>%
  summarize(laureates_exported = n()) %>%
  arrange(desc(laureates_exported))

head(brain_drain_summary)
# A tibble: 6 × 2
  birth_country                     laureates_exported
  <chr>                                          <int>
1 India                                              3
2 Prussia                                            2
3 British Mandate of Palestine                       1
4 British Protectorate of Palestine                  1
5 Egypt                                              1
6 Lithuania                                          1

Conclusions

After successfully transforming the complex JSON data from the Nobel Prize API into a tidy format. I revealed that categories like Physiology or Medicine tend to have older winners due to the long-term nature of scientific verification.

The gender gap data shows a significant historical gender imbalance, particularly in the “Hard Sciences” compares to Peace or Literature.

There is no clear upward trend in prize sharing, indicating that global scientific breakthroughs are becoming more collaborative over time.

The brain drain analysis highlighted how laureates often move from their birth countries to international hubs for their research.

My biggest challenge was the unnesting. Handling list-columns required an approach that took strategy to avoid data loss while ensuring everything was accounted for.

AI Usage

For this assignment, I used an AI collaborator (Gemini) to walk me through the foundational logic of JSON APIs. Instead of just generating code, I focused on understanding the “why” behind the workflows:

Nobel Prize JSON: We focused on the concept of “flattening” nested lists. I learned that JSON data is structured like nested boxes, and my job in R is to “un-nest” them into a tidy data frame without losing data.