Assignment 10B Nobel Prize

Author

Khandker Qaiduzzaman

Objective

The goal of this assignment is to use the Nobel Prize API to retrieve JSON data, transform it into tidy data frames in R, and explore the dataset to answer four data-driven questions.


Approach

This analysis focuses on retrieving structured JSON data from the Nobel Prize API, transforming it into tidy data using tidyverse principles, and preparing it for exploratory data analysis.


Step 1: Data Collection Using Nobel Prize API

The dataset is obtained from the Nobel Prize Developer Zone, which provides structured JSON data on Nobel laureates and prize awards.

Two endpoints are used:

  • Laureates Endpoint: contains personal information such as name, gender, birth date, and country of birth
  • Prizes Endpoint: contains award-level information such as year, category, and prize details

The data is retrieved in JSON format and converted into R data frames for further processing.


Step 2: JSON Structure and Data Preparation

The JSON data is parsed using the jsonlite package with flatten = TRUE to partially simplify nested structures.

However, the datasets still contain complex nested components such as: - List-columns (e.g., nobelPrizes, laureates, links) - Nested identifiers inside sub-structures - Hierarchical fields generated from JSON paths (e.g., birth.place.country.en)

Initial inspection using head() and glimpse() is used to understand:

- Variable structure
- Key identifiers (especially id)
- Fields required for analysis (country, year, category)

Further data wrangling such as unnesting and restructuring will be required in later stages before analysis.


Data Analysis Workflow

The workflow begins by retrieving JSON data from the Nobel Prize API and converting it into data frames.

Next, both datasets are explored to understand their structure and nesting behavior. The laureates dataset contains individual-level data, while the prizes dataset contains award-level data with embedded laureate information.

Since the data is not structured in a traditional relational format, joins require extracting nested identifiers (such as id) from list-columns rather than relying on a simple shared key.

After restructuring, the datasets will be combined to enable comparative and longitudinal analysis.

Finally, the cleaned dataset will be used to answer four research questions involving grouping, filtering, time trends, and cross-variable comparisons.


Research Questions

The following four questions will guide the analysis. The questions are designed to balance interpretability and analytical depth while remaining feasible using the Nobel Prize API structure.


1. How has the number of Nobel Prizes awarded changed over time across different categories?

This question examines historical trends in Nobel Prize distribution and compares how awards have evolved across major categories such as Physics, Chemistry, Medicine, Literature, and Peace. It helps identify whether certain fields have become more prominent over time.


2. What is the distribution of age at the time of receiving a Nobel Prize across different categories?

This question investigates whether laureates in different fields tend to receive the Nobel Prize at different stages of life. Age will be derived using birth year and award year, and summarized by category.


3. Which countries have produced the most Nobel laureates, and how does this compare between birth country and award affiliation country?

This question explores geographic distribution of Nobel laureates using two perspectives: the country of birth and the country associated with their affiliation at the time of the award. This allows for a comparison of national contribution versus institutional recognition.


4. Which Nobel laureates have won prizes in multiple categories?

This question identifies individuals who have received Nobel Prizes in more than one category. It requires grouping and filtering across the joined dataset and highlights rare cases of cross-disciplinary recognition.


Anticipated Challenges

Several challenges are expected when working with this dataset:

  • JSON data contains deeply nested structures requiring flattening and unnesting using tidyverse tools such as unnest_longer() and unnest_wider()
  • The dataset does not follow a traditional relational database structure; joins must be constructed using nested identifiers (e.g., extracting id from embedded lists)
  • Some fields contain missing or inconsistent values (e.g., unknown countries or incomplete birth information)
  • Column names are generated from nested JSON paths, resulting in long and complex variable names
  • List-columns such as links and laureates require additional processing before analysis

Example API Implementation (Nobel Prize API)

# Load libraries
library(jsonlite)
library(tibble)
library(gt)

# Retrieve Laureates data
laureates_df <- fromJSON(
  "https://api.nobelprize.org/2.1/laureates",
  flatten = TRUE
)$laureates

# Retrieve Prizes data
prizes_df <- fromJSON(
  "https://api.nobelprize.org/2.1/nobelPrizes",
  flatten = TRUE
)$nobelPrizes

# Convert to tibble
laureates_df <- as_tibble(laureates_df)
prizes_df <- as_tibble(prizes_df)

# Preview data
laureates_df |> 
  head(n = 2) |> 
  gt()
id fileName gender sameAs links nobelPrizes knownName.en knownName.se givenName.en givenName.se familyName.en familyName.se fullName.en fullName.se birth.date birth.year birth.place.city.en birth.place.city.no birth.place.city.se birth.place.country.en birth.place.country.no birth.place.country.se birth.place.cityNow.en birth.place.cityNow.no birth.place.cityNow.se birth.place.cityNow.sameAs birth.place.cityNow.latitude birth.place.cityNow.longitude birth.place.countryNow.en birth.place.countryNow.no birth.place.countryNow.se birth.place.countryNow.sameAs birth.place.countryNow.latitude birth.place.countryNow.longitude birth.place.continent.en birth.place.continent.no birth.place.continent.se birth.place.locationString.en birth.place.locationString.no birth.place.locationString.se wikipedia.slug wikipedia.english wikidata.id wikidata.url death.date death.place.city.en death.place.city.no death.place.city.se death.place.country.en death.place.country.no death.place.country.se death.place.country.sameAs death.place.cityNow.en death.place.cityNow.no death.place.cityNow.se death.place.cityNow.sameAs death.place.cityNow.latitude death.place.cityNow.longitude death.place.countryNow.en death.place.countryNow.no death.place.countryNow.se death.place.countryNow.sameAs death.place.countryNow.latitude death.place.countryNow.longitude death.place.continent.en death.place.continent.no death.place.continent.se death.place.locationString.en death.place.locationString.no death.place.locationString.se
745 spence male https://www.wikidata.org/wiki/Q157245, https://en.wikipedia.org/wiki/Michael_Spence c("laureate", "external"), c("https://api.nobelprize.org/2/laureate/745", "https://www.nobelprize.org/laureate/745"), c("GET", "GET"), c("application/json", "text/html"), c(NA, "A. Michael Spence - Facts"), list(NULL, "laureate facts") 2001, 2, 1/3, 2001-10-10, received, 10000000, 15547541, list(list(name.en = "Stanford University", name.no = "Stanford University", name.se = "Stanford University", nameNow.en = "Stanford University", city.en = "Stanford, CA", city.no = "Stanford, CA", city.se = "Stanford, CA", country.en = "USA", country.no = "USA", country.se = "USA", cityNow.en = "Stanford, CA", cityNow.no = "Stanford, CA", cityNow.se = "Stanford, CA", cityNow.sameAs = list(c("https://www.wikidata.org/wiki/Q173813", "https://www.wikipedia.org/wiki/Stanford,_California")), cityNow.latitude = "37.424734", cityNow.longitude = "-122.163858", countryNow.en = "USA", countryNow.no = "USA", countryNow.se = "USA", countryNow.sameAs = list("https://www.wikidata.org/wiki/Q30"), countryNow.latitude = "39.828175", countryNow.longitude = "-98.579500", continent.en = "North America", locationString.en = "Stanford, CA, USA", locationString.no = "Stanford, CA, USA", locationString.se = "Stanford, CA, USA")), list(list(rel = c("nobelPrize", "external", "external"), href = c("https://api.nobelprize.org/2/nobelPrize/eco/2001", "https://www.nobelprize.org/prizes/economic-sciences/2001/spence/facts/", "https://www.nobelprize.org/prizes/economic-sciences/2001/summary/"), action = c("GET", "GET", "GET"), types = c("application/json", "text/html", "text/html"), title = c(NA, "A. Michael Spence - Facts", "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2001"), class = list(NULL, "laureate facts", "prize summary"))), Economic Sciences, Økonomi, Ekonomi, The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, Sveriges Riksbanks pris i økonomisk vitenskap til minne om Alfred Nobel, Sveriges Riksbanks pris i ekonomisk vetenskap till Alfred Nobels minne, for their analyses of markets with asymmetric information, för deras analys av marknader med assymetrisk informations A. Michael Spence A. Michael Spence A. Michael A. Michael Spence Spence A. Michael Spence A. Michael Spence 1943-00-00 1943 Montclair, NJ Montclair, NJ Montclair, NJ USA USA USA Montclair, NJ Montclair, NJ Montclair, NJ https://www.wikidata.org/wiki/Q678437, https://www.wikipedia.org/wiki/Montclair,_New_Jersey 40.825930 -74.209030 USA USA USA https://www.wikidata.org/wiki/Q30 39.828175 -98.579500 North America Nord-Amerika Nordamerika Montclair, NJ, USA Montclair, NJ, USA Montclair, NJ, USA Michael_Spence https://en.wikipedia.org/wiki/Michael_Spence Q157245 https://www.wikidata.org/wiki/Q157245 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
102 bohr male https://www.wikidata.org/wiki/Q103854, https://en.wikipedia.org/wiki/Aage_Bohr c("laureate", "external"), c("https://api.nobelprize.org/2/laureate/102", "https://www.nobelprize.org/laureate/102"), c("GET", "GET"), c("application/json", "text/html"), c(NA, "Aage N. Bohr - Facts"), list(NULL, "laureate facts") 1975, 1, 1/3, 1975-10-17, received, 630000, 4304697, list(list(name.en = "Niels Bohr Institute", name.no = "Niels Bohr Institute", name.se = "Niels Bohr Institute", nameNow.en = "Niels Bohr Institute", city.en = "Copenhagen", city.no = "København", city.se = "Köpenhamn", country.en = "Denmark", country.no = "Danmark", country.se = "Danmark", cityNow.en = "Copenhagen", cityNow.no = "København", cityNow.se = "Köpenhamn", cityNow.sameAs = list(c("https://www.wikidata.org/wiki/Q1748", "https://www.wikipedia.org/wiki/Copenhagen")), cityNow.latitude = "55.678127", cityNow.longitude = "12.572532", countryNow.en = "Denmark", countryNow.no = "Danmark", countryNow.se = "Danmark", countryNow.sameAs = list("https://www.wikidata.org/wiki/Q35"), countryNow.latitude = "56.000000", countryNow.longitude = "10.000000", continent.en = "Europe", locationString.en = "Copenhagen, Denmark", locationString.no = "København, Danmark", locationString.se = "Köpenhamn, Danmark")), list(list(rel = c("nobelPrize", "external", "external"), href = c("https://api.nobelprize.org/2/nobelPrize/phy/1975", "https://www.nobelprize.org/prizes/physics/1975/bohr/facts/", "https://www.nobelprize.org/prizes/physics/1975/summary/"), action = c("GET", "GET", "GET"), types = c("application/json", "text/html", "text/html"), title = c(NA, "Aage N. Bohr - Facts", "The Nobel Prize in Physics 1975"), class = list(NULL, "laureate facts", "prize summary"))), Physics, Fysikk, Fysik, The Nobel Prize in Physics, Nobelprisen i fysikk, Nobelpriset i fysik, for the discovery of the connection between collective motion and particle motion in atomic nuclei and the development of the theory of the structure of the atomic nucleus based on this connection, för upptäckten av sambandet mellan kollektiva rörelser och partikelrörelser i atomkärnor, samt den därpå baserade utvecklingen av teorien för atomkärnans struktur Aage N. Bohr Aage N. Bohr Aage N. Aage N. Bohr Bohr Aage Niels Bohr Aage Niels Bohr 1922-06-19 1922 Copenhagen København Köpenhamn Denmark Danmark Danmark Copenhagen København Köpenhamn https://www.wikidata.org/wiki/Q1748, https://www.wikipedia.org/wiki/Copenhagen 55.678127 12.572532 Denmark Danmark Danmark https://www.wikidata.org/wiki/Q35 56.000000 10.000000 Europe Europa Europa Copenhagen, Denmark København, Danmark Köpenhamn, Danmark Aage_Bohr https://en.wikipedia.org/wiki/Aage_Bohr Q103854 https://www.wikidata.org/wiki/Q103854 2009-09-08 Copenhagen København Köpenhamn Denmark Danmark Danmark https://www.wikidata.org/wiki/Q35 Copenhagen København Köpenhamn https://www.wikidata.org/wiki/Q1748, https://www.wikipedia.org/wiki/Copenhagen 55.678127 12.572532 Denmark Danmark Danmark https://www.wikidata.org/wiki/Q35 56.000000 10.000000 Europe Europa Europa Copenhagen, Denmark København, Danmark Köpenhamn, Danmark
prizes_df |>
  head(n = 2) |> 
  gt()
awardYear dateAwarded prizeAmount prizeAmountAdjusted links laureates category.en category.no category.se categoryFullName.en categoryFullName.no categoryFullName.se
1901 1901-11-12 150782 10833458 nobelPrize, https://api.nobelprize.org/2/nobelPrize/che/1901, GET, application/json 160, 1, 1, list(list(rel = "laureate", href = "https://api.nobelprize.org/2/laureate/160", action = "GET", types = "application/json")), Jacobus H. van 't Hoff, Jacobus Henricus van 't Hoff, in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions, såsom ett erkännande av den utomordentliga förtjänst han inlagt genom upptäckten av lagarna för den kemiska dynamiken och för det osmotiska trycket i lösningar Chemistry Kjemi Kemi The Nobel Prize in Chemistry Nobelprisen i kjemi Nobelpriset i kemi
1901 1901-11-14 150782 10833458 nobelPrize, https://api.nobelprize.org/2/nobelPrize/lit/1901, GET, application/json 569, 1, 1, list(list(rel = "laureate", href = "https://api.nobelprize.org/2/laureate/569", action = "GET", types = "application/json")), Sully Prudhomme, Sully Prudhomme, in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect, såsom ett erkännande av hans utmärkta, jämväl under senare år ådagalagda förtjänster som författare och särskilt av hans om hög idealitet, konstnärlig fulländning samt sällspord förening av hjärtats och snillets egenskaper vittnande diktning Literature Litteratur Litteratur The Nobel Prize in Literature Nobelprisen i litteratur Nobelpriset i litteratur