This assignment uses the Nobel Prize API to explore patterns in Nobel Prize awards using structured JSON data. The goal is to move beyond simple summaries and investigate how laureates, prize categories, and countries are connected.
Approach
The analysis begins by retrieving JSON data from the Nobel Prize API, specifically from the laureates and nobelPrizes endpoints. These endpoints provide detailed, nested data on individuals and prize records.
The JSON data is loaded into R using the jsonlite package and then transformed into tidy data frames using dplyr and tidyr. Because the API data is nested (e.g., multiple prizes per laureate and multiple affiliations per prize), functions such as unnest() are used to flatten the structure into a tabular format suitable for analysis.
Several related data frames are created, including:
a laureates table with demographic information,
a prize-level table linking individuals to awards,
and an affiliations table capturing institutional and country information.
These tables are then used to answer four questions. The analysis includes grouping, filtering, and joining across datasets to uncover patterns. At least one question involves comparing birth country and affiliation country, requiring a join between multiple data frames.
The results are presented using tables and visualizations to clearly communicate the findings.
library(jsonlite)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)library(purrr)
Attaching package: 'purrr'
The following object is masked from 'package:jsonlite':
flatten
library(stringr)library(ggplot2)library(knitr)
Helper function
`%||%`<-function(x, y) {if (is.null(x) ||length(x) ==0) y else x}
# A tibble: 5 × 6
award_year category category_full date_awarded prize_amount
<chr> <chr> <chr> <chr> <int>
1 1901 Chemistry The Nobel Prize i… 1901-11-12 150782
2 1901 Literature The Nobel Prize i… 1901-11-14 150782
3 1901 Peace The Nobel Peace P… 1901-12-10 150782
4 1901 Physics The Nobel Prize i… 1901-11-12 150782
5 1901 Physiology or Medicine The Nobel Prize i… 1901-10-30 150782
# ℹ 1 more variable: prize_amount_adjusted <int>
Question 1
Which Nobel Prize categories have been awarded most often?
q1 <- prize_tbl %>%count(category, sort =TRUE)q1 %>%kable(caption ="Number of laureate records by Nobel Prize category")
Number of laureate records by Nobel Prize category
category
n
Chemistry
11
Physics
5
Peace
3
Economic Sciences
2
Literature
2
Physiology or Medicine
2
q1 %>%ggplot(aes(x =reorder(category, n), y = n)) +geom_col() +coord_flip() +labs(title ="Nobel Prize Categories by Number of Laureate Records",x ="Category",y ="Count" )
Answer: Chemistry appears most frequently in this dataset, followed by Physics and Peace. Other categories such as Economic Sciences, Literature, and Physiology or Medicine appear less often. This suggests that, within the sampled data, scientific fields, particularly Chemistry, are more prominently represented.
Question 2
Which birth countries have produced the most Nobel laureates?
q2 <- laureates_tbl %>%filter(!is.na(birth_country), birth_country !="") %>%count(birth_country, sort =TRUE)q2 %>%slice_head(n =15) %>%kable(caption ="Top 15 birth countries by number of laureates")
Top 15 birth countries by number of laureates
birth_country
n
USA
4
Germany
2
India
2
Japan
2
Prussia
2
Argentina
1
Belgium
1
British Mandate of Palestine
1
British Protectorate of Palestine
1
Denmark
1
Egypt
1
Ethiopia
1
France
1
French Algeria
1
Lithuania
1
q2 %>%slice_head(n =15) %>%ggplot(aes(x =reorder(birth_country, n), y = n)) +geom_col() +coord_flip() +labs(title ="Top 15 Birth Countries of Nobel Laureates",x ="Birth Country",y ="Number of Laureates" )
Answer: The United States appears most frequently in this dataset, followed by several countries such as Germany, India, Japan, and Prussia with smaller counts. Most other countries appear only once, indicating that Nobel laureates in this sample are concentrated in a few countries, with a long tail of less-represented nations.
Note: The presence of historical country names such as Prussia and French Algeria reflects changes in geopolitical boundaries over time, which can affect how laureates are categorized by birthplace.
Question 3
Which institutions appear most often as Nobel Prize affiliations?
q3 <- affiliations_tbl %>%filter(!is.na(affiliation_name), affiliation_name !="") %>%count(affiliation_name, affiliation_country, sort =TRUE)q3 %>%slice_head(n =15) %>%kable(caption ="Top 15 affiliations in Nobel Prize records")
Top 15 affiliations in Nobel Prize records
affiliation_name
affiliation_country
n
Asahi Kasei Corporation
Japan
1
Berlin University
Germany
1
California Institute of Technology (Caltech)
USA
1
Goettingen University
Germany
1
Hokkaido University
Japan
1
Imperial College
United Kingdom
1
Institut d’Optique Graduate School – Université Paris-Saclay
France
1
International Centre for Theoretical Physics
Italy
1
Johns Hopkins University
USA
1
Kaiser-Wilhelm-Institut (now Max-Planck-Institut) für Biochemie
Germany
1
MRC Laboratory of Molecular Biology
United Kingdom
1
Massachusetts Institute of Technology (MIT)
USA
1
Meijo University
Japan
1
Munich University
Germany
1
Niels Bohr Institute
Denmark
1
Answer: Each institution in this dataset appears only once, meaning no single affiliation clearly dominates. This suggests that, within this sample, Nobel Prize winners are distributed across a wide range of institutions rather than concentrated in a few. However, the lack of repeated affiliations is likely due to the limited size of the dataset rather than reflecting the full distribution of Nobel Prize institutions.
Question 4
Which birth countries most often differ from the laureate’s affiliation country at the time of the award?
q4_birth_loss <- birth_vs_affiliation %>%count(birth_country, sort =TRUE)q4_birth_loss %>%slice_head(n =15) %>%ggplot(aes(x =reorder(birth_country, n), y = n)) +geom_col() +coord_flip() +labs(title ="Birth Countries Most Often Different from Award Affiliation Country",x ="Birth Country",y ="Number of Mismatch Records" )
Answer: The results show that India has the highest number of mismatch cases in this dataset, followed by Prussia, while the remaining birth countries appear only once. The detailed table shows that these mismatches involve affiliations in countries such as the United States, the United Kingdom, Italy, Germany, and Israel. Together, the table and graph suggest that Nobel laureates in this sample often received their awards while affiliated with institutions outside their country of birth.
Conclusion
This analysis used JSON data from the Nobel Prize API to examine relationships between laureates, prize categories, and countries. After transforming the nested API responses into tidy tables, four questions were explored using filtering, grouping, and joins.
The results showed patterns in prize categories, birth countries, and institutional affiliations within the dataset. The comparison between birth country and affiliation country indicated that many laureates received their awards while affiliated with institutions outside their country of birth, highlighting the international nature of Nobel recognition.
Because the analysis is based on a limited sample of API data, the findings should be interpreted as illustrative rather than definitive. Overall, this assignment demonstrates how nested JSON data can be converted into tidy data frames in R and how combining multiple tables enables more meaningful analysis beyond simple counts.