For this assignment, I will use the Nobel Prize API to work with JSON data and answer 4 questions using Nobel Prize information.
The main goal of this assignment is to practice working with JSON from an API, then turning that data into tidy data frames that can be explored in R.
Retrieve the JSON data from the Nobel Prize API
First, I will connect to the Nobel Prize API directly from R instead of downloading a file. I plan to use one or both of these endpoints:
Nobel Prize data
Laureate data
Load the JSON data into R
Next, I will use R packages for JSON and tidy data work. My plan is to load the API response into R, inspect the structure and identify which parts of the JSON need to be extracted.
API JSON usually comes in nested format, I expect that some fields may need extra cleaning or unnesting before analysis.
Transform the JSON into tidy data frames
After loading the JSON data, I will convert the important parts into tidy data frames. That way each variable should have its own column and each observation should have its own row.
For example:
select useful fields
unnest nested columns
clean text fields
separate prize-level and laureate-level information
join data frames when needed
Not only retrieving JSON, but also transforming it into tidy data frames.
Create four data-driven questions
After the data is cleaned, I will create four questions that can be answered from the Nobel Prize data.
one question will go beyond a basic count and will require filtering, joining, or comparing multiple fields.
Answer each question with code and results
For each of the four questions, I will include:
the question itself
the R code used to answer it
the result, shown as a table, summary, or visualization
Data Source
Nobel Prize API: https://api.nobelprize.org/2.1/nobelPrizes https://api.nobelprize.org/2.1/laureates
Code Base
In this section, I uretrieve JSON data directly from the Nobel Prize API, turn it into tidy data frames and answer four questions from the data. The Nobel Prize Developer Zone says the API is in JSON format, includes endpoints such as nobelPrizes and laureates.
library(jsonlite)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)library(purrr)
Attaching package: 'purrr'
The following object is masked from 'package:jsonlite':
flatten
library(stringr)library(ggplot2)
I define the two API links and read the JSON data into R.
birth_country n
1 USA 295
2 United Kingdom 93
3 Germany 78
4 France 60
5 Sweden 30
6 Japan 27
7 Canada 22
8 the Netherlands 20
9 Switzerland 19
10 Italy 18
Question 3: Which laureates have won more than one Nobel Prize?
This question goes beyond just counting categories because it groups by laureate and compares how many prizes each person or organization received.The table lists laureates who have received more than one Nobel Prize.
# A tibble: 7 × 2
known_name n
<chr> <int>
1 International Committee of the Red Cross 3
2 Frederick Sanger 2
3 John Bardeen 2
4 K. Barry Sharpless 2
5 Linus Carl Pauling 2
6 Marie Curie, née Skłodowska 2
7 Office of the United Nations High Commissioner for Refugees 2
Question 4: Which country lost the most Nobel laureates, meaning they were born there but awarded while affiliated with an organization in another country?
This question is the more advanced one for the assignment because it requires joining, unnesting, filtering and comparing different fields across the data.I expand the affiliation information from the laureates prize records.
# A tibble: 10 × 2
birth_country n
<chr> <int>
1 Germany 33
2 United Kingdom 27
3 France 16
4 Canada 15
5 Russia 14
6 Austria-Hungary 12
7 Prussia 12
8 the Netherlands 11
9 Russian Empire 10
10 Hungary 9
A plot for this comparison.
ggplot(q4_table, aes(x =reorder(birth_country, n), y = n)) +geom_col() +coord_flip() +labs(title ="Countries Losing Laureates to Other Award Affiliations",x ="Birth Country",y ="Number of Laureates Awarded Elsewhere" )
Conclusion
I used the Nobel Prize API JSON data directly from the public endpoints, then transformed the nested JSON into tidy data frames using unnest(), transmute(), count() and left_join(). The Nobel Prize Developer Zone states that the data is available through the laureates and nobelPrizes endpoints and is updated regularly.