##Introduction Working with the two JSON files available through the API at nobelprize.org, ask and answer 4 interesting questions, e.g. “Which country “lost” the most nobel laureates (who were born there but received their Nobel prize as a citizen of a different country)?”
#Preparing the dataset
The code starts by loading several R packages necessary for working with web APIs and data manipulation. These packages include httr for making HTTP requests, jsonlite for handling JSON data, dplyr and tidyr for data wrangling, and tidyverse which combines these functionalities. Note that there might be conflicts between some functions from different packages with the same name (e.g., filter). You can use the conflicted package to identify and address these conflicts if needed.
The code defines the URL of the Nobel Prize API endpoint that provides information on Nobel laureates. It then uses the fromJSON function from the jsonlite package to retrieve the data in JSON format. Once retrieved, the code explores the structure of the JSON data by displaying the names of the main elements and sub-elements within the list returned by fromJSON. This helps understand how the data is organized within the API response.
#load the packages
library(httr)
library(jsonlite)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#fetch JSON data from the Nobel Prize API
nobel_df <- fromJSON("http://api.nobelprize.org/v1/laureate.json")
#displays the names that correspond to the keys or attributes in the JSON data retrieved from the API.
names(nobel_df)
## [1] "laureates"
#Sub-element called laureates
names(nobel_df$laureates)
## [1] "id" "firstname" "surname" "born"
## [5] "died" "bornCountry" "bornCountryCode" "bornCity"
## [9] "diedCountry" "diedCountryCode" "diedCity" "gender"
## [13] "prizes"
#Question 1:
How are the Nobel laureates in Chemistry, Economics, and Physics from 2000 to 2024?
This code fetches Nobel Prize data from an API, processes it using R libraries like jsonlite and dplyr, and filters the data to include only laureates in specific categories (Chemistry, Economics, Physics) from 1980 to 2024.
# Load necessary libraries
library(httr)
library(jsonlite)
library(dplyr)
library(tidyr)
# Fetch JSON data from the Nobel Prize API
response <- fromJSON("http://api.nobelprize.org/v1/prize.json")
# Convert the list to a data frame and filter for specific categories and years
nobel_df <- response$prizes %>%
unnest(laureates) %>%
mutate(year = as.numeric(year)) %>%
filter(category %in% c("chemistry", "economics", "physics") & year >= 2000 & year <= 2024) %>%
arrange(year)
# Display the filtered data
print(nobel_df %>% select(year, category, firstname, surname))
## # A tibble: 185 × 4
## year category firstname surname
## <dbl> <chr> <chr> <chr>
## 1 2000 chemistry Alan Heeger
## 2 2000 chemistry Alan MacDiarmid
## 3 2000 chemistry Hideki Shirakawa
## 4 2000 economics James J. Heckman
## 5 2000 economics Daniel L. McFadden
## 6 2000 physics Zhores Alferov
## 7 2000 physics Herbert Kroemer
## 8 2000 physics Jack Kilby
## 9 2001 chemistry William Knowles
## 10 2001 chemistry Ryoji Noyori
## # ℹ 175 more rows
#Question 2:
Which country “lost” the most Nobel laureates (who were born there but received their Nobel prize as a citizen of a different country)?
This code analyzes Nobel Prize data to identify countries that have “lost” the most laureates, meaning those born in the country but received the prize as citizens of another nation. It fetches data from the API, transforms it to create separate entries for each laureate’s prize, and then filters for laureates whose birth and prize-awarding countries differ.
# Load necessary libraries
library(httr)
library(jsonlite)
library(dplyr)
library(tidyr)
# Fetch JSON data from the Nobel Prize API
nobel <- fromJSON("http://api.nobelprize.org/v1/laureate.json")
# Transform the nested list structure within laureates and create multiple rows for each laureate-prize combination
nobel_df <- nobel$laureates %>%
unnest(prizes) %>%
distinct(id, firstname, surname, bornCountry, diedCountry, category, year)
# Identify laureates who received their Nobel Prize as citizens of different countries from where they were born
lost_laureates <- nobel_df %>%
filter(!is.na(bornCountry) & !is.na(diedCountry) & bornCountry != diedCountry) %>%
count(bornCountry) %>%
arrange(desc(n))
# Display the country that "lost" the most Nobel laureates
cat("Country that 'lost' the most Nobel laureates:\n", lost_laureates$bornCountry[1])
## Country that 'lost' the most Nobel laureates:
## Germany
#Question 3: Year with the least Nobel winners
This code retrieves data on Nobel Prizes from an API. It then transforms the nested structure to create a data frame with separate rows for each laureate. It converts the year into a numerical format and counts the number of prizes awarded each year. Finally, it identifies and displays the year with the fewest Nobel laureates.
# Fetch JSON data from the Nobel Prize API
nobel <- fromJSON("http://api.nobelprize.org/v1/prize.json")
# Convert the list to a data frame and count laureates per year
nobel_df <- nobel$prizes %>%
unnest(laureates) %>%
mutate(year = as.numeric(year)) %>%
count(year) %>%
arrange(n)
# Display the year with the least Nobel winners
cat("Year with the least Nobel winners:\n", nobel_df$year[1])
## Year with the least Nobel winners:
## 1916
#Question 4: Top ten countries with more winners and top ten countries with fewer winners
This chunk analyzes Nobel laureate data from an API where first retrieves the data and transforms it to create separate entries for each laureate’s prize. Then, it counts the number of laureates from each country, at the end it displays the top 10 countries with both the most and fewest Nobel laureates.
# Fetch JSON data from the Nobel Prize API
nobel <- fromJSON("http://api.nobelprize.org/v1/laureate.json")
# Transform the nested list structure within laureates and create multiple rows for each laureate-prize combination
nobel_df <- nobel$laureates %>%
unnest(prizes) %>%
distinct(id, firstname, surname, bornCountry, category, year)
# Count laureates by country
country_counts <- nobel_df %>%
count(bornCountry) %>%
arrange(desc(n))
# Display the top ten countries with more winners
cat("Top ten countries with more Nobel winners:\n")
## Top ten countries with more Nobel winners:
print(head(country_counts, 10))
## # A tibble: 10 × 2
## bornCountry n
## <chr> <int>
## 1 USA 297
## 2 United Kingdom 93
## 3 Germany 67
## 4 France 58
## 5 <NA> 33
## 6 Sweden 30
## 7 Japan 28
## 8 Canada 21
## 9 Switzerland 19
## 10 the Netherlands 19
# Display the top ten countries with fewer winners
cat("\nTop ten countries with fewer Nobel winners:\n")
##
## Top ten countries with fewer Nobel winners:
print(tail(country_counts, 10))
## # A tibble: 10 × 2
## bornCountry n
## <chr> <int>
## 1 Taiwan 1
## 2 Tibet (now China) 1
## 3 Trinidad and Tobago 1
## 4 Tuscany (now Italy) 1
## 5 USSR (now Belarus) 1
## 6 Ukraine 1
## 7 Venezuela 1
## 8 Vietnam 1
## 9 Württemberg (now Germany) 1
## 10 Yemen 1
###Conclusions
APIs are vital tools in data science. They provide access to real-time data, automate data collection, enable the integration of diverse data sources, and efficiently handle large datasets. Moreover, APIs allow for customization, enabling data scientists to retrieve only the necessary information, thus optimizing data analysis and improving efficiency.
Utilizing APIs in data sciences allows for efficient, real-time data retrieval and integration, enabling more accurate and comprehensive analysis. By leveraging APIs, data scientists can automate data collection, access diverse datasets, and scale their operations effectively, ultimately leading to more insightful and data-driven decisions.
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.