The first step will be determining which public api to use from Nobel peace prize organization. Using Different JSON methods, Join, Filter, and Compare across the data. Determining what question i want to answer after choosing an API. Determing the question and api and then coming up with meaningful JSON parsing.
Code Base
library(httr)library(jsonlite)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)library(ggplot2)
# Load Data from Nobel Prize APIurl <-"https://api.nobelprize.org/2.1/laureates?limit=1000"response <-GET(url)data <-fromJSON(content(response, "text", encoding ="UTF-8"), flatten =TRUE)laureates <- data$laureates# Question 1: Which Nobel Prize category has the most laureates?laureates %>%unnest(nobelPrizes, names_sep ="_") %>%count(nobelPrizes_category.en, sort =TRUE)
# A tibble: 6 × 2
nobelPrizes_category.en n
<chr> <int>
1 Physiology or Medicine 231
2 Physics 225
3 Chemistry 198
4 Peace 138
5 Literature 117
6 Economic Sciences 99
# Question 2: How has the number of female laureates changed over time?laureates %>%filter(gender =="female") %>%unnest(nobelPrizes, names_sep ="_") %>%mutate(decade =floor(as.integer(nobelPrizes_awardYear) /10) *10) %>%count(decade) %>%ggplot(aes(x = decade, y = n)) +geom_col(fill ="steelblue") +labs(title ="Female Nobel Laureates by Decade", x ="Decade", y ="Count")
# Question 3: What is the average age at award by category?laureates %>%unnest(nobelPrizes, names_sep ="_") %>%mutate(birth_year =as.integer(substr(birth.date, 1, 4)),award_year =as.integer(nobelPrizes_awardYear),age = award_year - birth_year,category =`nobelPrizes_category.en` ) %>%filter(!is.na(age), age >0, age <100) %>%group_by(category) %>%summarise(avg_age =round(mean(age), 1)) %>%arrange(desc(avg_age))
# A tibble: 6 × 2
category avg_age
<chr> <dbl>
1 Economic Sciences 67
2 Literature 64.9
3 Peace 60.6
4 Chemistry 59.2
5 Physiology or Medicine 58.9
6 Physics 57.6
# Question 4 (Join/Compare): Which birth country has the most laureates# who received the prize while living in a different country?laureates %>%unnest(nobelPrizes, names_sep ="_") %>%mutate(birth_country = birth.place.country.en) %>%unnest(nobelPrizes_residences, names_sep ="_") %>%mutate(residence_country = nobelPrizes_residences_country.en) %>%filter(!is.na(birth_country), !is.na(residence_country)) %>%filter(birth_country != residence_country) %>%count(birth_country, sort =TRUE) %>%slice_max(n, n =10)
# A tibble: 13 × 2
birth_country n
<chr> <int>
1 Russian Empire 6
2 Northern Ireland 5
3 Russia 5
4 Germany 4
5 Austria-Hungary 2
6 Austrian Empire 2
7 British India 2
8 France 2
9 Ottoman Empire 2
10 Prussia 2
11 Romania 2
12 Scotland 2
13 USSR 2
Conclusion
We explored 4 different questions using Nobel Prize API to retrieve and explore the data. Physics and Medicine have produced the most laureates. We can see displacement throughout the 20th century from germany and the united kingdom where lauretes live in other countries at the time of recieving the award.