Assignmenttt10B

Approach

The first step will be determining which public api to use from Nobel peace prize organization. Using Different JSON methods, Join, Filter, and Compare across the data. Determining what question i want to answer after choosing an API. Determing the question and api and then coming up with meaningful JSON parsing.

Code Base

library(httr)
library(jsonlite)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
# Load Data from Nobel Prize API


url <- "https://api.nobelprize.org/2.1/laureates?limit=1000"
response <- GET(url)
data <- fromJSON(content(response, "text", encoding = "UTF-8"), flatten = TRUE)
laureates <- data$laureates


# Question 1: Which Nobel Prize category has the most laureates?


laureates %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  count(nobelPrizes_category.en, sort = TRUE)
# A tibble: 6 × 2
  nobelPrizes_category.en     n
  <chr>                   <int>
1 Physiology or Medicine    231
2 Physics                   225
3 Chemistry                 198
4 Peace                     138
5 Literature                117
6 Economic Sciences          99
# Question 2: How has the number of female laureates changed over time?

laureates %>%
  filter(gender == "female") %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  mutate(decade = floor(as.integer(nobelPrizes_awardYear) / 10) * 10) %>%
  count(decade) %>%
  ggplot(aes(x = decade, y = n)) +
  geom_col(fill = "steelblue") +
  labs(title = "Female Nobel Laureates by Decade", x = "Decade", y = "Count")

# Question 3: What is the average age at award by category?


laureates %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  mutate(
    birth_year = as.integer(substr(birth.date, 1, 4)),
    award_year = as.integer(nobelPrizes_awardYear),
    age = award_year - birth_year,
    category = `nobelPrizes_category.en`
  ) %>%
  filter(!is.na(age), age > 0, age < 100) %>%
  group_by(category) %>%
  summarise(avg_age = round(mean(age), 1)) %>%
  arrange(desc(avg_age))
# A tibble: 6 × 2
  category               avg_age
  <chr>                    <dbl>
1 Economic Sciences         67  
2 Literature                64.9
3 Peace                     60.6
4 Chemistry                 59.2
5 Physiology or Medicine    58.9
6 Physics                   57.6
# Question 4 (Join/Compare): Which birth country has the most laureates
# who received the prize while living in a different country?


laureates %>%
  unnest(nobelPrizes, names_sep = "_") %>%
  mutate(birth_country = birth.place.country.en) %>%
  unnest(nobelPrizes_residences, names_sep = "_") %>%
  mutate(residence_country = nobelPrizes_residences_country.en) %>%
  filter(!is.na(birth_country), !is.na(residence_country)) %>%
  filter(birth_country != residence_country) %>%
  count(birth_country, sort = TRUE) %>%
  slice_max(n, n = 10)
# A tibble: 13 × 2
   birth_country        n
   <chr>            <int>
 1 Russian Empire       6
 2 Northern Ireland     5
 3 Russia               5
 4 Germany              4
 5 Austria-Hungary      2
 6 Austrian Empire      2
 7 British India        2
 8 France               2
 9 Ottoman Empire       2
10 Prussia              2
11 Romania              2
12 Scotland             2
13 USSR                 2

Conclusion

We explored 4 different questions using Nobel Prize API to retrieve and explore the data. Physics and Medicine have produced the most laureates. We can see displacement throughout the 20th century from germany and the united kingdom where lauretes live in other countries at the time of recieving the award.