Extra Credit: More JSON Practice

Introduction

Working with the two JSON files available through the API at nobelprize.org, ask and answer 4 interesting questions, e.g. “Which country “lost” the most nobel laureates (who were born there but received their Nobel prize as a citizen of a different country)?”

#load the packages
library(httr)
library(jsonlite)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()  masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag()     masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Fetch nobel laureates data by specifying the API’s URL:

JSON data is retrieved from the Nobel Prize API, parses it into an R list, and then inspects the names of various elements and sub-elements within that list.

#fetch JSON data from the Nobel Prize API
nobel <- fromJSON("http://api.nobelprize.org/v1/laureate.json")
#displays the names that correspond to the keys or attributes in the JSON data retrieved from the API.
names(nobel)

## [1] "laureates"

#looking at the sub-element called laureates
names(nobel$laureates)

##  [1] "id"              "firstname"       "surname"         "born"           
##  [5] "died"            "bornCountry"     "bornCountryCode" "bornCity"       
##  [9] "diedCountry"     "diedCountryCode" "diedCity"        "gender"         
## [13] "prizes"

#looking at the first laureate's information by specifying "prizes[[1]]"
names(nobel$laureates$prizes[[1]])

## [1] "year"         "category"     "share"        "motivation"   "affiliations"

Question 1

What is the gender of a typical Nobel Prize winner?

Transform the nested list structure within the laureates element and separates elements in the prizes list, and, create multiple rows for each laureate with one laureate-prize combination per row.
Filter out duplicate rows based on the combination of id (laureate ID), gender, and category.
Count the number of laureates in each combination of gender and category
Plot a visual representation of the distribution of Nobel Prize laureates by gender across different categories

nobel$laureates %>% 
    unnest(cols = prizes) %>% 
    distinct(id, gender, category, year) %>% 
    count(year, category, gender) %>%
    filter(gender != "org") %>%
    group_by(category, gender) %>% 
    mutate(cs = cumsum(n)) %>% 
    ungroup() %>% 
    mutate(year = as.numeric(year)) %>% 
    ggplot(aes(year, log(cs))) + 
    geom_point(aes(color = gender)) +
    facet_wrap(~category) +
    labs(x = "year", 
         y = "log(cumulative sum) of laureates", 
         title = "Cumulative Sum of Nobel Laureates by Gender and Category over Time") + 
    scale_color_manual(values = c("darkorange", "skyblue3"),
                       name = NULL) +
    scale_x_continuous(breaks = seq(1900, 2030, 10))

The breakdown of Nobel prize wins by gender and by category suggests that women are deemed better suited for “soft” and “humane” endeavors, such as literature and peace, while males are considered more gifted for the “hard”, “no-nonsense” scientific work, such as physics or chemistry.

Question 2

Is there any indication of an increase in female laureates over time?

nobel$laureates %>% 
  unnest(cols = prizes) %>% 
  distinct(id, gender, year) %>% 
  count(year, gender) %>%
  filter(gender != "org") %>%
  group_by(gender) %>% 
  mutate(cs = cumsum(n)) %>% 
  ungroup() %>% 
  mutate(year = as.numeric(year)) %>% 
  ggplot(aes(year, log(cs))) + 
  geom_point(aes(color = gender)) + 
  labs(x = "year", 
       y = "log(cumulative sum) of laureates", 
       title = "Cumulative Sum of Nobel Laureates by Gender over Time") + 
  scale_color_manual(values = c("darkorange", "skyblue3"),
                    name = NULL) +
  scale_x_continuous(breaks = seq(1900, 2030, 10))

The Nobel prizes remain very much a man’s world, especially in science, but with three female laureates are slowly making their mark. However the number of women laureates has been steadily increasing over the 120 years. The 2020 PCM Nobel prizes acknowledged and celebrated a total of 8 scientists, including 3 women.

Question 3

What is the average age of the nobel prize winner?

nobel$laureates %>% 
  unnest(cols = prizes) %>% 
  select(year, category, born) %>% 
  mutate(year = ymd(paste(year, "12", "31", sep = "-")), 
         born = ymd(born), 
         age = as.numeric(year - born) / 365) %>% 
  ggplot(aes(category, age)) + 
  geom_violin(fill = "skyblue3") + 
  stat_summary(fun.y = "median", 
               geom = "point") + 
  labs(x = "Category", 
       y = "Age (years)", 
       title = "Age Distribution of Nobel Laureates by Category")

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `born = ymd(born)`.
## Caused by warning:
## !  43 failed to parse.

## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 44 rows containing non-finite values (`stat_ydensity()`).

## Warning: Removed 44 rows containing non-finite values (`stat_summary()`).

The 2016 Nobel laureates for physics, medicine and chemistry: all men, at least 65 years old and mostly over 72. Go back to the first half of the 20th Century, however, and the average laureate was “only” 56. Physics laureates, now typically a group of men in their late-sixties, used to have an average age of 47.

Question 4

Which country has the most Nobel Prize winners?

nobel$laureates %>% 
  unnest(cols = prizes) %>% 
  count(bornCountryCode, category) %>% 
  ggplot(aes(bornCountryCode, n)) + 
  geom_col(aes(fill = category), 
           position = "stack") + 
  theme(axis.text.x = element_text(angle = 90, 
                                   size = rel(0.50))) + 
  labs(x = "bornCountryCode", 
       y = "Count", 
       title = "All Nobel Prizes by Country and Category") + 
  scale_fill_brewer(palette = "Spectral", 
                    name = "category")

The country with the most Nobel Prizes is the United States with a remarkable 300 Nobel Prizes, followed by Germany with 110 Nobel Prizes winners. This dominance by the US, Germany and France reflect their longstanding contributions to scientific research, literature, and peace efforts.