Working with the two JSON files available through the API at nobelprize.org, ask and answer 4 interesting questions, e.g. “Which country “lost” the most nobel laureates (who were born there but received their Nobel prize as a citizen of a different country)?”
#load the packages
library(httr)
library(jsonlite)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.4 ✔ stringr 1.5.0
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
JSON data is retrieved from the Nobel Prize API, parses it into an R list, and then inspects the names of various elements and sub-elements within that list.
#fetch JSON data from the Nobel Prize API
nobel <- fromJSON("http://api.nobelprize.org/v1/laureate.json")
#displays the names that correspond to the keys or attributes in the JSON data retrieved from the API.
names(nobel)
## [1] "laureates"
#looking at the sub-element called laureates
names(nobel$laureates)
## [1] "id" "firstname" "surname" "born"
## [5] "died" "bornCountry" "bornCountryCode" "bornCity"
## [9] "diedCountry" "diedCountryCode" "diedCity" "gender"
## [13] "prizes"
#looking at the first laureate's information by specifying "prizes[[1]]"
names(nobel$laureates$prizes[[1]])
## [1] "year" "category" "share" "motivation" "affiliations"
nobel$laureates %>%
unnest(cols = prizes) %>%
distinct(id, gender, category, year) %>%
count(year, category, gender) %>%
filter(gender != "org") %>%
group_by(category, gender) %>%
mutate(cs = cumsum(n)) %>%
ungroup() %>%
mutate(year = as.numeric(year)) %>%
ggplot(aes(year, log(cs))) +
geom_point(aes(color = gender)) +
facet_wrap(~category) +
labs(x = "year",
y = "log(cumulative sum) of laureates",
title = "Cumulative Sum of Nobel Laureates by Gender and Category over Time") +
scale_color_manual(values = c("darkorange", "skyblue3"),
name = NULL) +
scale_x_continuous(breaks = seq(1900, 2030, 10))
The breakdown of Nobel prize wins by gender and by category suggests that women are deemed better suited for “soft” and “humane” endeavors, such as literature and peace, while males are considered more gifted for the “hard”, “no-nonsense” scientific work, such as physics or chemistry.
nobel$laureates %>%
unnest(cols = prizes) %>%
distinct(id, gender, year) %>%
count(year, gender) %>%
filter(gender != "org") %>%
group_by(gender) %>%
mutate(cs = cumsum(n)) %>%
ungroup() %>%
mutate(year = as.numeric(year)) %>%
ggplot(aes(year, log(cs))) +
geom_point(aes(color = gender)) +
labs(x = "year",
y = "log(cumulative sum) of laureates",
title = "Cumulative Sum of Nobel Laureates by Gender over Time") +
scale_color_manual(values = c("darkorange", "skyblue3"),
name = NULL) +
scale_x_continuous(breaks = seq(1900, 2030, 10))
The Nobel prizes remain very much a man’s world, especially in science, but with three female laureates are slowly making their mark. However the number of women laureates has been steadily increasing over the 120 years. The 2020 PCM Nobel prizes acknowledged and celebrated a total of 8 scientists, including 3 women.
nobel$laureates %>%
unnest(cols = prizes) %>%
select(year, category, born) %>%
mutate(year = ymd(paste(year, "12", "31", sep = "-")),
born = ymd(born),
age = as.numeric(year - born) / 365) %>%
ggplot(aes(category, age)) +
geom_violin(fill = "skyblue3") +
stat_summary(fun.y = "median",
geom = "point") +
labs(x = "Category",
y = "Age (years)",
title = "Age Distribution of Nobel Laureates by Category")
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `born = ymd(born)`.
## Caused by warning:
## ! 43 failed to parse.
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 44 rows containing non-finite values (`stat_ydensity()`).
## Warning: Removed 44 rows containing non-finite values (`stat_summary()`).
The 2016 Nobel laureates for physics, medicine and chemistry: all men, at least 65 years old and mostly over 72. Go back to the first half of the 20th Century, however, and the average laureate was “only” 56. Physics laureates, now typically a group of men in their late-sixties, used to have an average age of 47.
nobel$laureates %>%
unnest(cols = prizes) %>%
count(bornCountryCode, category) %>%
ggplot(aes(bornCountryCode, n)) +
geom_col(aes(fill = category),
position = "stack") +
theme(axis.text.x = element_text(angle = 90,
size = rel(0.50))) +
labs(x = "bornCountryCode",
y = "Count",
title = "All Nobel Prizes by Country and Category") +
scale_fill_brewer(palette = "Spectral",
name = "category")
The country with the most Nobel Prizes is the United States with a remarkable 300 Nobel Prizes, followed by Germany with 110 Nobel Prizes winners. This dominance by the US, Germany and France reflect their longstanding contributions to scientific research, literature, and peace efforts.