In this extra credit, we will load the Nobel Prize data using an API from nobelprize.org. We will then ask 4 interesting questions and provide answers to these questions using the data.
library(jsonlite)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(httr)
library(gt)
prize_url <- "http://api.nobelprize.org/v1/prize.json"
laureate_url <- "http://api.nobelprize.org/v1/laureate.json"
nobelPrize <- fromJSON(prize_url)
nobelLaureate <- fromJSON(laureate_url)
nobelPrize <- nobelPrize$prizes %>%
unnest_wider(laureates) %>%
unnest(id, firstname, surname, motivation, share)
nobelPrize_df <- bind_rows(nobelPrize)
nobelLaureate_df <- bind_rows(nobelLaureate)
# join the prize and laureate data frames by id
final_df <- inner_join(nobelPrize_df, nobelLaureate_df, by = "id") %>%
select(-c("firstname.y", "surname.y", "prizes", "overallMotivation")) %>%
as.data.frame()
head(final_df)
## year category id firstname.x surname.x
## 1 2023 chemistry 1029 Moungi Bawendi
## 2 2023 chemistry 1030 Louis Brus
## 3 2023 chemistry 1031 Aleksey Yekimov
## 4 2023 economics 1034 Claudia Goldin
## 5 2023 literature 1032 Jon Fosse
## 6 2023 peace 1033 Narges Mohammadi
## motivation
## 1 "for the discovery and synthesis of quantum dots"
## 2 "for the discovery and synthesis of quantum dots"
## 3 "for the discovery and synthesis of quantum dots"
## 4 "for having advanced our understanding of women’s labour market outcomes"
## 5 "for his innovative plays and prose which give voice to the unsayable"
## 6 "for her fight against the oppression of women in Iran and her fight to promote human rights and freedom for all"
## share born died bornCountry bornCountryCode bornCity
## 1 3 1961-00-00 0000-00-00 France FR Paris
## 2 3 1943-00-00 0000-00-00 USA US Cleveland, OH
## 3 3 1945-00-00 0000-00-00 USSR (now Russia) RU <NA>
## 4 1 1946-00-00 0000-00-00 USA US New York, NY
## 5 1 1959-09-29 0000-00-00 Norway NO Haugesund
## 6 1 1972-04-21 0000-00-00 Iran IR Zanjan
## diedCountry diedCountryCode diedCity gender
## 1 <NA> <NA> <NA> male
## 2 <NA> <NA> <NA> male
## 3 <NA> <NA> <NA> male
## 4 <NA> <NA> <NA> female
## 5 <NA> <NA> <NA> male
## 6 <NA> <NA> <NA> female
final_df %>%
group_by(bornCountry) %>%
filter(year %in% 2018:2023) %>%
filter(n() > 1) %>%
filter(!is.na(bornCountry)) %>%
ggplot() +
geom_bar(aes(x = bornCountry, fill = bornCountry)) +
ggtitle("Birth Countries of Nobel Prize Winners",
"From 2018 to 2023") +
ylab("Count") +
xlab("Birth Country") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
legend.title = element_blank(),
legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1))
From the above graph, we can see that the most common country of birth for Nobel prize winners in the past 5 years is the United States. I thought this was interesting, so I did some outside research. One theory as to why the majority of Nobel prizes go to Americans is due funding and academic freedom. An article on InsideScience.org says, “Since the mid-20th century, the United States has spent a tremendous amount on fundamental or”basic” research, not forcing scientists to work on projects with an immediate application as the goal.”
final_df %>%
group_by(year, category) %>%
filter(year %in% 2018:2023) %>%
ggplot() +
geom_bar(aes(x = category, fill = year), position = "dodge") +
ggtitle("Nobel Prize Categories",
"From 2018 to 2023") +
ylab("Count") +
xlab("Prize Category") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(fill = "Year")
From the above graph, we can see that the physics prize has been split between three winners every year in the past 5 years. Chemistry and economics are the next most commonly split prize categories. It is interesting to see that the literature prize has not been split at all in the past 5 years.
final_df %>%
mutate(yearBorn = substr(born, 1, 4)) %>%
filter(yearBorn %in% 1990:1999) %>%
# filter out organizations
filter(!is.na(surname.x)) %>%
as.data.frame()
## year category id firstname.x surname.x
## 1 2018 peace 967 Nadia Murad
## 2 2014 peace 914 Malala Yousafzai
## motivation
## 1 "for their efforts to end the use of sexual violence as a weapon of war and armed conflict"
## 2 "for their struggle against the suppression of children and young people and for the right of all children to education"
## share born died bornCountry bornCountryCode bornCity diedCountry
## 1 2 1993-00-00 0000-00-00 Iraq IQ Kojo <NA>
## 2 2 1997-07-12 0000-00-00 Pakistan PK Mingora <NA>
## diedCountryCode diedCity gender yearBorn
## 1 <NA> <NA> female 1993
## 2 <NA> <NA> female 1997
From the above data frame, we can see that there have been 2 Nobel prize winners who were born in the 1990’s. Both won the Nobel peace prize and both are female.
final_df %>%
mutate(decade = paste0(substr(year, 1, 3), "0s")) %>%
group_by(decade, gender) %>%
ggplot() +
geom_bar(aes(x = decade, fill = gender), position = "dodge") +
ggtitle("Nobel Prizes by Gender and Decade") +
xlab("Decade") +
ylab("Count") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(fill = "Gender")
final_df %>%
mutate(decade = paste0(substr(year, 1, 3), "0s")) %>%
group_by(decade) %>%
mutate(percent_male = sum(gender == "male") / n(),
percent_female = sum(gender == "female") / n(),
total = n()) %>%
select(decade, percent_male, percent_female) %>%
distinct(decade, .keep_all = TRUE) %>%
mutate_at(vars(matches("percent_")), ~round(., 2)) %>%
gt() %>%
tab_header(title = "Nobel Prizes by Gender") %>%
cols_label(percent_male = "Percent Male", percent_female = "Percent Female")
Nobel Prizes by Gender | |
Percent Male | Percent Female |
---|---|
2020s | |
0.72 | 0.22 |
2010s | |
0.86 | 0.11 |
2000s | |
0.88 | 0.09 |
1990s | |
0.90 | 0.07 |
1980s | |
0.93 | 0.04 |
1970s | |
0.95 | 0.04 |
1960s | |
0.91 | 0.04 |
1950s | |
0.99 | 0.00 |
1940s | |
0.86 | 0.07 |
1930s | |
0.93 | 0.05 |
1920s | |
0.96 | 0.04 |
1910s | |
0.92 | 0.03 |
1900s | |
0.93 | 0.05 |
From the above graph, we can see that men have won the majority of Nobel prizes every decade since the commencement of the Nobel Prize. However, from the table we can see that the percentage of female prize winners has been increasing each decade since the 1990’s.
https://nobelprize.readme.io/reference/laureate
https://www.insidescience.org/news/why-do-so-many-americans-win-nobel-prize#:~:text=According%20to%20experts%2C%20it%27s%20strong,and%20patience%20to%20see%20results.