Introduction

In this extra credit, we will load the Nobel Prize data using an API from nobelprize.org. We will then ask 4 interesting questions and provide answers to these questions using the data.

Load Packages

library(jsonlite)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(httr)
library(gt)

Load and Clean Data

prize_url <- "http://api.nobelprize.org/v1/prize.json"
laureate_url <- "http://api.nobelprize.org/v1/laureate.json"

nobelPrize <- fromJSON(prize_url)

nobelLaureate <- fromJSON(laureate_url)

nobelPrize <- nobelPrize$prizes %>%
  unnest_wider(laureates) %>%
  unnest(id, firstname, surname, motivation, share)

nobelPrize_df <- bind_rows(nobelPrize)
nobelLaureate_df <- bind_rows(nobelLaureate)

# join the prize and laureate data frames by id
final_df <- inner_join(nobelPrize_df, nobelLaureate_df, by = "id") %>%
  select(-c("firstname.y", "surname.y", "prizes", "overallMotivation")) %>%
  as.data.frame()

head(final_df)
##   year   category   id firstname.x surname.x
## 1 2023  chemistry 1029      Moungi   Bawendi
## 2 2023  chemistry 1030       Louis      Brus
## 3 2023  chemistry 1031     Aleksey   Yekimov
## 4 2023  economics 1034     Claudia    Goldin
## 5 2023 literature 1032         Jon     Fosse
## 6 2023      peace 1033      Narges Mohammadi
##                                                                                                          motivation
## 1                                                                 "for the discovery and synthesis of quantum dots"
## 2                                                                 "for the discovery and synthesis of quantum dots"
## 3                                                                 "for the discovery and synthesis of quantum dots"
## 4                                         "for having advanced our understanding of women’s labour market outcomes"
## 5                                            "for his innovative plays and prose which give voice to the unsayable"
## 6 "for her fight against the oppression of women in Iran and her fight to promote human rights and freedom for all"
##   share       born       died       bornCountry bornCountryCode      bornCity
## 1     3 1961-00-00 0000-00-00            France              FR         Paris
## 2     3 1943-00-00 0000-00-00               USA              US Cleveland, OH
## 3     3 1945-00-00 0000-00-00 USSR (now Russia)              RU          <NA>
## 4     1 1946-00-00 0000-00-00               USA              US  New York, NY
## 5     1 1959-09-29 0000-00-00            Norway              NO     Haugesund
## 6     1 1972-04-21 0000-00-00              Iran              IR        Zanjan
##   diedCountry diedCountryCode diedCity gender
## 1        <NA>            <NA>     <NA>   male
## 2        <NA>            <NA>     <NA>   male
## 3        <NA>            <NA>     <NA>   male
## 4        <NA>            <NA>     <NA> female
## 5        <NA>            <NA>     <NA>   male
## 6        <NA>            <NA>     <NA> female

Questions

What is the most common country of birth for Nobel Prize winners in the past 5 years?

final_df %>%
  group_by(bornCountry) %>%
  filter(year %in% 2018:2023) %>%
  filter(n() > 1) %>% 
  filter(!is.na(bornCountry)) %>%
  ggplot() +
  geom_bar(aes(x = bornCountry, fill = bornCountry)) +
  ggtitle("Birth Countries of Nobel Prize Winners",
          "From 2018 to 2023") +
  ylab("Count") +
  xlab("Birth Country") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        legend.title = element_blank(),
        legend.position = "none", 
        axis.text.x = element_text(angle = 45, hjust = 1))

From the above graph, we can see that the most common country of birth for Nobel prize winners in the past 5 years is the United States. I thought this was interesting, so I did some outside research. One theory as to why the majority of Nobel prizes go to Americans is due funding and academic freedom. An article on InsideScience.org says, “Since the mid-20th century, the United States has spent a tremendous amount on fundamental or”basic” research, not forcing scientists to work on projects with an immediate application as the goal.”

As we know, a Nobel Prize can be split between up to three people. What is the most commonly split prize category (in the past 5 years)?

final_df %>%
  group_by(year, category) %>%
  filter(year %in% 2018:2023) %>%
  ggplot() +
  geom_bar(aes(x = category, fill = year), position = "dodge") +
  ggtitle("Nobel Prize Categories",
          "From 2018 to 2023") +
  ylab("Count") +
  xlab("Prize Category") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(fill = "Year")

From the above graph, we can see that the physics prize has been split between three winners every year in the past 5 years. Chemistry and economics are the next most commonly split prize categories. It is interesting to see that the literature prize has not been split at all in the past 5 years.

How many Nobel Prize winners were born in the 1990’s?

final_df %>%
  mutate(yearBorn = substr(born, 1, 4)) %>%
  filter(yearBorn %in% 1990:1999) %>%
  # filter out organizations
  filter(!is.na(surname.x)) %>%
  as.data.frame()
##   year category  id firstname.x surname.x
## 1 2018    peace 967       Nadia     Murad
## 2 2014    peace 914      Malala Yousafzai
##                                                                                                                 motivation
## 1                              "for their efforts to end the use of sexual violence as a weapon of war and armed conflict"
## 2 "for their struggle against the suppression of children and young people and for the right of all children to education"
##   share       born       died bornCountry bornCountryCode bornCity diedCountry
## 1     2 1993-00-00 0000-00-00        Iraq              IQ     Kojo        <NA>
## 2     2 1997-07-12 0000-00-00    Pakistan              PK  Mingora        <NA>
##   diedCountryCode diedCity gender yearBorn
## 1            <NA>     <NA> female     1993
## 2            <NA>     <NA> female     1997

From the above data frame, we can see that there have been 2 Nobel prize winners who were born in the 1990’s. Both won the Nobel peace prize and both are female.

What is the gender split of Nobel Prize winners over the past century?

final_df %>%
  mutate(decade = paste0(substr(year, 1, 3), "0s")) %>%
  group_by(decade, gender) %>%
  ggplot() +
  geom_bar(aes(x = decade, fill = gender), position = "dodge") +
  ggtitle("Nobel Prizes by Gender and Decade") +
  xlab("Decade") +
  ylab("Count") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(fill = "Gender")

final_df %>%
  mutate(decade = paste0(substr(year, 1, 3), "0s")) %>%
  group_by(decade) %>%
  mutate(percent_male = sum(gender == "male") / n(),
         percent_female = sum(gender == "female") / n(),
         total = n()) %>%
  select(decade, percent_male, percent_female) %>%
  distinct(decade, .keep_all = TRUE) %>%
  mutate_at(vars(matches("percent_")), ~round(., 2)) %>%
  gt() %>%
  tab_header(title = "Nobel Prizes by Gender") %>%
  cols_label(percent_male = "Percent Male", percent_female = "Percent Female")
Nobel Prizes by Gender
Percent Male Percent Female
2020s
0.72 0.22
2010s
0.86 0.11
2000s
0.88 0.09
1990s
0.90 0.07
1980s
0.93 0.04
1970s
0.95 0.04
1960s
0.91 0.04
1950s
0.99 0.00
1940s
0.86 0.07
1930s
0.93 0.05
1920s
0.96 0.04
1910s
0.92 0.03
1900s
0.93 0.05

From the above graph, we can see that men have won the majority of Nobel prizes every decade since the commencement of the Nobel Prize. However, from the table we can see that the percentage of female prize winners has been increasing each decade since the 1990’s.

Sources

https://nobelprize.readme.io/reference/laureate

https://www.insidescience.org/news/why-do-so-many-americans-win-nobel-prize#:~:text=According%20to%20experts%2C%20it%27s%20strong,and%20patience%20to%20see%20results.