Lab 03 - Nobel laureates

Brady May 5/18/26

Load packages and data

library(tidyverse) 
nobel <- read_csv("nobel.csv")

Exercises

  1. Number of Observations and variables
dim(nobel)
## [1] 935  26

There are 935 observations and 26 variables Every row represents the Nobel Prize winner by year

  1. Create a new data frame called nobel_living
nobel_living <- nobel %>%
  filter(
    !is.na(country),
    gender != "org",
    is.na(died_date)
  )
  1. Create a faceted bar plot
nobel_living <- nobel_living %>%
  mutate(
    country_us = if_else(country == "USA", "USA", "Other")
  )
nobel_living_science <- nobel_living %>%
  filter(category %in% c("Physics", "Medicine", "Chemistry", "Economics"))
ggplot(nobel_living_science,
       aes(x = country_us)) +
  geom_bar() +
  facet_wrap(~ category) +
  coord_flip() +
  labs(
    title = "Nobel laureates US vs Other by Category",
    x = "Where laureate was working when awarded",
    y = "Count"
  )

Based off the chart, most nobel lauretes are born in the US. The closest percentage wise between the USA and other would be chemistry and the furtherest apart would be economics. Overall, the data supports the Buzzfeed headline because the chart shows the US having more winners.

  1. How many of the winners are born in the US?
nobel_living_science <- nobel_living_science %>%
  mutate(
    born_country_us = if_else(born_country == "USA", "USA", "Other")
  )
nobel_living_science %>%
  count(born_country_us)
## # A tibble: 2 × 2
##   born_country_us     n
##   <chr>           <int>
## 1 Other             123
## 2 USA               105

105 born in the US

  1. Add a second variable
ggplot(nobel_living_science,
       aes(x = country_us, fill = born_country_us)) +
  geom_bar() +
  facet_wrap(~ category) +
  coord_flip() +
  labs(
    title = "Nobel laureates US affiliation and birth country",
    x = "Where laureate was based when awarded",
    y = "# of laureates",
    fill = "Born in"
  )

This data shows that although most nobel laureates were working in the US when they got the award, a large portion of them were not born in the US. This supports Buzzfeeds claim.

  1. In a single pipeline, filter for laureates who won their prize in the US, but were born outside of the US…
nobel_living_science %>%
  filter(born_country_us == "Other") %>%
  count(born_country) %>%
  arrange(desc(n))
## # A tibble: 33 × 2
##    born_country       n
##    <chr>          <int>
##  1 Germany           20
##  2 Japan             17
##  3 United Kingdom    16
##  4 France             8
##  5 Canada             6
##  6 China              6
##  7 Switzerland        6
##  8 Israel             5
##  9 Norway             4
## 10 Australia          3
## # ℹ 23 more rows

The most common country is Germany and Japan.