Load Packages & Data

Run the block of code below only once. Notes:

  • Hashtags in R are comment lines that are ignored by the console. This is a great way to document your code
    • For others, but more importantly
    • Future you! Hadley Wickham quote: “In every project you have at least one other collaborator; future-you. You don’t want future-you to curse past-you”
  • You do not need to completely understand the code below for now; we’ll be learning these packages in the upcoming weeks.
  • In case you are curious however, after loading the packages, we need to adjust for the fact that names that occurred less than 5 times for any sex for any given year were excluded from the data set. We create a new category Other to reflect this.
# Load packages
library(ggplot2)
library(dplyr)
library(babynames)

# Compute counts/proporations for "Other" names
other <- babynames %>% 
  group_by(year, sex) %>% 
  summarise(sum_n=sum(n), sum_prop=sum(prop)) %>% 
  mutate(
    total = sum_n/sum_prop,
    name = "Other",
    prop = 1-sum_prop,
    n = prop * total
    ) %>% 
  select(year, sex, name, n, prop)

# Add "Other" names to babynames
babynames <- babynames %>% 
  bind_rows(other) %>% 
  arrange(year, sex, desc(n))

Explore Data Set

Play around with the babynames dataset in the RStudio viewer by running the following in the console.

View(babynames)

In particular, click on the Filter button and click in the white boxes under the variable names and

  • Play with any sliders for numerical variables: year, n, prop
  • Enter in values to view subsets of rows: sex, name