library(babynames)
mydata <- babynames
str(mydata)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1858689 obs. of 5 variables:
## $ year: num 1880 1880 1880 1880 1880 1880 1880 1880 1880 1880 ...
## $ sex : chr "F" "F" "F" "F" ...
## $ name: chr "Mary" "Anna" "Emma" "Elizabeth" ...
## $ n : int 7065 2604 2003 1939 1746 1578 1472 1414 1320 1288 ...
## $ prop: num 0.0724 0.0267 0.0205 0.0199 0.0179 ...
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.8
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ---------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
OK, let’s select Elleana.
elleana <- mydata %>%
filter(name =="Elleana")
tail(elleana)
Let’s compare it to Emma
emma <- mydata %>%
filter(name =="Emma" & sex == "F")
tail(emma)
In 2015, there were 20,355 female Emmas. Compare that with 15 - 20 Elleanas. For every 1000 Emmas, there is only one Elleana.