Babynames Project: MEA 329

This report will be an exploration of the first letter of babynames. This report will explore which first letters are most popular for females and males. Are they the same for both genders? Which letters are more common now and what were in the past? Does the popularity have to do with specific time periods? If so why?

I will use multiple functions such as filter(), arrange(), mutate(), summarize() , in order to fully explore this hypothesis. I will also utilize ggplot to visualize my findings in a way that is easy to see the trends presented.

Process

First, I need to import the correct packages.

library(babynames)
library(tidyverse)
library(ggthemes)

Now, I will use the mutate function to create an additional column attached to the babynames dataset that indicate the first letter of each name. I will also use the substr() function to find the first letter of each name and set that equal to the variable, ‘firstLetter’. I will set this mutated database equal to the variable name ‘babynames_firstLetter’ so that I am able to reference it later in my exploration.

babynames %>% 
  mutate(firstLetter = substr(name, 1,1)) -> babynames_firstLetter
head(babynames_firstLetter)

## # A tibble: 6 x 6
##    year sex   name          n   prop firstLetter
##   <dbl> <chr> <chr>     <int>  <dbl> <chr>      
## 1  1880 F     Mary       7065 0.0724 M          
## 2  1880 F     Anna       2604 0.0267 A          
## 3  1880 F     Emma       2003 0.0205 E          
## 4  1880 F     Elizabeth  1939 0.0199 E          
## 5  1880 F     Minnie     1746 0.0179 M          
## 6  1880 F     Margaret   1578 0.0162 M

What are the most popular first letter names? To explore this I will use multiple functions such as; group_by(), count(), and arrange().

babynames_firstLetter %>% 
  group_by(firstLetter) %>%
  count() %>% 
  arrange(desc(n)) -> count_firstLetter

Now let’s plot it.

ggplot(count_firstLetter, aes(reorder(firstLetter,-n),n, fill = firstLetter)) + 
  geom_col() + ggtitle("Count of Names with Each Letter") + xlab("First Letter") +
  ylab("Count of Occurences")

Babynames Project: MEA 329

Paige Minsky

9/11/2020

Process