##Introduction
This report will be an exploration of the first letter of babynames. This report will explore which first letters are most popular for females and males. Are they the same for both genders? Which letters are more common now and what were in the past? Does the popularity have to do with specific time periods? If so why?
I will use multiple functions such as filter(), arrange(), mutate(), summarize() , in order to fully explore this hypothesis. I will also utilize ggplot to visualize my findings in a way that is easy to see the trends presented.
First, I need to import the correct packages.
library(babynames)
library(tidyverse)
library(ggthemes)
Now, I will use the mutate function to create an additional column attached to the babynames dataset that indicate the first letter of each name. I will also use the substr() function to find the first letter of each name and set that equal to the variable, ‘firstLetter’. I will set this mutated database equal to the variable name ‘babynames_firstLetter’ so that I am able to reference it later in my exploration.
babynames %>%
mutate(firstLetter = substr(name, 1,1)) -> babynames_firstLetter
head(babynames_firstLetter)
## # A tibble: 6 x 6
## year sex name n prop firstLetter
## <dbl> <chr> <chr> <int> <dbl> <chr>
## 1 1880 F Mary 7065 0.0724 M
## 2 1880 F Anna 2604 0.0267 A
## 3 1880 F Emma 2003 0.0205 E
## 4 1880 F Elizabeth 1939 0.0199 E
## 5 1880 F Minnie 1746 0.0179 M
## 6 1880 F Margaret 1578 0.0162 M
What are the most popular first letter names? To explore this I will use multiple functions such as; group_by(), count(), and arrange().
babynames_firstLetter %>%
group_by(firstLetter) %>%
count() %>%
arrange(desc(n)) -> count_firstLetter
Now let’s plot it.
ggplot(count_firstLetter, aes(reorder(firstLetter,-n),n, fill = firstLetter)) +
geom_col() + ggtitle("Count of Names with Each Letter") + xlab("First Letter") +
ylab("Count of Occurences")