The Most Popular first letter ‘babyname’ in NYC

In this project I will determine what is the most popular first letter babynames from each year in New York from 2011-2014 Second, I will find out what the most popular first letter and sex of babynames from each year in NYC from 2011-2014

I will use the NYC babynames data

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(readr)
Popular_Baby_Names <- read_csv("Popular_Baby_Names.csv")
## Rows: 47545 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Gender, Ethnicity, Child's First Name
## dbl (3): Year of Birth, Count, Rank
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

First, I mutated the data to find the first letter of the name

Popular_Baby_Names %>%
  mutate(first_letter = substr(`Child's First Name`,0,1)) %>%
  arrange(desc(first_letter)) -> Popular_Baby_Names 

#knitr::kable(Popular_Baby_Names)

Let’s load the ‘NYC babynames’ files from NYC Open Data

Second, I sorted the most popular first letter babyname in a data frame

Popular_Baby_Names %>%
  group_by(first_letter) %>%
  summarize(letter_count = sum(Count)) %>%
  arrange(desc(letter_count))  -> Popular_First_Letter

Third, I made a data frame of just the letter in the most popular year

Popular_Baby_Names %>%
    group_by(first_letter, `Year of Birth`) %>%
    summarize(letter_count = sum(Count)) %>%
    arrange(desc(letter_count))  -> Popular_First_Letter_Year
## `summarise()` has grouped output by 'first_letter'. You can override using the
## `.groups` argument.
  Most_Popular_First_Letter_Year <- Popular_First_Letter_Year %>%
    group_by(`Year of Birth`) %>% 
    top_n(1, letter_count) %>% 
    filter(`Year of Birth` <2015)

The most popular first letter name overall is “A”.

Most_Popular_First_Letter_Year
## # A tibble: 4 × 3
## # Groups:   Year of Birth [4]
##   first_letter `Year of Birth` letter_count
##   <chr>                  <dbl>        <dbl>
## 1 A                       2014        52285
## 2 A                       2012        51934
## 3 A                       2011        51655
## 4 A                       2013        50786

Then I made a line graph to show the data from 2011-2014

Popular_First_Letter_Year %>%
    filter(`Year of Birth` <2015) %>% 
    ggplot(aes(`Year of Birth`,letter_count, color = first_letter)) + geom_line()

I will now find out the totals per year and take the previous data and look at gender specifically Females

Popular_Baby_Names_Females <- Popular_Baby_Names %>% 
    filter(Gender %in% "FEMALE")
  
  Popular_Baby_Names_Females %>%
    group_by(first_letter) %>%
    summarize(letter_count = sum(Count)) %>%
    arrange(desc(letter_count))  -> Popular_First_Letter_Females
  
  
  Popular_Baby_Names_Females %>%
    group_by(first_letter, `Year of Birth`) %>%
    summarize(letter_count = sum(Count)) %>%
    arrange(desc(letter_count))  -> Popular_First_Letter_Females
## `summarise()` has grouped output by 'first_letter'. You can override using the
## `.groups` argument.
  Most_Popular_First_Letter_Year_Females <- Popular_First_Letter_Females %>%
    group_by(`Year of Birth`) %>% 
    top_n(1, letter_count) %>% 
    filter(`Year of Birth` <2015)

When plotting the most popular letter from 2011-2014 for females it shows that “A” is most common. When plotting by gender specifically female you can see that “A” is still the most

Most_Popular_First_Letter_Year_Females
## # A tibble: 4 × 3
## # Groups:   Year of Birth [4]
##   first_letter `Year of Birth` letter_count
##   <chr>                  <dbl>        <dbl>
## 1 A                       2014        25977
## 2 A                       2012        24487
## 3 A                       2013        24270
## 4 A                       2011        23130
  Popular_First_Letter_Females %>%
    filter(`Year of Birth` < 2015) %>% 
    ggplot(aes(`Year of Birth`,letter_count, color = first_letter)) + geom_line() 

I will now take the previous data and look at males

Popular_Baby_Names_Males <- Popular_Baby_Names %>% 
    filter(Gender %in% "MALE")
  Popular_Baby_Names_Males %>%
    group_by(first_letter) %>%
    summarize(letter_count = sum(Count)) %>%
    arrange(desc(letter_count))  -> Popular_First_Letter_Males
  
  
  Popular_Baby_Names_Males %>%
    filter(`Year of Birth`<2015) %>% 
    group_by(first_letter, `Year of Birth`) %>%
    summarize(letter_count = sum(Count)) %>%
    arrange(desc(letter_count))  -> Popular_First_Letter_Males
## `summarise()` has grouped output by 'first_letter'. You can override using the
## `.groups` argument.
  Most_Popular_First_Letter_Year_Males <- Popular_First_Letter_Males %>%
    group_by(`Year of Birth`) %>% 
    top_n(1, letter_count) %>% 
    filter(`Year of Birth` <2015) %>% 
    head(12)

The most popular letter per year was “J” and was consistent.

Most_Popular_First_Letter_Year_Males
## # A tibble: 4 × 3
## # Groups:   Year of Birth [4]
##   first_letter `Year of Birth` letter_count
##   <chr>                  <dbl>        <dbl>
## 1 J                       2011        36853
## 2 J                       2012        34918
## 3 J                       2013        32274
## 4 J                       2014        29975
  Popular_First_Letter_Males %>%
    ggplot(aes(`Year of Birth`,letter_count, color = first_letter)) + geom_line()

In conclusion when trying to figure out the most popular first letter babyname by year in New York City. The most popular first letter babyname was “A”. When analyzing by gender specifically female “A” was also the most common. The most popular first letter babyname for males was “J”. If I were to expand on this in the future I would like to compare the New York City babynames to the State of New York to see the difference in the most popular first letter babynames.