The frequency of names starting with vowels vs. consonants over time

The data set babynames provides all names for US babies from 1880-2017. Using this resources, I want to determine the frequency of names starting with vowels vs. consonants over time

I believe the amount of names starting with constants will exceedingly surpass individuals names starting with vowels.

The first step is to load all of my packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(babynames)
library(ggthemes)

The first step is to determine first letters in all the names.

baby_first_letter <- babynames %>%
  mutate(first_letter = substr(name, 1,1)) 

I then filtered the first letter into vowels and directed a,e,i,o,u to be refferd to as vowels

baby_first_letter %>% 
  filter(first_letter %in% c("A", "E", "I", "O", "U")) -> vowels


vowels %>% 
  group_by(year, first_letter) %>% 
  summarize(total = sum(n)) %>% 
  ggplot(aes(year, total, color = first_letter)) + geom_line()
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

The data arranges the baby’s first letter by one of the five vowels and plots it

baby_first_letter %>% 
  filter(first_letter %in% c("B","C","D", "F", "G", "H", "J","K", "L", "M", "N","P", "Q", "R", "S", "T", "V", "W", "X", "Y", "Z")) -> constants

I then filtered to 21 constants and directed them to be titled as constants

constants %>% 
  group_by(year, first_letter) %>% 
  summarize(total = sum(n)) %>% 
  ggplot(aes(year, total, color = first_letter)) + geom_line()
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

The data arranges the baby’s first letter by one of the 21 consonants and plots the seperate individual letters

Next I will plot the vowels and consonants together

vowels %>% 
  mutate(type = "vowel") -> vowels

constants %>% 
  mutate(type = "constants") -> constants

vowels %>% 
  full_join(constants) -> grouped
## Joining, by = c("year", "sex", "name", "n", "prop", "first_letter", "type")

I then added a new column to group the 21 consonants and titled the group constants so the 21 separate letters would group together. Next, I added a new column to the 5 vowels and labeled the group vowels. After, I joined the constants group and vowels andplotted them together.

vowels %>% 
  group_by(year, type) %>% 
  summarize(total = sum(n)) -> vowels_per_year
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
grouped %>% 
  group_by(year, type) %>% 
  summarize(total = sum(n)) -> per_year
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

I then plotted the points together and separated the values into the same proportion so the data would be accurately depicted.

per_year <- per_year %>% 
  mutate(normal= case_when(type %in% "vowel" ~ total/5,
                           type%in% "constants" ~ total/21))

ggplot(per_year, aes(year, normal, color = type)) + geom_line()

The number of individuals with names starting with constants is significantly higher than those starting with a vowel. When the values are placed into the same proportion the constants in the mid to late 1900’s are significantly higher than names beginning with vowels. Recently, there has been a jump in individuals names starting with vowels.