The data set babynames provides all names for US babies from 1880-2017. Using this resources, I want to determine the frequency of names starting with vowels vs. consonants over time
I believe the amount of names starting with constants will exceedingly surpass individuals names starting with vowels.
The first step is to load all of my packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(babynames)
library(ggthemes)
The first step is to determine first letters in all the names.
baby_first_letter <- babynames %>%
mutate(first_letter = substr(name, 1,1))
I then filtered the first letter into vowels and directed a,e,i,o,u to be refferd to as vowels
baby_first_letter %>%
filter(first_letter %in% c("A", "E", "I", "O", "U")) -> vowels
vowels %>%
group_by(year, first_letter) %>%
summarize(total = sum(n)) %>%
ggplot(aes(year, total, color = first_letter)) + geom_line()
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
The data arranges the baby’s first letter by one of the five vowels and
plots it
baby_first_letter %>%
filter(first_letter %in% c("B","C","D", "F", "G", "H", "J","K", "L", "M", "N","P", "Q", "R", "S", "T", "V", "W", "X", "Y", "Z")) -> constants
I then filtered to 21 constants and directed them to be titled as constants
constants %>%
group_by(year, first_letter) %>%
summarize(total = sum(n)) %>%
ggplot(aes(year, total, color = first_letter)) + geom_line()
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
The data arranges the baby’s first letter by one of the 21 consonants
and plots the seperate individual letters
Next I will plot the vowels and consonants together
vowels %>%
mutate(type = "vowel") -> vowels
constants %>%
mutate(type = "constants") -> constants
vowels %>%
full_join(constants) -> grouped
## Joining, by = c("year", "sex", "name", "n", "prop", "first_letter", "type")
I then added a new column to group the 21 consonants and titled the group constants so the 21 separate letters would group together. Next, I added a new column to the 5 vowels and labeled the group vowels. After, I joined the constants group and vowels andplotted them together.
vowels %>%
group_by(year, type) %>%
summarize(total = sum(n)) -> vowels_per_year
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
grouped %>%
group_by(year, type) %>%
summarize(total = sum(n)) -> per_year
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
I then plotted the points together and separated the values into the same proportion so the data would be accurately depicted.
per_year <- per_year %>%
mutate(normal= case_when(type %in% "vowel" ~ total/5,
type%in% "constants" ~ total/21))
ggplot(per_year, aes(year, normal, color = type)) + geom_line()
The number of individuals with names starting with constants is
significantly higher than those starting with a vowel. When the values
are placed into the same proportion the constants in the mid to late
1900’s are significantly higher than names beginning with vowels.
Recently, there has been a jump in individuals names starting with
vowels.