I chose to do my project on the frequency of names starting with vowels, and then from there I will analyze the most popular name for each vowel and compare their frequency in 2017. For this project I used the babynames package.

I believe out of all the names starting with vowels, that the names starting with ā€˜A’ will be the most popular in 2017.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## āœ” ggplot2 3.3.6     āœ” purrr   0.3.4
## āœ” tibble  3.1.8     āœ” dplyr   1.0.9
## āœ” tidyr   1.2.0     āœ” stringr 1.4.1
## āœ” readr   2.1.2     āœ” forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## āœ– dplyr::filter() masks stats::filter()
## āœ– dplyr::lag()    masks stats::lag()
library(babynames)
library(ggthemes)
babynames %>%
  group_by(first_letter = substr(name, 0,1)) %>%
  arrange(desc(first_letter)) -> baby_first_letter

baby_first_letter %>%
  filter(first_letter %in% c('A', 'E', 'I', 'O', 'U'))
## # A tibble: 339,238 Ɨ 6
## # Groups:   first_letter [5]
##     year sex   name        n      prop first_letter
##    <dbl> <chr> <chr>   <int>     <dbl> <chr>       
##  1  1880 F     Una        10 0.000102  U           
##  2  1880 F     Ula         5 0.0000512 U           
##  3  1880 M     Ulysses    29 0.000245  U           
##  4  1880 M     Urban      10 0.0000845 U           
##  5  1880 M     Uriah      10 0.0000845 U           
##  6  1880 M     Unknown     5 0.0000422 U           
##  7  1881 F     Una        14 0.000142  U           
##  8  1881 F     Ursula      6 0.0000607 U           
##  9  1881 M     Ulysses    18 0.000166  U           
## 10  1881 M     Unknown     8 0.0000739 U           
## # … with 339,228 more rows
baby_first_letter %>%
  group_by(first_letter) %>%
  summarize(total = n_distinct(name)) %>%
  filter(first_letter %in% c('A', 'E', 'I', 'O', 'U')) 
## # A tibble: 5 Ɨ 2
##   first_letter total
##   <chr>        <int>
## 1 A            10292
## 2 E             3679
## 3 I             1534
## 4 O             1271
## 5 U              295

From this we can see that ā€˜A’ is the most popular vowel to start a name with.

baby_first_letter %>%
  filter(first_letter %in% c('A', 'E', 'I', 'O', 'U')) %>%
  arrange(desc(prop)) %>%
  ggplot(aes(year, prop, color = first_letter)) +
  geom_line() +
  facet_wrap(~sex)

We can see that female names starting with ā€œAā€ are the most popular from 1880-1920, taking a slight break and becoming the most popular first vowel again from 1960-2000.

Next, we can see the most popular Male and Female ā€˜A’ names along with the popularity of them. Anthony is the most popular name overall throughout time. When testing which is the most popular name starting with ā€˜A’ in 2017, the name Ava is the most frequently used.

baby_first_letter %>%
  filter(first_letter %in% c("A")) %>%
  arrange(desc(prop)) 
## # A tibble: 187,950 Ɨ 6
## # Groups:   first_letter [1]
##     year sex   name       n   prop first_letter
##    <dbl> <chr> <chr>  <int>  <dbl> <chr>       
##  1  1987 F     Ashley 54851 0.0293 A           
##  2  1885 F     Anna    3994 0.0281 A           
##  3  1884 F     Anna    3860 0.0281 A           
##  4  1886 F     Anna    4283 0.0279 A           
##  5  1883 F     Anna    3306 0.0275 A           
##  6  1881 F     Anna    2698 0.0273 A           
##  7  1887 F     Anna    4227 0.0272 A           
##  8  1882 F     Anna    3143 0.0272 A           
##  9  1986 F     Ashley 49675 0.0269 A           
## 10  1889 F     Anna    5062 0.0268 A           
## # … with 187,940 more rows
  baby_first_letter %>% 
  filter(first_letter %in% c("A")) %>%
  group_by(name) %>% 
  summarize(total = sum(n)) %>% 
    arrange(desc(total)) %>% 
    head(10) %>% 
  ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
    coord_flip() + ggtitle("Names Starting With A")

  baby_first_letter %>% 
    filter(first_letter %in% c("A")) %>%
    filter(year == 2017) %>% 
    group_by(name) %>% 
    summarize(total = sum(n)) %>% 
   arrange(desc(total)) %>% 
  head(10) %>% 
    ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
    coord_flip() + ggtitle("Names Starting With A in 2017")

We can see the most popular Male and Female ā€˜E’ names along with the popularity of them. Elizabeth is the most popular name overall throughout time.When testing which is the most popular name starting with ā€˜A’ in 2017, the name Emma is the most frequently used.

baby_first_letter %>%
  filter(first_letter == "E") %>%
  arrange(desc(prop)) 
## # A tibble: 88,224 Ɨ 6
## # Groups:   first_letter [1]
##     year sex   name          n   prop first_letter
##    <dbl> <chr> <chr>     <int>  <dbl> <chr>       
##  1  1881 F     Emma       2034 0.0206 E           
##  2  1880 F     Emma       2003 0.0205 E           
##  3  1882 M     Edward     2477 0.0203 E           
##  4  1881 M     Edward     2177 0.0201 E           
##  5  1883 M     Edward     2250 0.0200 E           
##  6  1880 M     Edward     2364 0.0200 E           
##  7  1882 F     Emma       2303 0.0199 E           
##  8  1884 M     Edward     2439 0.0199 E           
##  9  1880 F     Elizabeth  1939 0.0199 E           
## 10  1883 F     Emma       2367 0.0197 E           
## # … with 88,214 more rows
baby_first_letter %>%
    filter(first_letter %in% c("E")) %>%
    group_by(name) %>% 
    summarize(total = sum(n)) %>% 
    arrange(desc(total)) %>% 
    head(10) %>% 
    ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
    coord_flip() + ggtitle("Names Starting With E")

baby_first_letter %>% 
  filter(first_letter %in% c("E")) %>%
  filter(year == 2017) %>% 
  group_by(name) %>% 
  summarize(total = sum(n)) %>% 
  arrange(desc(total)) %>% 
  head(10) %>% 
  ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
  coord_flip() + ggtitle("Names Starting With E in 2017")

We can see the most popular Male and Female ā€˜I’ names along with the popularity of them. Irene is surprisingly the most popular name overall throughout time.When testing which is the most popular name starting with ā€˜I’ in 2017, the name Isabella is the most frequently used.

baby_first_letter %>%
  filter(first_letter == "I") %>%
  arrange(desc(prop))
## # A tibble: 29,613 Ɨ 6
## # Groups:   first_letter [1]
##     year sex   name         n   prop first_letter
##    <dbl> <chr> <chr>    <int>  <dbl> <chr>       
##  1  1880 F     Ida       1472 0.0151 I           
##  2  1881 F     Ida       1439 0.0146 I           
##  3  1882 F     Ida       1673 0.0145 I           
##  4  1884 F     Ida       1882 0.0137 I           
##  5  1883 F     Ida       1634 0.0136 I           
##  6  1886 F     Ida       2049 0.0133 I           
##  7  1885 F     Ida       1854 0.0131 I           
##  8  1887 F     Ida       1929 0.0124 I           
##  9  1888 F     Ida       2229 0.0118 I           
## 10  2010 F     Isabella 22905 0.0117 I           
## # … with 29,603 more rows
baby_first_letter %>% 
  filter(first_letter %in% c("I")) %>%
  group_by(name) %>% 
  summarize(total = sum(n)) %>% 
  arrange(desc(total)) %>% 
  head(10) %>% 
  ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
  coord_flip() + ggtitle("Names Starting With I")

baby_first_letter %>% 
  filter(first_letter %in% c("I")) %>%
  filter(year == 2017) %>% 
  group_by(name) %>% 
  summarize(total = sum(n)) %>% 
  arrange(desc(total)) %>% 
  head(10) %>% 
  ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
  coord_flip() + ggtitle("Names Starting With I in 2017")

We can see the most popular Male and Female ā€˜O’ names along with the popularity of them. Olivia is the most popular name overall throughout time and in 2017 specifically.

baby_first_letter %>%
  filter(first_letter == "O") %>%
  arrange(desc(prop))
## # A tibble: 28,849 Ɨ 6
## # Groups:   first_letter [1]
##     year sex   name       n    prop first_letter
##    <dbl> <chr> <chr>  <int>   <dbl> <chr>       
##  1  2014 F     Olivia 19791 0.0101  O           
##  2  2015 F     Olivia 19669 0.0101  O           
##  3  2016 F     Olivia 19327 0.0100  O           
##  4  2017 F     Olivia 18632 0.00994 O           
##  5  2013 F     Olivia 18414 0.00957 O           
##  6  2011 F     Olivia 17321 0.00895 O           
##  7  2012 F     Olivia 17310 0.00894 O           
##  8  2010 F     Olivia 17022 0.00869 O           
##  9  2009 F     Olivia 17433 0.00862 O           
## 10  2008 F     Olivia 17078 0.00821 O           
## # … with 28,839 more rows
baby_first_letter %>% 
  filter(first_letter %in% c("O")) %>%
  group_by(name) %>% 
  summarize(total = sum(n)) %>% 
  arrange(desc(total)) %>% 
  head(10) %>% 
  ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
  coord_flip() + ggtitle("Names Starting With O")

baby_first_letter %>% 
  filter(first_letter %in% c("O")) %>%
  filter(year == 2017) %>% 
  group_by(name) %>% 
  summarize(total = sum(n)) %>% 
  arrange(desc(total)) %>% 
  head(10) %>% 
  ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
  coord_flip() + ggtitle("Names Starting With O in 2017")

We can see the most popular Male and Female ā€˜U’ names along with the popularity of them. The name, Unknown, is the most popular name starting with ā€˜U’ throughout time. This is very interesting due to the fact that to many of us, this is a very uncommon name. The most popular name starting with a ā€˜U’ in 2017 is Uriel. In 2017, the name, Unknown, is the eighth most popular.

baby_first_letter %>%
  filter(first_letter == "U") %>%
  arrange(desc(prop))
## # A tibble: 4,602 Ɨ 6
## # Groups:   first_letter [1]
##     year sex   name        n     prop first_letter
##    <dbl> <chr> <chr>   <int>    <dbl> <chr>       
##  1  2008 M     Uriel     788 0.000362 U           
##  2  2009 M     Uriel     733 0.000346 U           
##  3  2006 M     Uriel     751 0.000343 U           
##  4  1895 M     Ulysses    43 0.000340 U           
##  5  2007 M     Uriel     741 0.000335 U           
##  6  2005 M     Uriel     706 0.000332 U           
##  7  1952 F     Unknown   622 0.000327 U           
##  8  1954 F     Unknown   646 0.000324 U           
##  9  1886 M     Ulysses    38 0.000319 U           
## 10  1891 M     Ulysses    34 0.000311 U           
## # … with 4,592 more rows
baby_first_letter %>% 
  filter(first_letter %in% c("U")) %>%
  group_by(name) %>% 
  summarize(total = sum(n)) %>% 
  arrange(desc(total)) %>% 
  head(10) %>% 
  ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
  coord_flip() + ggtitle("Names Starting With U")

baby_first_letter %>% 
  filter(first_letter %in% c("U")) %>%
  filter(year == 2017) %>% 
  group_by(name) %>% 
  summarize(total = sum(n)) %>% 
  arrange(desc(total)) %>% 
  head(10) %>% 
  ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
  coord_flip() + ggtitle("Names Starting With U in 2017")

The most popular name used as the first letter in names throughout time is Edward, closely followed by Anthony. The most popular name in 2017 was Emma. Names starting with the vowel,ā€œUā€ are used significantly less.