I chose to do my project on the frequency of names starting with vowels, and then from there I will analyze the most popular name for each vowel and compare their frequency in 2017. For this project I used the babynames package.
I believe out of all the names starting with vowels, that the names starting with āAā will be the most popular in 2017.
library(tidyverse)
## āā Attaching packages āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā tidyverse 1.3.2 āā
## ā ggplot2 3.3.6 ā purrr 0.3.4
## ā tibble 3.1.8 ā dplyr 1.0.9
## ā tidyr 1.2.0 ā stringr 1.4.1
## ā readr 2.1.2 ā forcats 0.5.2
## āā Conflicts āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā tidyverse_conflicts() āā
## ā dplyr::filter() masks stats::filter()
## ā dplyr::lag() masks stats::lag()
library(babynames)
library(ggthemes)
babynames %>%
group_by(first_letter = substr(name, 0,1)) %>%
arrange(desc(first_letter)) -> baby_first_letter
baby_first_letter %>%
filter(first_letter %in% c('A', 'E', 'I', 'O', 'U'))
## # A tibble: 339,238 Ć 6
## # Groups: first_letter [5]
## year sex name n prop first_letter
## <dbl> <chr> <chr> <int> <dbl> <chr>
## 1 1880 F Una 10 0.000102 U
## 2 1880 F Ula 5 0.0000512 U
## 3 1880 M Ulysses 29 0.000245 U
## 4 1880 M Urban 10 0.0000845 U
## 5 1880 M Uriah 10 0.0000845 U
## 6 1880 M Unknown 5 0.0000422 U
## 7 1881 F Una 14 0.000142 U
## 8 1881 F Ursula 6 0.0000607 U
## 9 1881 M Ulysses 18 0.000166 U
## 10 1881 M Unknown 8 0.0000739 U
## # ⦠with 339,228 more rows
baby_first_letter %>%
group_by(first_letter) %>%
summarize(total = n_distinct(name)) %>%
filter(first_letter %in% c('A', 'E', 'I', 'O', 'U'))
## # A tibble: 5 Ć 2
## first_letter total
## <chr> <int>
## 1 A 10292
## 2 E 3679
## 3 I 1534
## 4 O 1271
## 5 U 295
From this we can see that āAā is the most popular vowel to start a name with.
baby_first_letter %>%
filter(first_letter %in% c('A', 'E', 'I', 'O', 'U')) %>%
arrange(desc(prop)) %>%
ggplot(aes(year, prop, color = first_letter)) +
geom_line() +
facet_wrap(~sex)
We can see that female names starting with āAā are the most popular from 1880-1920, taking a slight break and becoming the most popular first vowel again from 1960-2000.
Next, we can see the most popular Male and Female āAā names along with the popularity of them. Anthony is the most popular name overall throughout time. When testing which is the most popular name starting with āAā in 2017, the name Ava is the most frequently used.
baby_first_letter %>%
filter(first_letter %in% c("A")) %>%
arrange(desc(prop))
## # A tibble: 187,950 Ć 6
## # Groups: first_letter [1]
## year sex name n prop first_letter
## <dbl> <chr> <chr> <int> <dbl> <chr>
## 1 1987 F Ashley 54851 0.0293 A
## 2 1885 F Anna 3994 0.0281 A
## 3 1884 F Anna 3860 0.0281 A
## 4 1886 F Anna 4283 0.0279 A
## 5 1883 F Anna 3306 0.0275 A
## 6 1881 F Anna 2698 0.0273 A
## 7 1887 F Anna 4227 0.0272 A
## 8 1882 F Anna 3143 0.0272 A
## 9 1986 F Ashley 49675 0.0269 A
## 10 1889 F Anna 5062 0.0268 A
## # ⦠with 187,940 more rows
baby_first_letter %>%
filter(first_letter %in% c("A")) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With A")
baby_first_letter %>%
filter(first_letter %in% c("A")) %>%
filter(year == 2017) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With A in 2017")
We can see the most popular Male and Female āEā names along with the popularity of them. Elizabeth is the most popular name overall throughout time.When testing which is the most popular name starting with āAā in 2017, the name Emma is the most frequently used.
baby_first_letter %>%
filter(first_letter == "E") %>%
arrange(desc(prop))
## # A tibble: 88,224 Ć 6
## # Groups: first_letter [1]
## year sex name n prop first_letter
## <dbl> <chr> <chr> <int> <dbl> <chr>
## 1 1881 F Emma 2034 0.0206 E
## 2 1880 F Emma 2003 0.0205 E
## 3 1882 M Edward 2477 0.0203 E
## 4 1881 M Edward 2177 0.0201 E
## 5 1883 M Edward 2250 0.0200 E
## 6 1880 M Edward 2364 0.0200 E
## 7 1882 F Emma 2303 0.0199 E
## 8 1884 M Edward 2439 0.0199 E
## 9 1880 F Elizabeth 1939 0.0199 E
## 10 1883 F Emma 2367 0.0197 E
## # ⦠with 88,214 more rows
baby_first_letter %>%
filter(first_letter %in% c("E")) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With E")
baby_first_letter %>%
filter(first_letter %in% c("E")) %>%
filter(year == 2017) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With E in 2017")
We can see the most popular Male and Female āIā names along with the popularity of them. Irene is surprisingly the most popular name overall throughout time.When testing which is the most popular name starting with āIā in 2017, the name Isabella is the most frequently used.
baby_first_letter %>%
filter(first_letter == "I") %>%
arrange(desc(prop))
## # A tibble: 29,613 Ć 6
## # Groups: first_letter [1]
## year sex name n prop first_letter
## <dbl> <chr> <chr> <int> <dbl> <chr>
## 1 1880 F Ida 1472 0.0151 I
## 2 1881 F Ida 1439 0.0146 I
## 3 1882 F Ida 1673 0.0145 I
## 4 1884 F Ida 1882 0.0137 I
## 5 1883 F Ida 1634 0.0136 I
## 6 1886 F Ida 2049 0.0133 I
## 7 1885 F Ida 1854 0.0131 I
## 8 1887 F Ida 1929 0.0124 I
## 9 1888 F Ida 2229 0.0118 I
## 10 2010 F Isabella 22905 0.0117 I
## # ⦠with 29,603 more rows
baby_first_letter %>%
filter(first_letter %in% c("I")) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With I")
baby_first_letter %>%
filter(first_letter %in% c("I")) %>%
filter(year == 2017) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With I in 2017")
We can see the most popular Male and Female āOā names along with the popularity of them. Olivia is the most popular name overall throughout time and in 2017 specifically.
baby_first_letter %>%
filter(first_letter == "O") %>%
arrange(desc(prop))
## # A tibble: 28,849 Ć 6
## # Groups: first_letter [1]
## year sex name n prop first_letter
## <dbl> <chr> <chr> <int> <dbl> <chr>
## 1 2014 F Olivia 19791 0.0101 O
## 2 2015 F Olivia 19669 0.0101 O
## 3 2016 F Olivia 19327 0.0100 O
## 4 2017 F Olivia 18632 0.00994 O
## 5 2013 F Olivia 18414 0.00957 O
## 6 2011 F Olivia 17321 0.00895 O
## 7 2012 F Olivia 17310 0.00894 O
## 8 2010 F Olivia 17022 0.00869 O
## 9 2009 F Olivia 17433 0.00862 O
## 10 2008 F Olivia 17078 0.00821 O
## # ⦠with 28,839 more rows
baby_first_letter %>%
filter(first_letter %in% c("O")) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With O")
baby_first_letter %>%
filter(first_letter %in% c("O")) %>%
filter(year == 2017) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With O in 2017")
We can see the most popular Male and Female āUā names along with the popularity of them. The name, Unknown, is the most popular name starting with āUā throughout time. This is very interesting due to the fact that to many of us, this is a very uncommon name. The most popular name starting with a āUā in 2017 is Uriel. In 2017, the name, Unknown, is the eighth most popular.
baby_first_letter %>%
filter(first_letter == "U") %>%
arrange(desc(prop))
## # A tibble: 4,602 Ć 6
## # Groups: first_letter [1]
## year sex name n prop first_letter
## <dbl> <chr> <chr> <int> <dbl> <chr>
## 1 2008 M Uriel 788 0.000362 U
## 2 2009 M Uriel 733 0.000346 U
## 3 2006 M Uriel 751 0.000343 U
## 4 1895 M Ulysses 43 0.000340 U
## 5 2007 M Uriel 741 0.000335 U
## 6 2005 M Uriel 706 0.000332 U
## 7 1952 F Unknown 622 0.000327 U
## 8 1954 F Unknown 646 0.000324 U
## 9 1886 M Ulysses 38 0.000319 U
## 10 1891 M Ulysses 34 0.000311 U
## # ⦠with 4,592 more rows
baby_first_letter %>%
filter(first_letter %in% c("U")) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total), total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With U")
baby_first_letter %>%
filter(first_letter %in% c("U")) %>%
filter(year == 2017) %>%
group_by(name) %>%
summarize(total = sum(n)) %>%
arrange(desc(total)) %>%
head(10) %>%
ggplot(aes(reorder(name, total),total, fill = name)) + geom_col() +
coord_flip() + ggtitle("Names Starting With U in 2017")
The most popular name used as the first letter in names throughout time is Edward, closely followed by Anthony. The most popular name in 2017 was Emma. Names starting with the vowel,āUā are used significantly less.