library(tidyverse)
library(babynames)
library(scales)
library(ggplot2)
My hypothesis was that different variations of the most popular female names have become more common overtime. In my own personal expierence I feel like I have seen more and more interesting ways to spell traditional names recently. I used this data set to see if there is an actual increase in different variations of common names in more modern times.
In order to test this hypothesis, I first generated a list of the most popular female baby names from the data set. I chose to begin by looking at the top 20 female names. Looking at the top 20 gave me some room to have a sample size of at least five when not including names that clearly only have one possible spelling. I wanted names that would have significant data to analyze within the data set which is why I chose to work with a group of the most popular.
babynames %>%
filter(sex=="F") %>%
group_by(name, sex) %>%
summarise(total = sum(n))%>%
arrange(desc(total)) %>%
head(20) %>%
ggplot(aes(reorder(name,-total), total)) + geom_col()+ labs(title= "Top 20 Female Names" , x="Name", y="count") + coord_flip() + scale_y_continuous(labels=comma)
From this chart, I chose 5 names that have possible alternative spellings. It is not possible for every single name to have multiple spellings. For example, Mary, the top name in the entire data set, cannot possible be spelt any other way and still be the same name. Taking this into consideration, I chose 5 names from the top 20 chart to analyze further. In order to pick these 5, I did some brief research to see possible variations of all 20 names listed. I only chose to include names that had multiple possible variations that did not alter the form of the name, just the spelling. These 5 were Elizabeth, Jennifer, Karen, Ashley and Emily.
| Orignial Name | Variations |
|---|---|
| Elizabeth | Elisabeth, Ellisabeth, Ellizabeth, Ellyzabeth |
| Jennifer | Jenifer, Jennafer, Jeniffer |
| Karen | Caren, Caryn, Karyn, Karin |
| Ashley | Ashleigh, Ashlee, Ashlie |
| Emily | Emilee, Emileigh, Emilie |
From here I created graphs tracking the popularity of each of these indivdiual parent names and their variations over the time period of the data set (1880-2017).
babynames %>%
filter(name %in% c("Elizabeth", "Elisabeth", "Ellisabeth", "Ellizabeth", "Ellyzabeth") & sex== "F" ) %>%
ggplot (aes(year, prop, color= name)) + geom_line()
This graph shows the most popular version of the name is the original one from the first top 20 names (Elizabeth). The only other variations that were even included in the data set (have at least 5 uses) are Elisabeth and Ellizabeth. The speeling “Elisabeth” has a pretty steady use but there is a little bump around 1980-2000. This time is also the only time when the variation “Ellizabeth” appears in the data.
babynames %>%
filter(name %in% c("Jennifer", "Jenifer", "Jennafer", "Jeniffer") & sex== "F") %>%
ggplot (aes(year, prop, color= name)) + geom_line()
This graph tracks the popularity of the name Jennifer and its variations: Jenifer, Jeniffer, Jennafer. The variations in spelling only appear around 1945 and then continue with very small useage. There is a slight bump in popularity around 1970. This is also when the popularity of the main spelling (Jennifer) is at its peak.
babynames %>%
filter(name %in% c("Karen", "Caren", "Caryn", "Karyn", "Karin") & sex== "F" & year> 1970) %>%
ggplot (aes(year, prop, color= name)) + geom_line()
This graph deals with the parent name Karen and its variations of Caren, Caryn, Karin, and Karyn. The results show that all variations were popular at the beginning of the time period (1970) and have since all decreased in popularity.
babynames %>%
filter(name %in% c("Ashley", "Ashleigh", "Ashlee", "Ashlie") & sex== "F" & year> 1970) %>%
ggplot (aes(year, prop, color= name)) + geom_line()
As shown in the graph above, the parent name Ashley, had the most popularity but had a signifcant spike around 1975. The variations of Ashley (Ashlee, Ashleigh, Ashlie) all had some minimal popularity for the entire time span. There was also a slight increase in these variations (at different rates) from the years 1980 to 2010.
babynames %>%
filter(name %in% c("Emily", "Emilee", "Emileigh", "Emilie") & sex== "F" & year> 1970) %>%
ggplot (aes(year, prop, color= name)) + geom_line()
The graph tracking the name Emily and its variations is shown above. This depicts that the parent name Emily clearly had the most popularity thoughout the entire time period but has been declining since around 2000. The variations tracked all have had limited popularity and had an increase in use from about 1990-2010.
This analysis proved my original hypothesis wrong. There is no clear increase in variations of names recently in comparison to historical data. All the observations that were tested in this analysis showed the parent population having the most popularity by a large margin, and the variation names were all relatively unpopular with some spikes. These spikes however were not time dependent, disproving the original hypothesis. However, there was one trend that seemed to be represented in these visualization. It was shown that whenever the parent name was at its peak popularity, the variations also seemed to have a slight increase.