1 Introduction

Utilizing data from kaggle containing names given to babies in the United States between 1880 and 2014, this report attempts to better understand the changing trends in female baby naming during this period. To better understand demographic changes during this period, the number of male and female babies born in each year represented in the data set is investigated. Then, I attempt to demonstrate my thought process in understanding how the frequency of most common female baby names have changed over time.

2 Downloading packages and dataset

2.1 Downloading packages

library(tidyverse)
library(dplyr)
library(readr)
library(knitr)

2.2 Downloading dataset

natbabynames <- read_csv("/Users/lorigerstenfeld/Downloads/national_baby_names.csv")

2.3 Checking dataset for NAs

sum(is.na(natbabynames))
## [1] 0

The result is 0, so there are no NAs in this dataset that need to be removed.

3 Number of babies over time

In this section, I will attempt to answer the question: how has the trend in the number of male and female babies born each year changed between 1880 and 2014?

3.1 Select relevant data from dataset

natbabynames2 <- natbabynames %>%
  dplyr::select("Year", "Count", "Gender") %>%
  dplyr::group_by(Gender, Year) %>%
  dplyr::summarize("total_babies"=sum(Count)/1000) %>%
  dplyr::ungroup()

3.2 Plot total number of babies over time

ggplot(natbabynames2, aes(Year, total_babies))+
  geom_line(aes(color=Gender))+
  theme_classic()+
  labs(
    title="Number of Male and Female Babies Born 1880-2014",
    x="Year",
    y="Total Babies Born (Thousands)") +
  scale_color_discrete(breaks=c("F","M"),
                      labels=c("Female", "Male"))
Figure 1: Number of male and female babies born each year 1880-2014

Figure 1: Number of male and female babies born each year 1880-2014

3.3 Concluding remarks on trend of babies over time

We can conclude from this figure 1 that the number of babies born increased rapidly between the early 1900s and the early 1960s, aside from a slight dip during the 1920s and 1930s (possibly associated with the Great Depression and World War II). Birth rates have fluctuated, but remained relatively regular since the 1960s.

4 Frequency of most common female name over time

In this and the next section, I attempt to answer the question: how has the frequency of common female baby names changed between 1880 and 2014?

4.1 Select relevant data from dataset

natbabynames3 <- natbabynames %>%
  dplyr::filter(Gender=="F") %>%
  dplyr::group_by(Year) %>%
  slice(1)

4.2 Plot most common baby name for each year vs. number of babies with name

ggplot(natbabynames3, aes(Year, Count))+
  geom_point(aes(color=Name))+
  theme_bw()+
  labs(
    title="Number of Female Babies Born 1880-2014 with \n Most Frequently Used Baby Names of the Year",
    x="Year",
    y="Number of Babies with Most Frequently Used Female Name",
    color="Most Frequently\nUsed Baby Name")
Figure 2: Number of US-born female babies with most common name of year

Figure 2: Number of US-born female babies with most common name of year

While figure 2 shows the changes in the number of female babies born with the most common names over time, it does not take into account the number of female babies born each year. Therefore, another graph will need to be made to graph year vs. proportion of babies with the most frequently used name in each year to accurately represent this phenomenon.

5 Proportion of female babies with most frequently used name

In this section, the analysis from the previous section will be altered to include the total number of female babies in each year. This will allow for the analysis of the frequency that the most common baby name is used over time.

5.1 Select only female babies from dataset with total number of female/male babies in each year

natbabynames2_onlytotal <- natbabynames2 %>%
  dplyr::filter(Gender=="F") %>%
  dplyr::select(-"Gender")

5.2 Join dataset with total number of female babies each year with dataset with frequency of most common baby name in each year

natbabynames_joined <- left_join(natbabynames3, natbabynames2_onlytotal, by="Year") %>%
  dplyr::mutate(proportion=Count/(total_babies*1000))

5.3 Create new plot with linear regression!

ggplot(natbabynames_joined, aes(Year, proportion))+
  geom_point(aes(color=Name))+
  geom_smooth(method="lm", se=F) +
  theme_bw()+
  labs(
    title="Proportion of Female Babies Born 1880-2014 \nwith Most Frequently Used Baby Names of the Year",
    x="Year",
    y="Proportion of Babies with Most Frequently Used Female Name",
    color="Most Frequently\nUsed Baby Name")
Figure 3: Proportion of US-born female babies each year with most common female name

Figure 3: Proportion of US-born female babies each year with most common female name

Figure 3 demonstrates that the proportion of female babies with the most common baby name has steadily decreased over time. This trend is much more clearly visible with the standardization of data to the number of babies being born each year.

5.4 Linear Regression Results

model <- lm(proportion ~ Year, data=natbabynames_joined)
statsummary <- coef(summary(model))
row.names (statsummary) <- c("Intercept", "Count")
knitr::kable(statsummary, align="ccccc",
             caption="Table 1. Linear Model for Figure 3 (year vs. proportion of female babies with most common name",
             col.names=c("Estimate", "Std. Error", "t value", "p value"))
Table 1. Linear Model for Figure 3 (year vs. proportion of female babies with most common name
Estimate Std. Error t value p value
Intercept 0.8958616 0.0261097 34.31149 0
Count -0.0004383 0.0000134 -32.68914 0

The very high t value and the low p value of this linear regression demonstrates that the linear model is very accurate in representing the decrease in the frequency that the most common baby name is used over time.

6 Conclusion

As shown by Figure 3, the relative popularity of common female names has decrease linearly between 1880 and 2014. It was important to understand the trends in the birthrates of babies (Figure 1) and the number of times the most frequent baby name was used (Figure 2) in order to reach this third graph. By dividing the number of female babies with the most common name by the total number of babies born each year, a conclusion could be reach about the relative popularity of the most common female baby names over time.

This result and the figures produced throughout the report demonstrates that the trend in female baby naming has changed significantly over time. The most popular name has changed several times, often after remaining the most common name for several years, and the commonality of the most common female baby name has decreased significantly between 1880 and 2014.