Frequency of seasonal names through generations

The dataset babynames provides all names for US babies from 1880-2017. Using this resource, I want to determine the frequency of seasonal related babynames.

Thesis Statement- I hypothesize that that each generation closest to modern day will see more popularity as far as seasonal like names. Let’s start by loading our packages in R.

library(tidyverse)
library(babynames)
library(ggthemes)

The first step is to determine what the names I will be including in my analysis of ‘seasonal names.’

I will create a filter for the names Winter, Spring, Summer, & Autumn, and since I will be focusing on each gender first I will filter females with these names by labeling it season_f then I will filter men by labeling it seasons_m

babynames %>% 
  filter(name %in% c("Winter", "Spring", "Summer", "Autumn"), sex=="F")-> seasons_f

babynames %>% 
  filter(name %in% c("Winter", "Spring", "Summer", "Autumn"), sex=="M")-> seasons_m

Now that I have created a variable, I will use these to represent the different names I will be analyzing: Winter, Spring, Summer, Autumn

Next, lets see arrange these names in descending order starting with the highest prop; The proportion (prop) of children born with that name in each year.

seasons_f %>% 
  arrange(desc(prop))

## # A tibble: 289 × 5
##     year sex   name       n    prop
##    <dbl> <chr> <chr>  <int>   <dbl>
##  1  1998 F     Autumn  4208 0.00217
##  2  1999 F     Autumn  4127 0.00212
##  3  2001 F     Autumn  4191 0.00212
##  4  2015 F     Autumn  4112 0.00211
##  5  2016 F     Autumn  4022 0.00209
##  6  2014 F     Autumn  4062 0.00208
##  7  2002 F     Autumn  4103 0.00208
##  8  2013 F     Autumn  3950 0.00205
##  9  2003 F     Autumn  4055 0.00202
## 10  2000 F     Autumn  4027 0.00202
## # … with 279 more rows

From this visualization we can see that Females named Autumn have taken over our top 10 for the highest prop in the data set. Now lets do this with males.

seasons_m %>% 
  arrange(desc(prop))

## # A tibble: 99 × 5
##     year sex   name       n      prop
##    <dbl> <chr> <chr>  <int>     <dbl>
##  1  2016 M     Winter    46 0.0000228
##  2  2017 M     Winter    42 0.0000214
##  3  2014 M     Winter    35 0.0000171
##  4  2012 M     Winter    34 0.0000168
##  5  2015 M     Winter    33 0.0000162
##  6  2004 M     Autumn    28 0.0000133
##  7  2004 M     Summer    28 0.0000133
##  8  1977 M     Summer    20 0.0000117
##  9  2002 M     Winter    24 0.0000116
## 10  2005 M     Winter    23 0.0000108
## # … with 89 more rows

From this table we can see that Winter was the most popular name for Males over time, specifically in 2016, 2017, 2014, 2012, and 2015 which takes over the top 5 most popular when using prop.

Now I will plot the names using a different color for each name. I am creating for both N and prop to see if there is a major difference.

ggplot(seasons_f, aes(x=year, y=n, color=name)) +
  geom_line()

ggplot(seasons_f, aes(x=year, y=prop, color=name)) +
  geom_line() -> prop_time_female

We can see that n and prop graphs are very comparable.

Through this graph we see how Autumn is the most popular name for females, and has consistently been since after approx 1975.

Now I am going to do the same thing, except for males.

ggplot(seasons_m, aes(x=year, y=n, color=name))+
  geom_line()

ggplot(seasons_m, aes(x=year, y=prop, color=name))+
  geom_line()

We see that the graphs again are very similar in nature.

It is evident that the male names have more variation then the females did, however it is obvious that the name Winter has been consistently the most popular since after the 2000s.

Next I will plot All the names together over time for females in a bar graph.

 ggplot(seasons_f, aes(x=name, y=n)) +
  geom_col()

 ggplot(seasons_f, aes(x=name, y=prop)) +
  geom_col()

Again, we see that there is not much variation between n and prop for the graphs.

This bar graph does show up overall through all of time Autumn is the most popular female name followed by Summer, Winter, then Spring.

Now I will do the same for males. This will represent all the names together over time for males.

ggplot(seasons_m, aes(x=name, y=n)) +
  geom_col()

ggplot(seasons_m, aes(x=name, y=prop)) +
  geom_col()

We see here that Spring has ever been used enough as a male name to ever show up for our data set and bar graph. We see Winter is the most popular male name followed by summer and autumn which are not too far a part.

Now I will be creating variables to divide the four seasonal names, by gender by season. This will help me with my analysis.

 seasons_f %>% 
  filter( year >= 1901 & year <=1927) -> greatest_gen_f

seasons_m %>% 
  filter( year >= 1901 & year <=1927) -> greatest_gen_m

seasons_f %>% 
  filter( year >= 1928 & year <=1945) -> silent_gen_f

seasons_m %>% 
  filter( year >= 1928 & year <=1945) -> silent_gen_m

seasons_f %>% 
  filter( year >= 1946 & year <=1964) -> boomer_gen_f

seasons_m %>% 
  filter( year >= 1946 & year <=1964) -> boomer_gen_m

seasons_f %>% 
  filter( year >= 1965 & year <=1980) -> x_gen_f

seasons_m %>% 
  filter( year >= 1965 & year <=1980) -> x_gen_m

seasons_f %>% 
  filter( year >= 1981 & year <=1995) -> millenials_gen_f

seasons_m %>% 
  filter( year >= 1981 & year <=1995) -> millenials_gen_m

seasons_f %>% 
  filter( year >= 1996 & year <=2010) -> z_gen_f

seasons_m %>% 
  filter( year >= 1996 & year <=2010) -> z_gen_m

seasons_f %>% 
  filter( year >= 2011 & year <=2025) -> alpha_gen_f

seasons_m %>% 
  filter( year >= 2011 & year <=2025) -> alpha_gen_m

Now I am taking my line graph of female seasonal names and adding annotations so that it is clearly divided by season. I also used the theme ‘fivethirtyeight’ here. These annotations add shaded areas for every other generation so that people can see the lines during each generation. I also labeled the relevant seasons, the ones who show data aka the boomer gen and beyond.

prop_time_female +
  annotate("rect", xmin=2011, xmax=2018, ymin= 0, ymax = 0.003, alpha=.2) +
  annotate("rect", xmin=1981, xmax=1995, ymin= 0, ymax = 0.003, alpha=.2)+ 
  annotate("rect", xmin=1946, xmax=1964, ymin= 0, ymax = 0.003, alpha=.2) +
  annotate("text", x=1956, y=.0031, label= "Boomer") +
  annotate("text", x=1974, y=.0031, label= "Gen X") +
  annotate("text", x=1988, y=.0031, label= "Millenials") +
  annotate("text", x=2004, y=.0031, label= "Gen Z") +
  annotate("text", x=2015, y=.0031, label= "Alpha") +
  theme_fivethirtyeight()

Now I will create a variable to show the overall popularity of these names in total by gender. This will help us see if M or F used seasonal names more.

babynames %>%
  filter(name %in% c("Winter", "Spring", "Summer", "Autumn"),) %>% 
  group_by(year, sex) %>% 
  summarise(n=sum(n)) %>% 
  arrange(year) -> seasons_mf

Now I will graph it using a line plot!

seasons_mf %>% 
  ggplot(aes(x=year, y=n, color=sex )) +
  geom_line() +
  scale_color_fivethirtyeight() +
  theme_fivethirtyeight() -> compare_gender

Now I will add the same annotations, including shaded areas to differentiate the generation, and names to clarify them.

compare_gender +
  annotate("rect", xmin=2011, xmax=2018, ymin= 0, ymax =7000, alpha=.2) +
  annotate("rect", xmin=1981, xmax=1995, ymin= 0, ymax = 7000, alpha=.2)+ 
  annotate("rect", xmin=1946, xmax=1964, ymin= 0, ymax = 7000, alpha=.2) +
  annotate("text", x=1956, y=7100, label= "Boomer") +
  annotate("text", x=1974, y=7100, label= "Gen X") +
  annotate("text", x=1988, y=7100, label= "Millenials") +
  annotate("text", x=2004, y=7100, label= "Gen Z") +
  annotate("text", x=2015, y=7100, label= "Alpha") +
  theme_fivethirtyeight()

Here we see that there is an exponential more amount of males than females, and when comparing the two it seems that there are no males at all, but we know this is false since we graphed it earlier.

To me, the spike around the 1970s is extremely interesting as well as again around the 90s. It seems that seasonal names were at its highest right before the 2000s and has been decreasing ever since.

Frequency of seasonal names through generations

Christina Alescio

2022-09-25