PUBLIC DISCLOSURE: My name is Kier, and I am biased.
I got my hands on the database of US Baby Names from the Kaggle website. It covers babies born in the US between 1880 and 2014. Initially I was interested to see how the popularity of the names that my siblings and parents changed over this period . I probably shouldn’t have been surprised, but I was, that the popularity of their name peaked right around their birth year. For instance Ryan and Erin saw their name’s greatest popularity in the years around their birth years, 1970 and 1971, respectively. My name, Kier, has an unusual anomaly where there is a blip of popularity around my birth year, 1968, and then as we Kier’s reach adulthood the popularity of the name continued to grow over time.
A reasonable conclusion to be inferred is that as the greatness of the late 1960’s Kier’s became to be recognized as they reached adulthood, more women were inspired to name their babies after them.
rm(list=ls())
#setwd("~/Analytics Course/Kaggle/US Baby Names/output")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
The baby names database was acquired from https://www.kaggle.com/kaggle/us-baby-names on 2016-08-18.
# This analysis uses the National Names dataset
national_names_raw <- read.csv("C:/Users/Kier/Documents/Analytics Course/Kaggle/US Baby Names/output/NationalNames.csv")
Let’s take a look at the me, my wife, parents, and sibling’s names.
n1 <- national_names_raw %>%
filter(Name %in% c("Kier", "Keir", "Kieran", "Kiernan")) %>%
group_by(Year) %>%
summarise(sum_of_names = sum(Count)) %>%
mutate(Name = "Kier")
n2 <- national_names_raw %>%
filter(Name == "Eva") %>%
group_by(Year) %>%
summarise(sum_of_names = sum(Count)) %>%
mutate(Name = "Eva")
n3 <- national_names_raw %>%
filter(Name %in% c("Margot", "Margeaux")) %>%
group_by(Year) %>%
summarise(sum_of_names = sum(Count)) %>%
mutate(Name = "Margot")
n4 <- national_names_raw %>%
filter(Name %in% c("Robert", "Rob", "Bob")) %>%
group_by(Year) %>%
summarise(sum_of_names = sum(Count)) %>%
mutate(Name = "Robert")
n5 <- national_names_raw %>%
filter(Name == "Ryan") %>%
group_by(Year) %>%
summarise(sum_of_names = sum(Count)) %>%
mutate(Name = "Ryan")
n6 <- national_names_raw %>%
filter(Name == "Erin") %>%
group_by(Year) %>%
summarise(sum_of_names = sum(Count)) %>%
mutate(Name = "Erin")
name_counts <- rbind(n1,n2,n3,n4,n5,n6)
ggplot(name_counts, aes(Year, sum_of_names, color = Name)) + geom_line() +
facet_wrap(~Name, ncol = 3, scales = "free_y") +
theme(legend.position = "none")
Notice that Margot has a smooth pattern except for an anomoly in the middle.
Let’s take a closer look…
caption <- paste(strwrap("Birth year of my mother Margot", 20), collapse = "\n")
caption2 <- paste(strwrap("Period of rapid decline and recovery", 20), collapse = "\n")
ggplot(n3, aes(Year, sum_of_names)) +
geom_vline(aes(xintercept=1944)) +
ggtitle("Babies named Margot") +
ylab("Babies named Margot") +
geom_rect(aes(xmin=1962, xmax=1982,
ymin=-Inf, ymax=Inf), fill="yellow", alpha = 0.5) +
geom_line(color="blue") +
annotate("text", x=1918, y=300, label=caption, hjust=0, vjust=1, size=4) +
annotate("text", x=1962, y=300, label=caption2, hjust=0, vjust=1, size=4) +
theme(legend.position = "none")
It runs very consistant from 1940 to 2005 except for a minor spike and then a major drop from 1962 to 1982. By 1982 it was back to it’s previous levels of usage. Note also the surge in recent usage.
Variable n1 contains the subset of data for Kier. Let’s plot out it’s usage.
caption <- "Blip of popularity\nbetween\n1968 and 1972..."
caption2 <- paste(strwrap("...then a generation later a +300% increase in usage developed over the next 20 years.", 20), collapse = "\n")
p <- ggplot(n1, aes(Year, sum_of_names)) +
labs(title="Number of US babies named Kier\nbetween 1920 to 2014",
x = "Year",
y = "Babies Named Kier") +
geom_vline(aes(xintercept = 1988)) +
annotate("text", x=1940, y=275, label=caption, hjust=0, vjust=1, size=4) +
geom_rect(mapping=aes(xmin=1968, xmax=1972,
ymin=-Inf, ymax=Inf), fill="green", alpha=0.25) +
geom_rect(mapping=aes(xmin=1988, xmax=Inf,
ymin=-Inf, ymax=Inf), fill="green", alpha=0.5) + geom_line() +
annotate("text", x=1995, y=300, label=caption2, hjust=0, vjust=1, size=4) +
geom_smooth() +
geom_line()
p
The data indicates that as the Kier’s born in the late 1960’s and early 1970’s reached adulthood they must have inspired a generation of women to name their children after them. What these specific attributes are is not clear.
Note: Not a serious analysis