PUBLIC DISCLOSURE: My name is Kier, and I am biased.

Synopsis

I got my hands on the database of US Baby Names from the Kaggle website. It covers babies born in the US between 1880 and 2014. Initially I was interested to see how the popularity of the names that my siblings and parents changed over this period . I probably shouldn’t have been surprised, but I was, that the popularity of their name peaked right around their birth year. For instance Ryan and Erin saw their name’s greatest popularity in the years around their birth years, 1970 and 1971, respectively. My name, Kier, has an unusual anomaly where there is a blip of popularity around my birth year, 1968, and then as we Kier’s reach adulthood the popularity of the name continued to grow over time.
A reasonable conclusion to be inferred is that as the greatness of the late 1960’s Kier’s became to be recognized as they reached adulthood, more women were inspired to name their babies after them.

First clear the environment, set working directory, and load libraries.

rm(list=ls())
#setwd("~/Analytics Course/Kaggle/US Baby Names/output")

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

The Data

The baby names database was acquired from https://www.kaggle.com/kaggle/us-baby-names on 2016-08-18.

# This analysis uses the National Names dataset
national_names_raw <- read.csv("C:/Users/Kier/Documents/Analytics Course/Kaggle/US Baby Names/output/NationalNames.csv")

The O’Neil Family Names

Let’s take a look at the me, my wife, parents, and sibling’s names.

n1 <- national_names_raw %>%
    filter(Name %in% c("Kier", "Keir", "Kieran", "Kiernan")) %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count)) %>%
    mutate(Name = "Kier")
n2 <- national_names_raw %>%
    filter(Name == "Eva") %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count)) %>%
    mutate(Name = "Eva")
n3 <- national_names_raw %>%
    filter(Name %in% c("Margot", "Margeaux")) %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count)) %>%
    mutate(Name = "Margot")
n4 <- national_names_raw %>%
    filter(Name %in% c("Robert", "Rob", "Bob")) %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count)) %>%
    mutate(Name = "Robert")
n5 <- national_names_raw %>%
    filter(Name == "Ryan") %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count)) %>%
    mutate(Name = "Ryan")
n6 <- national_names_raw %>%
    filter(Name == "Erin") %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count)) %>%
    mutate(Name = "Erin")
name_counts <- rbind(n1,n2,n3,n4,n5,n6)

ggplot(name_counts, aes(Year, sum_of_names, color = Name)) + geom_line() +
    facet_wrap(~Name, ncol = 3, scales = "free_y") +
    theme(legend.position = "none")  

Notice that Margot has a smooth pattern except for an anomoly in the middle.

Let’s take a closer look…

caption <- paste(strwrap("Birth year of my mother Margot", 20), collapse = "\n")
caption2 <- paste(strwrap("Period of rapid decline and recovery", 20), collapse = "\n")

ggplot(n3, aes(Year, sum_of_names)) +  
    geom_vline(aes(xintercept=1944)) + 
    ggtitle("Babies named Margot") +
    ylab("Babies named Margot") +
    geom_rect(aes(xmin=1962, xmax=1982, 
                    ymin=-Inf, ymax=Inf), fill="yellow", alpha = 0.5) +
    geom_line(color="blue") +
    annotate("text", x=1918, y=300, label=caption, hjust=0, vjust=1, size=4) + 
    annotate("text", x=1962, y=300, label=caption2, hjust=0, vjust=1, size=4) +
     theme(legend.position = "none")

It runs very consistant from 1940 to 2005 except for a minor spike and then a major drop from 1962 to 1982. By 1982 it was back to it’s previous levels of usage. Note also the surge in recent usage.

Usage of the name Kier

Variable n1 contains the subset of data for Kier. Let’s plot out it’s usage.

caption <- "Blip of popularity\nbetween\n1968 and 1972..."
caption2 <- paste(strwrap("...then a generation later a +300% increase in usage developed over the next 20 years.", 20), collapse = "\n")
p <- ggplot(n1, aes(Year, sum_of_names)) +  
    labs(title="Number of US babies named Kier\nbetween 1920 to 2014",
         x = "Year",
         y = "Babies Named Kier") +
    geom_vline(aes(xintercept = 1988)) + 
    annotate("text", x=1940, y=275, label=caption, hjust=0, vjust=1, size=4) + 
    geom_rect(mapping=aes(xmin=1968, xmax=1972, 
                          ymin=-Inf, ymax=Inf), fill="green", alpha=0.25) +
    geom_rect(mapping=aes(xmin=1988, xmax=Inf, 
                          ymin=-Inf, ymax=Inf), fill="green", alpha=0.5) + geom_line() +
    annotate("text", x=1995, y=300, label=caption2, hjust=0, vjust=1, size=4) + 
    geom_smooth() +
    geom_line()
p

Conclusion

The data indicates that as the Kier’s born in the late 1960’s and early 1970’s reached adulthood they must have inspired a generation of women to name their children after them. What these specific attributes are is not clear.
Note: Not a serious analysis