Synopsis

You probably know a Kaitlyn, of course you do. The name Kaitlyn, and its derivations, became extremely popular from the late 1980’s to 2000. This analysis shows three distinct periods: 1. the steady rise in usage of Kaitlyn between 1980 and 1990; 2. Consistantly high usage in the 1990’s, peaking in the year 2000, when it was the most popular girls name; 3. Rapid and steady decline in usage in usage after 2000 that continues to this day.
This dataset was retrieved from The US Baby Names database at https://www.kaggle.com/kaggle/us-baby-names on 2016-08-18.

Data Processing

rm(list=ls())
#setwd("~/Analytics Course/Kaggle/US Baby Names/output")
setwd("C:/Users/Kier/Documents/Analytics Course/Kaggle/US Baby Names/Katelyns")

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tibble)

# Data acquired from https://www.kaggle.com/kaggle/us-baby-names 
# on 2016-08-18.

# This analysis uses the National Names dataset
national_names_raw <- read.csv("NationalNames.csv")
names_full <- national_names_raw

All the derivations of Kaitlyn; 57 total.

kaitlyn_derivations <- c("Katelyn", "Katelin", "Katelynn", 
                         "Katelynne", "Katelan", "Catelyn", 
                         "Catelin", "Katelynd", "Katelen", 
                         "Katelind", "Katelyne", "Catelynn", 
                         "Katelinn", "Katelund", "Kateline", 
                         "Catelynne", "Katelon", "Katelina", 
                         "Caitlin", "Kaitlyn", "Caitlyn", 
                         "Kaitlin", "Caitilin", "Kaitlan", 
                         "Kaitlynn", "Caitlan", "Caitlynn", 
                         "Kaitlen", "Caitlen", "Kaithlyn", 
                         "Caitland", "Kaitland", "Caitlain", 
                         "Kaitlynne", "Caitlynne", "Caithlin", 
                         "Caitlinn", "Kaityln", "Kaithlin", 
                         "Kaitlinn", "Kaitlyne", "Kaitlynd", 
                         "Kaitlain", "Kaitlind", "Caitlynd", 
                         "Caitlyne", "Kaitlon", "Kaitylyn", 
                         "Caitline", "Kaitelyn", "Caitylyn",
                         "Caityln", "Kaitleen", "Kaitelynn", 
                         "Caitelyn", "Caitilyn", "Kaithlynn")
length(kaitlyn_derivations)
## [1] 57

There are 57 derivations of the name Kaitlyn

Create data set with only records for Kaitlyns

kaitlyns <- names_full %>%
    filter(Gender == "F" & Name %in% kaitlyn_derivations)

k2 <- kaitlyns %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count))

Kaitlyn usage through US History

ggplot(k2, aes(Year, sum_of_names)) + geom_line() +
    labs(title="Number of baby girls named Kaitlyn through history",
         y="Number of Babies Named") + geom_smooth()

Note:
The rapid growth from 1980 to 1990;
Sustained high usage from 1990 to 2000;
The rapid decline after after 2000.

Here are the top 10 derivations of Kaitlyn through history.

k1 <- kaitlyns %>%
    group_by(Name) %>%
    summarise(sub_total = sum(Count)) %>%
    arrange(-sub_total)
head(k1,10)
## # A tibble: 10 x 2
##        Name sub_total
##      <fctr>     <int>
## 1   Kaitlyn    158601
## 2   Katelyn    127809
## 3   Caitlin    110413
## 4   Kaitlin     56678
## 5   Caitlyn     50566
## 6  Katelynn     28857
## 7  Kaitlynn     15248
## 8   Katelin      8808
## 9  Caitlynn      4644
## 10  Catelyn      2112

1980 to 1990 - The Growth Years

Get data only for the growth years

# Full set for 1980 to 1990
names_80_90 <- names_full %>%
    filter(Year >= 1980 & Year <= 1990) %>%
    group_by(Year, Name) %>%
    summarise(sum_of_names = sum(Count)) %>%
    arrange(Year, -sum_of_names)

# Kaitlyn data for 1980 to 1990
kaitlyn_80_90 <- names_80_90 %>%
    filter(Name %in% kaitlyn_derivations) %>%
    arrange(-sum_of_names)
head(kaitlyn_80_90, 10)
## Source: local data frame [10 x 3]
## Groups: Year [6]
## 
##     Year    Name sum_of_names
##    <int>  <fctr>        <int>
## 1   1988 Caitlin         7269
## 2   1989 Caitlin         7072
## 3   1990 Caitlin         7045
## 4   1987 Caitlin         5016
## 5   1990 Katelyn         4495
## 6   1990 Kaitlyn         4318
## 7   1989 Katelyn         4044
## 8   1986 Caitlin         3897
## 9   1985 Caitlin         3619
## 10  1988 Katelyn         3520

Interesting that the most popular derivation changed from year to year.

Chart the growth from 1980 to 1990

k89_1 <- kaitlyn_80_90 %>%
    group_by(Year) %>%
    summarise(total = sum(sum_of_names)) %>%
    arrange(Year)

ggplot(k89_1, aes(Year, total)) + geom_line() + geom_smooth()

print(paste0("The growth in usage between 1980 and 1990 was ", 
             round(max(k89_1$total)/min(k89_1$total)*100, 2), "%"))
## [1] "The growth in usage between 1980 and 1990 was 2180.97%"

This plot shows strong, steady growth, with little variation over this period.
From 1980 to 1990 the use of the name Kaitlyn rose from practically zero to 22,985 little girls being given that name.

1990 and 2000 - The Sustained-Usage Period

This decade saw high, sustained usage peaking in 2000.

kaitlyn_90_00 <- kaitlyns %>%
    filter(Year >= 1990 & Year <= 2000) %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count))

# Plot out usage from 1990 to 2000
ggplot(kaitlyn_90_00, aes(Year, sum_of_names)) + geom_line() + geom_smooth() +
    labs(title="Usage of Kaitlyn from 1990 to 2000",
         x="Year", y="Usage")

Notice that this name falls into fairly consistent usage over the decade, with modest growth. The range is only about 4000 between low and high. The average usage over this period was 25,319 per year

range(kaitlyn_90_00$sum_of_names)
## [1] 22626 26789
mean(kaitlyn_90_00$sum_of_names)
## [1] 25319.73
sd(kaitlyn_90_00$sum_of_names)
## [1] 1463.496

2000 to present - The Decline of Kaitlyn

Now let’s look at the period from 2000 to 2014 (most current data).

kaitlyn_00_14 <- kaitlyns %>%
    filter(Year >= 2000) %>%
    group_by(Year) %>%
    summarise(sum_of_names = sum(Count))

ggplot(kaitlyn_00_14, aes(Year, sum_of_names)) + geom_line() + geom_smooth()

A long sustained decline with little variation

A look at specific years during the sustained period: 1990, 1995, and 2000

1990

# Get all the female names for 1990, order from highest usage to lowest
year_1990 <- national_names_raw %>%
    filter(Gender == "F" & (Year == 1990)) %>%
    group_by(Year, Name) %>%
    summarise(sum_of_name = sum(Count)) %>%
    arrange(-sum_of_name, Name)

How many Kaitlyns were born in 1990?

(kaitlyns_1990 <- year_1990 %>%
    filter(Name %in% kaitlyn_derivations) %>%
    summarise(total_kaitlyns_1990 = sum(sum_of_name)) %>%
    select(-Year) %>%
    unlist())
## total_kaitlyns_1990 
##               22985

How many unique derivations were used?

dispersion_1990 <- year_1990 %>%
    filter(Name %in% kaitlyn_derivations) %>%
    arrange(-sum_of_name)
length(unique(dispersion_1990$Name))
## [1] 38

Top 20 Names of 1990

ranked_names_1990 <- tibble::rownames_to_column(year_1990, var = "rank")
rn_1990 <- ranked_names_1990 %>%
    mutate(rank = as.integer(rank))
(top_twenty_1990 <- rn_1990 %>%
    filter(rank <= 20))
## Source: local data frame [20 x 4]
## Groups: Year [1]
## 
##     rank  Year      Name sum_of_name
##    <int> <int>    <fctr>       <int>
## 1      1  1990   Jessica       46466
## 2      2  1990    Ashley       45549
## 3      3  1990  Brittany       36535
## 4      4  1990    Amanda       34406
## 5      5  1990  Samantha       25864
## 6      6  1990     Sarah       25808
## 7      7  1990 Stephanie       24856
## 8      8  1990  Jennifer       22221
## 9      9  1990 Elizabeth       20742
## 10    10  1990    Lauren       20498
## 11    11  1990     Megan       20255
## 12    12  1990     Emily       19358
## 13    13  1990    Nicole       17950
## 14    14  1990     Kayla       17536
## 15    15  1990     Amber       15863
## 16    16  1990    Rachel       15703
## 17    17  1990  Courtney       15377
## 18    18  1990  Danielle       14330
## 19    19  1990   Heather       14217
## 20    20  1990   Melissa       13996
top_twenty_1990$Name <- ordered(x=top_twenty_1990$Name, levels = top_twenty_1990$Name)

ggplot(top_twenty_1990,aes(x=Name, y=sum_of_name))+
    geom_bar(stat = "identity") +
    labs(title="Top 20 girl names of 1990", 
         y="Number of girls with name") + 
    geom_hline(yintercept = kaitlyns_1990, color = "red") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    annotate("text", x="Nicole", y=27000, 
             label = "Red line represents number of Kaitlyns born in 1990", 
             color = "red")

The number of girls named Kaitlyn, and derivations, in 1990 was 22,985 that name would have ranked 8th in 1990. A very popular name.
Even though the average usage in the top ten is 30,294 there is a nearly 10,000 break between #4 and #5.

1995

This is in the middle of the sustained usage period.

Get all the female names for 1995, order from highest usage to lowest

year_1995 <- national_names_raw %>%
    filter(Gender == "F" & (Year == 1995)) %>%
    group_by(Year, Name) %>%
    summarise(sum_of_name = sum(Count)) %>%
    arrange(-sum_of_name, Name)

How many Kaitlyns were born in 1995?

(kaitlyns_1995 <- year_1995 %>%
    filter(Name %in% kaitlyn_derivations) %>%
    summarise(total_kaitlyns_1995 = sum(sum_of_name)) %>%
    select(-Year) %>%
    unlist())
## total_kaitlyns_1995 
##               26450

How many unique derivations were used?

dispersion_1995 <- year_1995 %>%
    filter(Name %in% kaitlyn_derivations) %>%
    arrange(-sum_of_name)
length(unique(dispersion_1995$Name))
## [1] 39

Get the Top 20 Names in 1995

ranked_names_1995 <- tibble::rownames_to_column(year_1995, var = "rank")
rn_1995 <- ranked_names_1995 %>%
    mutate(rank = as.integer(rank))
(top_twenty_1995 <- rn_1995 %>%
    filter(rank <= 20))
## Source: local data frame [20 x 4]
## Groups: Year [1]
## 
##     rank  Year      Name sum_of_name
##    <int> <int>    <fctr>       <int>
## 1      1  1995   Jessica       27938
## 2      2  1995    Ashley       26603
## 3      3  1995     Emily       24377
## 4      4  1995  Samantha       21646
## 5      5  1995     Sarah       21365
## 6      6  1995    Taylor       20424
## 7      7  1995    Hannah       17012
## 8      8  1995  Brittany       16477
## 9      9  1995    Amanda       16344
## 10    10  1995 Elizabeth       16183
## 11    11  1995     Kayla       16083
## 12    12  1995    Rachel       16041
## 13    13  1995     Megan       15529
## 14    14  1995    Alexis       14330
## 15    15  1995    Lauren       13444
## 16    16  1995 Stephanie       12979
## 17    17  1995  Courtney       12771
## 18    18  1995  Jennifer       12685
## 19    19  1995    Nicole       12276
## 20    20  1995  Victoria       12250
top_twenty_1995$Name <- ordered(x=top_twenty_1995$Name, levels = top_twenty_1995$Name)

Plot them in from highest to lowest with a line showing where Kaitlyn would have ranked

ggplot(top_twenty_1995,aes(x=Name, y=sum_of_name))+
    geom_bar(stat = "identity") +
    labs(title="Top 20 girl names of 1995", 
         y="Number of girls with name") + 
    geom_hline(yintercept = kaitlyns_1995, color = "red") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    annotate("text", x="Kayla", y=24000, 
             label = "Red line represents number of Kaitlyns born in 1995", 
             color = "red")

The number of girls named Kaitlyn, and derivations, in 1995 was 22,985 that name would have ranked 3rd in 1995. A very popular name.

2000

This is in the end of the sustained usage period.

Get all the female names for 2000, order from highest usage to lowest

year_2000 <- national_names_raw %>%
    filter(Gender == "F" & (Year == 2000)) %>%
    group_by(Year, Name) %>%
    summarise(sum_of_name = sum(Count)) %>%
    arrange(-sum_of_name, Name)

How many Kaitlyns were born in 2000?

(kaitlyns_2000 <- year_2000 %>%
    filter(Name %in% kaitlyn_derivations) %>%
    summarise(total_kaitlyns_2000 = sum(sum_of_name)) %>%
    select(-Year) %>%
    unlist())
## total_kaitlyns_2000 
##               26789

How many unique derivations were used?

dispersion_2000 <- year_2000 %>%
    filter(Name %in% kaitlyn_derivations) %>%
    arrange(-sum_of_name)

length(unique(dispersion_2000$Name))
## [1] 37

Conclusion

We can assume that Kaitlyn was in the top ten most used girls name during the entire decade from 1990 to 2000. It was the most popular name in 2000, and the 3rd most popular in 1995.If you meet a Kaitlyn chances are that she was born between 1985 (the 2nd half of the growth period), and 2000 (the end of the sustained period). As of this writing (2016), they would be between 16 and 31 years old.
An older Kaitlyn would be considered an early adopter, and a younger Kaitlyn would be considered a late adopter.

The End