Introduction

Hollywood celebrities influences American taste in countless ways, including baby names (I mean first name). The Social Security Administration’s (SSA) official list of popular baby names in the U.S. is a very useful source to study how parent’s choice of names was influenced by celebrities.

Similar Analysis

My analysis was inspired by [Bhaskar V. Karambelkar: Popularity of baby names and Disney Movies] (https://rpubs.com/bhaskarvk/disney), which provide a great demonstration of how Disney movies influenced parent preferences in giving baby names. Merit and acknowledgment shall be given to Bhaskar V. Karambelkar.

Method

To evaluate how parent’s choice of names was influenced by celebrities, I did the following:
* Step 1: Define 9 female ‘star’ names from 3 domains: (i) Singer, (ii) Princess and (iii) Movie (Each domain has 3 names to compare with)
* Step 2: For each name defined in Step 1, extract the popularity of the name (i.e. number of hits) from Google Trends in the period of 2004 - 2018
* Step 3: For each name defined in Step 1, extract the count per year from the SSA data in the period of 2004 - 2017
* Step 4: Study the trends

Step 1: Define Names of Interest

Well.the selection of names is purely driven by my own personal taste. For each domin I selected the following first name: * Singer: Adele, Avril (from AVril Lavigne), Taylor (from Taylor Swift)
* Princess: Charlotte (Princess Charlotte from the Royal Family - a real princess), Elsa (from Disney movie: Frozen), Belle (from Disney movie: Beauty and the Beast)
* Movie: Gwyneth (Gwyneth Paltrow from the Iron Man - a real actress name), Katniss (a character from the Hunger Games), Rey (a character from the Star Wars: Episode VII - The Force Awakens)

Adele Avril Lavigne Taylor Swift

Charlotte Elsa Emma

Gwyneth Katniss Rey

library("ggplot2") # This is R most famous graphic package
library("ggthemes") # This is an add-on package for ggplot2
library("dplyr") # This is my favourite data transforming tool
library("lubridate") # This package provide the 'year' function I need
library("gtrendsR") # This is the package to extract Google Trend Data

Step 2: Extract Google Trend Data

Using the gtrendsR package to extract Google trend data is pretty simple.

# Define terms to search
t1 <- c("Adele", "Avril", "Taylor")
t2 <- c("Charlotte", "Elsa", "Belle")
t3 <- c("Gwyneth", "Katniss", "Rey")
name <- c(t1,t2,t3)

# Group by domains
group <- rep(c("Singer", "Princess", "Movie"), each=3)
gp <- data.frame(name, group)
gp$name <- as.character(gp$name)

# Extract data from Google Trends (This might take a while)
gt1 = gtrends(t1, gprop = "web", time = "all")[[1]]
gt2 = gtrends(t2, gprop = "web", time = "all")[[1]]
gt3 = gtrends(t3, gprop = "web", time = "all")[[1]]
gt <- rbind(gt1, gt2, gt3)

# Data cleaning
dt <- gt %>%
  mutate (year = year(date)) %>% # create year variable
  mutate (year = as.integer(year)) %>%
  mutate(hits = replace(hits, hits=="<1", 0)) %>% # replace hits <1 by zero
  mutate (hits = as.integer(hits)) %>%
  group_by(keyword, year) %>%
  summarise(hits = sum(hits)) %>% # calculate total count of hits per year
  rename(name = keyword) %>%
  inner_join(gp, by = "name") # extract hits for the 9 selected names only

# Data Transformation

## Find max
mx <- dt %>%
  group_by(name) %>%
  summarise(mx = max(hits)) # create a separate dataset to record the maximum hits for the defined period

## Normalise the hits based on the max value. 
dt <- dt %>%
  inner_join(mx, by = "name") %>%
  mutate (phits = hits/mx) # phits value should be between zero and 1 now

Now we can test plot the trends. Note that names have not been grouped yet (I will reorder them later).

ggplot(dt, aes(x=year, y=phits)) +
  geom_step() +
  scale_y_continuous(breaks=seq(0,1,0.1)) +
  scale_x_continuous(breaks=seq(2004,2018,1)) +
  facet_wrap(~name) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Step 3: Extract SSA Data

The data was provided in the babynames package. We just need to do some minor modifications.

library("babynames")

# Again, create a dataset to check the max. count of each each per name
bb_mx <- babynames %>%
  filter(year >= 2004) %>%
  filter(sex == "F") %>%
  group_by(name) %>%
  summarise(bb_mx = max(n))

# Normalisation: Again, normalise the counts to a value between zero and one
bb <- babynames %>%
  filter(year >= 2004) %>%
  filter(sex == "F") %>%
  inner_join(bb_mx, by = "name") %>%
  mutate (pn = n/bb_mx) %>%
  select(year, name, n, pn)

# Merge: Filter the list with the 9 selected names only
dt <- dt %>%
  inner_join(bb, by = c("year", "name"))

# Sort the names in preferred order
dt$name_f <- factor(dt$name,levels = c("Adele", "Avril", "Taylor", "Charlotte", "Elsa", "Belle", "Gwyneth", "Katniss", "Rey")) 

Analysis

Ola! For some names, there were very strong associations between the Google search hits (the blue line) and the actual count of names being registered in the SSA (the yellow line).

Singer’s Group

For signers, there were clear trends more parents named baby as ‘Adele’ or ‘Avril’ when Adele and Avril Lavigne gained popularity. However, similar association was not found under the name ‘Taylor’ (might be because ‘Taylor’ is a relatively more common name than ‘Adele’ or ‘Avril’ ?).

Princess’s Group

Little princess Charlotte gained a small jump in Google hits when she was born in 2015, but no significant influence was observed in parent naming preferences. The name ‘Elsa’, however is a legend! This name remains in a low popularity until 2014 where Disney movie: Frozen was released (Disney’s princesses has a stronger power than real princesses?). No obvious association can be observed from the name ‘Belle’ (Emma Watson’s version of Belle from the movie Beauty and the Beast).

Movie Group

The name ‘Gwyneth’ gained both popularity and count increase under SSA when the Iron Man movie series released. Katniss and Rey were two typical names that influenced parent’s choices. Prior to the release of the Hunger Games film and the Start Wars: Force Awaken, both names were extremely rare. Once the films were released both names have had huge jumps in popularity!

Final version: Animated Graph

Finally, I created an animated graph which can give a better visualisation of the changes.

library(gganimate)
p2 <- p1 +
  transition_reveal(year)
p2

Conclusion

Popularity of Hollywood celebrities can influence parents’ preferences in choosing baby names. The degree of influences, however can be varied.