Hollywood celebrities influences American taste in countless ways, including baby names (I mean first name). The Social Security Administration’s (SSA) official list of popular baby names in the U.S. is a very useful source to study how parent’s choice of names was influenced by celebrities.
My analysis was inspired by [Bhaskar V. Karambelkar: Popularity of baby names and Disney Movies] (https://rpubs.com/bhaskarvk/disney), which provide a great demonstration of how Disney movies influenced parent preferences in giving baby names. Merit and acknowledgment shall be given to Bhaskar V. Karambelkar.
To evaluate how parent’s choice of names was influenced by celebrities, I did the following:
* Step 1: Define 9 female ‘star’ names from 3 domains: (i) Singer, (ii) Princess and (iii) Movie (Each domain has 3 names to compare with)
* Step 2: For each name defined in Step 1, extract the popularity of the name (i.e. number of hits) from Google Trends in the period of 2004 - 2018
* Step 3: For each name defined in Step 1, extract the count per year from the SSA data in the period of 2004 - 2017
* Step 4: Study the trends
Well.the selection of names is purely driven by my own personal taste. For each domin I selected the following first name: * Singer: Adele, Avril (from AVril Lavigne), Taylor (from Taylor Swift)
* Princess: Charlotte (Princess Charlotte from the Royal Family - a real princess), Elsa (from Disney movie: Frozen), Belle (from Disney movie: Beauty and the Beast)
* Movie: Gwyneth (Gwyneth Paltrow from the Iron Man - a real actress name), Katniss (a character from the Hunger Games), Rey (a character from the Star Wars: Episode VII - The Force Awakens)
library("ggplot2") # This is R most famous graphic package
library("ggthemes") # This is an add-on package for ggplot2
library("dplyr") # This is my favourite data transforming tool
library("lubridate") # This package provide the 'year' function I need
library("gtrendsR") # This is the package to extract Google Trend Data
Using the gtrendsR package to extract Google trend data is pretty simple.
# Define terms to search
t1 <- c("Adele", "Avril", "Taylor")
t2 <- c("Charlotte", "Elsa", "Belle")
t3 <- c("Gwyneth", "Katniss", "Rey")
name <- c(t1,t2,t3)
# Group by domains
group <- rep(c("Singer", "Princess", "Movie"), each=3)
gp <- data.frame(name, group)
gp$name <- as.character(gp$name)
# Extract data from Google Trends (This might take a while)
gt1 = gtrends(t1, gprop = "web", time = "all")[[1]]
gt2 = gtrends(t2, gprop = "web", time = "all")[[1]]
gt3 = gtrends(t3, gprop = "web", time = "all")[[1]]
gt <- rbind(gt1, gt2, gt3)
# Data cleaning
dt <- gt %>%
mutate (year = year(date)) %>% # create year variable
mutate (year = as.integer(year)) %>%
mutate(hits = replace(hits, hits=="<1", 0)) %>% # replace hits <1 by zero
mutate (hits = as.integer(hits)) %>%
group_by(keyword, year) %>%
summarise(hits = sum(hits)) %>% # calculate total count of hits per year
rename(name = keyword) %>%
inner_join(gp, by = "name") # extract hits for the 9 selected names only
# Data Transformation
## Find max
mx <- dt %>%
group_by(name) %>%
summarise(mx = max(hits)) # create a separate dataset to record the maximum hits for the defined period
## Normalise the hits based on the max value.
dt <- dt %>%
inner_join(mx, by = "name") %>%
mutate (phits = hits/mx) # phits value should be between zero and 1 now
Now we can test plot the trends. Note that names have not been grouped yet (I will reorder them later).
ggplot(dt, aes(x=year, y=phits)) +
geom_step() +
scale_y_continuous(breaks=seq(0,1,0.1)) +
scale_x_continuous(breaks=seq(2004,2018,1)) +
facet_wrap(~name) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
The data was provided in the babynames package. We just need to do some minor modifications.
library("babynames")
# Again, create a dataset to check the max. count of each each per name
bb_mx <- babynames %>%
filter(year >= 2004) %>%
filter(sex == "F") %>%
group_by(name) %>%
summarise(bb_mx = max(n))
# Normalisation: Again, normalise the counts to a value between zero and one
bb <- babynames %>%
filter(year >= 2004) %>%
filter(sex == "F") %>%
inner_join(bb_mx, by = "name") %>%
mutate (pn = n/bb_mx) %>%
select(year, name, n, pn)
# Merge: Filter the list with the 9 selected names only
dt <- dt %>%
inner_join(bb, by = c("year", "name"))
# Sort the names in preferred order
dt$name_f <- factor(dt$name,levels = c("Adele", "Avril", "Taylor", "Charlotte", "Elsa", "Belle", "Gwyneth", "Katniss", "Rey"))
Now I plot the Google Trends with the SSA data in one plot.
p1 <- ggplot(dt, aes(x=year)) +
geom_step(aes(y = pn, colour = "pn"), size=1.1) +
geom_step(aes(y = phits, colour = "phits")) +
scale_y_continuous(breaks=seq(0,1,0.1)) +
scale_x_continuous(breaks=seq(2004,2018,1)) +
facet_wrap(~name_f) +
labs(title = "Unique Baby Girl Names Inspired by Popular Celebrity: Denisty Plot",
subtitle = '',
caption = "Data Source: SSA and Google Trends",
x = "Year", y = "Density") +
theme_bw() +
scale_color_manual(labels = c("Google Hits", "SSA Name Register"), values = c("#56B4E9","#E69F00")) +
scale_fill_brewer(palette = "Set1") + # Color palette
theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot(p1)
An article: From Arya to Zayn: The Rise and Fall of 100 Hollywood-Inspired Baby Names edited by Gary Baum provide some explanations re the jumps of the popularity. Based on Gary’s analysis (+ a little bit of Googling myself) I defined the ‘boosting spots’ for each name as below:
# Put comments
boost <- read.csv(textConnection('
name,year,txt
"Adele",2011,"Adele won six Grammy Awards in 2011"
"Adele",2015,"Adele released her third studio album, 25, in 2015. It became the year best-selling album and broke first-week sales records in the UK and US"
"Avril",2007,"The Best Damn Thing (2007), her third studio album, reached number one in seven countries worldwide"
"Avril",2011,"Goodbye Lullaby was released in March 2011"
"Taylor",2008,"Her second album, Fearless, was released in 2008 and became the best-selling album of 2009 in the US"
"Taylor",2014,"Taylor received three Grammys"
"Charlotte",2015,"The little princess born on 2 May 2015 "
"Elsa",2013,"Frozen premiered at the El Capitan Theatre in Hollywood, California, on November 19, 2013"
"Belle",2017,"Disney cast Emma Watson as Belle in its Beauty and the Beast in 2017"
"Gwyneth",2008,"Iron Man I released"
"Gwyneth",2010,"Iron Man II released"
"Gwyneth",2013,"Iron Man III released"
"Katniss",2012,"The Hunger Games film was released in 2012"
"Rey",2016,"The Force Awakens was released in December 2015 and went on to break multiple box office records in various markets"
'), stringsAsFactors = FALSE,
colClasses = c("character", "numeric", "character"))
# Merge comments
dt <- dt %>%
left_join(boost, by = c("name", "year")) %>%
mutate(boost = ifelse(!is.na(txt), phits, NA))
# Plot again
p2 <- p1 +
geom_point(data = dt, aes(x=year, y=boost), color="#FF0000", na.rm=T)
plot(p2)
Ola! For some names, there were very strong associations between the Google search hits (the blue line) and the actual count of names being registered in the SSA (the yellow line).
For signers, there were clear trends more parents named baby as ‘Adele’ or ‘Avril’ when Adele and Avril Lavigne gained popularity. However, similar association was not found under the name ‘Taylor’ (might be because ‘Taylor’ is a relatively more common name than ‘Adele’ or ‘Avril’ ?).
Little princess Charlotte gained a small jump in Google hits when she was born in 2015, but no significant influence was observed in parent naming preferences. The name ‘Elsa’, however is a legend! This name remains in a low popularity until 2014 where Disney movie: Frozen was released (Disney’s princesses has a stronger power than real princesses?). No obvious association can be observed from the name ‘Belle’ (Emma Watson’s version of Belle from the movie Beauty and the Beast).
The name ‘Gwyneth’ gained both popularity and count increase under SSA when the Iron Man movie series released. Katniss and Rey were two typical names that influenced parent’s choices. Prior to the release of the Hunger Games film and the Start Wars: Force Awaken, both names were extremely rare. Once the films were released both names have had huge jumps in popularity!
Finally, I created an animated graph which can give a better visualisation of the changes.
library(gganimate)
p2 <- p1 +
transition_reveal(year)
p2
Popularity of Hollywood celebrities can influence parents’ preferences in choosing baby names. The degree of influences, however can be varied.