12/08/2017
Popularity can't be determined only by how many babies were given a name.
Find the percentage of babies given a certain name out of all babies born in a particular year.
50's - Peter Pan: "Wendy"
60's - Breakfast at Tiffany's: "Tiffany"
70's - Logan: "Logan"
80's - 16 Candles: "Samantha"
90's - Pulp Fiction: "Mia"
This assigns a name to a year so that we can see the x intercept of each name associated with the year that the movie was released.
movieName <- c("Wendy", "Tiffany", "Logan","Samantha", "Mia" )
ym <- data.frame(year_movie = c(1953, 1961, 1976, 1984, 1994),
name = movieName)
In order to show which name the movie came from on the graph, we have to reassign names/labels.
movie_names <- list(
'Wendy'="Peter Pan: Wendy",
'Tiffany'="Breakfast at Tiffany's: Tiffany",
'Logan'="Logan's Run: Logan",
'Samantha'="16 Candles: Samantha",
'Mia'="Pulp Fiction: Mia"
)
movie_labeller <- function(variable, value){
return(movie_names[value])
}
To find the percentage for the names of interest we will have to create some new tables that include the variables we need.
Variable that shows the total number of babies born in each year: totalBabies.
Variable that is the number of babies given each name within each year: totalName.
New data table that contains: name, year, year_movie, nameTotal, and total.
totalBabies <- BabyNames %>% group_by(year) %>% summarise(total = sum(count)) totalName <- BabyNames %>% group_by(name, year)%>% summarise(nameTotal=sum(count)) PopNames <- totalName %>% inner_join(totalBabies) %>% inner_join(ym)
PopNames %>%
filter(name %in% movieName) %>%
group_by(name, year, year_movie, nameTotal, total) %>%
summarise(namepercent = ((nameTotal/total)*1000)) %>%
ggplot(aes(x = year, y = namepercent)) +
ggtitle("Percent of Babies Given Names Popularized by Movies")+
theme(text = element_text(size=11.5), plot.title=element_text(hjust=0.5))+
geom_line() +
geom_vline(aes(xintercept = year_movie)) +
labs(x = "Year", y = "Proportion Out of 1000")+
facet_wrap(~ name, labeller=movie_labeller)+
scale_x_continuous(limits = c(1940, 2010), breaks=seq(1940,2010,10))
In some cases the film industry greatly impacts the names given to babies.
Manual selection of names leaves room for a more in depth study.
Let R find the popular movies and popular names instead of doing it manually.
Scrape the Internet Movie Database (IMDb) to collect popular movies from each decade and then collect the names of important characters in each movie.
Join to the BabyNames data table.
Create something that analyzed each of those names from each movie in each decade and decided how many of them became "popular."
Because popular is not a numerical variable, decide what you will count as popular.
Check the before and after of the movie release in accordance with "popular."