12/08/2017

Project Details

Task

  • Create a graph showing the ups and downs in the popularity of names of interest.

How?

  • Popularity can't be determined only by how many babies were given a name.

  • Find the percentage of babies given a certain name out of all babies born in a particular year.

Elaborate

  • Prince's name became popular after his album Purple rain.
  • To what extent does pop culture influence how people name their children?
  • Film industry has a huge influence.
  • New task involves graphing each name chosen for its respective decade and a vertical line showing when the movie was released.

Analysis

Names/Movies:

50's - Peter Pan: "Wendy"

60's - Breakfast at Tiffany's: "Tiffany"

70's - Logan: "Logan"

80's - 16 Candles: "Samantha"

90's - Pulp Fiction: "Mia"

Analysis

This assigns a name to a year so that we can see the x intercept of each name associated with the year that the movie was released.

movieName <- c("Wendy", "Tiffany", "Logan","Samantha", "Mia" )
ym <- data.frame(year_movie = c(1953, 1961, 1976, 1984, 1994),
                 name = movieName)

Analysis

In order to show which name the movie came from on the graph, we have to reassign names/labels.

movie_names <- list(
  'Wendy'="Peter Pan: Wendy",
  'Tiffany'="Breakfast at Tiffany's: Tiffany",
  'Logan'="Logan's Run: Logan",
  'Samantha'="16 Candles: Samantha",
  'Mia'="Pulp Fiction: Mia"
)

movie_labeller <- function(variable, value){
  return(movie_names[value])
}

Analysis

To find the percentage for the names of interest we will have to create some new tables that include the variables we need.

  • Variable that shows the total number of babies born in each year: totalBabies.

  • Variable that is the number of babies given each name within each year: totalName.

  • New data table that contains: name, year, year_movie, nameTotal, and total.

Analysis

totalBabies <-
  BabyNames %>%
  group_by(year) %>%
  summarise(total = sum(count))

totalName <-
  BabyNames %>%
  group_by(name, year)%>%
  summarise(nameTotal=sum(count))

PopNames <-
  totalName %>%
  inner_join(totalBabies) %>%
  inner_join(ym)

PopNames %>%
  filter(name %in% movieName) %>%

  group_by(name, year, year_movie, nameTotal, total) %>%
  
  summarise(namepercent = ((nameTotal/total)*1000)) %>%

  ggplot(aes(x = year, y = namepercent)) +

  ggtitle("Percent of Babies Given Names Popularized by Movies")+
  theme(text = element_text(size=11.5), plot.title=element_text(hjust=0.5))+

  geom_line() +
  geom_vline(aes(xintercept = year_movie)) +

  labs(x = "Year", y = "Proportion Out of 1000")+
  facet_wrap(~ name, labeller=movie_labeller)+ 

  scale_x_continuous(limits = c(1940, 2010), breaks=seq(1940,2010,10)) 

Discussion

Discussion

  • In some cases the film industry greatly impacts the names given to babies.

  • Manual selection of names leaves room for a more in depth study.

Future Study

  • Let R find the popular movies and popular names instead of doing it manually.

  • Scrape the Internet Movie Database (IMDb) to collect popular movies from each decade and then collect the names of important characters in each movie.

  • Join to the BabyNames data table.

  • Create something that analyzed each of those names from each movie in each decade and decided how many of them became "popular."

  • Because popular is not a numerical variable, decide what you will count as popular.

  • Check the before and after of the movie release in accordance with "popular."

Questions?