607 Final Project

Brian Weinfeld

May 7th, 2018

What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?

Top.Movie.Query <- function(years, rank){
  years %>%
    map_(~getURL(paste0('http://www.boxofficemojo.com/yearly/chart/?yr=', .x, '&p=.htm')) %>% 
              htmlParse() %>%
              xpathSApply('//*[@id="body"]/table[3]//tr//td', xmlValue) %>%
              .[15:914] %>%
              matrix(ncol=9, byrow=T) %>%
              as.data.frame() %>%
              filter(row_number() <= rank) %>%
              mutate(Movie=str_replace(V2, paste0('(.*?)( \\(', .x, '\\))$'), '\\1'),
                     Year = .x) %>%
              select(Movie, Year)
    )
}

top.movies <- Top.Movie.Query(2017:2008, 50)

all.movies <- map2_df(top.movies$Movie, top.movies$Year, ~Movie.API.Query(.x, .y))
Movie.API.Query <- function(movie, year){
  print(movie)
  initial.query <- GET('http://www.omdbapi.com/', 
      add_headers('Content-Type'='application/json', 'Accept-Encoding'='gzip'),
      query=list('t'=movie, 'apikey'=apikey, 'y'=year, 'plot'='full')
  ) %>%
  content(as='text') %>%
  fromJSON(flatten=FALSE) %>%
  .[-15] %>%
  as.tibble()
  if(ncol(initial.query) == 2){
    print('Movie Not Found!')
    tibble(Title=movie, Year=as.character(year))
  }else{
    initial.query %>%
      select(c(1, 2, 3, 5, 6, 9, 10, 14, 15, 18, 21)) %>%
      mutate(Genre = str_extract(Genre, '([^,]+)'),
             Runtime = str_extract(Runtime, '(\\d+)'),
             Actors = IMDB.Star.Query(imdbID),
             BoxOffice = parse_number(BoxOffice)
      ) %>%
      separate(Actors, c('Lead_1', 'Lead_2'), sep=', ') %>%
      mutate(Lead_1_Male = Wikipedia.Gender.Query(Lead_1),
             Lead_2_Male = Wikipedia.Gender.Query(Lead_2)
      ) %>%
      select(c(1:6, 13, 7, 14), everything())
  }
}
Wikipedia.Gender.Query <- function(lead){
  Sys.sleep(0.5)
  lead <- str_replace_all(lead, ' ', '_')
  initial.query <- getURL(paste0('https://en.wikipedia.org/wiki/', lead)) %>% 
            htmlParse() %>%
            xpathSApply('//*[@id="mw-content-text"]/div/p[position()<3]', xmlValue) %>%
            unlist()  %>%
            paste(collapse='')
  if(str_detect(initial.query, 'may refer to:')){
    initial.query <- getURL(paste0('https://en.wikipedia.org/wiki/', lead, '_(actor)')) %>% 
      htmlParse() %>%
      xpathSApply('//*[@id="mw-content-text"]/div/p[position()<3]', xmlValue) %>%
      unlist() %>%
      paste(collapse='')
  }
  if(str_detect(initial.query, 'actor') & !str_detect(initial.query, 'actress')){
    return(TRUE)
  }else if(str_detect(initial.query, 'actress') & !str_detect(initial.query, 'actor')){
    return(FALSE)
  }else{
    return(NA)
  }
}
Title Year Rated Runtime Genre Lead_1 Lead_1_Male Lead_2 Lead_2_Male BoxOffice Type
The Dark Knight 2008 PG-13 152 Action Christian Bale TRUE Heath Ledger TRUE 533316061 Male/Male
Avatar 2009 PG-13 162 Action Sam Worthington TRUE Zoe Saldana FALSE 749700000 Male/Female
Marvel’s The Avengers 2012 PG-13 143 Action Robert Downey Jr. TRUE Chris Evans TRUE 623357910 Male/Male
The Dark Knight Rises 2012 PG-13 164 Action Christian Bale TRUE Tom Hardy TRUE 448130642 Male/Male
Star Wars: The Force Awakens 2015 PG-13 136 Action Daisy Ridley FALSE John Boyega TRUE 936658640 Female/Male
Jurassic World 2015 PG-13 124 Action Chris Pratt TRUE Bryce Dallas Howard FALSE 528757749 Male/Female
Rogue One: A Star Wars Story 2016 PG-13 133 Action Felicity Jones FALSE Diego Luna TRUE 532171696 Female/Male
Finding Dory 2016 PG 97 Animation Ellen DeGeneres FALSE Albert Brooks TRUE 486292984 Female/Male
Star Wars: The Last Jedi 2017 PG-13 152 Action Daisy Ridley FALSE John Boyega TRUE 619117636 Female/Male
Beauty and the Beast 2017 PG 129 Family Emma Watson FALSE Dan Stevens TRUE 503974884 Female/Male

What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?

Type n
Female/Female 32
Female/Male 79
Male/Female 169
Male/Male 220

## 
##  Chi-squared test for given probabilities
## 
## data:  movie.count$n
## X-squared = 173.81, df = 3, p-value < 2.2e-16

What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?

## 
##  Pearson's Chi-squared test
## 
## data:  gender.genre.compare
## X-squared = 94.273, df = 36, p-value = 4.09e-07

What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?

Title Year BoxOffice Type
Frozen 2013 $400,736,600 Female/Female
Maleficent 2014 $190,871,149 Female/Female
Cinderella 2015 $183,327,144 Female/Female
The Help 2011 $169,705,587 Female/Female
Hidden Figures 2016 $169,385,416 Female/Female
Title Year BoxOffice Type
Star Wars: The Force Awakens 2015 $936,658,640 Female/Male
Star Wars: The Last Jedi 2017 $619,117,636 Female/Male
Rogue One: A Star Wars Story 2016 $532,171,696 Female/Male
Beauty and the Beast 2017 $503,974,884 Female/Male
Finding Dory 2016 $486,292,984 Female/Male
Title Year BoxOffice Type
Avatar 2009 $749,700,000 Male/Female
Jurassic World 2015 $528,757,749 Male/Female
Transformers: Revenge of the Fallen 2009 $402,076,689 Male/Female
Jumanji: Welcome to the Jungle 2017 $393,201,353 Male/Female
Guardians of the Galaxy Vol. 2 2017 $389,804,217 Male/Female
Title Year BoxOffice Type
Marvel’s The Avengers 2012 $623,357,910 Male/Male
The Dark Knight 2008 $533,316,061 Male/Male
The Dark Knight Rises 2012 $448,130,642 Male/Male
Avengers: Age of Ultron 2015 $429,113,729 Male/Male
Toy Story 3 2010 $414,984,497 Male/Male
Shiny applications not supported in static R Markdown documents

Shiny App

What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?

What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?

MATCH (a:Actor)-->(m:Movie)
WITH a, COUNT(m) AS c
ORDER BY c DESC
LIMIT 50

MATCH (a)-->(m)
RETURN a, m

Type word n tf idf tf_idf
Male/Male frustration 1 0.0003729 1.3862944 0.0005169
Male/Male mystery 3 0.0011186 0.2876821 0.0003218
Male/Male prolong 1 0.0003729 1.3862944 0.0005169
Male/Male colonel 4 0.0014914 0.6931472 0.0010338
Male/Male major 3 0.0011186 0.6931472 0.0007753
Male/Male drone 1 0.0003729 1.3862944 0.0005169
Male/Male survive 11 0.0041014 0.6931472 0.0028429
Male/Male specialist 2 0.0007457 1.3862944 0.0010338

Amy Adams and Cameron Diaz star in “Action Blockbuster”! Frustrated by her commanding officer’s unwillingness to address an ongoing civil war on a foreign island nation, Major Jennifer Slater (Cameron Diaz) enlists the help of survival specialist Annie (Amy Adams). Together they journey to the secretive island in an effort to end the prolonged conflict. But what they discover there will shake the world to it’s very core. Can they solve the mystery of the island before Jennifer’s renagade Colonel can nuke the island via drone? You won’t want to miss a moment of “Action Blockbuster!”

Shiny applications not supported in static R Markdown documents

Shiny App

What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?

Women are severely underrepresented in blockbuster movies. This is especially evident in movies that have two female leads. When movies do star women they are overwhelmingly likely to be either comedies aimed at women or fairy tales aimed at families. There is an severe lack of representation among women in action movies.