Brian Weinfeld
May 7th, 2018
Top.Movie.Query <- function(years, rank){
years %>%
map_(~getURL(paste0('http://www.boxofficemojo.com/yearly/chart/?yr=', .x, '&p=.htm')) %>%
htmlParse() %>%
xpathSApply('//*[@id="body"]/table[3]//tr//td', xmlValue) %>%
.[15:914] %>%
matrix(ncol=9, byrow=T) %>%
as.data.frame() %>%
filter(row_number() <= rank) %>%
mutate(Movie=str_replace(V2, paste0('(.*?)( \\(', .x, '\\))$'), '\\1'),
Year = .x) %>%
select(Movie, Year)
)
}
top.movies <- Top.Movie.Query(2017:2008, 50)
all.movies <- map2_df(top.movies$Movie, top.movies$Year, ~Movie.API.Query(.x, .y))Top.Movie.Query scrapes boxofficemojo.com for the names of the top 50 domestic grossing blockbusters between 2008 and 2017Movie.API.Query <- function(movie, year){
print(movie)
initial.query <- GET('http://www.omdbapi.com/',
add_headers('Content-Type'='application/json', 'Accept-Encoding'='gzip'),
query=list('t'=movie, 'apikey'=apikey, 'y'=year, 'plot'='full')
) %>%
content(as='text') %>%
fromJSON(flatten=FALSE) %>%
.[-15] %>%
as.tibble()
if(ncol(initial.query) == 2){
print('Movie Not Found!')
tibble(Title=movie, Year=as.character(year))
}else{
initial.query %>%
select(c(1, 2, 3, 5, 6, 9, 10, 14, 15, 18, 21)) %>%
mutate(Genre = str_extract(Genre, '([^,]+)'),
Runtime = str_extract(Runtime, '(\\d+)'),
Actors = IMDB.Star.Query(imdbID),
BoxOffice = parse_number(BoxOffice)
) %>%
separate(Actors, c('Lead_1', 'Lead_2'), sep=', ') %>%
mutate(Lead_1_Male = Wikipedia.Gender.Query(Lead_1),
Lead_2_Male = Wikipedia.Gender.Query(Lead_2)
) %>%
select(c(1:6, 13, 7, 14), everything())
}
}Movie.API.Query accesses an API that called OMDB and requests each of the movies. This function called two other functions to fill in missing information, namely the stars of the movie and the genders of those stars.Wikipedia.Gender.Query <- function(lead){
Sys.sleep(0.5)
lead <- str_replace_all(lead, ' ', '_')
initial.query <- getURL(paste0('https://en.wikipedia.org/wiki/', lead)) %>%
htmlParse() %>%
xpathSApply('//*[@id="mw-content-text"]/div/p[position()<3]', xmlValue) %>%
unlist() %>%
paste(collapse='')
if(str_detect(initial.query, 'may refer to:')){
initial.query <- getURL(paste0('https://en.wikipedia.org/wiki/', lead, '_(actor)')) %>%
htmlParse() %>%
xpathSApply('//*[@id="mw-content-text"]/div/p[position()<3]', xmlValue) %>%
unlist() %>%
paste(collapse='')
}
if(str_detect(initial.query, 'actor') & !str_detect(initial.query, 'actress')){
return(TRUE)
}else if(str_detect(initial.query, 'actress') & !str_detect(initial.query, 'actor')){
return(FALSE)
}else{
return(NA)
}
}Wikipedia.Gender.Query scrapes Wikipedia in an effort to determine the gender of the star by looking for the words ‘actor’ or ‘actress’.| Title | Year | Rated | Runtime | Genre | Lead_1 | Lead_1_Male | Lead_2 | Lead_2_Male | BoxOffice | Type |
|---|---|---|---|---|---|---|---|---|---|---|
| The Dark Knight | 2008 | PG-13 | 152 | Action | Christian Bale | TRUE | Heath Ledger | TRUE | 533316061 | Male/Male |
| Avatar | 2009 | PG-13 | 162 | Action | Sam Worthington | TRUE | Zoe Saldana | FALSE | 749700000 | Male/Female |
| Marvel’s The Avengers | 2012 | PG-13 | 143 | Action | Robert Downey Jr. | TRUE | Chris Evans | TRUE | 623357910 | Male/Male |
| The Dark Knight Rises | 2012 | PG-13 | 164 | Action | Christian Bale | TRUE | Tom Hardy | TRUE | 448130642 | Male/Male |
| Star Wars: The Force Awakens | 2015 | PG-13 | 136 | Action | Daisy Ridley | FALSE | John Boyega | TRUE | 936658640 | Female/Male |
| Jurassic World | 2015 | PG-13 | 124 | Action | Chris Pratt | TRUE | Bryce Dallas Howard | FALSE | 528757749 | Male/Female |
| Rogue One: A Star Wars Story | 2016 | PG-13 | 133 | Action | Felicity Jones | FALSE | Diego Luna | TRUE | 532171696 | Female/Male |
| Finding Dory | 2016 | PG | 97 | Animation | Ellen DeGeneres | FALSE | Albert Brooks | TRUE | 486292984 | Female/Male |
| Star Wars: The Last Jedi | 2017 | PG-13 | 152 | Action | Daisy Ridley | FALSE | John Boyega | TRUE | 619117636 | Female/Male |
| Beauty and the Beast | 2017 | PG | 129 | Family | Emma Watson | FALSE | Dan Stevens | TRUE | 503974884 | Female/Male |
What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?
| Type | n |
|---|---|
| Female/Female | 32 |
| Female/Male | 79 |
| Male/Female | 169 |
| Male/Male | 220 |
##
## Chi-squared test for given probabilities
##
## data: movie.count$n
## X-squared = 173.81, df = 3, p-value < 2.2e-16
What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?
##
## Pearson's Chi-squared test
##
## data: gender.genre.compare
## X-squared = 94.273, df = 36, p-value = 4.09e-07
What is the relationship between the Top Grossing Blockbusters of the last decade and the gender of the movie’s main stars?
| Title | Year | BoxOffice | Type |
|---|---|---|---|
| Frozen | 2013 | $400,736,600 | Female/Female |
| Maleficent | 2014 | $190,871,149 | Female/Female |
| Cinderella | 2015 | $183,327,144 | Female/Female |
| The Help | 2011 | $169,705,587 | Female/Female |
| Hidden Figures | 2016 | $169,385,416 | Female/Female |
| Title | Year | BoxOffice | Type |
|---|---|---|---|
| Star Wars: The Force Awakens | 2015 | $936,658,640 | Female/Male |
| Star Wars: The Last Jedi | 2017 | $619,117,636 | Female/Male |
| Rogue One: A Star Wars Story | 2016 | $532,171,696 | Female/Male |
| Beauty and the Beast | 2017 | $503,974,884 | Female/Male |
| Finding Dory | 2016 | $486,292,984 | Female/Male |
| Title | Year | BoxOffice | Type |
|---|---|---|---|
| Avatar | 2009 | $749,700,000 | Male/Female |
| Jurassic World | 2015 | $528,757,749 | Male/Female |
| Transformers: Revenge of the Fallen | 2009 | $402,076,689 | Male/Female |
| Jumanji: Welcome to the Jungle | 2017 | $393,201,353 | Male/Female |
| Guardians of the Galaxy Vol. 2 | 2017 | $389,804,217 | Male/Female |
| Title | Year | BoxOffice | Type |
|---|---|---|---|
| Marvel’s The Avengers | 2012 | $623,357,910 | Male/Male |
| The Dark Knight | 2008 | $533,316,061 | Male/Male |
| The Dark Knight Rises | 2012 | $448,130,642 | Male/Male |
| Avengers: Age of Ultron | 2015 | $429,113,729 | Male/Male |
| Toy Story 3 | 2010 | $414,984,497 | Male/Male |
MATCH (a:Actor)-->(m:Movie)
WITH a, COUNT(m) AS c
ORDER BY c DESC
LIMIT 50
MATCH (a)-->(m)
RETURN a, m
| Type | word | n | tf | idf | tf_idf |
|---|---|---|---|---|---|
| Male/Male | frustration | 1 | 0.0003729 | 1.3862944 | 0.0005169 |
| Male/Male | mystery | 3 | 0.0011186 | 0.2876821 | 0.0003218 |
| Male/Male | prolong | 1 | 0.0003729 | 1.3862944 | 0.0005169 |
| Male/Male | colonel | 4 | 0.0014914 | 0.6931472 | 0.0010338 |
| Male/Male | major | 3 | 0.0011186 | 0.6931472 | 0.0007753 |
| Male/Male | drone | 1 | 0.0003729 | 1.3862944 | 0.0005169 |
| Male/Male | survive | 11 | 0.0041014 | 0.6931472 | 0.0028429 |
| Male/Male | specialist | 2 | 0.0007457 | 1.3862944 | 0.0010338 |
Amy Adams and Cameron Diaz star in “Action Blockbuster”! Frustrated by her commanding officer’s unwillingness to address an ongoing civil war on a foreign island nation, Major Jennifer Slater (Cameron Diaz) enlists the help of survival specialist Annie (Amy Adams). Together they journey to the secretive island in an effort to end the prolonged conflict. But what they discover there will shake the world to it’s very core. Can they solve the mystery of the island before Jennifer’s renagade Colonel can nuke the island via drone? You won’t want to miss a moment of “Action Blockbuster!”
Women are severely underrepresented in blockbuster movies. This is especially evident in movies that have two female leads. When movies do star women they are overwhelmingly likely to be either comedies aimed at women or fairy tales aimed at families. There is an severe lack of representation among women in action movies.