Question 1: The budget for Abraham Lincoln: Vampire Hunter was 67.5 million dollars. It’s domestic gross was 37.5 million dollars, and it’s international gross was 115.1 million dollars. The movie does fails the Bechdel test, as it has two women or fewer, and the women either don’t talk at all or only talk about other men. The result “dubious” means that the contributors were skeptical whether or not the movie passed the test.
Question 2: The Bechdel test is a test invented by Alison Bechdel intending to analyze whether or not the movie has two women that have a meaninful conversation that does not involve a man. Movies that fail this test either do not feature a conversation between two women that isn’t about a man or don’t have women in the movie at all. The author of the article found that the budgets of movies that pass the Bechdel test is much lower than those that don’t, however,movies that do pass the test may have a higher return on investment than those that don’t.
In this mini analysis we work with the data used in the
FiveThirtyEight story titled “The
Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. Your
task is to fill in the blanks denoted by ___.
We start with loading the packages we’ll use.
library(fivethirtyeight)
library(tidyverse)
The dataset contains information on 1794 movies released between 1970 and 2013. However we’ll focus our analysis on movies released between 1990 and 2013.
bechdel90_13 <- bechdel %>%
filter(between(year, 1990, 2013))
print(bechdel90_13)
## # A tibble: 1,615 × 15
## year imdb title test clean…¹ binary budget domgr…² intgr…³ code budge…⁴
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr> <int>
## 1 2013 tt1711… 21 &… nota… notalk FAIL 1.3 e7 2.57e7 4.22e7 2013… 1.3 e7
## 2 2012 tt1343… Dred… ok-d… ok PASS 4.5 e7 1.34e7 4.09e7 2012… 4.57e7
## 3 2013 tt2024… 12 Y… nota… notalk FAIL 2 e7 5.31e7 1.59e8 2013… 2 e7
## 4 2013 tt1272… 2 Gu… nota… notalk FAIL 6.1 e7 7.56e7 1.32e8 2013… 6.1 e7
## 5 2013 tt0453… 42 men men FAIL 4 e7 9.50e7 9.50e7 2013… 4 e7
## 6 2013 tt1335… 47 R… men men FAIL 2.25e8 3.84e7 1.46e8 2013… 2.25e8
## 7 2013 tt1606… A Go… nota… notalk FAIL 9.2 e7 6.73e7 3.04e8 2013… 9.2 e7
## 8 2013 tt2194… Abou… ok-d… ok PASS 1.2 e7 1.53e7 8.73e7 2013… 1.2 e7
## 9 2013 tt1814… Admi… ok ok PASS 1.3 e7 1.80e7 1.80e7 2013… 1.3 e7
## 10 2013 tt1815… Afte… nota… notalk FAIL 1.3 e8 6.05e7 2.44e8 2013… 1.3 e8
## # … with 1,605 more rows, 4 more variables: domgross_2013 <dbl>,
## # intgross_2013 <dbl>, period_code <int>, decade_code <int>, and abbreviated
## # variable names ¹clean_test, ²domgross, ³intgross, ⁴budget_2013
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
There are 1615 such movies.
The financial variables we’ll focus on are the following:
bechdel2005 <- bechdel %>%
filter(between(year, 2005, 2005))
print(bechdel2005)
## # A tibble: 100 × 15
## year imdb title test clean…¹ binary budget domgr…² intgr…³ code budge…⁴
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr> <int>
## 1 2005 tt0402… AEon… ok ok PASS 5.5 e7 2.59e7 4.80e7 2005… 6.56e7
## 2 2005 tt0398… Assa… ok ok PASS 3 e7 2.00e7 3.60e7 2005… 3.58e7
## 3 2005 tt0372… Batm… nota… notalk FAIL 1.5 e8 2.05e8 3.73e8 2005… 1.79e8
## 4 2005 tt0388… Beau… ok ok PASS 2.5 e7 3.64e7 3.84e7 2005… 2.98e7
## 5 2005 tt0374… Bewi… ok ok PASS 8 e7 6.33e7 1.31e8 2005… 9.54e7
## 6 2005 tt0383… Bloo… dubi… dubious FAIL 2.5 e7 2.41e6 3.61e6 2005… 2.98e7
## 7 2005 tt0357… Boog… men men FAIL 2 e7 4.68e7 6.72e7 2005… 2.39e7
## 8 2005 tt0439… Boyn… ok ok PASS 2.9 e6 3.13e6 3.13e6 2005… 3.46e6
## 9 2005 tt0393… Brick nota… notalk FAIL 4.5 e5 2.08e6 4.09e6 2005… 5.37e5
## 10 2005 tt0388… Brok… nota… notalk FAIL 1.39e7 8.30e7 1.74e8 2005… 1.66e7
## # … with 90 more rows, 4 more variables: domgross_2013 <dbl>,
## # intgross_2013 <dbl>, period_code <int>, decade_code <int>, and abbreviated
## # variable names ¹clean_test, ²domgross, ³intgross, ⁴budget_2013
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
budget_2013: Budget in 2013 inflation adjusted
dollarsdomgross_2013: Domestic gross (US) in 2013 inflation
adjusted dollarsintgross_2013: Total International (i.e., worldwide)
gross in 2013 inflation adjusted dollarsAnd we’ll also use the binary and
clean_test variables for grouping.
Let’s take a look at how median budget and gross vary by whether the
movie passed the Bechdel test, which is stored in the
binary variable.
bechdel90_13 %>%
group_by(binary) %>%
summarise(
med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE)
)
## # A tibble: 2 × 4
## binary med_budget med_domgross med_intgross
## <chr> <dbl> <dbl> <dbl>
## 1 FAIL 48385984. 57318606. 104475669
## 2 PASS 31070724 45330446. 80124349
Next, let’s take a look at how median budget and gross vary by a more
detailed indicator of the Bechdel test result. This information is
stored in the clean_test variable, which takes on the
following values:
ok = passes testdubiousmen = women only talk about mennotalk = women don’t talk to each othernowomen = fewer than two womenbechdel90_13 %>%
#group_by(clean_test) %>%
summarise(
med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE)
)
## # A tibble: 1 × 3
## med_budget med_domgross med_intgross
## <int> <dbl> <dbl>
## 1 37878971 52270207 93523336
In order to evaluate how return on investment varies among movies
that pass and fail the Bechdel test, we’ll first create a new variable
called roi as the ratio of the gross to budget.
bechdel90_13 <- bechdel90_13 %>%
mutate(roi = (intgross_2013 + domgross_2013) / budget_2013)
Let’s see which movies have the highest return on investment.
bechdel90_13 %>%
arrange(desc(roi)) %>%
select(title, roi, year)
## # A tibble: 1,615 × 3
## title roi year
## <chr> <dbl> <int>
## 1 Paranormal Activity 671. 2007
## 2 The Blair Witch Project 648. 1999
## 3 El Mariachi 583. 1992
## 4 Clerks. 258. 1994
## 5 In the Company of Men 231. 1997
## 6 Napoleon Dynamite 227. 2004
## 7 Once 190. 2006
## 8 The Devil Inside 155. 2012
## 9 Primer 142. 2004
## 10 Fireproof 134. 2008
## # … with 1,605 more rows
## # ℹ Use `print(n = ...)` to see more rows
Below is a visualization of the return on investment by test result, however it’s difficult to see the distributions due to a few extreme observations.
ggplot(data = bechdel90_13,
mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(
title = "Return on investment vs. Bechdel test result",
x = "Detailed Bechdel result",
y = "Return on Investment",
color = "Binary Bechdel result"
)
What are those movies with very high returns on investment?
bechdel90_13 %>%
filter(roi > 400) %>%
select(title, budget_2013, domgross_2013, year)
## # A tibble: 3 × 4
## title budget_2013 domgross_2013 year
## <chr> <int> <dbl> <int>
## 1 Paranormal Activity 505595 121251476 2007
## 2 The Blair Witch Project 839077 196538593 1999
## 3 El Mariachi 11622 3388636 1992
Zooming in on the movies with roi < ___ provides a
better view of how the medians across the categories compare:
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(
title = "Return on investment vs. Bechdel test result",
subtitle = "Zoomed into results lower than 16", # Something about zooming in to a certain level
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result"
) +
coord_cartesian(ylim = c(0, 15))