In this mini analysis we work with the data used in the
FiveThirtyEight story titled “The
Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. Your
task is to fill in the blanks denoted by ___.
We start with loading the packages we’ll use.
library(fivethirtyeight)
library(tidyverse)
The dataset contains information on 1794 movies released between 1970 and 2013. However we’ll focus our analysis on movies released between 1990 and 2013.
bechdel90_13 <- bechdel %>%
filter(between(year, 1990, 2013))
print(bechdel90_13)
## # A tibble: 1,615 × 15
## year imdb title test clean…¹ binary budget domgr…² intgr…³ code budge…⁴
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr> <int>
## 1 2013 tt1711… 21 &… nota… notalk FAIL 1.3 e7 2.57e7 4.22e7 2013… 1.3 e7
## 2 2012 tt1343… Dred… ok-d… ok PASS 4.5 e7 1.34e7 4.09e7 2012… 4.57e7
## 3 2013 tt2024… 12 Y… nota… notalk FAIL 2 e7 5.31e7 1.59e8 2013… 2 e7
## 4 2013 tt1272… 2 Gu… nota… notalk FAIL 6.1 e7 7.56e7 1.32e8 2013… 6.1 e7
## 5 2013 tt0453… 42 men men FAIL 4 e7 9.50e7 9.50e7 2013… 4 e7
## 6 2013 tt1335… 47 R… men men FAIL 2.25e8 3.84e7 1.46e8 2013… 2.25e8
## 7 2013 tt1606… A Go… nota… notalk FAIL 9.2 e7 6.73e7 3.04e8 2013… 9.2 e7
## 8 2013 tt2194… Abou… ok-d… ok PASS 1.2 e7 1.53e7 8.73e7 2013… 1.2 e7
## 9 2013 tt1814… Admi… ok ok PASS 1.3 e7 1.80e7 1.80e7 2013… 1.3 e7
## 10 2013 tt1815… Afte… nota… notalk FAIL 1.3 e8 6.05e7 2.44e8 2013… 1.3 e8
## # … with 1,605 more rows, 4 more variables: domgross_2013 <dbl>,
## # intgross_2013 <dbl>, period_code <int>, decade_code <int>, and abbreviated
## # variable names ¹clean_test, ²domgross, ³intgross, ⁴budget_2013
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Abraham Lincoln: Vampire Hunter is an action, fantasy, and horror film released in 2012 and evaluated on BechdelTest.com. The movie had a budget of $67,500,000, a domestic (U.S. and Canada) gross profit of $37,519,139, and an international gross profit of $115,119,139. According to BechdelTest.com, Abraham Lincoln: Vampire Hunter is dubious, meaning some contributors were skeptical about whether it passed the Bechdel test and, therefore, did not pass the Bechdel test. These statistics are consistent with the data shown in fivethirtyeight.com’s return on investment figures, proving a lower return on investment for movies that do not pass the Bechdel test.
The Bechdel test was created by cartoonist Alison Bechdel in her 1985 comic strip measuring gender bias in Hollywood films. The criteria to pass this test requires at least two named women in a picture, the named women talking to each other, and the women conversing about a subject other than a man. If a film passes the Bechdel test, the film’s female characters have the bare minimum of depth. Unfortunately, a substantial amount of U.S films fail the Bechdel test- no more than 50% of films each year have passed. There are also misconceptions about having female characters in pictures, and most of the film industry believes films with strong female characters hinder audience enjoyment. However, the data provided by BechdelTest.com proves there is a better return on investment for films passing the Bechdel test.
bechdel90_13 <- bechdel %>%
filter(between(year, 2006, 2006))
print(bechdel90_13)
## # A tibble: 90 × 15
## year imdb title test clean…¹ binary budget domgr…² intgr…³ code budge…⁴
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr> <int>
## 1 2006 tt0416… 300 nowo… nowomen FAIL 6 e7 2.11e8 4.54e8 2006… 6.93e7
## 2 2006 tt0405… A Sc… nota… notalk FAIL 2 e7 5.50e6 7.41e6 2006… 2.31e7
## 3 2006 tt0437… Akee… ok ok PASS 8 e6 1.88e7 1.90e7 2006… 9.25e6
## 4 2006 tt0429… Aqua… ok ok PASS 1.2e7 1.86e7 2.30e7 2006… 1.39e7
## 5 2006 tt0416… Band… ok ok PASS 3.5e7 NA 1.84e7 2006… 4.05e7
## 6 2006 tt0454… Blac… ok ok PASS 9 e6 1.62e7 1.62e7 2006… 1.04e7
## 7 2006 tt0450… Bloo… nota… notalk FAIL 1 e8 5.74e7 1.71e8 2006… 1.16e8
## 8 2006 tt0479… Bon … nota… notalk FAIL 8 e6 1.27e7 1.27e7 2006… 9.25e6
## 9 2006 tt0443… Bora… nota… notalk FAIL 1.8e7 1.29e8 2.62e8 2006… 2.08e7
## 10 2006 tt0470… Bug ok ok PASS 4 e6 7.01e6 7.01e6 2006… 4.62e6
## # … with 80 more rows, 4 more variables: domgross_2013 <dbl>,
## # intgross_2013 <dbl>, period_code <int>, decade_code <int>, and abbreviated
## # variable names ¹clean_test, ²domgross, ³intgross, ⁴budget_2013
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
There are 1615 such movies.
The financial variables we’ll focus on are the following:
budget_2013: Budget in 2013 inflation adjusted
dollarsdomgross_2013: Domestic gross (US) in 2013 inflation
adjusted dollarsintgross_2013: Total International (i.e., worldwide)
gross in 2013 inflation adjusted dollarsAnd we’ll also use the binary and
clean_test variables for grouping.
Let’s take a look at how median budget and gross vary by whether the
movie passed the Bechdel test, which is stored in the
binary variable.
bechdel90_13 %>%
group_by(binary) %>%
summarise(
med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE)
)
## # A tibble: 2 × 4
## binary med_budget med_domgross med_intgross
## <chr> <int> <dbl> <dbl>
## 1 FAIL 36985483 50329573 96846559
## 2 PASS 23115927 20238130 41482984
Next, let’s take a look at how median budget and gross vary by a more
detailed indicator of the Bechdel test result. This information is
stored in the clean_test variable, which takes on the
following values:
ok = passes testdubious = some BechdelTest.com contributors were
skeptical about whether the films in question passed the testmen = women only talk about mennotalk = women don’t talk to each othernowomen = fewer than two womenbechdel90_13 %>%
#group_by(med_budget) %>%
summarise(
med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE)
)
## # A tibble: 1 × 3
## med_budget med_domgross med_intgross
## <dbl> <dbl> <dbl>
## 1 30050705 39245050. 66016570.
In order to evaluate how return on investment varies among movies
that pass and fail the Bechdel test, we’ll first create a new variable
called roi as the ratio of the gross to budget.
bechdel90_13 <- bechdel90_13 %>%
mutate(roi = (intgross_2013 + domgross_2013) / budget_2013)
Let’s see which movies have the highest return on investment.
bechdel90_13 %>%
arrange(desc(roi)) %>%
select(title, roi, year)
## # A tibble: 90 × 3
## title roi year
## <chr> <dbl> <int>
## 1 Once 190. 2006
## 2 Das Leben Der Anderen 46.2 2006
## 3 Borat: Cultural Learnings of America for Make Benefit Glorious N… 21.7 2006
## 4 Little Miss Sunshine 20.1 2006
## 5 Jackass Number Two 14.4 2006
## 6 The Devil Wears Prada 12.9 2006
## 7 The Queen 12.4 2006
## 8 Ice Age: The Meltdown 11.3 2006
## 9 300 11.1 2006
## 10 Quinceanera 10.5 2006
## # … with 80 more rows
## # ℹ Use `print(n = ...)` to see more rows
Below is a visualization of the return on investment by test result, however it’s difficult to see the distributions due to a few extreme observations.
ggplot(data = bechdel90_13,
mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(
title = "Return on investment vs. Bechdel test result",
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result"
)
What are those movies with very high returns on investment?
bechdel90_13 %>%
filter(roi > 400) %>%
select(title, budget_2013, domgross_2013, year)
## # A tibble: 0 × 4
## # … with 4 variables: title <chr>, budget_2013 <int>, domgross_2013 <dbl>,
## # year <int>
## # ℹ Use `colnames()` to see all variable names
print(bechdel90_13)
## # A tibble: 90 × 16
## year imdb title test clean…¹ binary budget domgr…² intgr…³ code budge…⁴
## <int> <chr> <chr> <chr> <ord> <chr> <int> <dbl> <dbl> <chr> <int>
## 1 2006 tt0416… 300 nowo… nowomen FAIL 6 e7 2.11e8 4.54e8 2006… 6.93e7
## 2 2006 tt0405… A Sc… nota… notalk FAIL 2 e7 5.50e6 7.41e6 2006… 2.31e7
## 3 2006 tt0437… Akee… ok ok PASS 8 e6 1.88e7 1.90e7 2006… 9.25e6
## 4 2006 tt0429… Aqua… ok ok PASS 1.2e7 1.86e7 2.30e7 2006… 1.39e7
## 5 2006 tt0416… Band… ok ok PASS 3.5e7 NA 1.84e7 2006… 4.05e7
## 6 2006 tt0454… Blac… ok ok PASS 9 e6 1.62e7 1.62e7 2006… 1.04e7
## 7 2006 tt0450… Bloo… nota… notalk FAIL 1 e8 5.74e7 1.71e8 2006… 1.16e8
## 8 2006 tt0479… Bon … nota… notalk FAIL 8 e6 1.27e7 1.27e7 2006… 9.25e6
## 9 2006 tt0443… Bora… nota… notalk FAIL 1.8e7 1.29e8 2.62e8 2006… 2.08e7
## 10 2006 tt0470… Bug ok ok PASS 4 e6 7.01e6 7.01e6 2006… 4.62e6
## # … with 80 more rows, 5 more variables: domgross_2013 <dbl>,
## # intgross_2013 <dbl>, period_code <int>, decade_code <int>, roi <dbl>, and
## # abbreviated variable names ¹clean_test, ²domgross, ³intgross, ⁴budget_2013
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Zooming in on the movies with roi < ___ provides a
better view of how the medians across the categories compare:
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(
title = "Return on investment vs. Bechdel test result",
subtitle = "Zoomed into 16 or less", # Something about zooming in to a certain level
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result"
) +
coord_cartesian(ylim = c(0, 15))