In this mini analysis we work with the data used in the FiveThirtyEight story titled “The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. Your task is to fill in the blanks denoted by ___.

Data and packages

We start with loading the packages we’ll use.

library(fivethirtyeight)
library(tidyverse)

The dataset contains information on 1794 movies released between 1970 and 2013. However we’ll focus our analysis on movies released between 1990 and 2013.

bechdel90_13 <- bechdel %>% 
  filter(between(year, 1990, 2013))
print(bechdel90_13)
## # A tibble: 1,615 × 15
##     year imdb    title test  clean…¹ binary budget domgr…² intgr…³ code  budge…⁴
##    <int> <chr>   <chr> <chr> <ord>   <chr>   <int>   <dbl>   <dbl> <chr>   <int>
##  1  2013 tt1711… 21 &… nota… notalk  FAIL   1.3 e7  2.57e7  4.22e7 2013…  1.3 e7
##  2  2012 tt1343… Dred… ok-d… ok      PASS   4.5 e7  1.34e7  4.09e7 2012…  4.57e7
##  3  2013 tt2024… 12 Y… nota… notalk  FAIL   2   e7  5.31e7  1.59e8 2013…  2   e7
##  4  2013 tt1272… 2 Gu… nota… notalk  FAIL   6.1 e7  7.56e7  1.32e8 2013…  6.1 e7
##  5  2013 tt0453… 42    men   men     FAIL   4   e7  9.50e7  9.50e7 2013…  4   e7
##  6  2013 tt1335… 47 R… men   men     FAIL   2.25e8  3.84e7  1.46e8 2013…  2.25e8
##  7  2013 tt1606… A Go… nota… notalk  FAIL   9.2 e7  6.73e7  3.04e8 2013…  9.2 e7
##  8  2013 tt2194… Abou… ok-d… ok      PASS   1.2 e7  1.53e7  8.73e7 2013…  1.2 e7
##  9  2013 tt1814… Admi… ok    ok      PASS   1.3 e7  1.80e7  1.80e7 2013…  1.3 e7
## 10  2013 tt1815… Afte… nota… notalk  FAIL   1.3 e8  6.05e7  2.44e8 2013…  1.3 e8
## # … with 1,605 more rows, 4 more variables: domgross_2013 <dbl>,
## #   intgross_2013 <dbl>, period_code <int>, decade_code <int>, and abbreviated
## #   variable names ¹​clean_test, ²​domgross, ³​intgross, ⁴​budget_2013
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Summary of 2012 Abraham Lincoln: Vampire Slayer

Abraham Lincoln: Vampire Hunter is an action, fantasy, and horror film released in 2012 and evaluated on BechdelTest.com. The movie had a budget of $67,500,000, a domestic (U.S. and Canada) gross profit of $37,519,139, and an international gross profit of $115,119,139. According to BechdelTest.com, Abraham Lincoln: Vampire Hunter is dubious, meaning some contributors were skeptical about whether it passed the Bechdel test and, therefore, did not pass the Bechdel test. These statistics are consistent with the data shown in fivethirtyeight.com’s return on investment figures, proving a lower return on investment for movies that do not pass the Bechdel test.

Summary of Bechdel Article

The Bechdel test was created by cartoonist Alison Bechdel in her 1985 comic strip measuring gender bias in Hollywood films. The criteria to pass this test requires at least two named women in a picture, the named women talking to each other, and the women conversing about a subject other than a man. If a film passes the Bechdel test, the film’s female characters have the bare minimum of depth. Unfortunately, a substantial amount of U.S films fail the Bechdel test- no more than 50% of films each year have passed. There are also misconceptions about having female characters in pictures, and most of the film industry believes films with strong female characters hinder audience enjoyment. However, the data provided by BechdelTest.com proves there is a better return on investment for films passing the Bechdel test.

bechdel90_13 <- bechdel %>% 
  filter(between(year, 2006, 2006))
print(bechdel90_13)
## # A tibble: 90 × 15
##     year imdb    title test  clean…¹ binary budget domgr…² intgr…³ code  budge…⁴
##    <int> <chr>   <chr> <chr> <ord>   <chr>   <int>   <dbl>   <dbl> <chr>   <int>
##  1  2006 tt0416… 300   nowo… nowomen FAIL    6  e7  2.11e8  4.54e8 2006…  6.93e7
##  2  2006 tt0405… A Sc… nota… notalk  FAIL    2  e7  5.50e6  7.41e6 2006…  2.31e7
##  3  2006 tt0437… Akee… ok    ok      PASS    8  e6  1.88e7  1.90e7 2006…  9.25e6
##  4  2006 tt0429… Aqua… ok    ok      PASS    1.2e7  1.86e7  2.30e7 2006…  1.39e7
##  5  2006 tt0416… Band… ok    ok      PASS    3.5e7 NA       1.84e7 2006…  4.05e7
##  6  2006 tt0454… Blac… ok    ok      PASS    9  e6  1.62e7  1.62e7 2006…  1.04e7
##  7  2006 tt0450… Bloo… nota… notalk  FAIL    1  e8  5.74e7  1.71e8 2006…  1.16e8
##  8  2006 tt0479… Bon … nota… notalk  FAIL    8  e6  1.27e7  1.27e7 2006…  9.25e6
##  9  2006 tt0443… Bora… nota… notalk  FAIL    1.8e7  1.29e8  2.62e8 2006…  2.08e7
## 10  2006 tt0470… Bug   ok    ok      PASS    4  e6  7.01e6  7.01e6 2006…  4.62e6
## # … with 80 more rows, 4 more variables: domgross_2013 <dbl>,
## #   intgross_2013 <dbl>, period_code <int>, decade_code <int>, and abbreviated
## #   variable names ¹​clean_test, ²​domgross, ³​intgross, ⁴​budget_2013
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

There are 1615 such movies.

The financial variables we’ll focus on are the following:

And we’ll also use the binary and clean_test variables for grouping.

Analysis

Let’s take a look at how median budget and gross vary by whether the movie passed the Bechdel test, which is stored in the binary variable.

bechdel90_13 %>%
  group_by(binary) %>%
  summarise(
    med_budget = median(budget_2013),
    med_domgross = median(domgross_2013, na.rm = TRUE),
    med_intgross = median(intgross_2013, na.rm = TRUE)
    )
## # A tibble: 2 × 4
##   binary med_budget med_domgross med_intgross
##   <chr>       <int>        <dbl>        <dbl>
## 1 FAIL     36985483     50329573     96846559
## 2 PASS     23115927     20238130     41482984

Next, let’s take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result. This information is stored in the clean_test variable, which takes on the following values:

bechdel90_13 %>%
  #group_by(med_budget) %>%
  summarise(
    med_budget = median(budget_2013),
    med_domgross = median(domgross_2013, na.rm = TRUE),
    med_intgross = median(intgross_2013, na.rm = TRUE)
    )
## # A tibble: 1 × 3
##   med_budget med_domgross med_intgross
##        <dbl>        <dbl>        <dbl>
## 1   30050705    39245050.    66016570.

In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we’ll first create a new variable called roi as the ratio of the gross to budget.

bechdel90_13 <- bechdel90_13 %>%
  mutate(roi = (intgross_2013 + domgross_2013) / budget_2013)

Let’s see which movies have the highest return on investment.

bechdel90_13 %>%
  arrange(desc(roi)) %>% 
  select(title, roi, year)
## # A tibble: 90 × 3
##    title                                                               roi  year
##    <chr>                                                             <dbl> <int>
##  1 Once                                                              190.   2006
##  2 Das Leben Der Anderen                                              46.2  2006
##  3 Borat: Cultural Learnings of America for Make Benefit Glorious N…  21.7  2006
##  4 Little Miss Sunshine                                               20.1  2006
##  5 Jackass Number Two                                                 14.4  2006
##  6 The Devil Wears Prada                                              12.9  2006
##  7 The Queen                                                          12.4  2006
##  8 Ice Age: The Meltdown                                              11.3  2006
##  9 300                                                                11.1  2006
## 10 Quinceanera                                                        10.5  2006
## # … with 80 more rows
## # ℹ Use `print(n = ...)` to see more rows

Below is a visualization of the return on investment by test result, however it’s difficult to see the distributions due to a few extreme observations.

ggplot(data = bechdel90_13, 
       mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  labs(
    title = "Return on investment vs. Bechdel test result",
    x = "Detailed Bechdel result",
    y = "Return on investment",
    color = "Binary Bechdel result"
    )

What are those movies with very high returns on investment?

bechdel90_13 %>%
  filter(roi > 400) %>%
  select(title, budget_2013, domgross_2013, year)
## # A tibble: 0 × 4
## # … with 4 variables: title <chr>, budget_2013 <int>, domgross_2013 <dbl>,
## #   year <int>
## # ℹ Use `colnames()` to see all variable names
print(bechdel90_13)
## # A tibble: 90 × 16
##     year imdb    title test  clean…¹ binary budget domgr…² intgr…³ code  budge…⁴
##    <int> <chr>   <chr> <chr> <ord>   <chr>   <int>   <dbl>   <dbl> <chr>   <int>
##  1  2006 tt0416… 300   nowo… nowomen FAIL    6  e7  2.11e8  4.54e8 2006…  6.93e7
##  2  2006 tt0405… A Sc… nota… notalk  FAIL    2  e7  5.50e6  7.41e6 2006…  2.31e7
##  3  2006 tt0437… Akee… ok    ok      PASS    8  e6  1.88e7  1.90e7 2006…  9.25e6
##  4  2006 tt0429… Aqua… ok    ok      PASS    1.2e7  1.86e7  2.30e7 2006…  1.39e7
##  5  2006 tt0416… Band… ok    ok      PASS    3.5e7 NA       1.84e7 2006…  4.05e7
##  6  2006 tt0454… Blac… ok    ok      PASS    9  e6  1.62e7  1.62e7 2006…  1.04e7
##  7  2006 tt0450… Bloo… nota… notalk  FAIL    1  e8  5.74e7  1.71e8 2006…  1.16e8
##  8  2006 tt0479… Bon … nota… notalk  FAIL    8  e6  1.27e7  1.27e7 2006…  9.25e6
##  9  2006 tt0443… Bora… nota… notalk  FAIL    1.8e7  1.29e8  2.62e8 2006…  2.08e7
## 10  2006 tt0470… Bug   ok    ok      PASS    4  e6  7.01e6  7.01e6 2006…  4.62e6
## # … with 80 more rows, 5 more variables: domgross_2013 <dbl>,
## #   intgross_2013 <dbl>, period_code <int>, decade_code <int>, roi <dbl>, and
## #   abbreviated variable names ¹​clean_test, ²​domgross, ³​intgross, ⁴​budget_2013
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Zooming in on the movies with roi < ___ provides a better view of how the medians across the categories compare:

ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  labs(
    title = "Return on investment vs. Bechdel test result",
    subtitle = "Zoomed into 16 or less", # Something about zooming in to a certain level
    x = "Detailed Bechdel result",
    y = "Return on investment",
    color = "Binary Bechdel result"
    ) +
  coord_cartesian(ylim = c(0, 15))