This assignment is for ETC5521 Assignment 2 by Team Sawfish comprising of Zoljargal Batsaikhan 30392756, Prajyot Nagrale 31132324, Yin Shan Ho 31201474, and Pranali Angne 32355068.

Introduction and motivation

The Bechdel test is a technique for evaluating female character engagement in films. The Bechdel Test reveals how movies conceal gender bias. This test analyses whether the female protagonist has a serious character on the screen in comparison to the male protagonist.

Test rules: It has to have at least two named women in it (1) who talk to each other (2) about something besides a man (3) This test is done to show the positive representation of women in films.

This experiment is being carried out in order to demonstrate positive female representation in films. The Bechdel test was inspired by Alison Bechdel’s accidental creation in 1985. It became well-known thanks to the popular comic strip Dykes to Watch Out For. It was a basic test to discover if a film had any female characters in it. In other cases, an additional criterion is that the two female characters in a film are identified but solely speak about men. It was a source of concern for a few people when it was the only criterion used to assess a film’s men. When it was discovered that Twilight had passed such a test but Gravity, a brilliant film, had failed it, it was ridiculously regressive. So this topic piqued our curiosity, and hence we decided to learn more about it. Following some study, our team deliberated and decided on this issue for further investigation.

Data description

Data Reading

From the chosen data of Tidy Tuesday repository called Bechdel test, 2 data files movies.csv and raw_bechdel.csv were downloaded.

  • raw_bechdel.csv file contains 8839 rows and 5. It was originally downloaded from website on which moviegoers analyze and determine if the certain movies pass or fail the test.

  • movies.csv file contains 1794 rows and 34. This dataset comes from https://fivethirtyeight.com's article called The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women. They used the Bechdel test dataset mentioned above and added financial information from The-Numbers.com, a leading site for box office and budget data. It inventories financial information for roughly 4,500 films. The authors of the article used the intersection of The-Numbers and BechdelTest which was a set of 1,615 films released between 1990 and 2013. Considering the financial information, they adjusted all numbers for inflation, using 2013 dollars. As mentioned in the article they have analyzed the movies in the sample using the Bechdel’s three criteria.

Because the dataset is not tidy and contains a large number of N/A values. It is also found with certain undesired variables with all N/A values, as well as some variables that would be useless for the research. To conduct additional analysis, data wrangling of the dataset is required.

Data collection method

  • Rating in raw_bechdel.csv dataset was inputted by users in the website
  • Financial data in movies.csv was collected form The-Numbers.com
  • As mentioned in the article, it is assumed that the Bechdel test results in columns: test, clean_test, binary was assessed by the author’s team.

Data wrangling and tidying

The raw data summary of Bachdel test is shown below.

Summary of raw_bechdel data

Summary of raw_bechdel data

Description of the variables

In the data wrangling and cleaning section the data types were changed and the variables that are unnecessary the analysis were removed. Since the movies.csv data has more depth such as financial figures and ratings, it is decided to use only movies.csv data for further analysis. Moreover, the some of the data types have been adjusted for example, the genre, director, language, country, binary variables were separated and adjusted to facotr type. The following plot summarizes the tidy data.

Summary of movies_clean data after data wrangling

Summary of movies_clean data after data wrangling

The description and types of each column are provided below:

Variables Description Type
year The year movie was released Integer
title The title of the movie. Character
clean_test Result of the Bechdel test with 5 levels Factor
binary Binary result of the Bechdel test Factor
budget_2013 Inflation adjusted budget of the movie in 2013 dollars Numeric
domgross_2013 Inflation adjusted domestic gross of the movie in 2013 dollars Numeric
intgross_2013 Inflation adjusted international gross of the movie in 2013 dollars Numeric
imdb_id Unique ID for each movie Character
language1 The language 1 of the movie Factor
language2 The language 2 of the movie Factor
language3 The language 3 of the movie Factor
country1 The country 1 of the movie Factor
country2 The country 2 of the movie Factor
country3 The country 3 of the movie Factor
metascore Metascore of the movie Numeric
imdb_rating IMDb rating of the movie Numeric
director1 The first director of the movie Factor
director2 The second director of the movie Factor
director3 The third director of the movie Factor
genre1 The first genre 1 of the movie Factor
genre2 The first genre 2 of the movie Factor
genre3 The first genre 3 of the movie Factor
runtime The runtime of the movie in minutes Numeric
poster The http address of the poster of the movie Character
imdb_votes The number of IMDb votes of the movie Numeric

“clean_test” refers to the result of the Bechdel test with 5 levels, detailed explanations are as follow:

  • ok - passed the test
  • nowomen - there was less than 2 named women in the picture
  • notalk - the women in the picture did not talk to each other
  • men - the women in the picture only talked about men
  • dubious - the topic the conversation between the women in the picture was dubious

List of questions to be answered

  1. What is the trend of the movies passing or failing the Bechdel test from 1970 to 2013?
  2. Compare budget and return on investment of by grouping movies by Bechdel test result.
    • Is imdb_rating and return on investment correlated?
  3. What are the top regions and genres with the most gender bias based on Bechdel test?
  4. Does higher rating mean higher passing rate in the Bechdel test?
  5. Does the duration of the movies have any effect on the Bcchdel test?

Analysis and findings

Question 1

What is the trend of the movies passing or failing the Bechdel test from 1970 to 2013?

By using the Bechdel test, it is easier to access the distribution of gender bias in films from 1970 to 2013. Based on the test, it is found that 56% of the 1,794 films released between 1970 and 2013 passed the Bechdel Test. While the passing rule is that if a film features two women talking to each other about something other than men, it is considered to pass the Bechdel test.

According to the graph above,the data indicates the percentage of movies that passed and failed the Bechdel test by every five years. It is found that the passing rate of the films were consistently below 50%. However, the rate has increased in the past 3 decades that there was a peak of passed tests observed around the year 2000 with the rate slightly above 50%. Sadly, the passing rate has fallen in the preceding decade. The one possible reason for the fall might be due to the fact that the dataset includes movies until 2013. It could be a different story if the the data set increased with broader dataset until 2021.

To examine the reason for failing the test, the results of Bechdel test were further decomposed into the following five groups:

  • Passed the Bechdel test.
  • The topic the conversation between the women in the picture was dubious.
  • There were less than 2 female named characters in the movie.
  • Women in the movie don’t talk to each other.
  • Women in the movie only talk about men.

[NOTE: The passing numbers are shown in green whereas the failures are shown in reds.]

Takeaway 1: Although the female representation in movie industry is getting better over time, only about 50% of movies pass simple Bechdel test.

Question 2

Compare budget and return on investment of by grouping movies by Bechdel test result and finding if there are any correlation between IMDB rating and ROI?

The reason for asking this question is that money can have a significant impact on how well a film does. As a result, the graphs below is constructed to see if the Bechdel test had any impact on the budget or return on investment of the films. The budget and income figures have been adjusted for inflation using 2013 dollars. Then, return on investment is calculated by diving gross income by budget.

The average budget in USD in relation to the Bechdel test is depicted in the bar chart above. It is observed observe that the average budget for films that passed the Bechdel test is almost $20,000,000 less than for films that failed. This suggests that the movie industry investors invested more in the movie that has gender bias in which women are less represented.

Since more money was invested in the movies that failed the test, it is necessary to see if the investing more money converts into higher return on investment.

In Figure 6, the return on investment is plotted by the Bechdel test result over last 3 decades. The return has been log transformed since the original data had skewness. It is fascinating to see that the movies that passed the test slightly outperformed the movies that failed even though the different is low.

correlation between IMDB rating and ROI

The association between IMDB rating, and Metascore is seen in the table above. With a correlation of 0.73, it proves that IMDB rating and Metascore have a fairly strong relationship.

takeaway 2: The movie industry investors invested more in the movie that are male dominant. It is fascinating to see that the movies that passed the test slightly outperformed the movies that failed.

Question 3

What are the top regions and genres with the most gender bias based on Bechdel test?

From the original dataset, genre and country variables were selected to further investigate the Bechdel test result. The first genre and country variables of each movies in the dataset were extracted, respectively in case there are multiple.

First, the number of movies were counted by grouping genre and regions. Then, top 10 genres and regions were filtered to plot the passing rate of the movies to the test.

The bar chart above depicts the percentage of people who passed the Bechdel test in each of the different genre. Horror, comedy, and drama were the three categories with more that 50% passing rate. With 63.4 percent of the passed categories, horror had the highest proportion.

The bar graph above depicts the Bechdel Test passing rate in different regions. It is found that Spain had the highest passing rate based on the test. Whereas, Hong Kong SAR and China had a realtively low percentage of movies that had passed the Bechdel test.

takeaway 3: The horror, comedy and drama are the types of movies with relatively higher passing rate of the Bechdel Test. Moreover, female has relatively higher status in Spainish and German movies.

Question 4

Does higher rating mean higher passing rate in the Bachdel test?

The bubble chart above plots the movies that are in the top and bottom 30 of the IMDB ratings. The size of the bubble represent the budget while the color indicates the Bechdel test result that red means failure whereas green refers to passing. It is surprisingly found that the high rating movies with higher budget like The Dark Knight and Inception have failed the Bechdel test, whereas the low ratings with less budget like the Fog and Crossroads have passed the test.

takeaway 4: Movies with higher ratings and budgets are more likely to be male dominent. Whereas, female plays more important role in lower ratings and budgets movies.

Question 5

Does the duration of the movies have any effect on the Bcchdel test

Bechdal test for movies with runtime

Figure 11 has taken the longest and shortest 10 movies from the data, respectively. It indicates that Run time of the movies is not likely to be inferential to the results of Bechdel test. As from the lot above, movies with Longer run time or shorter run time are having almost the same results.

takeaway 5: The passing rate of the Bechdel tests is not likely to be affected by the run time of the movies.

Conclusion and limitations

Based on the analysis, some conclusions are drawn as follow:

Firstly, an increasing trend is observed that females are playing more important roles in the movies. However, there were only half of the movies that had passed the Bechdel Test by 2013. This indicates that women are still having lower status in the movie industry.

Moreover, it is found that the investors tend to put more money in the movie that are male dominant. Whereas, the movies that passed the test slightly outperformed the movies that failed as they have comparatively higher return rate on investment.

On the other hand, there are some genre like horror, comedy and drama movies are having higher passing rate. And movie some regions like Spain and Germany are comparatively provide more opportunities to the actresses in the movies.

Surprisingly, it is found that most of the top rated movies with more investment have failed the test. This indicates that the movie industry and the the audience expectations are more favorable to men-dominant movies. As it is also found that the movies with the lowest ratings and less investment tend to have higher passing rate to the Bechdel test.

Lastly, it’s found that the length of the movies is less likely to affect the results of the Bechdel Test which means that women are not having more opportunities even when the movies are longer.

Based on the conclusions, sexual inequality was observed in the movie industry as female are having lower status and playing less important roles in the movies.

However, all of the conclusions drawn above are based on the data up to 2013 when was around 8 years from now. It can be a different story if the data is more updated to recent years.

References