Cameron Faulks s3240003
Rpubs link comes here: https://rpubs.com/s3240003/bechdel
Disproportionate gender representation in the film industry is an interesting issue to many.
The Bechdel test is a tool to measure the representation of women in film. A film is said to pass the test when the following criteria are met:
This study will investigate the association of the gender of a film’s writers, and the film’s score on the Bechdel test.
Is a film’s result on the Bechdel test associated with the gender of the film’s writers?
A Chi-Squared test of association will be used to test this association.
In order to address the question posed by this study, a number of data sources were utilised. Data was sourced from the following:
Gender data for writers and directors is not included within publicly available IMDb datasets. Due to the number of writers credited in the assessed films, it was not possible to individually check and assign a gender to each person. genderize.io was used to predict the gender of each writer, based on their first name, as extracted from IMDb (Adapted from FiveThirtyEight, 2017, The Next Bechdel Test)
Each film in the Bechdel dataset has a score of between 0 and 3, depending on the number of Bechdel test criteria that it met. A film was only considered to Pass the Bechdel test if it passed each of the 3 rules set out by the test. All films scoring less than 3 were deemed to Fail.
First names of writers were extracted and processed through genderize.io, the predicted gender and corresponding probability associated with that prediction were joined with the Bechdel dataset.
If the gender of a writer could not be predicted, or a gender was predicted with a probability of less than 90% (as output by genderize.io), the observation was filtered out of the dataset.
A film was considered to have female representation if at least one writer that worked on that film was female. All remaining Films that did not meet this criteria were considered to only have Male representation.
The resulting data set contained 7835 unique films, each with a Bechdel test result (Pass/Fail) and an assessment of the Writer gender representation (Male/Female)
The following variables are of importance to this study:
In this investigation each observation represents a unique film, each with a determined gender representation (Male/Female) and a binary result on the Bechdel test (PASS/FAIL)
Observations with questionable values, such as those with genders predicted with a probability of correctness being less than 90%, were filtered out of the data set.
It can be observed that a higher proportion of films that pass the Bechdel test appear to have female representation among its writers compared to male representation. See the next slide for this visualised on a bar chart.
bechdel_complete <- read.csv("Data/bechdel_complete.csv", stringsAsFactors = TRUE)
barplot(table(bechdel_complete$representation,bechdel_complete$binary)%>%prop.table(margin = 2),ylab="Proportion Within Group",
ylim=c(0,1),legend=rownames(table(bechdel_complete$representation,bechdel_complete$binary)%>%prop.table(margin = 2)),beside=TRUE,
args.legend=c(x = "top",horiz=TRUE,title="Representation"),
xlab="Representation")A summary of the data is presented in the tables below. These tables include a raw count of all observations:
t1 <- table(bechdel_complete$representation,bechdel_complete$binary)
t2 <- table(bechdel_complete$representation,bechdel_complete$binary)%>%prop.table(margin = 2)
knitr::kable(t1)| FAIL | PASS | |
|---|---|---|
| Female | 484 | 1589 |
| Male | 2855 | 2907 |
and also the proportion of films with Male and Female representation among writers, by Bechdel score:
| FAIL | PASS | |
|---|---|---|
| Female | 0.14 | 0.35 |
| Male | 0.86 | 0.65 |
A Chi-squared test of association was used to test for a statistically significant association between the gender of a films writers and the films result on the Bechdel test.
It was assumed that if a film had at least one female writer, that constituted female representation.
This statistical test assumes that no more than 25% of the cells have expected counts below 5. This assumption was met.
\(H_0\): There is no association in the population between the gender of a films writers and the films result on the Bechdel test.
\(H_A\): There is an association in the population between the gender of a films writers and the films result on the Bechdel test.
To test \(H_0\), the Chi-square statistic (\(\chi^2\))is calculated, where \(O_{ij}\) is the observed count in the \(i^{th}\) row of the \(j^{th}\) column and \(E_{ij}\) is the expected count assuming no association.
\(r_{i}\) refers to the total count of the \(i^{th}\) row and \(c_{j}\) is the total count of the \(j^{th}\) column. \(n\) is the total number of observations.
\[\chi^2 = \sum(O_{ij} - E_{ij})^2/E_{ij} \]
\[E_{ij} = n(r_{i}/n)(c_{j}/n)\] These values can be computed by R…
Observed Values:
chi <- chisq.test(table(bechdel_complete$binary, bechdel_complete$representation))
knitr::kable(chi$observed, digits = 2)| Female | Male | |
|---|---|---|
| FAIL | 484 | 2855 |
| PASS | 1589 | 2907 |
Expected Values:
| Female | Male | |
|---|---|---|
| FAIL | 883.44 | 2455.56 |
| PASS | 1189.56 | 3306.44 |
These expected values can now be used to calculate the test statistic \(\chi^2\)…
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(bechdel_complete$binary, bechdel_complete$representation)
## X-squared = 426.89, df = 1, p-value < 2.2e-16
This \(\chi^2\) value of 426.89 is compared against the critical value, \(\chi^2_{crit}\):
## [1] 3.841459
\(p\) value:
## [1] 7.725228e-95
\(H_0\) can be rejected as \(\chi^2 >\chi^2_{crit}\) (426.89 > 3.84) and the \(p\) value of <0.001 is below the 0.05 level of significance.
A Chi-squared test of association was used to test for a statistically significant association between the gender of a films writers and the films result on the Bechdel test. The results of this test found a statistically significant association, \(\chi^2 = 426.89, p < .001\). The results of this study suggest that films that had female representation among its writers was more likely to pass the bechdel test when compared to those films without female representation.
This suggests that if at least one female is on a film’s writing team, that film is more likely to achieve a basic standard of female representation in film as set out by the rules of the Bechdel test, compared to those films that have no female writers.
There are two major limitations with this study:
Future studies should consider: