Select an interesting binary column of data, or one which can be reasonably converted into a binary variable
bechdel_data_movies$binary_num <- ifelse(bechdel_data_movies$binary == 'PASS', 1, 0)
binary_model <- glm(binary_num ~ budget_2013, data = bechdel_data_movies, family = binomial(link = 'logit'))
binary_model$coefficients
## (Intercept) budget_2013
## 1.113148e-01 -5.972374e-09
Interpret the coefficients, and explain what they mean in your notebook:
The intercept means that the probability of the movie passing the bechdel test when the budget is zero is about \(52.77\%\), which is the baseline probability for a movie passing the bechdel test in the data set. As the budget increases by a dollar, the odds of the movie passing the bechdel test decreases by about \(5.97 \cdot 10^{-9}\) percent. This is significant because this means that the lowest budget movies is where the movie is most likely to pass the bechdel test and as the budget increases, the probability of passing the bechdel test decreases. This matters because it shows that movies that pass the bechdel test are not funded (via budgets) to the same extent that movies that fail the test are, which impacts profitability/gross income. I’d still like to know how genre impacts these decision and whether they correlate.
Using the Standard Error for at least one coefficient, build a C.I. for that coefficient, and translate its meaning
summary(binary_model)
##
## Call:
## glm(formula = binary_num ~ budget_2013, family = binomial(link = "logit"),
## data = bechdel_data_movies)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.113e-01 6.897e-02 1.614 0.107
## budget_2013 -5.972e-09 9.556e-10 -6.250 4.11e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2467.3 on 1793 degrees of freedom
## Residual deviance: 2424.7 on 1792 degrees of freedom
## AIC: 2428.7
##
## Number of Fisher Scoring iterations: 4
The standard error is 0.0689660917706.
standard_error_binary <- 0.0689660917706
confint.default(binary_model)
## 2.5 % 97.5 %
## (Intercept) -2.385625e-02 2.464859e-01
## budget_2013 -7.845359e-09 -4.099389e-09
The standard error states that with a confidence of 95%, it is likely that the intercept of the true relationship of the budget and the movie passing the bechdel test is between -0.024 and 0.246. The rate at which the pass probability decreases by a small amount per dollar
Because the per dollar amount is so small, we can also look at the change per million dollars, which is a decrease of .4 to .8 percent.
Because the intercept crosses over zero, this means that the model says the baseline probability of passing the bechdel test is around 50%, which aligns with the model’s intercept. The change in passing as the budget increases is reliably small and a decrease, which also matches the model. The model and the CI match, so I genuinely have no further questions.
For each of the above tasks, you must explain to the reader what insight was gathered, its significance, and any further questions you have which might need to be further investigated.