Datadive10.Rmd

data <- read.csv ("C:\\Users\\varsh\\OneDrive\\Desktop\\Gitstuff\\age_gaps.CSV")

library(ggplot2)
library(ggthemes)
library(ggrepel)
library(boot)
library(broom)
library(lindia)

The binary column I’m selecting is encoded_gender.

data$encoded_gender <- ifelse(data$character_1_gender == "man", 0, 1)

The explanatory variables that I chose are age_difference, release_year, actor_1_age, couple_number.

model <- glm(encoded_gender ~ age_difference + release_year + actor_1_age + couple_number, data = data, family = binomial(link = 'logit'))

summary(model)

## 
## Call:
## glm(formula = encoded_gender ~ age_difference + release_year + 
##     actor_1_age + couple_number, family = binomial(link = "logit"), 
##     data = data)
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -30.999206  13.093538  -2.368   0.0179 *  
## age_difference  -0.135770   0.019006  -7.143  9.1e-13 ***
## release_year     0.015353   0.006555   2.342   0.0192 *  
## actor_1_age     -0.005824   0.010585  -0.550   0.5821    
## couple_number    0.035809   0.097700   0.367   0.7140    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1107.20  on 1154  degrees of freedom
## Residual deviance:  972.38  on 1150  degrees of freedom
## AIC: 982.38
## 
## Number of Fisher Scoring iterations: 6

Intercept (-30.999206):

This intercept indicates the log-odds of encoded_gender is one (woman) when all other predictor variables are zero. However, given how negative this value is, it is difficult to evaluate in its raw form.

Age_difference (-0.135770):

When all other variables are maintained constant, for every one-unit increase in age_difference, the log-odds of encoded_gender being 1 (woman) decrease by about 0.135770.
This shows that as the age gap between characters grows, the probability of encoded_gender being 1 (woman) decreases.

Release Year (0.015353):

When all other variables are held constant, every one-unit increase in release_year increases the log-odds of encoded_gender being 1 (woman) by about 0.015353.
This shows that movies made in recent years have a somewhat higher probability of having encoded_gender = 1 (woman).

actor_1_age is -0.005824:

When all other variables are maintained constant, for every one-unit increase in actor_1_age, the log-odds of encoded_gender being 1 (woman) decrease by about 0.005824.
This suggests that when actor 1’s age increases, the chance of encoded_gender being 1 (woman) decreases slightly.

Couple Number (0.035809):

When all other variables remain constant, for every one-unit increase in couple_number, the log-odds of encoded_gender being 1 (woman) increase by about 0.035809.
It indicates that movies with a greater couple_number (which may indicate greater emphasis on or main romantic couples) have a slightly higher probability of having encoded_gender 1 (woman).

Confidence Interval for the coefficient of the “age_difference”:

coef_age_difference <- -0.135770
se_age_difference <- 0.019006

lower_bound <- coef_age_difference - 1.96 * se_age_difference
upper_bound <- coef_age_difference + 1.96 * se_age_difference

cat("95% Confidence Interval for the coefficient of age_difference: (", round(lower_bound, 3), ",", round(upper_bound, 3), ")\n")

## 95% Confidence Interval for the coefficient of age_difference: ( -0.173 , -0.099 )

As the age difference between the characters increases by one unit, the log-odds of the encoded gender being 1 (woman) decrease by an average of 0.135 units.
We are 95% certain that a one-unit increase in age difference results in an accurate decrease in log-odds ranging from 0.098 and 0.173 units.
This means that as the age gap between characters increases, the probability of the character being encoded as a woman (1) decreases, with a 95% confidence that the reduction lies within the specified range.

Plot for Confidence Interval for the coefficient of age_difference:

confidence_interval <- data.frame(
  coefficient = "age_difference",
  estimate = coef_age_difference,
  lower = lower_bound,
  upper = upper_bound
)

ggplot(confidence_interval, aes(x = coefficient, y = estimate)) +
  geom_bar(stat = "identity", fill = "grey", width = 0.5) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, color = "black") +
  labs(title = "95% Confidence Interval for Coefficient of age_difference",
       x = "Coefficient",
       y = "Estimate") +
  coord_flip() +
  theme_minimal()

Datadive10.Rmd

2024-04-06

Confidence Interval for the coefficient of the “age_difference”:

Plot for Confidence Interval for the coefficient of age_difference: