data <- read.csv ("C:\\Users\\varsh\\OneDrive\\Desktop\\Gitstuff\\age_gaps.CSV")
The “age_difference” column represents the age difference between the two actors playing the couple in each movie. Age difference could be an interesting aspect to explore in terms of the dynamics and portrayal of relationships in film. As a result, I will use the “age_difference” column as my response variable.
The “character_1_gender” column could be used as a categorical column to impact the age difference between the couples in movies.
There is no significant difference in the age difference between main couples in movies based on the gender of the first character.
character_1_gender_col <- data$character_1_gender
age_difference_col <- data$age_difference
unique_values <- unique(character_1_gender_col)
cat("Unique Values in character_1_gender: \n")
## Unique Values in character_1_gender:
cat(unique_values)
## woman man
data$age_difference <- as.numeric(data$age_difference)
anova_result <- aov(age_difference ~ character_1_gender, data=data)
cat("Summary of Results: \n")
## Summary of Results:
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## character_1_gender 1 6849 6849 102.9 <2e-16 ***
## Residuals 1153 76745 67
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The output of the ANOVA test will provide an F-statistic and a p-value. The F-Score obtained from the ANOVA test is 102.9, indicating that there is significantly more variance in the age_difference between different categories of character_1_gender than there is within each category. This implies that the gender of the first character (character_1_gender) has a significant impact on the age gap between couples.
We reject the null hypothesis because the p-value for the F-statistic is extremely small (p < 2e-16), which is quite lower than the standard 0.05 significance level. This indicates a significant difference in the mean age gap between couples based on the first character’s gender. In other words, the first character’s gender has a statistically significant effect on the age difference between couples in movies.
In conclusion, based on the ANOVA test results, we can conclude that the gender of the first character has a significant effect on the age difference between couples portrayed in movies. The high F-Score and low p-value give strong reasons for this conclusion.
The continuous explanatory variable I selected is actor_1_age_col, this column represents the age of the first actor in the film. The response variable is age_difference_col, which represents the age difference between the couples represented in the movie.
age_difference_col <- data$age_difference
actor_1_age_col <- data$actor_1_age
correlation_coefficient <- cor(age_difference_col, actor_1_age_col)
print(correlation_coefficient)
## [1] 0.7039631
The correlation coefficient between actor_1_age_col and age_difference_col is 0.704, suggesting a significant positive relationship between the first actor’s age and the age difference between couples.
model <- lm(actor_1_age_col ~ age_difference_col, data=data)
summary(model)
##
## Call:
## lm(formula = actor_1_age_col ~ age_difference_col, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.372 -5.047 -0.959 3.766 36.628
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.64775 0.34469 91.81 <2e-16 ***
## age_difference_col 0.86220 0.02562 33.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.407 on 1153 degrees of freedom
## Multiple R-squared: 0.4956, Adjusted R-squared: 0.4951
## F-statistic: 1133 on 1 and 1153 DF, p-value: < 2.2e-16
When the age_difference_col is zero, the intercept represents the expected actor_1_age_col value. In this case, it denotes the predicted age of the first actor when the age gap between couples is zero. However, this interpretation may be irrelevant in reality because a zero age difference is unusual in reality.
The coefficient estimate of 0.862 indicates that for every one-unit increase in the age gap between couples, the age of the first actor tends to rise by about 0.862 units. The coefficient is statistically significant (p < 0.001), demonstrating a strong linear relationship between age difference and first actor’s age.
The R-squared value of 0.496 suggests that the linear relationship between the first actor’s age (actor_1_age_col) and the age difference between spouses (age_difference_col) can explain approximately 49.6% of the variability.
The F-statistic of 1133 and low p-value (< 2.2e-16) indicate statistical significance in the whole model, showing that the link between age_difference_col and actor_1_age_col is not random chance.
Overall, the linear regression model provides insight into the
relationship between the age gap between couples and the age of the
first actor in films, providing guidance for casting decisions and
improving the representation of couples on screen.