data <- read.csv ("C:\\Users\\91630\\OneDrive\\Desktop\\statistics\\age_gaps.CSV")

Null Hypothesis(HO):

The assumption that there is no discernible variation in the average release year of character 1 in films for each gender is the null hypothesis for the ANOVA test.

Anova Test:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
age_gaps_df <- read.csv("age_gaps.CSV")

age_gaps_df <- age_gaps_df %>%
  mutate(character_1_gender = ifelse(character_1_gender %in% names(table(character_1_gender))[table(character_1_gender) < 10], "Other", character_1_gender))

anova_model <- aov(release_year ~ character_1_gender, data = age_gaps_df)
summary(anova_model)
##                      Df Sum Sq Mean Sq F value   Pr(>F)    
## character_1_gender    1   4095    4095   15.48 8.83e-05 ***
## Residuals          1153 304992     265                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Thoughts:

  1. The study shows a clear link between the gender of character 1 and when movies are released, with a significant finding (p < 0.001).

  2. This suggests that the gender of characters influences the timing of movie releases, adding depth to the storytelling process.

  3. Understanding this connection can help filmmakers tailor their storytelling strategies, while movie enthusiasts gain insight into how character dynamics shape the cinematic experience.

Continuous data column that could have an impact on the response variable:

The age of the first actor in the movie is represented by the continuous explanatory variable I chose, actor_1_age_col. The age difference between the couples portrayed in the film is indicated by the response variable age_diff_column.

age_diff_column <- age_gaps_df$age_difference
actor_1_age_col <- age_gaps_df$actor_1_age

correlation_coefficient <- cor(age_diff_column, actor_1_age_col)

# Printing the correlation coefficient
print(correlation_coefficient)
## [1] 0.7039631

The first actor’s age and the age gap between spouses appear to be significantly positively correlated, as indicated by the correlation coefficient of 0.7039631 between actor_1_age_col and age_diff_column.

Regression Model:

lm_model <- lm(actor_1_age_col ~ age_difference, data = age_gaps_df)
summary(lm_model)
## 
## Call:
## lm(formula = actor_1_age_col ~ age_difference, data = age_gaps_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.372  -5.047  -0.959   3.766  36.628 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    31.64775    0.34469   91.81   <2e-16 ***
## age_difference  0.86220    0.02562   33.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.407 on 1153 degrees of freedom
## Multiple R-squared:  0.4956, Adjusted R-squared:  0.4951 
## F-statistic:  1133 on 1 and 1153 DF,  p-value: < 2.2e-16

Thoughts:

  • The actor_1_age_col coefficient reveals how much a movie’s release year shifts with each additional year in actor 1’s age.

  • For every extra year actor 1 ages, the film’s release year changes by approximately [coefficient value] years. This suggests that movies might be timed to align with the age or experience of their lead actors.

library(ggplot2)

ggplot(age_gaps_df, aes(x = actor_1_age, y = age_difference)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "Relationship between Actor 1's Age and Age Difference Between Characters",
       x = "Actor 1's Age",
       y = "Age Difference Between Characters") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

This visualization displays a scatter plot where the age of the first actor is represented on the horizontal axis (x-axis), while the vertical axis (y-axis) shows the age difference between characters in the movies they portray. The blue line across the plot represents a regression line that helps us understand how actor 1’s age relates to the age difference between characters.

By examining this plot, we can gain insights into how the age of the first actor influences the portrayal of age differences between characters in movies.

Conclusion:

This analysis sheds light on how actor 1’s age impacts the timing of movie releases. Exploring further factors like directorial style or genre preferences could deepen the understanding of movie release dynamics and audience preferences.