data <- read.csv ("C:\\Users\\varsh\\OneDrive\\Desktop\\Gitstuff\\age_gaps.CSV")
library(ggplot2)
library(ggthemes)
library(ggrepel)
library(boot)
library(broom)
library(lindia)
model <- lm(actor_1_age ~ age_difference + release_year + couple_number, data = data)
summary(model)
## 
## Call:
## lm(formula = actor_1_age ~ age_difference + release_year + couple_number, 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.723  -4.943  -0.566   3.712  36.430 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -141.64135   26.70573  -5.304 1.36e-07 ***
## age_difference    0.92008    0.02640  34.857  < 2e-16 ***
## release_year      0.08553    0.01331   6.425 1.93e-10 ***
## couple_number     1.11478    0.29163   3.823 0.000139 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.243 on 1151 degrees of freedom
## Multiple R-squared:  0.5185, Adjusted R-squared:  0.5172 
## F-statistic: 413.1 on 3 and 1151 DF,  p-value: < 2.2e-16

Conclusion-

  1. Age Difference:

    There is a significant positive relationship between the age difference between romantic couples and the age of actor_1 in movies.

    Actor_1’s age increases by about 0.92 units for every one-unit variance in age (p < 2e-16).

  2. Movie Release Year:

    Newer movies tend to feature older actors, as seen by a positive coefficient of around 0.0855 for each one-unit increase in release year (p = 1.93e-10).

  3. Number of couples:

    The presence of more couples in the film is associated with older actors, with actor_1’s age increasing by about 1.11 units for each new couple (p = 0.000139).

  4. Model Fit:
    The regression model explains approximately 51.72% of the variation in actor_1 age, as displayed by the corrected R-squared value.

    The model accurately predicts actor_1 age, as demonstrated by the extremely significant F-statistic (p < 2.2e-16).

Residual Analysis:

The residuals appear to be relatively normally distributed, suggesting that the model accurately represents the variation in actor_1 age. And the residual standard error is 7.243

Visualizations-

1. Residuals vs. Fitted Values

gg_resfitted(model) +
  geom_smooth(se=FALSE)
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

2. Residuals vs. X Values

residual_plots <- gg_resX(model)

3. Residuals Histogram

gg_reshist(model)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

4. QQ-Plots

gg_qqplot(model)

5. Cook’s Distance Plot

gg_cooksd(model, threshold = 'matlab')