2026-03-07

Simple Linear Regression

Predicting Movie Rating from Length

Randy Mattoka
March 7, 2026

Motivation

  • Can movie length predict audience ratings?
  • We use the movies dataset from ggplot2movies.
  • Goal: Fit and interpret a linear regression model.

The Linear Regression Model

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

Where:

  • \(Y\) = movie rating
  • \(X\) = movie length
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\varepsilon \sim N(0,\sigma^2)\)

Least Squares Estimators

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sum (x_i - \bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

These minimize: \[ \sum (y_i - \hat{y}_i)^2 \]

Data + Model Fit (Show R Code)

lm1 = lm(rating ~ length, data = movies)

b0 =coef(lm1)[1]
b1 = coef(lm1)[2]
r2 = summary(lm1)$r.squared
p = summary(lm1)$coefficients[2,4]

round(c(Intercept=b0, Slope=b1, R2=r2, p_value=p), 6)
Intercept.(Intercept)          Slope.length                    R2 
             6.021471             -0.001076              0.000945 
              p_value 
             0.000000 

ggplot 1 (Scatter + Regression Line)

`geom_smooth()` using formula = 'y ~ x'

ggplot 2 (Residual Plot)

Plotly Interactive Plot

Conclusion: Length has a statistically significant but practically negligible effect on rating (R² ≈ 0.0009).