What is Simple Linear Regression?

  • Models relationship between a response variable \(Y\) and predictor \(X\)
  • Used for explanation and prediction \[ Y = \beta_0 + \beta_1 X + \varepsilon \]

Model Interpretation (Math)

\[ E(Y \mid X) = \beta_0 + \beta_1 X \] - \(\beta_0\): intercept
- \(\beta_1\): slope

Fit the Model in R (Call)

s$call
## lm(formula = mpg ~ wt, data = df)

Fit the Model in R (Residuals)

s$residuals
##           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
##          -2.2826106          -0.9197704          -2.0859521           1.2973499 
##   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
##          -0.2001440          -0.6932545          -3.9053627           4.1637381 
##            Merc 230            Merc 280           Merc 280C          Merc 450SE 
##           2.3499593           0.2998560          -1.1001440           0.8668731 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
##          -0.0502472          -1.8830236           1.1733496           2.1032876 
##   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
##           5.9810744           6.8727113           1.7461954           6.4219792 
##       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
##          -2.6110037          -2.9725862          -3.7268663          -3.4623553 
##    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
##           2.4643670           0.3564263           0.1520430           1.2010593 
##      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
##          -4.5431513          -2.7809399          -3.2053627          -1.0274952

Fit the Model in R (Coefficients)

s$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10

Fit the Model in R (Model Statistics)

c(
  r_squared = s$r.squared,
  adj_r_squared = s$adj.r.squared,
  residual_se = s$sigma,
  f_statistic = s$fstatistic[1]
)
##         r_squared     adj_r_squared       residual_se f_statistic.value 
##         0.7528328         0.7445939         3.0458821        91.3753250

ggplot: Data + Regression Line

Least Squares Estimation (Math)

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sum (x_i - \bar{x})^2} \] \[ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i \]

ggplot: Residuals vs Fitted

Hypothesis Test for the Slope

\[ H_0: \beta_1 = 0 \quad\text{vs}\quad H_A: \beta_1 \ne 0 \]

s$coefficients["wt", ]
##      Estimate    Std. Error       t value      Pr(>|t|) 
## -5.344472e+00  5.591010e-01 -9.559044e+00  1.293959e-10

Plotly 3D Visualization

Prediction Example

new_car <- data.frame(wt = 3)
predict(fit, new_car, interval = "confidence")
##        fit      lwr      upr
## 1 21.25171 20.12444 22.37899

Final Takeaways

  • Simple Linear Regression models linear relationships
  • Parameters estimated using least squares
  • ggplot provides visual diagnostics
  • Plotly enables interactive 3D visualization
  • Fully reproducible statistical analysis in R