2026-03-06

Simple Linear Regression Definition:

It is a statistical method used to model the linear relationship between one predictor variable and one response variable.

The formula is:

\[y = \beta_0 + \beta_1 x + \varepsilon\]

  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\varepsilon\) = error term

Dataset: mtcars

The data set contains 32 automobiles from 1973-1974 Motor Trend magazine.

  • Variable (y): mpg
  • Variable (x): wt
  • Third variable: cyl

Scatter Plot with Regression Line

Residuals Plot

Plotly Plot

Fitting the Simple Model

# Simple model: weight 
simple_model <- lm(mpg ~ wt, data = mtcars)

# Multiple model: weight + horsepower
multi_model <- lm(mpg ~ wt + hp, data = mtcars)

Comparing Models: Simple vs Multiple

summary(simple_model)$r.squared
## [1] 0.7528328
summary(multi_model)$r.squared
## [1] 0.8267855
anova(simple_model, multi_model)
## Analysis of Variance Table
## 
## Model 1: mpg ~ wt
## Model 2: mpg ~ wt + hp
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1     30 278.32                                
## 2     29 195.05  1    83.274 12.381 0.001451 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

The simple model uses only weight:

\[\hat{y} = 37.29 - 5.34 \cdot wt\]

The multiple model adds horsepower:

\[\hat{y} = 37.23 - 3.88 \cdot wt - 0.032 \cdot hp\]

  • Adding horsepower improves \(R^2\) from 0.753 to 0.827
  • ANOVA confirms this improvement is statistically significant (p < 0.001)