Linear Regression: From Simple to Multiple

Overview

Goal: Understand simple and multiple linear regression through a small, real dataset.
Dataset: mtcars (built-in)
Tools:
- ggplot2 (2+ plots)
- plotly (1 interactive 3D plot)
- LaTeX math for formulas (2+ slides)
- R code included

Data at a Glance

library(ggplot2)
library(dplyr)
library(plotly)
library(magrittr)  

data(mtcars)
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Response: mpg
Predictors we’ll use: wt (weight, 1000 lbs), hp (horsepower)

Simple Linear Regression (SLR)

We model a linear relationship between a response \(y\) and one predictor \(x\): \[ y_i \;=\; \beta_0 + \beta_1 x_i + \varepsilon_i,\quad \varepsilon_i \sim \text{i.i.d. } (0,\sigma^2) \]

Interpretation: - \(\beta_0\) is the intercept (when \(x=0\)) - \(\beta_1\) is the average change in \(y\) per unit of \(x\)

ggplot #1: mpg vs wt with fitted line (green theme)

Visual: Heavier cars tend to have lower MPG.

Model Fit & R Code (SLR)

slr <- lm(mpg ~ wt, data = mtcars)
summary(slr)

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Slope (\(\hat\beta_1\)): expected change in MPG per 1000 lbs increase in weight
\(R^2\): proportion of variance in MPG explained by weight

Residual Diagnostics (ggplot #2, green theme)

Look for randomness (no pattern) around 0 → supports linearity & constant variance.

OLS Formulas (Math Slide)

Closed-form OLS estimates: \[ \hat{\beta}_1 \;=\; \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}, \qquad \hat{\beta}_0 \;=\; \bar{y} - \hat{\beta}_1 \bar{x}. \]

Residual variance estimate: \[ \hat{\sigma}^2 = \frac{1}{n-2}\sum_{i=1}^n (y_i - \hat{y}_i)^2. \]

Multiple Linear Regression (MLR)

Extend SLR to two predictors: \[ \text{mpg} = \beta_0 + \beta_1\,\text{wt} + \beta_2\,\text{hp} + \varepsilon. \]

\(\beta_1\): effect of wt holding hp fixed
\(\beta_2\): effect of hp holding wt fixed

mlr <- lm(mpg ~ wt + hp, data = mtcars)
summary(mlr)

## 
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

Plotly 3D: mpg vs wt & hp with fitted plane (interactive, green theme)

(Click and drag to rotate the 3D view.)

Inference for Coefficients (Math Slide)

For each coefficient \(\beta_j\), test \(H_0:\beta_j=0\) vs \(H_A:\beta_j\neq 0\) using \[ t = \frac{\hat{\beta}_j}{\operatorname{SE}(\hat{\beta}_j)}, \quad \text{with } \text{df} = n - p, \] where \(p\) is the number of parameters (including intercept).
p-value: probability of a \(t\)-stat at least as extreme under \(H_0\).
Small p-value → evidence that predictor contributes to explaining MPG.

Takeaways (Green Theme)

Weight (wt) alone explains a large portion of MPG variability (high \(R^2\) in SLR).
Adding horsepower (hp) refines the model; check each coefficient’s t-test and overall \(R^2\).
Always inspect residuals for patterns to validate model assumptions.