- Goal: Understand simple and multiple linear regression through a small, real dataset.
- Dataset:
mtcars(built-in) - Tools:
- ggplot2 (2+ plots)
- plotly (1 interactive 3D plot)
- LaTeX math for formulas (2+ slides)
- R code included
mtcars (built-in)library(ggplot2) library(dplyr) library(plotly) library(magrittr) data(mtcars) head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mpgwt (weight, 1000 lbs), hp (horsepower)We model a linear relationship between a response \(y\) and one predictor \(x\): \[ y_i \;=\; \beta_0 + \beta_1 x_i + \varepsilon_i,\quad \varepsilon_i \sim \text{i.i.d. } (0,\sigma^2) \]
Interpretation: - \(\beta_0\) is the intercept (when \(x=0\)) - \(\beta_1\) is the average change in \(y\) per unit of \(x\)
Visual: Heavier cars tend to have lower MPG.
slr <- lm(mpg ~ wt, data = mtcars) summary(slr)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Closed-form OLS estimates: \[ \hat{\beta}_1 \;=\; \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}, \qquad \hat{\beta}_0 \;=\; \bar{y} - \hat{\beta}_1 \bar{x}. \]
Residual variance estimate: \[ \hat{\sigma}^2 = \frac{1}{n-2}\sum_{i=1}^n (y_i - \hat{y}_i)^2. \]
Extend SLR to two predictors: \[ \text{mpg} = \beta_0 + \beta_1\,\text{wt} + \beta_2\,\text{hp} + \varepsilon. \]
wt holding hp fixedhp holding wt fixedmlr <- lm(mpg ~ wt + hp, data = mtcars) summary(mlr)
## ## Call: ## lm(formula = mpg ~ wt + hp, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.941 -1.600 -0.182 1.050 5.854 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.22727 1.59879 23.285 < 2e-16 *** ## wt -3.87783 0.63273 -6.129 1.12e-06 *** ## hp -0.03177 0.00903 -3.519 0.00145 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.593 on 29 degrees of freedom ## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148 ## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
(Click and drag to rotate the 3D view.)
For each coefficient \(\beta_j\), test \(H_0:\beta_j=0\) vs \(H_A:\beta_j\neq 0\) using \[
t = \frac{\hat{\beta}_j}{\operatorname{SE}(\hat{\beta}_j)},
\quad \text{with } \text{df} = n - p,
\] where \(p\) is the number of parameters (including intercept).
p-value: probability of a \(t\)-stat at least as extreme under \(H_0\).
Small p-value → evidence that predictor contributes to explaining MPG.
wt) alone explains a large portion of MPG variability (high \(R^2\) in SLR).hp) refines the model; check each coefficient’s t-test and overall \(R^2\).