- Goal: model how car weight (wt) predicts fuel efficiency (mpg)
- We’ll cover:
- the linear regression model + assumptions
- two ggplot visualizations
- a 3D Plotly view of the loss function (SSE)
- inference: slope, p-value, confidence interval
We’ll use the built-in mtcars dataset (32 cars).
| mpg | wt | hp | |
|---|---|---|---|
| Mazda RX4 | 21.0 | 2.62 | 110 |
| Mazda RX4 Wag | 21.0 | 2.88 | 110 |
| Datsun 710 | 22.8 | 2.32 | 93 |
| Hornet 4 Drive | 21.4 | 3.21 | 110 |
| Hornet Sportabout | 18.7 | 3.44 | 175 |
| Valiant | 18.1 | 3.46 | 105 |
Interpretation: - mpg = miles per gallon (response) - wt = weight (1000 lbs) (predictor)
The simple linear regression model is:
\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \]
Assumptions (typical): - \(E(\varepsilon_i)=0\) - constant variance: \(\mathrm{Var}(\varepsilon_i)=\sigma^2\) - independent errors (often) - normal errors (mainly for inference)
What to look for: - random scatter around 0 is good - curvature/funnel shape suggests model issues
The loss we minimize in OLS is:
\[ \mathrm{SSE}(\beta_0,\beta_1)=\sum_{i=1}^n (y_i-(\beta_0+\beta_1 x_i))^2 \]
Testing if weight matters:
\[ H_0:\beta_1=0 \quad\text{vs}\quad H_a:\beta_1\neq 0 \]
Test statistic:
\[ t=\frac{\hat\beta_1 - 0}{SE(\hat\beta_1)} \]
A 95% CI for slope:
\[ \hat\beta_1 \pm t^* SE(\hat\beta_1) \]
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 37.2851 | 1.8776 | 19.8576 | 0 | 33.4505 | 41.1198 |
| wt | -5.3445 | 0.5591 | -9.5590 | 0 | -6.4863 | -4.2026 |
library(ggplot2) library(plotly) fit <- lm(mpg ~ wt, data = mtcars) # ggplot scatter + line ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_smooth(method = "lm", se = TRUE) # 3D Plotly SSE surface (outline) # (See full grid + outer() code in the Rmd)
wt and mpg