- What is linear regression?
- Estimating the line/plane
- Interpreting coefficients
- Diagnostics
- Inference (t-tests & CIs)
- 3D interactive view (plotly)
- Code & reproducibility
- Prediction example
For simple regression: \[ y_i = \beta_0 + \beta_1 x_{1i} + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0,\sigma^2). \] For two predictors: \[ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \varepsilon_i. \]
ggplot(dat, aes(x = x1, y = y)) + geom_point(alpha = 0.7) + geom_smooth(method = "lm", se = TRUE) + labs(x = "x1", y = "y", title = "Simple Linear Regression: y ~ x1")
aug <- augment(m1) ggplot(aug, aes(.fitted, .resid)) + geom_hline(yintercept = 0, linewidth = 0.6) + geom_point(alpha = 0.7) + labs(x = "Fitted", y = "Residual", title = "Residuals vs Fitted (y ~ x1)")
OLS estimates: \[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1}\mathbf{X}^\top \mathbf{y}, \quad \widehat{\sigma}^2 = \frac{\text{SSE}}{n - p}. \] For \(H_0:\beta_j=0\): \[ t = \frac{\hat\beta_j}{SE(\hat\beta_j)} \sim t_{n-p},\quad \hat\beta_j \pm t_{1-\alpha/2,\,n-p}\,SE(\hat\beta_j). \]
# Fit models m1 <- lm(y ~ x1, data = dat) m2 <- lm(y ~ x1 + x2, data = dat) # Grid for plane rng1 <- range(dat$x1); rng2 <- range(dat$x2) g1 <- seq(rng1[1], rng1[2], length.out = 30) g2 <- seq(rng2[1], rng2[2], length.out = 30) surf <- expand.grid(x1 = g1, x2 = g2) b <- coef(m2) surf$yhat <- b[1] + b[2]*surf$x1 + b[3]*surf$x2
plot_ly(height = 480, width = 740) |>
add_markers(data = dat, x = ~x1, y = ~x2, z = ~y,
name = "data", opacity = 0.6, marker = list(size = 3)) |>
add_surface(x = ~g1, y = ~g2,
z = ~matrix(surf$yhat, nrow = length(g1), byrow = FALSE),
showscale = FALSE, name = "Least-Squares Plane", opacity = 0.85) |>
layout(scene = list(xaxis = list(title = "x1"),
yaxis = list(title = "x2"),
zaxis = list(title = "y")),
margin = list(l = 0, r = 0, b = 0, t = 0))
new <- data.frame(x1 = 6, x2 = 1)
pred <- predict(m2, newdata = new, interval = "prediction", level = 0.95)
knitr::kable(as.data.frame(pred), digits = 3,
caption = "Predicted response with 95% prediction interval")
| fit | lwr | upr |
|---|---|---|
| 9.862 | 5.99 | 13.735 |