Agenda

  • What is linear regression?
  • Estimating the line/plane
  • Interpreting coefficients
  • Diagnostics
  • Inference (t-tests & CIs)
  • 3D interactive view (plotly)
  • Code & reproducibility
  • Prediction example

Model

For simple regression: \[ y_i = \beta_0 + \beta_1 x_{1i} + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0,\sigma^2). \] For two predictors: \[ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \varepsilon_i. \]

Scatter with fitted line (ggplot 1)

ggplot(dat, aes(x = x1, y = y)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "x1", y = "y", title = "Simple Linear Regression: y ~ x1")

Residual diagnostics (ggplot 2)

aug <- augment(m1)
ggplot(aug, aes(.fitted, .resid)) +
  geom_hline(yintercept = 0, linewidth = 0.6) +
  geom_point(alpha = 0.7) +
  labs(x = "Fitted", y = "Residual", title = "Residuals vs Fitted (y ~ x1)")

Inference formulas (math 2)

OLS estimates: \[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1}\mathbf{X}^\top \mathbf{y}, \quad \widehat{\sigma}^2 = \frac{\text{SSE}}{n - p}. \] For \(H_0:\beta_j=0\): \[ t = \frac{\hat\beta_j}{SE(\hat\beta_j)} \sim t_{n-p},\quad \hat\beta_j \pm t_{1-\alpha/2,\,n-p}\,SE(\hat\beta_j). \]

Plotly 3D plane — interactive

How this 3D plot is constructed

  • Fit multiple regression: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2\).
  • Create a grid over \((x_1, x_2)\) and compute \(\hat y = \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 x_2\).
  • Plot data as 3D points and the fitted plane as a surface (Plotly).

Code: fit models & make grid

# Fit models
m1 <- lm(y ~ x1, data = dat)
m2 <- lm(y ~ x1 + x2, data = dat)

# Grid for plane
rng1 <- range(dat$x1); rng2 <- range(dat$x2)
g1   <- seq(rng1[1], rng1[2], length.out = 30)
g2   <- seq(rng2[1], rng2[2], length.out = 30)
surf <- expand.grid(x1 = g1, x2 = g2)
b    <- coef(m2)
surf$yhat <- b[1] + b[2]*surf$x1 + b[3]*surf$x2

Code: plotly call

plot_ly(height = 480, width = 740) |>
  add_markers(data = dat, x = ~x1, y = ~x2, z = ~y,
              name = "data", opacity = 0.6, marker = list(size = 3)) |>
  add_surface(x = ~g1, y = ~g2,
              z = ~matrix(surf$yhat, nrow = length(g1), byrow = FALSE),
              showscale = FALSE, name = "Least-Squares Plane", opacity = 0.85) |>
  layout(scene = list(xaxis = list(title = "x1"),
                      yaxis = list(title = "x2"),
                      zaxis = list(title = "y")),
         margin = list(l = 0, r = 0, b = 0, t = 0))

Prediction example

new <- data.frame(x1 = 6, x2 = 1)
pred <- predict(m2, newdata = new, interval = "prediction", level = 0.95)
knitr::kable(as.data.frame(pred), digits = 3,
             caption = "Predicted response with 95% prediction interval")
Predicted response with 95% prediction interval
fit lwr upr
9.862 5.99 13.735