2025-11-07

Overview

Goal: Demonstrate and visualize simple linear regression using the built-in mtcars dataset with mpg (fuel efficiency) as the response variable and wt (vehicle weight in 1000 lbs) as the predictor.

What you’ll see: Model definition & assumptions, two ggplot2 graphics, one plotly 3D interactive plot, R code used to fit and visualize the model, and math slides using LaTeX.

The Linear Regression Model

We model the mean of \(Y\) (mpg) as a linear function of \(X\) (weight):

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i,\qquad i=1,\dots,n. \]

The least-squares estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\) minimize

\[ \sum_{i=1}^{n} \left(Y_i - (\beta_0 + \beta_1 X_i)\right)^2 . \]

Model Assumptions

  1. Linearity: \(\mathbb{E}[Y|X] = \beta_0 + \beta_1X\)
  2. Independence of residuals
  3. Constant variance: \(\mathrm{Var}(Y|X) = \sigma^2\)
  4. Normality of residuals for inference

Fitting the Model (R Output)

dat <- mtcars |> dplyr::select(mpg, wt, hp)
fit <- lm(mpg ~ wt, data = dat)

coef_tbl  <- broom::tidy(fit)
glance_tbl <- broom::glance(fit)

coef_tbl
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    37.3      1.88      19.9  8.24e-19
## 2 wt             -5.34     0.559     -9.56 1.29e-10
glance_tbl |> dplyr::select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs)
## # A tibble: 1 × 7
##   r.squared adj.r.squared sigma statistic  p.value    df  nobs
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl> <int>
## 1     0.753         0.745  3.05      91.4 1.29e-10     1    32

ggplot #1 — Scatterplot with Regression Line

ggplot #2 — Residuals vs Fitted

ggplot #3 — Normal Q–Q Plot

plotly — 3D Interactive Plot

Interpretation & Key Results

  • The slope \(\hat{\beta}_1\) is negative → heavier cars get lower MPG.
  • \(R^2\) shows how much variation in MPG weight explains.
  • Residual diagnostics help confirm model assumptions.
summary(fit)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10
summary(fit)$r.squared
## [1] 0.7528328

R Code Summary Slide

library(tidyverse)
fit <- lm(mpg ~ wt, data = mtcars)
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)

Conclusion

  • Weight has a strong negative relationship with fuel efficiency.
  • Regression + visuals (ggplot & plotly) make results easy to interpret.
  • This workflow can be reused for other datasets and predictors.