linear regression mpg prediction

2025-11-10

Overview

This deck shows a simple linear regression predicting miles per gallon (mpg) from horsepower (hp) using the built-in mtcars dataset. It includes 2 ggplots, 1 plotly (3D), 2 math slides, and code.

Data

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

cars <- mtcars %>% dplyr::select(mpg, hp, wt)
summary(cars)

##       mpg              hp              wt       
##  Min.   :10.40   Min.   : 52.0   Min.   :1.513  
##  1st Qu.:15.43   1st Qu.: 96.5   1st Qu.:2.581  
##  Median :19.20   Median :123.0   Median :3.325  
##  Mean   :20.09   Mean   :146.7   Mean   :3.217  
##  3rd Qu.:22.80   3rd Qu.:180.0   3rd Qu.:3.610  
##  Max.   :33.90   Max.   :335.0   Max.   :5.424

Model (Math)

We assume a simple linear model: \[ \text{mpg}_i=\beta_0+\beta_1\,\text{hp}_i+\varepsilon_i,\qquad \varepsilon_i\sim\mathcal{N}(0,\sigma^2). \]

Fit in R (Code)

mod <- lm(mpg ~ hp, data = cars)
summary(mod)

## 
## Call:
## lm(formula = mpg ~ hp, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

cars <- cars %>%
  mutate(.fitted = fitted(mod),
         .resid  = resid(mod))

ggplot: MPG vs HP (with fitted line)

ggplot: Residuals vs Fitted

plotly: 3D Context (MPG ~ HP, Weight)

Inference (Math)

We test slope significance and compute a CI: \[ H_0:\ \beta_1=0 \quad \text{vs} \quad H_1:\ \beta_1\neq 0,\qquad t=\frac{\hat{\beta}_1}{\mathrm{SE}(\hat{\beta}_1)} \sim t_{n-2}. \] A \((1-\alpha)100\%\) CI for \(\beta_1\) is \[ \hat{\beta}_1 \pm t_{\alpha/2,\,n-2}\,\mathrm{SE}(\hat{\beta}_1). \]

Confidence Interval (Code)

coefs <- coef(summary(mod))
beta1 <- coefs["hp", "Estimate"];  se1 <- coefs["hp", "Std. Error"]
df    <- df.residual(mod);         alpha <- 0.05
crit  <- qt(1 - alpha/2, df)
ci    <- c(beta1 - crit * se1, beta1 + crit * se1)

list(
  slope_estimate = beta1,
  slope_se       = se1,
  df             = df,
  conf_level     = 1 - alpha,
  ci_for_beta1   = ci
)

## $slope_estimate
## [1] -0.06822828
## 
## $slope_se
## [1] 0.0101193
## 
## $df
## [1] 30
## 
## $conf_level
## [1] 0.95
## 
## $ci_for_beta1
## [1] -0.08889465 -0.04756190

Takeaways

As horsepower increases, MPG decreases.
Residual diagnostics and the CI support a significant negative slope.
3D plot adds context using weight.