This deck shows a simple linear regression predicting miles per gallon (mpg) from horsepower (hp) using the built-in mtcars dataset. It includes 2 ggplots, 1 plotly (3D), 2 math slides, and code.
2025-11-10
This deck shows a simple linear regression predicting miles per gallon (mpg) from horsepower (hp) using the built-in mtcars dataset. It includes 2 ggplots, 1 plotly (3D), 2 math slides, and code.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
cars <- mtcars %>% dplyr::select(mpg, hp, wt) summary(cars)
## mpg hp wt ## Min. :10.40 Min. : 52.0 Min. :1.513 ## 1st Qu.:15.43 1st Qu.: 96.5 1st Qu.:2.581 ## Median :19.20 Median :123.0 Median :3.325 ## Mean :20.09 Mean :146.7 Mean :3.217 ## 3rd Qu.:22.80 3rd Qu.:180.0 3rd Qu.:3.610 ## Max. :33.90 Max. :335.0 Max. :5.424
We assume a simple linear model: \[ \text{mpg}_i=\beta_0+\beta_1\,\text{hp}_i+\varepsilon_i,\qquad \varepsilon_i\sim\mathcal{N}(0,\sigma^2). \]
mod <- lm(mpg ~ hp, data = cars) summary(mod)
## ## Call: ## lm(formula = mpg ~ hp, data = cars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.7121 -2.1122 -0.8854 1.5819 8.2360 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 30.09886 1.63392 18.421 < 2e-16 *** ## hp -0.06823 0.01012 -6.742 1.79e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.863 on 30 degrees of freedom ## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892 ## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
cars <- cars %>%
mutate(.fitted = fitted(mod),
.resid = resid(mod))
We test slope significance and compute a CI: \[ H_0:\ \beta_1=0 \quad \text{vs} \quad H_1:\ \beta_1\neq 0,\qquad t=\frac{\hat{\beta}_1}{\mathrm{SE}(\hat{\beta}_1)} \sim t_{n-2}. \] A \((1-\alpha)100\%\) CI for \(\beta_1\) is \[ \hat{\beta}_1 \pm t_{\alpha/2,\,n-2}\,\mathrm{SE}(\hat{\beta}_1). \]
coefs <- coef(summary(mod)) beta1 <- coefs["hp", "Estimate"]; se1 <- coefs["hp", "Std. Error"] df <- df.residual(mod); alpha <- 0.05 crit <- qt(1 - alpha/2, df) ci <- c(beta1 - crit * se1, beta1 + crit * se1) list( slope_estimate = beta1, slope_se = se1, df = df, conf_level = 1 - alpha, ci_for_beta1 = ci )
## $slope_estimate ## [1] -0.06822828 ## ## $slope_se ## [1] 0.0101193 ## ## $df ## [1] 30 ## ## $conf_level ## [1] 0.95 ## ## $ci_for_beta1 ## [1] -0.08889465 -0.04756190