Welcome: a compact overview of simple linear regression, visual examples, and code.
Welcome: a compact overview of simple linear regression, visual examples, and code.
The simple linear regression model is:
\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]
where:
\[ \varepsilon_i \sim \text{iid } (0, \sigma^2) \]
We’ll use the built-in mtcars dataset.
Predict miles per gallon (mpg) from weight (wt) and optionally horsepower (hp).
# this chunk will print the first rows on the slide head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
A \((1-\alpha)\times 100\%\) confidence interval for the slope \(\beta_1\) is:
\[ \hat{\beta}_1 \pm t_{n-2,\,1-\alpha/2} \cdot \text{SE}(\hat{\beta}_1) \]
where
\[ \text{SE}(\hat{\beta}_1) = \sqrt{ \frac{\hat{\sigma}^2}{\sum (X_i - \bar{X})^2} } \]
Rotate and zoom the plot to explore the relationship between MPG, weight, and horsepower.
The following R code creates the 3D interactive plot shown earlier. The code is displayed here for transparency and reproducibility.
library(plotly)
plot_ly(
data = mtcars,
x = ~wt,
y = ~hp,
z = ~mpg,
type = "scatter3d",
mode = "markers",
marker = list(size = 4)
) %>%
layout(
title = "MPG vs Weight and Horsepower",
scene = list(
xaxis = list(title = "Weight"),
yaxis = list(title = "Horsepower"),
zaxis = list(title = "MPG")
)
)
The following R code fits the regression model.
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
The estimated slope is negative: heavier cars tend to have lower MPG
Residuals show no strong patterns, supporting model assumptions
Confidence intervals quantify uncertainty in the estimate
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
## 2.5 % 97.5 % ## (Intercept) 33.450500 41.119753 ## wt -6.486308 -4.202635
Simple linear regression models linear relationships
Visualization helps validate assumptions
ggplot2 and plotly produce high-quality graphics
R Markdown enables fully reproducible presentations