Simple Linear Regression: Speed → Stopping Distance

Overview

Goal: Model the relationship between car speed (mph) and stopping distance (ft)
Tool: Simple Linear Regression (SLR)
Dataset: built-in cars (n = 50)
We’ll fit the model, check assumptions, do inference, and make predictions

Model (Mathematical Form)

\[\text{dist}_i = \beta_0 + \beta_1\,\text{speed}_i + \varepsilon_i,\quad \varepsilon_i \sim \mathcal{N}(0,\sigma^2)\]

\(\beta_0\): intercept — expected distance when speed = 0
\(\beta_1\): slope — expected change in distance for a 1 mph increase in speed
Assumptions on errors: independence, constant variance, normality, mean zero

Assumptions (Diagnostics We Will Check)

Linearity: \(\mathbb{E}[\text{dist}\mid\text{speed}] = \beta_0 + \beta_1\,\text{speed}\)
Independence of errors
Homoscedasticity: \(\operatorname{Var}(\varepsilon_i) = \sigma^2\) is constant
Normality of errors: \(\varepsilon_i \sim \mathcal{N}(0,\sigma^2)\)

These will be assessed via residual plots and a QQ-plot.

Data Peek

head(cars) %>% kable(caption = "First 6 rows of the cars data")

First 6 rows of the cars data
speed	dist
4	2
4	10
7	4
7	22
8	16
9	10

Fit the Model

fit <- lm(dist ~ speed, data = cars)
summary(fit)

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

ggplot: Scatter + Fitted Line

base_scatter <- ggplot(cars, aes(speed, dist)) +
  geom_point(alpha = 0.8) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  labs(title = "Stopping Distance vs Speed",
       x = "Speed (mph)", y = "Stopping Distance (ft)") +
  theme_minimal()
base_scatter

## `geom_smooth()` using formula = 'y ~ x'

ggplot: Residuals vs Fitted

aug <- augment(fit)
rvf <- ggplot(aug, aes(.fitted, .resid)) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_point(alpha = 0.8) +
  labs(title = "Residuals vs Fitted", x = "Fitted values", y = "Residuals") +
  theme_minimal()
rvf

ggplot: Normal Q-Q of Residuals

qq <- ggplot(aug, aes(sample = .std.resid)) +
  stat_qq() + stat_qq_line() +
  labs(title = "Normal Q–Q Plot", x = "Theoretical Quantiles", y = "Standardized Residuals") +
  theme_minimal()
qq

plotly: Interactive Scatter

plt <- ggplot(cars, aes(speed, dist)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Interactive: Distance vs Speed",
       x = "Speed (mph)", y = "Stopping Distance (ft)") +
  theme_minimal()

ggplotly(plt)

## `geom_smooth()` using formula = 'y ~ x'

Inference: Coefficients, p-values, CIs

coef_tab <- tidy(fit, conf.int = TRUE)
kable(coef_tab, digits = 3, caption = "SLR coefficients with 95% CIs")

SLR coefficients with 95% CIs
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-17.579	6.758	-2.601	0.012	-31.168	-3.990
speed	3.932	0.416	9.464	0.000	3.097	4.768

Hypotheses for slope: \(H_0: \beta_1 = 0\) vs \(H_a: \beta_1 \ne 0\)
p-value from the t-test (summary above); a small p-value → evidence of association

Interpreting the Slope & Intercept

Slope (\(\hat{\beta}_1\)): expected increase in stopping distance (ft) per 1 mph increase in speed
Intercept (\(\hat{\beta}_0\)): expected stopping distance when speed = 0 mph (often not meaningful outside the data range)

Prediction

new_speeds <- data.frame(speed = c(10, 15, 20, 25))
preds <- predict(fit, newdata = new_speeds, interval = "prediction", level = 0.95)
results <- cbind(new_speeds, round(preds, 1))
kable(results, caption = "Predicted stopping distance with 95% prediction intervals")

Predicted stopping distance with 95% prediction intervals
speed	fit	lwr	upr
10	21.7	-9.8	53.3
15	41.4	10.2	72.6
20	61.1	29.6	92.5
25	80.7	48.5	113.0

Code Chunk (Reproducible Example)

# 1) Fit SLR
fit <- lm(dist ~ speed, data = cars)

# 2) Quick plot
plot(cars$speed, cars$dist,
     xlab = "Speed (mph)", ylab = "Stopping Distance (ft)",
     main = "cars: dist vs speed")
abline(fit, col = "red", lwd = 2)

# 3) Predict
predict(fit, newdata = data.frame(speed = 20), interval = "prediction")

##        fit      lwr      upr
## 1 61.06908 29.60309 92.53507

Takeaways (Summary)

SLR captures a linear trend between speed and stopping distance
Diagnostics help check linearity, equal variance, and normality
Slope test + p-value quantify evidence for association
Prediction intervals reflect future variability — wider than confidence intervals for the mean