Modeling the Relationship Between Car Speed and Stopping Distance
Author
Abdullah Al Shamim
Published
February 27, 2026
Introduction
Simple Linear Regression is a statistical method used to model the relationship between a single independent variable (\(X\)) and a dependent variable (\(Y\)). In this lesson, we analyze the cars dataset to see how a car’s speed influences its stopping distance.
1. Data Visualization: The Regression Plot
Before diving into the numbers, we visualize the data points and the line of best fit.
Code
library(tidyverse)cars %>%ggplot(aes(speed, dist)) +geom_point(size =3, color ="#cc00cc") +geom_smooth(method = lm, se =FALSE, color ="#5f008f") +theme_test() +labs(title ="Speed of Car vs. Stopping Distance",x ="Speed of Car (mph)",y ="Distance taken to Stop (ft)") +theme(plot.title =element_text(size =18, face ="bold", hjust =0.5),axis.text =element_text(size =12),axis.title =element_text(size =12, face ="bold")) +annotate("text", x =10, y =100,label ="Intercept = -17.58 \n Slope = 3.93 \n p-value < 0.05 \n R-squared = 0.65",color ="black", fontface ="bold", size =4)
2. Data Exploration & Model Summary
We use the built-in cars dataset. It contains 50 observations of speed (mph) and stopping distance (ft).
# Generate Summary Statistics using the linear model functioncars %>%lm(dist ~ speed, data = .) %>%summary()
Call:
lm(formula = dist ~ speed, data = .)
Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
speed 3.9324 0.4155 9.464 1.49e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Interpretation of the Summary:
R-squared (0.6511): Approximately 65% of the variation in stopping distance can be explained by the car’s speed.
p-value: The extremely small p-value (\(1.49 \times 10^{-12}\)) indicates that speed is a statistically significant predictor of distance.
3. Building the Linear Model
We define our linear equation as \(Y = \beta_0 + \beta_1X + \epsilon\). In R, we store this in an object called linear_model.
Code
# Building the linear modellinear_model <-lm(dist ~ speed, data = cars) summary(linear_model)
Call:
lm(formula = dist ~ speed, data = cars)
Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
speed 3.9324 0.4155 9.464 1.49e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
In this model, Speed is the independent variable (Predictor) and Distance is the dependent variable (Outcome). The coefficients tell us that for every 1 mph increase in speed, the stopping distance increases by roughly 3.93 feet.
4. Examining Model Residuals
Residuals are the differences between the actual observed values and the values predicted by the model. Analyzing them is crucial for validating our model’s assumptions.
Code
# View residuals# hist(linear_model$residuals, main="Histogram of Residuals", xlab="Residual Value", col="#cc00cc")# Plotting the distribution of residualsggplot(data.frame(residuals = linear_model$residuals), aes(x = residuals)) +geom_histogram(bins =10, fill ="#5f008f", color ="white", alpha =0.7) +labs(title ="Distribution of Residuals", x ="Residuals", y ="Frequency") +theme_minimal()
Why check residuals? If the residuals are Normally Distributed and show no specific pattern, it suggests the model is a “good fit” for the data.
5. Making Predictions
One of the primary goals of regression is prediction. Let’s estimate the stopping distance for cars traveling at 10, 15, and 20 mph.
Code
# Prepare new data for predictionsnew_speed <-data.frame(speed =c(10, 15, 20)) # Predictive model outputpredictions <-predict(linear_model, new_speed)round(predictions, 1)
1 2 3
21.7 41.4 61.1
Quick Predictive Model (One-Step)
You can also perform the entire modeling and prediction process in a single piped command:
Model Function:lm(dependent ~ independent, data = df)
Key Metric:R-squared (measures model accuracy).
Significance: Check if p-value < 0.05.
Coefficients:Intercept (starting point) and Slope (rate of change).
Congratulations! You have mastered the basics of Linear Regression. You can now build a model, evaluate its strength, and use it to predict future outcomes.