Simple Linear Regression with the Cars Dataset

Introduction

Simple Linear Regression is one of the most fundamental statistical techniques used in data science. In this presentation we will be analyzing the relationship between a vehicle’s speed and the distance it takes to stop.

The Cars Dataset

The cars dataset is a built-in dataset in R that measures stopping distances of cars traveling at different speeds.

Variables:

-speed: Speed of the car (miles per hour)

-dist: Stopping distance (feet)

Goal:

Determine whether speed can be used to predict stopping distance using linear regression.

Linear Regression Equation

The simple linear regression model is:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Where:

$y$ = dependent variable (stopping distance)
$x$ = independent variable (speed)
$\beta_0$ = intercept
$\beta_1$ = slope
$\epsilon$ = random error

The slope tells us how much stopping distance changes as speed increases.

Least Squares Method

The regression line is found using Ordinary Least Squares (OLS), which minimizes the Residual Sum of Squares (RSS):

\[ RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

Where:

$y_i$ = observed value
$\hat{y}_i$ = predicted value from the regression line

The best regression line minimizes the total squared difference between observed and predicted values.

Scatter Plot

Residual Representation

Plot with Plotly

Scatter plot code

model <- lm(dist ~ speed, data = cars) cars$fitted <- fitted(model)

ggplot(cars, aes(x = speed, y = dist)) + geom_point(color = “steelblue”, size = 3) + geom_smooth(method = “lm”, color = “red”, se = FALSE) + geom_segment(aes(x = speed, y = dist, xend = speed, yend = fitted), color = “orange”, linetype = “dashed”) + labs( title = “Regression Line with Residuals”, x = “Speed (mph)”, y = “Stopping Distance (ft)” ) + -theme_minimal()

Conclusion

Key takeaways from analyzing the cars dataset:

There is a clear positive relationship between car speed and stopping distance.
As speed increases, stopping distance increases, which is consistent with physics.