Simple Linear Regression is one of the most fundamental statistical techniques used in data science. In this presentation we will be analyzing the relationship between a vehicle’s speed and the distance it takes to stop.
Simple Linear Regression is one of the most fundamental statistical techniques used in data science. In this presentation we will be analyzing the relationship between a vehicle’s speed and the distance it takes to stop.
The cars dataset is a built-in dataset in R that measures stopping distances of cars traveling at different speeds.
Variables:
-speed: Speed of the car (miles per hour)
-dist: Stopping distance (feet)
Goal:
Determine whether speed can be used to predict stopping distance using linear regression.
The simple linear regression model is:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
Where:
The slope tells us how much stopping distance changes as speed increases.
The regression line is found using Ordinary Least Squares (OLS), which minimizes the Residual Sum of Squares (RSS):
\[ RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]
Where:
The best regression line minimizes the total squared difference between observed and predicted values.
model <- lm(dist ~ speed, data = cars) cars$fitted <- fitted(model)
ggplot(cars, aes(x = speed, y = dist)) + geom_point(color = “steelblue”, size = 3) + geom_smooth(method = “lm”, color = “red”, se = FALSE) + geom_segment(aes(x = speed, y = dist, xend = speed, yend = fitted), color = “orange”, linetype = “dashed”) + labs( title = “Regression Line with Residuals”, x = “Speed (mph)”, y = “Stopping Distance (ft)” ) + -theme_minimal()
Key takeaways from analyzing the cars dataset: