2026-04-12

t

What is Simple Linear Regression?

SLR is a statistical method that models the relationship between two variables by fitting a straight line to the data (Penn State, 2018). It is one of the most widely used tools in statistics, data science, and machine learning.

The Goal: to find the best-fitting straight line through the data that minimizes prediction error.

The Equation

The formula is expressed as:

\[Y = \beta_0 + \beta_1 X + \varepsilon\] Where:

  • \(Y\) = predicted response variable
  • \(X\) = predictor variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\varepsilon\) = error term (random noise)

Estimating the Coefficients

The coefficients \(\beta_0\) and \(\beta_1\) are estimated using the Ordinary Least Squares (OLS) method, which minimizes the sum of squared residuals:

\[\text{Minimize} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2\]

The estimates formulas:

\[\hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}\]

\[\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}\]

Example: Vehicle Speed vs. Stopping Distance

Used R’s built-in cars dataset, which records the speed of cars (mph) and the distance required to stop (ft).

This is an example of a positive linear relationship meaning that faster cars need more distance to stop

Fitting the Regression Line

The blue line represents the fitted regression line and the shaded area is the 95% confidence interval.

The (R) Code

# Fit the linear regression model
model <- lm(dist ~ speed, data = cars)

# View the summary
summary(model)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -17.579095  6.7584402 -2.601058 1.231882e-02
## speed         3.932409  0.4155128  9.463990 1.489836e-12

The estimated equation: \[\hat{Y} = -17.58 + 3.93 \times X\]

For every 1 mph increase in speed, stopping distance increases by about 3.93 ft.

Model Fit: R-Squared

The \(R^2\) value for this model is 0.651, meaning the model explains about 65% of the variation in stopping distance.

3D Interactive Plot

Conclusion

Simple Linear Regression is a powerful yet interpretable tool for modeling relationships between variables.