June 2026

What is Simple Linear Regression?

Simple linear regression models the relationship between one predictor \(x\) and a continuous response \(y\) by fitting a straight line.

The population model is:

\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad \varepsilon_i \sim N(0, \sigma^2) \]

  • \(\beta_0\): the intercept (value of \(y\) when \(x = 0\))
  • \(\beta_1\): the slope (change in \(y\) per one-unit change in \(x\))
  • \(\varepsilon_i\): random error, assumed independent and normally distributed

Estimating the Line: Least Squares

We pick the estimates \(\hat\beta_0, \hat\beta_1\) that minimize the sum of squared residuals, the vertical gaps between the points and the line:

\[ \min_{\beta_0,\, \beta_1} \; \sum_{i=1}^{n} \left( y_i - \beta_0 - \beta_1 x_i \right)^2 \]

Solving gives the closed-form estimates:

\[ \hat\beta_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}, \qquad \hat\beta_0 = \bar{y} - \hat\beta_1 \bar{x} \]

The quality of the fit is summarized by \(R^2 \in [0, 1]\).

The Data: mtcars

We use R’s built-in mtcars dataset (32 cars, 1974 Motor Trend). Our question: how does a car’s weight predict its fuel economy (MPG)?

  • Response \(y\) = mpg (miles per gallon)
  • Predictor \(x\) = wt (weight in 1000s of lbs)
First 3 cars (mpg, weight in 1000 lbs, horsepower)
mpg wt hp
Mazda RX4 21.0 2.620 110
Mazda RX4 Wag 21.0 2.875 110
Datsun 710 22.8 2.320 93

ggplot #1: The Fitted Regression Line

A 1000 lb increase in weight is associated with about a -5.34 MPG change; the fit explains \(R^2 =\) 0.753. The shaded band is the 95% CI for the mean MPG, not individual cars.

ggplot #2: Checking Residuals

A good fit has residuals scattered randomly around zero (no pattern).

The residuals show no strong systematic pattern, consistent with the linearity assumption; the faint LOESS curve is within the noise expected at \(n = 32\).

Interactive 3D: Adding a Second Predictor

Now predict mpg from weight and horsepower, and the fit becomes a plane. Drag to rotate the plot below.

The R Code Behind the Fit

Everything above is generated programmatically. Here is the core code:

fit <- lm(mpg ~ wt, data = mtcars)
summary(fit)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm")
coef(fit)
## (Intercept)          wt 
##   37.285126   -5.344472

Summary

  • Simple linear regression fits \(\hat{y} = \hat\beta_0 + \hat\beta_1 x\) by minimizing squared residuals.
  • For mtcars, heavier cars get worse mileage: \(\widehat{\text{mpg}} = 37.29 + (-5.34)\cdot \text{wt}\), with \(R^2 = 0.753\).
  • Residual plots reveal whether the linear assumption holds.
  • Adding predictors (e.g. horsepower) extends the line to a plane, the start of multiple regression.