What is Simple Linear Regression

Simple linear regression is used to understand and model the relationship between a quantitative response variable and a single quantitative predictor.

It helps answer questions such as:

  • How does one variable change as another changes?
  • Can we predict one variable using another?

The Simple Linear Regression Model

We assume the following model:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

where:

  • \(Y\) is the response variable
  • \(X\) is the predictor variable
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\varepsilon\) is the random error term

Estimation by Least Squares

The parameters \(\beta_0\) and \(\beta_1\) are estimated by minimizing the sum of squared residuals:

\[ \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

where:

\[ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i \]

Example Dataset

We use the built-in mtcars dataset in R.

  • Response variable: mpg (miles per gallon)
  • Predictor variable: wt (weight of the car)

This dataset contains information on 32 cars.

head(mtcars[, c("mpg", "wt", "hp", "fitted", "residuals")])
##                    mpg    wt  hp   fitted  residuals
## Mazda RX4         21.0 2.620 110 23.28261 -2.2826106
## Mazda RX4 Wag     21.0 2.875 110 21.91977 -0.9197704
## Datsun 710        22.8 2.320  93 24.88595 -2.0859521
## Hornet 4 Drive    21.4 3.215 110 20.10265  1.2973499
## Hornet Sportabout 18.7 3.440 175 18.90014 -0.2001440
## Valiant           18.1 3.460 105 18.79325 -0.6932545

Scatter Plot with Regression Line

## `geom_smooth()` using formula = 'y ~ x'

R Code Used to Create the previous ggplot

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    x = "Weight (1000 lbs)",
    y = "Miles per Gallon",
    title = "Fuel Efficiency vs Car Weight"
  )

Residual Analysis

A 3D Visualization