Overview: Simple Linear Regression

Simple linear regression is a way to understand how one variable affects another.

  • We have a response variable \(Y\) - what we’re trying to predict
  • We have a predictor variable \(X\) - what we’re using to predict it

For example: does a heavier car get worse gas mileage?

Model Structure

To describe this relationship mathematically, we use the following model:

\[Y = \beta_0 + \beta_1 X + \varepsilon\]

Where:

  • \(\beta_0\) = intercept (value of \(Y\) when \(X = 0\))
  • \(\beta_1\) = slope (change in \(Y\) per unit increase in \(X\))
  • \(\varepsilon \sim N(0, \sigma^2)\) = random error term

The fitted model uses estimated coefficients:

\[\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X\]

Estimating the Coefficients

The coefficients \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are chosen to minimize the Residual Sum of Squares (RSS):

\[RSS = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 = \sum_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2\]

The closed-form solutions are:

\[\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}\]

Car Weight and Fuel Efficiency

The mtcars dataset in R records various stats for 32 cars from a 1974 Motor Trend magazine.

##                      wt  mpg
## Mazda RX4         2.620 21.0
## Mazda RX4 Wag     2.875 21.0
## Datsun 710        2.320 22.8
## Hornet 4 Drive    3.215 21.4
## Hornet Sportabout 3.440 18.7
  • wt: weight of the car (1000 lbs)
  • mpg: fuel efficiency (miles per gallon)

Heavier cars likely burn more fuel - let’s see if the data supports that.

Scatter Plot

We can already see a negative trend - as weight increases, fuel efficiency tends to drop.

Running It in R

model <- lm(mpg ~ wt, data = mtcars)
coef(summary(model))
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10

Checking the Residuals

Interactive Plot

Conclusions

The model estimates that for every extra 1000 lbs of weight, fuel efficiency drops by about 5.34 mpg - which makes intuitive sense.

  • \(\hat{\beta}_0 \approx 37.29\): a car with zero weight would get ~37 mpg
  • \(R^2 \approx 0.75\): weight explains 75% of the variation in mpg - a strong relationship
  • The p-value for weight is essentially zero, confirming it is a statistically significant predictor