Overview: Simple Linear Regression

Simple linear regression is a way to understand how one variable affects another.

  • We have a response variable \(Y\) - what we’re trying to predict
  • We have a predictor variable \(X\) - what we’re using to predict it

For example: does a heavier car get worse gas mileage?

Model Structure

To describe this relationship mathematically, we use the following model:

\[Y = \beta_0 + \beta_1 X + \varepsilon\]

Where:

  • \(\beta_0\) = intercept (value of \(Y\) when \(X = 0\))
  • \(\beta_1\) = slope (change in \(Y\) per unit increase in \(X\))
  • \(\varepsilon \sim N(0, \sigma^2)\) = random error term

The fitted model uses estimated coefficients:

\[\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X\]

Estimating the Coefficients

The coefficients \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are chosen to minimize the Residual Sum of Squares (RSS):

\[RSS = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 = \sum_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2\]

The closed-form solutions are:

\[\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}\]

Car Weight and Fuel Efficiency

The mtcars dataset in R records various stats for 32 cars from a 1974 Motor Trend magazine.

##                      wt  mpg
## Mazda RX4         2.620 21.0
## Mazda RX4 Wag     2.875 21.0
## Datsun 710        2.320 22.8
## Hornet 4 Drive    3.215 21.4
## Hornet Sportabout 3.440 18.7
  • wt: weight of the car (1000 lbs)
  • mpg: fuel efficiency (miles per gallon)

Heavier cars likely burn more fuel - let’s see if the data supports that.

Scatter Plot

We can already see a negative trend - as weight increases, fuel efficiency tends to drop, suggesting an inverse relationship.

Running It in R

model <- lm(mpg ~ wt, data = mtcars)
coef(summary(model))
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10

Checking the Residuals

Residuals appear randomly scattered around zero, supporting the assumptions of the linear model.

Interactive Plot

Conclusions

The model estimates that for every extra 1000 lbs of weight, fuel efficiency drops by about 5.34 mpg - which makes intuitive sense.

  • \(\hat{\beta}_0 \approx 37.29\): a car with zero weight would get ~37 mpg (not meaningful, just the math)
  • \(R^2 \approx 0.75\): weight explains 75% of the variation in mpg - a strong relationship
  • The p-value for weight is essentially zero, confirming it is a statistically significant predictor