2025-11-14

Overview

  • Topic: simple linear regression

  • Goal: predict the miles per gallon (mpg) from car weight (wt)

  • Dataset: built-in mtcars

  • This presentation covers:

    • Regression equation
    • Estimating the model
    • Fitting the model in R
    • ggplot and plotly visuals
    • Model diagnostics

Regression Equation

  • The simple linear regression model is: \[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i,\quad i = 1,2,\ldots,n \]

Where:

  • \(\beta_0\) is the intercept

  • \(\beta_1\) is the slope (change in \(Y\) per 1-unit increase in \(X\))

  • \(\varepsilon_i\) is the random error term

Estimating the Line

  • We estimate \(\beta_0\) and \(\beta_1\) using the least squares formula: \[ \hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i \]

  • The fitted line minimizes the sum of squared residuals: \[ \text{SSE} = \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \]

  • The residual for each point is: \[ e_i = Y_i - \hat{Y}_i \]

Regression Model Interpretation

  • Model form: \[ \widehat{\text{mpg}} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{wt} \]

  • The slope \(\hat{\beta}_1\) of the relationship between weight and mpg is typically negative.

  • As weight increases, fuel efficiency (mpg) decreases.

Example Dataset: mtcars

  • Variables used:

    • mpg: miles per gallon (response)
    • wt: weight in 1000 lbs (predictor)
    • hp: horsepower (for 3D plot)
head(mtcars[, c("mpg", "wt", "hp")])
##                    mpg    wt  hp
## Mazda RX4         21.0 2.620 110
## Mazda RX4 Wag     21.0 2.875 110
## Datsun 710        22.8 2.320  93
## Hornet 4 Drive    21.4 3.215 110
## Hornet Sportabout 18.7 3.440 175
## Valiant           18.1 3.460 105

Scatterplot: MPG vs Weight

  • Heavier cars tend to have lower fuel efficiency (mpg).

Fitting the Regression Model

  • We fit the model using: \[ \widehat{\text{mpg}} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{wt} \]

  • The slope \(\beta_1\) represents the expected change in mpg for a one–unit increase in weight (1000 lbs).

Regression Output in R

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Regression Line on Scatterplot

  • The fitted line shows the predicted mpg for a given weight.

3D View: MPG vs Weight and Horsepower

  • A 3D plot helps visualize how multiple predictors affect mpg.

Residual Plot

  • A solid regression model will have residuals that bounce around zero without any clear pattern.

Example R Code

model <- lm(mpg ~ wt, data = mtcars)
summary(model)

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  geom_smooth(method = "lm")
  • This lays out the basic steps used to build the model.

Summary

  • Simple linear regression models a linear relationship between two variables.

  • In mtcars, weight is a strong predictor of mpg.

  • Regression gives both an equation and visual fit.

  • Diagnostics help assess whether the model is appropriate.