2026-04-13

Introduction

Simple linear regression is a statistical method used to study the relationship between: - one predictor variable \(x\) - one response variable \(y\)

In this presentation, we use the built-in mtcars dataset: - predictor: wt = car weight - response: mpg = miles per gallon

Why It Matters

Simple linear regression is useful for: - predicting values - understanding relationships between variables - identifying trends in data

Question for this example:

Do heavier cars tend to get lower gas mileage?

The Data

We will use a few variables from mtcars.

##                    mpg    wt  hp cyl
## Mazda RX4         21.0 2.620 110   6
## Mazda RX4 Wag     21.0 2.875 110   6
## Datsun 710        22.8 2.320  93   4
## Hornet 4 Drive    21.4 3.215 110   6
## Hornet Sportabout 18.7 3.440 175   8
## Valiant           18.1 3.460 105   6

This dataset contains measurements for different cars, including fuel efficiency, weight, horsepower, and cylinders.

Regression Model

The simple linear regression model is:

\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \]

where: - \(\beta_0\) is the intercept - \(\beta_1\) is the slope - \(\varepsilon_i\) is the error term

For this example:

\[ \text{mpg}_i = \beta_0 + \beta_1(\text{wt}_i) + \varepsilon_i \]

Least Squares

The fitted regression line minimizes the sum of squared residuals:

\[ \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]

where:

\[ \hat{y}_i = b_0 + b_1x_i \]

A residual is the difference between the observed and predicted value:

\[ e_i = y_i - \hat{y}_i \]

Fit the Model

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

The model estimates how much mpg changes as wt changes.

GGPlot: Scatterplot with Regression Line

This plot shows a negative relationship between weight and fuel efficiency.

GGPlot: Residual Plot

Residuals help us check whether a linear model is reasonable.

Plotly: Interactive 3D Plot

This interactive plot shows that MPG is related to more than one variable.

Example Prediction

Suppose a car weighs 3.0 thousand pounds.

##        1 
## 21.25171

This gives the predicted miles per gallon for a car with that weight.

R Code Example

model <- lm(mpg ~ wt, data = mtcars)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    title = "MPG vs Weight",
    x = "Weight (1000 lbs)",
    y = "Miles per Gallon"
  )

This slide shows R code used to fit the regression model and create one of the plots.

Conclusion

Simple linear regression is a useful tool for: - describing relationships between two variables - making predictions - summarizing trends in data

Using mtcars, we found that: - heavier cars tend to have lower MPG - a regression line can summarize that pattern - plots and formulas help explain the idea clearly