2026-04-12

What is Simple Linear Regression?

Simple linear regression is a method for predicting one variable using another.

  • \(X\) is the predictor (input)
  • \(Y\) is the response (output)

The idea is to find the best fitting line through a scatter plot that describes the relationship between the two variables.

For this presentation, I am using the built-in mtcars dataset to see if car weight can predict fuel efficiency (mpg).

The Model

The simple linear regression model is written as:

\[Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\]

Where:

  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\varepsilon_i\) is the error term

Estimating the Coefficients

The slope and intercept are estimated from the data using these formulas:

\[\hat{\beta}_1 = \frac{\sum(X_i - \bar{X})(Y_i - \bar{Y})}{\sum(X_i - \bar{X})^2}\]

\[\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X}\]

The Dataset

For this presentation I am using the built-in mtcars dataset in R.

  • wt (weight in 1000 lbs) is the predictor \(X\)
  • mpg (miles per gallon) is the response \(Y\)

A Look at the Data

mpg wt hp cyl
Mazda RX4 21.0 2.620 110 6
Mazda RX4 Wag 21.0 2.875 110 6
Datsun 710 22.8 2.320 93 4
Hornet 4 Drive 21.4 3.215 110 6
Hornet Sportabout 18.7 3.440 175 8
Valiant 18.1 3.460 105 6

Scatter Plot

Residual Plot

If the residuals are randomly scattered around 0, that suggests the model is a good fit.

3D Plot

R Code

# fit the linear regression model
model <- lm(mpg ~ wt, data = mtcars)

# view the results
summary(model)

This fits a simple linear regression model using lm() and summary() shows us the coefficients, R-squared, and p-values.

Results

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

The fitted model is:

\[\widehat{\text{mpg}} = 37.29 - 5.34 \times \text{wt}\]

For every additional 1,000 lbs of weight, mpg decreases by about 5.34.

Conclusion

  • Simple linear regression finds the best fitting line between two variables
  • We used car weight to predict fuel efficiency in the mtcars dataset
  • Heavier cars tend to get lower mpg
  • The residual plot helps us check if the model is a good fit
  • R-squared tells us how much of the variation in \(Y\) is explained by \(X\)