What is Simple Linear Regression?

Simple linear regression is a statistical method used to study the relationship between one explanatory variable and one response variable.

In this example:

  • Response variable: mpg = miles per gallon
  • Explanatory variable: wt = car weight in 1000 pounds
  • Goal: predict fuel efficiency from car weight

Regression Model

The simple linear regression model is:

\[ y_i = \beta_0 + \beta_1x_i + \epsilon_i \]

Where:

  • \(y_i\) is the response value
  • \(x_i\) is the explanatory value
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\epsilon_i\) is the error term

Example Dataset

We use the built-in R dataset mtcars.

##                                 car  mpg    wt
## Mazda RX4                 Mazda RX4 21.0 2.620
## Mazda RX4 Wag         Mazda RX4 Wag 21.0 2.875
## Datsun 710               Datsun 710 22.8 2.320
## Hornet 4 Drive       Hornet 4 Drive 21.4 3.215
## Hornet Sportabout Hornet Sportabout 18.7 3.440
## Valiant                     Valiant 18.1 3.460

The dataset contains information about different car models, including fuel efficiency and weight.

Fitting the Line

The least squares method chooses the line that minimizes the sum of squared residuals:

\[ SSE = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]

The predicted value is:

\[ \hat{y}_i = b_0 + b_1x_i \]

A smaller residual means the prediction is closer to the observed value.

Scatterplot with Regression Line

R Code Used to Create the Model

library(ggplot2)
library(dplyr)

cars <- mtcars %>%
  mutate(car = rownames(mtcars))

model <- lm(mpg ~ wt, data = cars)
summary(model)

cars$fitted_mpg <- fitted(model)
cars$residuals <- residuals(model)

This code creates the regression model and saves the fitted values and residuals.

Model Output

## 
## Call:
## lm(formula = mpg ~ wt, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

The slope is negative, which means heavier cars tend to have lower miles per gallon.

Residual Plot

A good regression model should have residuals randomly scattered around zero.

Interactive Plotly Plot

This interactive plot allows the viewer to rotate the graph and compare actual and predicted values.

Interpretation

For this model, the relationship between weight and miles per gallon is negative.

This means:

  • As car weight increases, predicted MPG decreases.
  • Weight is useful for predicting fuel efficiency.
  • The model is simple, but it still gives useful information.

Conclusion

Simple linear regression helps us understand and predict relationships between two quantitative variables.

In this example, we used car weight to predict fuel efficiency. The plots and model output show that heavier cars usually have lower MPG.