2026-06-09

What is Simple Linear Regression

Simple linear regression is a statistical method.

It studies the relationship between two variables.

One variable is the predictor variable.

The other variable is the response variable.

In this example, I use car weight to predict miles per gallon.

Data

I used the mtcars data set in R.

This data set has information about different cars.

##                            car_name  mpg    wt  hp
## Mazda RX4                 Mazda RX4 21.0 2.620 110
## Mazda RX4 Wag         Mazda RX4 Wag 21.0 2.875 110
## Datsun 710               Datsun 710 22.8 2.320  93
## Hornet 4 Drive       Hornet 4 Drive 21.4 3.215 110
## Hornet Sportabout Hornet Sportabout 18.7 3.440 175
## Valiant                     Valiant 18.1 3.460 105

For this presentation, I mainly use:

  • mpg: miles per gallon
  • wt: weight of the car
  • hp: horsepower

Linear Regression Model

The simple linear regression model is:

\[ Y_i = \beta_0 + \beta_1X_i + \varepsilon_i \]

For this example, the model is:

\[ mpg_i = \beta_0 + \beta_1wt_i + \varepsilon_i \]

Here, car weight is used to predict MPG.

Meaning of the Model

In the formula:

  • \(Y_i\) is the response value
  • \(X_i\) is the predictor value
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\varepsilon_i\) is the error term

The slope tells us how much \(Y\) changes when \(X\) increases by one unit.

Least Squares

The residual is the difference between the real value and the predicted value.

\[ e_i = y_i - \hat{y}_i \]

The least squares method tries to make the sum of squared residuals small.

\[ SSE = \sum_{i=1}^{n} e_i^2 \]

This is how the regression line is chosen.

Formula for the Slope

The estimated slope is:

\[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^{n}(x_i - \bar{x})^2} \]

The estimated intercept is:

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x} \]

These formulas give the fitted regression line.

R Code

library(ggplot2)
library(plotly)

data(mtcars)

cars <- mtcars
cars$car_name <- rownames(mtcars)

model <- lm(mpg ~ wt, data = cars)

summary(model)

This code fits a simple linear regression model.

The formula mpg ~ wt means using weight to predict MPG.

Model Output

## 
## Call:
## lm(formula = mpg ~ wt, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

The slope of wt is negative.

This means when car weight increases, the predicted MPG decreases.

This makes sense because heavier cars usually use more fuel.

Fitted Equation

The fitted model is:

## mpg = 37.29 + -5.34 * wt

The coefficient of wt is negative.

So the relationship between weight and MPG is negative in this data set.

Scatter Plot with Regression Line

The plot shows a downward trend.

When weight goes up, MPG usually goes down.

Residual Plot

The residual plot checks the model errors.

Most points are around zero.

3D Plotly Plot

It shows three variables at the same time.

The three variables are weight, horsepower, and MPG.

Summary

Simple linear regression is useful for studying two quantitative variables.

In this example, car weight is related to MPG.

The result shows a negative relationship.

Heavier cars usually have lower MPG in this data set.

The model is simple, but it is useful for a first analysis.