Simple linear regression is a statistical method.
It studies the relationship between two variables.
One variable is the predictor variable.
The other variable is the response variable.
In this example, I use car weight to predict miles per gallon.
2026-06-09
Simple linear regression is a statistical method.
It studies the relationship between two variables.
One variable is the predictor variable.
The other variable is the response variable.
In this example, I use car weight to predict miles per gallon.
I used the mtcars data set in R.
This data set has information about different cars.
## car_name mpg wt hp ## Mazda RX4 Mazda RX4 21.0 2.620 110 ## Mazda RX4 Wag Mazda RX4 Wag 21.0 2.875 110 ## Datsun 710 Datsun 710 22.8 2.320 93 ## Hornet 4 Drive Hornet 4 Drive 21.4 3.215 110 ## Hornet Sportabout Hornet Sportabout 18.7 3.440 175 ## Valiant Valiant 18.1 3.460 105
For this presentation, I mainly use:
mpg: miles per gallonwt: weight of the carhp: horsepowerThe simple linear regression model is:
\[ Y_i = \beta_0 + \beta_1X_i + \varepsilon_i \]
For this example, the model is:
\[ mpg_i = \beta_0 + \beta_1wt_i + \varepsilon_i \]
Here, car weight is used to predict MPG.
In the formula:
The slope tells us how much \(Y\) changes when \(X\) increases by one unit.
The residual is the difference between the real value and the predicted value.
\[ e_i = y_i - \hat{y}_i \]
The least squares method tries to make the sum of squared residuals small.
\[ SSE = \sum_{i=1}^{n} e_i^2 \]
This is how the regression line is chosen.
The estimated slope is:
\[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^{n}(x_i - \bar{x})^2} \]
The estimated intercept is:
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x} \]
These formulas give the fitted regression line.
library(ggplot2) library(plotly) data(mtcars) cars <- mtcars cars$car_name <- rownames(mtcars) model <- lm(mpg ~ wt, data = cars) summary(model)
This code fits a simple linear regression model.
The formula mpg ~ wt means using weight to predict MPG.
## ## Call: ## lm(formula = mpg ~ wt, data = cars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
The slope of wt is negative.
This means when car weight increases, the predicted MPG decreases.
This makes sense because heavier cars usually use more fuel.
The fitted model is:
## mpg = 37.29 + -5.34 * wt
The coefficient of wt is negative.
So the relationship between weight and MPG is negative in this data set.
The plot shows a downward trend.
When weight goes up, MPG usually goes down.
The residual plot checks the model errors.
Most points are around zero.
It shows three variables at the same time.
The three variables are weight, horsepower, and MPG.
Simple linear regression is useful for studying two quantitative variables.
In this example, car weight is related to MPG.
The result shows a negative relationship.
Heavier cars usually have lower MPG in this data set.
The model is simple, but it is useful for a first analysis.