Topic: Simple Linear Regression
We will use simple linear regression to show the relationship between 2 variables. In this case we will be using the mtcars dataset to predict a car’s Miles per Gallon (mpg) from its Weight (wt).
2025-10-19
Topic: Simple Linear Regression
We will use simple linear regression to show the relationship between 2 variables. In this case we will be using the mtcars dataset to predict a car’s Miles per Gallon (mpg) from its Weight (wt).
Simple linear regression finds the best-fitting straight line that describes how a dependent variable \(y\) changes as an independent variable \(x\) changes.
The model assumes a linear relationship between \(x\) (e.g. weight of car) and \(y\) (e.g. miles per gallon):
\[ y = \beta_0 + \beta_1x + \epsilon \]
where:
- \(y\): response variable
- \(x\): predictor variable
- \(\beta_0\): intercept
- \(\beta_1\): slope
- \(\epsilon\): random error (captures variation not explained by \(x\))
To show our simple regression model, we will be using the mtcars dataset which is in built in R.
## mpg wt hp ## Mazda RX4 21.0 2.620 110 ## Mazda RX4 Wag 21.0 2.875 110 ## Datsun 710 22.8 2.320 93 ## Hornet 4 Drive 21.4 3.215 110 ## Hornet Sportabout 18.7 3.440 175 ## Valiant 18.1 3.460 105
As mentioned in the previous slides, we will be analyzing how a car’s weight (wt) affects its fuel efficiency (mpg).
A scatter plot is created to see the relationship between the independent and dependent variables before fitting it to a model. It helps us see if there exists a linear pattern between the variables.
As you can see from the plot as a car’s weight gets heavier, its fuel efficiency generally gets worse.
Below is the code generated for the scatterplot made in the previous slide.
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "#8C1D40", size = 3) +
labs(
title = "MPG vs Car Weight",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon"
) +
theme_minimal()
Based on the scatter plot we have analyzed, we are able to make a regression line that summarizes the trends we found.
Let’s start with our equation for a general regression formula.
\[ y = \beta_0 + \beta_1x + \epsilon \] To find the slope and intercept of our regression model we call:
lm(mpg ~ wt, data = mtcars)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Coefficients: ## (Intercept) wt ## 37.285 -5.344
The lm(mpg ~ wt, data = mtcars) call fits the linear regression model, estimating how car weight affects it fuel efficieny. It then generates the slope and intercept used in the regression equation. Our equation with the given information becomes:
\[ MPG = 37.285 - 5.3441WT \]
Now we will see how weight and horsepower together affect fuel efficiency. This is important as real-world variables like fuel efficiency are influenced by more than one factor.
The 3D plot shows that while lighter weight cars have higher fuel efficiency, they also generally have less horsepower. This shows the trade off between horsepower and fuel efficiency.