2025-10-19

Introduction

Topic: Simple Linear Regression

We will use simple linear regression to show the relationship between 2 variables. In this case we will be using the mtcars dataset to predict a car’s Miles per Gallon (mpg) from its Weight (wt).

The Regression Model

Simple linear regression finds the best-fitting straight line that describes how a dependent variable \(y\) changes as an independent variable \(x\) changes.

The model assumes a linear relationship between \(x\) (e.g. weight of car) and \(y\) (e.g. miles per gallon):

\[ y = \beta_0 + \beta_1x + \epsilon \]

where:
- \(y\): response variable
- \(x\): predictor variable
- \(\beta_0\): intercept
- \(\beta_1\): slope
- \(\epsilon\): random error (captures variation not explained by \(x\))

Our Dataset

To show our simple regression model, we will be using the mtcars dataset which is in built in R.

##                    mpg    wt  hp
## Mazda RX4         21.0 2.620 110
## Mazda RX4 Wag     21.0 2.875 110
## Datsun 710        22.8 2.320  93
## Hornet 4 Drive    21.4 3.215 110
## Hornet Sportabout 18.7 3.440 175
## Valiant           18.1 3.460 105

As mentioned in the previous slides, we will be analyzing how a car’s weight (wt) affects its fuel efficiency (mpg).

First Scatter Plot

A scatter plot is created to see the relationship between the independent and dependent variables before fitting it to a model. It helps us see if there exists a linear pattern between the variables.

As you can see from the plot as a car’s weight gets heavier, its fuel efficiency generally gets worse.

R Code for the Scatter Plot

Below is the code generated for the scatterplot made in the previous slide.

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "#8C1D40", size = 3) +
  labs(
    title = "MPG vs Car Weight",
    x = "Weight (1000 lbs)",
    y = "Miles Per Gallon"
  ) +
  theme_minimal()

Plotting the Regression Model

Based on the scatter plot we have analyzed, we are able to make a regression line that summarizes the trends we found.

Estimated Regression Model

Let’s start with our equation for a general regression formula.

\[ y = \beta_0 + \beta_1x + \epsilon \] To find the slope and intercept of our regression model we call:

lm(mpg ~ wt, data = mtcars)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Coefficients:
## (Intercept)           wt  
##      37.285       -5.344

Estimated Regression Model Cont.

The lm(mpg ~ wt, data = mtcars) call fits the linear regression model, estimating how car weight affects it fuel efficieny. It then generates the slope and intercept used in the regression equation. Our equation with the given information becomes:

\[ MPG = 37.285 - 5.3441WT \]

3D Plot: MPG vs Weight and Horsepower

Now we will see how weight and horsepower together affect fuel efficiency. This is important as real-world variables like fuel efficiency are influenced by more than one factor.

The 3D plot shows that while lighter weight cars have higher fuel efficiency, they also generally have less horsepower. This shows the trade off between horsepower and fuel efficiency.