What is Simple Linear Regression?

Simple linear regression is a statistical method used to describe and predict the relationship between two quantitative variables.

One variable is called the explanatory variable and is used to predict the response variable.

In this presentation, vehicle weight will be used to predict fuel efficiency.

Why Is It Useful?

Simple linear regression helps answer questions such as:

  • How strongly are two variables related?
  • Can one variable predict another?
  • Is the relationship positive or negative?
  • How much does the response variable change when the explanatory variable changes?

Research Question

Can a vehicle’s weight be used to predict its fuel efficiency?

For this analysis, we will use the built-in R dataset:

mtcars

Variables used:

  • mpg = Miles Per Gallon
  • wt = Vehicle Weight

Regression Formula

The general form of a simple linear regression model is:

\[y = \beta_0 + \beta_1x + \epsilon\]

Where:

  • \(y\) = response variable
  • \(x\) = explanatory variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\epsilon\) = error term

Model Used For This Analysis

For this dataset, the model becomes:

\[\hat{mpg} = \beta_0 + \beta_1(wt)\]

This equation predicts fuel efficiency using vehicle weight.

The slope tells us how much MPG changes as weight changes.

Sample Data

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Summary Statistics

##       mpg              wt       
##  Min.   :10.40   Min.   :1.513  
##  1st Qu.:15.43   1st Qu.:2.581  
##  Median :19.20   Median :3.325  
##  Mean   :20.09   Mean   :3.217  
##  3rd Qu.:22.80   3rd Qu.:3.610  
##  Max.   :33.90   Max.   :5.424

Scatterplot of Weight and MPG

Regression Line

Distribution of Miles Per Gallon

Linear Regression Results

## (Intercept)          wt 
##       37.29       -5.34

Regression equation:

\[\hat{mpg}=37.29-5.34(wt)\]

Interpretation of Results

The regression output shows a negative relationship between vehicle weight and fuel efficiency.

As vehicle weight increases, fuel efficiency decreases.

The slope is about -5.34, meaning MPG decreases by about 5.34 for each additional 1000 lbs of vehicle weight.

Advantages of Linear Regression

  • Easy to understand
  • Easy to visualize
  • Useful for prediction
  • Commonly used in data science
  • Helps identify trends in data

Conclusion

Simple linear regression is one of the most widely used statistical techniques.

Using the mtcars dataset, we found a clear negative relationship between vehicle weight and fuel efficiency.

Heavier vehicles generally achieve lower miles per gallon.

References

McKinney, W. (2022). Python for Data Analysis.

Wickham, H., & Grolemund, G. (2017). R for Data Science.