2025-06-05

Introduction

Linear regression is a statistical technique used to model and analyze the relationship between two continuous variables. It helps us understand how a change in one variable (the predictor) affects another (the outcome).

In this project, we use the built-in mtcars dataset in R. This dataset contains specifications and performance metrics of 32 car models, including their weight (wt) and fuel efficiency (mpg - miles per gallon).We aim to use simple linear regression to determine how a car’s weight affects its MPG.

In other words, can we predict a car’s fuel efficiency based on how heavy it is?

Concept

We’re trying to fit a line: MPG = intercept + slope × Weight + error This helps us understand how car weight affects fuel efficiency.

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

\(\beta_0\): intercept
\(\beta_1\): slope
\(\epsilon\): error

Example

We’ll use the built-in mtcars dataset in R. We’ll predict MPG (fuel efficiency) based on WT (car weight in 1000 lbs).

MPG vs Weight

Each point is a car. The red line is the best fit line from the regression. As weight increases, MPG decreases.

Residual Plot

This shows the difference between actual MPG and predicted MPG. If points are randomly scattered around zero, it means our model is a good fit.

3D View: MPG vs Weight vs HP

This adds Horsepower as a third variable. We’re exploring whether there’s more going on beyond just weight affecting MPG.

Code for 3D Plot

library(plotly)

plot_ly(mtcars, 
        x = ~wt, 
        y = ~mpg, 
        z = ~hp, 
        type = 'scatter3d', 
        mode = 'markers')

Math

These formulas calculate the best slope and intercept to minimize prediction error (Least Squares Method).

\[ \beta_1 = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sum{(x_i - \bar{x})^2}}, \quad \beta_0 = \bar{y} - \beta_1 \bar{x} \]

Model Summary

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Conclusion

  • As car weight increases, fuel efficiency drops.

  • Linear regression is a powerful way to model simple relationships.

  • Future work could include multiple predictors like HP, cylinders, etc.