2026-06-05

Simple Linear Regression: Vehicle Weight vs Fuel Efficiency

Introduction

We study the relationship between:

  • Car weight (wt)
  • Fuel efficiency (mpg)

We want to see if heavier cars use more gas.

Dataset: mtcars

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Linear Regression Model

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Where: - y = mpg - x = weight (wt)

Estimating the Slope

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sum (x_i - \bar{x})^2} \] ## Scatterplot: Weight vs MPG

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "blue")

  labs(title = "Scatterplot: MPG vs Weight",
       x = "Weight",
       y = "Miles Per Gallon")
## <ggplot2::labels> List of 3
##  $ x    : chr "Weight"
##  $ y    : chr "Miles Per Gallon"
##  $ title: chr "Scatterplot: MPG vs Weight"

Regression Line

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Linear Regression Fit",
    x = "Weight",
    y = "Miles Per Gallon")
## `geom_smooth()` using formula = 'y ~ x'

Interactive Plot

library(plotly)

plot_ly(
  mtcars,
  x = ~wt,
  y = ~mpg,
  type = "scatter",
  mode = "markers",
  width = 800,
  height = 400
)

Linear Model

model <- lm(mpg ~ wt, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Conclusion

  • Heavier cars tend to have lower mpg
  • There is a clear negative relationship
  • Linear regression helps quantify this relationship