Simple Linear Regression: Predicting Fuel Efficiency

Topic: Simple linear regression
Question: How does car weight affect miles per gallon (mpg)?

We will use the built-in mtcars dataset and fit the model:

\[\text{mpg} = \beta_0 + \beta_1(\text{wt}) + \varepsilon\]

What this presentation includes:

2 ggplot visualizations
1 interactive plotly visualization
2 slides with LaTeX math
1 slide showing R code
1 worked example prediction

The mtcars dataset contains measurements on 32 cars from the 1970s.

Variables used here:

mpg: miles per gallon
wt: car weight in 1000 pounds

Practical question:

Do heavier cars tend to get worse gas mileage?

If yes, a simple linear regression model can quantify that relationship.

Simple linear regression assumes:

\[Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\]

For this example:

\[\text{mpg}_i = \beta_0 + \beta_1 \text{wt}_i + \varepsilon_i\]

Where:

\(\beta_0\) = intercept
\(\beta_1\) = slope
\(\varepsilon_i\) = random error term

The least-squares estimates choose \(b_0\) and \(b_1\) to minimize:

\[\sum_{i=1}^{n}(y_i - \hat{y}_i)^2\]

fit <- lm(mpg ~ wt, data = mtcars)

summary(fit)

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

ggplot(mtcars, aes(wt, mpg)) +
  geom_point(size = 3, color = "steelblue") +
  geom_smooth(method = "lm", se = TRUE, color = "firebrick") +
  labs(
    title = "Linear Fit for mpg vs weight",
    x = "Weight (1000 lbs)",
    y = "Miles per gallon"
  ) +
  theme_minimal(base_size = 16)

From the fitted model:

\[\widehat{\text{mpg}} = 37.29 - 5.34(\text{wt})\]

Interpretation of the slope:

For each additional 1000 pounds of weight,
the predicted fuel efficiency changes by about -5.34 mpg.

Because the slope is negative, heavier cars are predicted to get lower gas mileage.

Worked example: for a car weighing 3,000 pounds (\(wt = 3.0\)),

\[\widehat{\text{mpg}} = 21.25\]

A good residual plot should show points scattered around 0 with no strong pattern.

To test whether weight is linearly related to mpg, we use:

\[H_0: \beta_1 = 0 \qquad \text{vs} \qquad H_a: \beta_1 \ne 0\]

The test statistic is:

\[t = \frac{b_1 - 0}{SE(b_1)}\]

From the model summary:

Estimated slope: -5.344
\(R^2\): 0.753
p-value for slope: 1.29^{-10}

Since the p-value is very small, we reject \(H_0\) and conclude that weight is a significant predictor of mpg.

Interactive plotly example

This interactive plot helps show that mpg is also related to other variables, even though our main fitted model is simple linear regression using weight only.

What we learned:

Simple linear regression models the relationship between one predictor and one response.
In mtcars, heavier cars tend to have lower mpg.
The fitted line gives both prediction and interpretation.
The slope was statistically significant, so weight is meaningfully associated with fuel efficiency.

Possible extension:

Move from simple regression to multiple regression by adding horsepower, cylinders, or displacement.

References / data source

Built-in R dataset: mtcars
Main packages used: ggplot2, plotly

To publish: Knit this file to HTML in RStudio, then upload the generated HTML to RPubs.