Understanding Simple Linear Regression

Why does simple linear regression matter?

Simple linear regression helps us understand how one numerical variable changes as another changes.

In this presentation, we study the relationship between:

vehicle weight (wt)
fuel efficiency in miles per gallon (mpg)

Using the mtcars dataset, we will see how regression helps describe patterns, make predictions, and test whether a relationship is statistically meaningful.

The main question

Our guiding question is:

Does a car’s weight help explain its fuel efficiency?

This is a useful regression example because both variables are quantitative, and the relationship appears approximately linear.

If heavier cars tend to have lower gas mileage, a regression model should help quantify that pattern.

The regression model

Simple linear regression models the relationship between a response variable and one explanatory variable as

\[ Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \]

where:

\(Y_i\) is the response
\(x_i\) is the explanatory variable
\(\beta_0\) is the intercept
\(\beta_1\) is the slope
\(\varepsilon_i\) is the random error term

Interpreting the model

In our example:

\(Y_i = mpg\)
\(x_i = wt\)

A common assumption is

\[ \varepsilon_i \sim N(0,\sigma^2) \]

How are the coefficients estimated?

The least-squares method chooses the line that minimizes squared prediction errors.

The slope estimate is

\[ \hat{\beta}_1 = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2} \]

Predicted values are computed as

\[ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i \]

To test whether weight is useful, we test

\[ H_0:\beta_1 = 0 \qquad \text{vs.} \qquad H_a:\beta_1 \ne 0 \]

Fitting the model to the data

We fit the model

\[ mpg = \beta_0 + \beta_1(wt) + \varepsilon \]

using the mtcars dataset.

Important results from the fitted model:

Estimated intercept: 37.29
Estimated slope: -5.34
\(R^2\): 0.753
p-value for the slope: 1.29^{-10}

Interpretation:

For every additional 1000 pounds of vehicle weight, the model predicts that fuel efficiency decreases by about 5.34 miles per gallon on average.

Visualizing the relationship

This scatterplot shows a clear negative trend.

As vehicle weight increases, fuel efficiency tends to decline.

The R code behind the analysis

library(ggplot2)
library(plotly)

model <- lm(mpg ~ wt, data = mtcars)
summary(model)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    title = "Fuel Efficiency vs. Car Weight",
    x = "Weight (1000 lbs)",
    y = "Miles per Gallon"
  ) +
  theme_minimal()

Checking model fit with residuals

Residuals are scattered around zero, which suggests the linear model is a reasonable first step.

Interactive view of the data

Hovering over the points allows us to inspect individual observations more closely.

What did we learn?

This analysis shows that simple linear regression can:

describe the direction and strength of a relationship
provide predictions
test whether a variable is statistically significant

In the mtcars data, weight is a strong negative predictor of fuel efficiency.

Final takeaway

Simple linear regression gives a useful first look at how two quantitative variables are related.

In this case, the conclusion is clear:

Heavier cars tend to get lower gas mileage, and the regression model captures that pattern effectively.

A natural next step would be multiple regression using additional variables such as horsepower or cylinders.