Why Simple Linear Regression?

We often want to explain or predict a numeric outcome using one predictor.

Example application:
Predicting fuel efficiency (mpg) using vehicle weight (wt) from the built-in mtcars dataset.

  • Response variable: mpg
  • Predictor variable: wt (in 1000 lbs)

The Model (LaTeX)

Simple linear regression assumes:

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

where: - \(Y_i\) is the response - \(X_i\) is the predictor - \(\varepsilon_i \sim N(0,\sigma^2)\)

Parameter Interpretation (LaTeX)

  • \(\beta_0\): expected mpg when weight is 0
  • \(\beta_1\): expected change in mpg for a 1-unit increase in weight

Estimated regression line:

\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X \]

ggplot #1: Scatterplot with Regression Line

ggplot #2: Residual Diagnostics

Hypothesis Test and p-value (LaTeX)

We test:

\[ H_0: \beta_1 = 0 \quad \text{vs} \quad H_A: \beta_1 \neq 0 \]

Test statistic:

\[ t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}, \quad df = n - 2 \]

The p-value measures evidence against \(H_0\).

Numerical Results

Quantity Value
Intercept 37.285
Slope (wt) -5.344
R-squared 0.753
p-value 1.29e-10

Confidence vs Prediction Intervals

  • Confidence interval: uncertainty about the mean mpg
  • Prediction interval: uncertainty for a new car (wider)
wt CI_low CI_mean CI_high PI_low PI_pred PI_high
2.5 22.55 23.92 25.30 17.55 23.92 30.29
3.0 20.12 21.25 22.38 14.93 21.25 27.57
3.5 17.43 18.58 19.73 12.25 18.58 24.90

Plotly 3D Visualization (Interactive)

R Code Example (Displayed as code)

fit <- lm(mpg ~ wt, data = mtcars)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "mpg vs wt with linear fit")

Summary

  • Simple linear regression models a relationship between two variables
  • ggplot helps visualize the relationship and diagnostics
  • p-values assess statistical significance
  • Prediction intervals are wider than confidence intervals
  • plotly enables interactive 3D exploration