Simple Linear Regression

Why Simple Linear Regression?

We often want to explain or predict a numeric outcome using one predictor.

Example application:
Predicting fuel efficiency (mpg) using vehicle weight (wt) from the built-in mtcars dataset.

Response variable: mpg
Predictor variable: wt (in 1000 lbs)

The Model (LaTeX)

Simple linear regression assumes:

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

where: - \(Y_i\) is the response - \(X_i\) is the predictor - \(\varepsilon_i \sim N(0,\sigma^2)\)

Parameter Interpretation (LaTeX)

\(\beta_0\): expected mpg when weight is 0
\(\beta_1\): expected change in mpg for a 1-unit increase in weight

Estimated regression line:

\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X \]

ggplot #1: Scatterplot with Regression Line

ggplot #2: Residual Diagnostics

Hypothesis Test and p-value (LaTeX)

We test:

\[ H_0: \beta_1 = 0 \quad \text{vs} \quad H_A: \beta_1 \neq 0 \]

Test statistic:

\[ t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}, \quad df = n - 2 \]

The p-value measures evidence against \(H_0\).

Numerical Results

Quantity	Value
Intercept	37.285
Slope (wt)	-5.344
R-squared	0.753
p-value	1.29e-10

Confidence vs Prediction Intervals

Confidence interval: uncertainty about the mean mpg
Prediction interval: uncertainty for a new car (wider)

wt	CI_low	CI_mean	CI_high	PI_low	PI_pred	PI_high
2.5	22.55	23.92	25.30	17.55	23.92	30.29
3.0	20.12	21.25	22.38	14.93	21.25	27.57
3.5	17.43	18.58	19.73	12.25	18.58	24.90

Plotly 3D Visualization (Interactive)

R Code Example (Displayed as code)

fit <- lm(mpg ~ wt, data = mtcars)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "mpg vs wt with linear fit")

Summary

Simple linear regression models a relationship between two variables
ggplot helps visualize the relationship and diagnostics
p-values assess statistical significance
Prediction intervals are wider than confidence intervals
plotly enables interactive 3D exploration