Simple Linear Regression (HW3)

Slide 1 — Title

Simple Linear Regression
Modeling a response \(y\) using one predictor \(x\) with a straight line.

Slide 2 — What is SLR?

We model \( y = \beta_0 + \beta_1 x + \varepsilon \).
Choose \(\beta_0,\beta_1\) to minimize the sum of squared residuals.
Example here: predict mpg from wt using mtcars.

Slide 3 — Data peek

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Slide 4 — Fit the model (code)

We fit: \( \text{mpg} = \beta_0 + \beta_1\,\text{wt} + \varepsilon \)

fit <- lm(mpg ~ wt, data = mtcars)
summary(fit)

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Slide 5 — Scatter + regression line (ggplot)

Heavier cars tend to have lower mpg (negative slope).

Slide 6 — Residuals vs Fitted (ggplot)

We want no strong pattern and roughly equal spread around 0.

Slide 7 — 3D (plotly)

Slide 8 — Math: the model

\[ y_i = \\beta_0 + \\beta_1 x_i + \\varepsilon_i, \\quad \\varepsilon_i \\sim \\mathcal{N}(0, \\sigma^2). \]

The fitted line is \(\hat{y}_i = b_0 + b_1 x_i\).

Slide 9 — Math: slope estimator & CI

Slope estimator:

\[ b_1 = \\frac{\\sum (x_i-\\bar{x})(y_i-\\bar{y})}{\\sum (x_i-\\bar{x})^2} \]

95% CI for slope:

\[ b_1 \\pm t_{\\alpha/2,\\,n-2}\\,\\mathrm{SE}(b_1). \]

Slide 10 — Interpretation

Slope: change in mpg per 1000 lbs of weight.
\(R^2\): % of mpg variability explained by weight.
p-value for slope: evidence of linear association.
Always check residuals before concluding.