- What linear regression is
- The mathematical model
- An example using real data
- Diagnostics and prediction
- Visualization with ggplot and Plotly
Linear regression is used to explain and predict a numeric outcome using one or more predictors.
Examples: - Business: sales vs advertising
- Engineering: output vs system inputs
- Biology: growth vs dosage
In this presentation: - Response variable: mpg (miles per gallon)
- Predictor: wt (car weight)
\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]
Where: - \(\beta_0\) is the intercept
- \(\beta_1\) is the slope
- \(\varepsilon_i\) is random error
\[ \min_{\beta_0,\beta_1}\sum_{i=1}^{n}(Y_i-\beta_0-\beta_1X_i)^2 \]
This gives the “best-fitting” line.
| term | estimate | conf.low | conf.high | p.value |
|---|---|---|---|---|
| (Intercept) | 37.2851 | 33.4505 | 41.1198 | 0 |
| wt | -5.3445 | -6.4863 | -4.2026 | 0 |
Interpretation: - Negative slope → heavier cars have lower mpg
- p-value tests whether weight is a significant predictor
A good model shows residuals randomly scattered around zero.
For a new value \(x_0\):
\[ \hat{Y}(x_0)=\hat{\beta}_0+\hat{\beta}_1 x_0 \]
Two intervals: - Confidence interval: mean response
- Prediction interval: new observation (wider)
# Fit the model fit1 <- lm(mpg ~ wt, data = mtcars) # Predict mpg for a car weighing 3.0 (1000 lbs) predict(fit1, newdata = data.frame(wt = 3.0), interval = "prediction")
## fit lwr upr ## 1 21.25171 14.92987 27.57355