What this presentation covers

  • What linear regression is
  • The mathematical model
  • An example using real data
  • Diagnostics and prediction
  • Visualization with ggplot and Plotly

Why linear regression?

Linear regression is used to explain and predict a numeric outcome using one or more predictors.

Examples: - Business: sales vs advertising
- Engineering: output vs system inputs
- Biology: growth vs dosage

In this presentation: - Response variable: mpg (miles per gallon)
- Predictor: wt (car weight)

The linear regression model

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

Where: - \(\beta_0\) is the intercept
- \(\beta_1\) is the slope
- \(\varepsilon_i\) is random error

Least squares estimation (Math)

\[ \min_{\beta_0,\beta_1}\sum_{i=1}^{n}(Y_i-\beta_0-\beta_1X_i)^2 \]

This gives the “best-fitting” line.

mpg vs weight (ggplot)

Model fit and coefficient table

Linear regression results: mpg ~ wt
term estimate conf.low conf.high p.value
(Intercept) 37.2851 33.4505 41.1198 0
wt -5.3445 -6.4863 -4.2026 0

Interpretation: - Negative slope → heavier cars have lower mpg
- p-value tests whether weight is a significant predictor

Diagnostics: residuals vs fitted (ggplot)

A good model shows residuals randomly scattered around zero.

Prediction and uncertainty

For a new value \(x_0\):

\[ \hat{Y}(x_0)=\hat{\beta}_0+\hat{\beta}_1 x_0 \]

Two intervals: - Confidence interval: mean response
- Prediction interval: new observation (wider)

Code

# Fit the model
fit1 <- lm(mpg ~ wt, data = mtcars)

# Predict mpg for a car weighing 3.0 (1000 lbs)
predict(fit1, newdata = data.frame(wt = 3.0), interval = "prediction")
##        fit      lwr      upr
## 1 21.25171 14.92987 27.57355

3D Plotly visualization

Key takeaways

  • Linear regression models relationships between variables
  • Least squares finds the best-fitting line or plane
  • ggplot and Plotly improve interpretation and communication