- Goal: Predict a car’s fuel efficiency (mpg) from weight (wt) and horsepower (hp).
- We’ll fit a multiple linear regression model, assess fit, and interpret results.
- Data: built-in mtcars dataset in R.
We model the response \(y\) as a linear function of predictors \(\mathbf{x}\):
\[ \mathbf{y} = \mathbf{X}oldsymbol{eta} + oldsymbol{arepsilon}, \quad oldsymbol{arepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}). \]
For two predictors \(x_1 = ext{wt}\) and \(x_2 = ext{hp}\):
\[ ext{mpg}_i = eta_0 + eta_1 \, ext{wt}_i + eta_2 \, ext{hp}_i + arepsilon_i. \]
Ordinary Least Squares (OLS) estimate:
\[ \hat{oldsymbol{eta}} = (\mathbf{X}^ op \mathbf{X})^{-1}\mathbf{X}^ op \mathbf{y} \]
Inference uses the sampling distribution of \(\hat{oldsymbol{eta}}\):
\[ \hat{oldsymbol{eta}} \sim \mathcal{N}\!\left(oldsymbol{eta}, \; \sigma^2 (\mathbf{X}^ op \mathbf{X})^{-1} ight). \]
# Packages library(tidyverse) library(broom) library(plotly) # Data data(mtcars) df <- mtcars %>% mutate(wt = wt, hp = hp, mpg = mpg) # Fit multiple linear regression fit <- lm(mpg ~ wt + hp, data = df) summary(fit) tidy(fit)
Notes: - Coefficients show mpg decreases as wt or hp increases. - We’ll visualize the fitted plane and diagnostics next.
library(ggplot2) aug <- augment(fit) ggplot(aug, aes(.fitted, mpg)) + geom_point(alpha = 0.8) + geom_abline(slope = 1, intercept = 0, linetype = "dashed") + labs(title = "Observed vs Fitted MPG", x = "Fitted values", y = "Observed mpg")
ggplot(aug, aes(.fitted, .resid)) + geom_hline(yintercept = 0, linetype = "dashed") + geom_point(alpha = 0.8) + labs(title = "Residuals vs Fitted", x = "Fitted values", y = "Residuals")
# Create grid for surface wt_seq <- seq(min(df$wt), max(df$wt), length.out = 30) hp_seq <- seq(min(df$hp), max(df$hp), length.out = 30) grid <- expand.grid(wt = wt_seq, hp = hp_seq) grid$mpg_hat <- predict(fit, newdata = grid) # Matrix for surface z_mat <- matrix(grid$mpg_hat, nrow = length(wt_seq), ncol = length(hp_seq)) # 3D interactive plot plot_ly() |> add_markers(data = df, x = ~wt, y = ~hp, z = ~mpg, name = "Cars", marker = list(size = 4)) |> add_surface(x = wt_seq, y = hp_seq, z = z_mat, opacity = 0.6, name = "Fitted plane") |> layout(scene = list(xaxis = list(title = "Weight (1000 lbs)"), yaxis = list(title = "Horsepower"), zaxis = list(title = "MPG")))
We test the significance of each coefficient with a t test:
In R:
tidy(fit, conf.int = TRUE)
newcars <- tibble( wt = c(2.5, 3.5), hp = c(110, 180) ) predict(fit, newdata = newcars, interval = "prediction")