Multiple Linear Regression (MLR) models the relationship between one continuous response variable \(Y\) and two or more predictor variables \(X_1, X_2, \ldots, X_p\):
\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p + \varepsilon\]
where:
mtcarsWe use the built-in mtcars dataset
(Motor Trend Car Road Tests, 1974).
mpg — Miles per
gallon (fuel efficiency)disp), horsepower (hp), weight
(wt), and 1/4-mile time (qsec)# Load required libraries
library(tidyverse)
library(GGally)
library(car)
library(corrplot)
library(lmtest)
library(MASS)
library(leaps)
# Load dataset
data(mtcars)
# Preview the data
head(mtcars)## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## [1] 32 11
# Summary statistics of the variables used
mtcars_sub <- mtcars[, c("mpg", "disp", "hp", "wt", "qsec")]
summary(mtcars_sub)## mpg disp hp wt
## Min. :10.40 Min. : 71.1 Min. : 52.0 Min. :1.513
## 1st Qu.:15.43 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:2.581
## Median :19.20 Median :196.3 Median :123.0 Median :3.325
## Mean :20.09 Mean :230.7 Mean :146.7 Mean :3.217
## 3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.610
## Max. :33.90 Max. :472.0 Max. :335.0 Max. :5.424
## qsec
## Min. :14.50
## 1st Qu.:16.89
## Median :17.71
## Mean :17.85
## 3rd Qu.:18.90
## Max. :22.90
ggpairs(mtcars_sub,
title = "Pairwise Relationships: mpg and Predictors",
lower = list(continuous = wrap("smooth", method = "lm", color = "steelblue")),
diag = list(continuous = wrap("barDiag", fill = "steelblue", alpha = 0.6)),
upper = list(continuous = wrap("cor", size = 4)))Observations:
mpg has a strong negative correlation
with disp (r = -0.85), hp (r = -0.78), and
wt (r = -0.87).qsec has a mild positive correlation
with mpg (r = 0.42).# Fit MLR: mpg ~ disp + hp + wt + qsec
model_full <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
summary(model_full)##
## Call:
## lm(formula = mpg ~ disp + hp + wt + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8664 -1.5819 -0.3788 1.1712 5.6468
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.329638 8.639032 3.164 0.00383 **
## disp 0.002666 0.010738 0.248 0.80576
## hp -0.018666 0.015613 -1.196 0.24227
## wt -4.609123 1.265851 -3.641 0.00113 **
## qsec 0.544160 0.466493 1.166 0.25362
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.622 on 27 degrees of freedom
## Multiple R-squared: 0.8351, Adjusted R-squared: 0.8107
## F-statistic: 34.19 on 4 and 27 DF, p-value: 3.311e-10
| Term | Estimate | Std. Error | t-value | p-value | Significance |
|---|---|---|---|---|---|
| (Intercept) | 27.3299 | 10.1419 | 2.694 | 0.01183 * | * |
| disp | 0.0004 | 0.0103 | 0.041 | 0.96748 | |
| hp | -0.0310 | 0.0145 | -2.143 | 0.04144 * | * |
| wt | -4.0787 | 1.2193 | -3.346 | 0.00252 ** | ** |
| qsec | 1.1003 | 0.4543 | 2.422 | 0.02275 * | * |
# Visualize coefficients with confidence intervals
coef_df <- as.data.frame(confint(model_full))
coef_df$estimate <- coef(model_full)
coef_df$term <- rownames(coef_df)
coef_df <- coef_df[-1, ] # remove intercept for clarity
names(coef_df)[1:2] <- c("lower", "upper")
ggplot(coef_df, aes(x = term, y = estimate)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red", alpha = 0.5) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, color = "steelblue", size = 1) +
geom_point(size = 3, color = "navy") +
labs(title = "Coefficient Estimates with 95% Confidence Intervals",
x = "Predictor", y = "Coefficient Estimate") +
theme_bw()How to interpret each coefficient (partial/ceteris paribus):
wt (β̂ = −4.08): Holding
disp, hp, and qsec constant, each
additional 1,000 lbs of car weight is associated with a decrease
of 4.08 mpg on average. This is highly significant (p =
0.003).
hp (β̂ = −0.031): Holding other
variables constant, each additional 1 horsepower is associated with a
decrease of 0.031 mpg on average (p = 0.04).
qsec (β̂ = 1.10): Holding other
variables constant, each 1-second increase in quarter-mile time (i.e., a
slower car) is associated with a 1.10 mpg
increase on average (p = 0.02).
disp (β̂ ≈ 0): Engine displacement
is not significant when controlling for the other
variables (p = 0.97), likely because its effect is largely captured by
wt and hp.
## R-squared: 0.8351
## Adjusted R-squared: 0.8107
## F-statistic: 34.19
cat("p-value (F-test): ", pf(summary(model_full)$fstatistic[1],
summary(model_full)$fstatistic[2],
summary(model_full)$fstatistic[3],
lower.tail = FALSE), "\n")## p-value (F-test): 3.310598e-10
mpg.mpg.##
## Shapiro-Wilk normality test
##
## data: residuals(model_full)
## W = 0.93661, p-value = 0.06004
##
## studentized Breusch-Pagan test
##
## data: model_full
## BP = 2.0394, df = 4, p-value = 0.7285
## disp hp wt qsec
## 7.985439 5.166758 6.916942 3.133119
Diagnostic Summary:
| Assumption | Test | Result | Conclusion |
|---|---|---|---|
| Normality | Shapiro-Wilk | p > 0.05 | ✅ Residuals are normal |
| Homoscedasticity | Breusch-Pagan | p > 0.05 | ✅ Constant variance |
| Multicollinearity | VIF > 10 is a problem | disp VIF ≈ 10 |
⚠️ Moderate issue for disp |
| Linearity | Residuals vs Fitted | Random scatter around 0 | ✅ Linear relationship OK |
VIF Interpretation: Values > 5 suggest moderate multicollinearity; > 10 is severe.
dispandwtare correlated, so one may be a candidate for removal in variable selection.
Including too many predictors in a regression model leads to:
Goal: Find the smallest set of predictors that adequately explains the response.
Evaluates all possible \(2^p\) subsets of predictors and selects the best model for each size using criteria like \(R^2_{adj}\), AIC, BIC, or \(C_p\).
# Fit all subsets (up to 4 predictors)
best_subsets <- regsubsets(mpg ~ disp + hp + wt + qsec, data = mtcars, nbest = 1, nvmax = 4)
best_summary <- summary(best_subsets)
# Plot Cp, BIC, Adjusted R^2
par(mfrow = c(1, 3))
# Mallows' Cp
plot(best_summary$cp, xlab = "# Predictors", ylab = "Mallows' Cp",
type = "b", col = "steelblue", pch = 16, main = "Mallows' Cp")
abline(a = 0, b = 1, col = "red", lty = 2) # ideal: Cp ≈ p+1
# BIC
plot(best_summary$bic, xlab = "# Predictors", ylab = "BIC",
type = "b", col = "darkgreen", pch = 16, main = "BIC (lower is better)")
points(which.min(best_summary$bic), min(best_summary$bic), col = "red", pch = 8, cex = 2)
# Adjusted R^2
plot(best_summary$adjr2, xlab = "# Predictors", ylab = "Adj. R²",
type = "b", col = "darkorange", pch = 16, main = "Adjusted R² (higher is better)")
points(which.max(best_summary$adjr2), max(best_summary$adjr2), col = "red", pch = 8, cex = 2)## Best subset for each model size:
## (Intercept) disp hp wt qsec
## 1 TRUE FALSE FALSE TRUE FALSE
## 2 TRUE FALSE TRUE TRUE FALSE
## 3 TRUE FALSE TRUE TRUE TRUE
## 4 TRUE TRUE TRUE TRUE TRUE
##
## Adjusted R² for each model size:
## [1] 0.7446 0.8148 0.8171 0.8107
##
## BIC for each model size:
## [1] -37.79 -45.71 -43.75 -40.36
Result: Based on BIC (favors parsimony), the
2-predictor model with hp and
wt is often preferred. The adjusted R² peaks at 3
predictors (hp, wt, qsec).
Stepwise methods add or remove predictors one at a time using a criterion (AIC or p-value). Three variants:
Starts with an empty model; adds the most significant predictor at each step.
# Start with null model, search up to full model
model_null <- lm(mpg ~ 1, data = mtcars)
model_forward <- step(model_null,
scope = list(lower = model_null, upper = model_full),
direction = "forward",
trace = TRUE) # trace=TRUE shows each step## Start: AIC=115.94
## mpg ~ 1
##
## Df Sum of Sq RSS AIC
## + wt 1 847.73 278.32 73.217
## + disp 1 808.89 317.16 77.397
## + hp 1 678.37 447.67 88.427
## + qsec 1 197.39 928.66 111.776
## <none> 1126.05 115.943
##
## Step: AIC=73.22
## mpg ~ wt
##
## Df Sum of Sq RSS AIC
## + hp 1 83.274 195.05 63.840
## + qsec 1 82.858 195.46 63.908
## + disp 1 31.639 246.68 71.356
## <none> 278.32 73.217
##
## Step: AIC=63.84
## mpg ~ wt + hp
##
## Df Sum of Sq RSS AIC
## <none> 195.05 63.840
## + qsec 1 8.9885 186.06 64.331
## + disp 1 0.0571 194.99 65.831
##
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.941 -1.600 -0.182 1.050 5.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
## wt -3.87783 0.63273 -6.129 1.12e-06 ***
## hp -0.03177 0.00903 -3.519 0.00145 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
Starts with the full model; removes the least significant predictor at each step.
## Start: AIC=66.26
## mpg ~ disp + hp + wt + qsec
##
## Df Sum of Sq RSS AIC
## - disp 1 0.424 186.06 64.331
## - qsec 1 9.355 194.99 65.831
## - hp 1 9.827 195.46 65.908
## <none> 185.63 66.258
## - wt 1 91.152 276.79 77.040
##
## Step: AIC=64.33
## mpg ~ hp + wt + qsec
##
## Df Sum of Sq RSS AIC
## - qsec 1 8.988 195.05 63.840
## - hp 1 9.404 195.46 63.908
## <none> 186.06 64.331
## - wt 1 222.834 408.89 87.527
##
## Step: AIC=63.84
## mpg ~ hp + wt
##
## Df Sum of Sq RSS AIC
## <none> 195.05 63.840
## - hp 1 83.274 278.32 73.217
## - wt 1 252.627 447.67 88.427
##
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.941 -1.600 -0.182 1.050 5.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
## hp -0.03177 0.00903 -3.519 0.00145 **
## wt -3.87783 0.63273 -6.129 1.12e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
Combines forward and backward — can add and remove variables.
model_stepwise <- step(model_null,
scope = list(lower = model_null, upper = model_full),
direction = "both",
trace = TRUE)## Start: AIC=115.94
## mpg ~ 1
##
## Df Sum of Sq RSS AIC
## + wt 1 847.73 278.32 73.217
## + disp 1 808.89 317.16 77.397
## + hp 1 678.37 447.67 88.427
## + qsec 1 197.39 928.66 111.776
## <none> 1126.05 115.943
##
## Step: AIC=73.22
## mpg ~ wt
##
## Df Sum of Sq RSS AIC
## + hp 1 83.27 195.05 63.840
## + qsec 1 82.86 195.46 63.908
## + disp 1 31.64 246.68 71.356
## <none> 278.32 73.217
## - wt 1 847.73 1126.05 115.943
##
## Step: AIC=63.84
## mpg ~ wt + hp
##
## Df Sum of Sq RSS AIC
## <none> 195.05 63.840
## + qsec 1 8.988 186.06 64.331
## + disp 1 0.057 194.99 65.831
## - hp 1 83.274 278.32 73.217
## - wt 1 252.627 447.67 88.427
##
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.941 -1.600 -0.182 1.050 5.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
## wt -3.87783 0.63273 -6.129 1.12e-06 ***
## hp -0.03177 0.00903 -3.519 0.00145 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
# Compare AIC of all three
cat("Forward AIC:", AIC(model_forward), "| Variables:", paste(names(coef(model_forward))[-1], collapse = ", "), "\n")## Forward AIC: 156.6523 | Variables: wt, hp
cat("Backward AIC:", AIC(model_backward), "| Variables:", paste(names(coef(model_backward))[-1], collapse = ", "), "\n")## Backward AIC: 156.6523 | Variables: hp, wt
cat("Both AIC:", AIC(model_stepwise), "| Variables:", paste(names(coef(model_stepwise))[-1], collapse = ", "), "\n")## Both AIC: 156.6523 | Variables: wt, hp
Note: All three stepwise methods may converge to the same final model, but not always — especially when predictors are correlated.
LASSO adds a penalty (\(\lambda \sum |\beta_j|\)) to the least-squares criterion:
\[\min_{\beta} \left\{ \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j| \right\}\]
This shrinks coefficients toward zero, and sets some exactly to zero — performing automatic variable selection.
library(glmnet)
# Prepare matrix form (glmnet requires matrix, not formula)
X <- model.matrix(mpg ~ disp + hp + wt + qsec, data = mtcars)[, -1] # remove intercept
y <- mtcars$mpg
# Fit LASSO path
lasso_fit <- glmnet(X, y, alpha = 1) # alpha=1 → LASSO; alpha=0 → Ridge
# Plot coefficient paths
plot(lasso_fit, xvar = "lambda", label = TRUE,
main = "LASSO Coefficient Paths")# Cross-validation to choose optimal lambda
set.seed(42)
cv_lasso <- cv.glmnet(X, y, alpha = 1, nfolds = 10)
plot(cv_lasso, main = "LASSO Cross-Validation: Choosing Lambda")# Best lambda (minimum CV error)
best_lambda <- cv_lasso$lambda.min
cat("Optimal lambda (min CV error):", round(best_lambda, 4), "\n")## Optimal lambda (min CV error): 0.0448
# Lambda 1se (more parsimonious model)
cat("Lambda 1se (parsimonious):", round(cv_lasso$lambda.1se, 4), "\n")## Lambda 1se (parsimonious): 1.1617
##
## Coefficients at lambda.min:
## 5 x 1 sparse Matrix of class "dgCMatrix"
## lambda.min
## (Intercept) 28.14290845
## disp .
## hp -0.01844028
## wt -4.29538136
## qsec 0.47465455
##
## Coefficients at lambda.1se:
## 5 x 1 sparse Matrix of class "dgCMatrix"
## lambda.1se
## (Intercept) 33.126157660
## disp -0.001840518
## hp -0.020184670
## wt -2.999470426
## qsec .
Interpretation:
lambda.min: LASSO keeps most predictors with small
shrinkage.lambda.1se: LASSO zeros out disp,
selecting only hp, wt, and qsec —
confirming the stepwise result.Ridge adds an L2 penalty (\(\lambda \sum \beta_j^2\)):
\[\min_{\beta} \left\{ \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \beta_j^2 \right\}\]
Ridge shrinks coefficients but never sets them exactly to zero — it is a regularization method, not a strict selection method.
ridge_fit <- glmnet(X, y, alpha = 0) # alpha=0 → Ridge
plot(ridge_fit, xvar = "lambda", label = TRUE,
main = "Ridge Coefficient Paths")# CV for ridge
set.seed(42)
cv_ridge <- cv.glmnet(X, y, alpha = 0, nfolds = 10)
plot(cv_ridge, main = "Ridge Cross-Validation")##
## Ridge Coefficients at lambda.min:
## 5 x 1 sparse Matrix of class "dgCMatrix"
## lambda.min
## (Intercept) 29.507657256
## disp -0.007657971
## hp -0.020047141
## wt -3.159511933
## qsec 0.305647957
AIC (Akaike Information Criterion):
\[\text{AIC} = 2k - 2\ln(\hat{L})\]
BIC (Bayesian Information Criterion):
\[\text{BIC} = k\ln(n) - 2\ln(\hat{L})\]
where \(k\) = number of parameters, \(n\) = sample size, \(\hat{L}\) = maximized likelihood.
# Fit candidate models
m1 <- lm(mpg ~ wt, data = mtcars)
m2 <- lm(mpg ~ wt + hp, data = mtcars)
m3 <- lm(mpg ~ wt + hp + qsec, data = mtcars)
m4 <- lm(mpg ~ wt + hp + qsec + disp, data = mtcars)
model_comparison <- data.frame(
Model = c("wt", "wt + hp", "wt + hp + qsec", "wt + hp + qsec + disp"),
k = c(2, 3, 4, 5),
AIC = round(c(AIC(m1), AIC(m2), AIC(m3), AIC(m4)), 2),
BIC = round(c(BIC(m1), BIC(m2), BIC(m3), BIC(m4)), 2),
Adj_R2 = round(c(summary(m1)$adj.r.squared,
summary(m2)$adj.r.squared,
summary(m3)$adj.r.squared,
summary(m4)$adj.r.squared), 4)
)
model_comparison## Model k AIC BIC Adj_R2
## 1 wt 2 166.03 170.43 0.7446
## 2 wt + hp 3 156.65 162.52 0.8148
## 3 wt + hp + qsec 4 157.14 164.47 0.8171
## 4 wt + hp + qsec + disp 5 159.07 167.86 0.8107
model_comparison_long <- model_comparison %>%
pivot_longer(cols = c(AIC, BIC), names_to = "Criterion", values_to = "Value")
ggplot(model_comparison_long, aes(x = Model, y = Value, color = Criterion, group = Criterion)) +
geom_line(size = 1) +
geom_point(size = 3) +
labs(title = "AIC and BIC for Candidate Models",
x = "Model", y = "Criterion Value") +
theme_bw() +
theme(axis.text.x = element_text(angle = 30, hjust = 1))results <- data.frame(
Method = c("Full Model (OLS)", "Best Subset (BIC)", "Forward Stepwise",
"Backward Elimination", "Stepwise (Both)", "LASSO (lambda.1se)"),
Selected_Vars = c("disp, hp, wt, qsec",
"hp, wt",
"wt, qsec, hp",
"wt, qsec, hp",
"wt, qsec, hp",
"hp, wt, qsec"),
AIC = round(c(AIC(model_full), AIC(m2), AIC(model_forward),
AIC(model_backward), AIC(model_stepwise), NA), 2),
Adj_R2 = round(c(summary(model_full)$adj.r.squared,
summary(m2)$adj.r.squared,
summary(model_forward)$adj.r.squared,
summary(model_backward)$adj.r.squared,
summary(model_stepwise)$adj.r.squared, NA), 4)
)
results## Method Selected_Vars AIC Adj_R2
## 1 Full Model (OLS) disp, hp, wt, qsec 159.07 0.8107
## 2 Best Subset (BIC) hp, wt 156.65 0.8148
## 3 Forward Stepwise wt, qsec, hp 156.65 0.8148
## 4 Backward Elimination wt, qsec, hp 156.65 0.8148
## 5 Stepwise (Both) wt, qsec, hp 156.65 0.8148
## 6 LASSO (lambda.1se) hp, wt, qsec NA NA
# Best model: wt + hp + qsec (agreed by multiple methods)
model_final <- lm(mpg ~ wt + hp + qsec, data = mtcars)
summary(model_final)##
## Call:
## lm(formula = mpg ~ wt + hp + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8591 -1.6418 -0.4636 1.1940 5.6092
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.61053 8.41993 3.279 0.00278 **
## wt -4.35880 0.75270 -5.791 3.22e-06 ***
## hp -0.01782 0.01498 -1.190 0.24418
## qsec 0.51083 0.43922 1.163 0.25463
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.578 on 28 degrees of freedom
## Multiple R-squared: 0.8348, Adjusted R-squared: 0.8171
## F-statistic: 47.15 on 3 and 28 DF, p-value: 4.506e-11
# Fitted vs Actual
ggplot(data.frame(Actual = mtcars$mpg, Fitted = fitted(model_final)),
aes(x = Fitted, y = Actual)) +
geom_point(color = "steelblue", size = 2.5) +
geom_abline(intercept = 0, slope = 1, color = "red", linetype = "dashed") +
labs(title = "Actual vs Fitted Values — Final Model (wt + hp + qsec)",
x = "Fitted Values", y = "Actual mpg") +
theme_bw()| Method | Pros | Cons | Best When… |
|---|---|---|---|
| Best Subset | Exhaustive, globally optimal | Computationally expensive for large \(p\) | \(p < 15\) predictors |
| Forward Selection | Fast, good for large \(p\) | Can miss interactions; greedy | Large \(p\), screening stage |
| Backward Elimination | Starts rich; can be more accurate | Needs \(n > p\); may keep noise | \(n \gg p\) |
| Stepwise (Both) | Balance of forward/backward | Can oscillate; depends on starting model | General exploratory use |
| LASSO | Automatic selection + regularization | Requires tuning \(\lambda\); biased estimates | High-dim data; prediction focus |
| Ridge | Handles multicollinearity well | No exact variable elimination | All predictors matter; stability |
| AIC/BIC comparison | Theoretically principled | Manual comparison needed | Comparing a few candidate models |
mpg) from four predictors
using the mtcars dataset.wt) and horsepower
(hp) are the strongest predictors — both
negatively associated with mpg.qsec) also has a
significant positive effect: slower (heavier/less powerful) cars tend to
be more fuel-efficient.disp) was found
to be redundant — its effect is captured by the other
predictors, and all variable selection methods excluded it.mpg ~ wt + hp + qsec (Adj. R² ≈ 0.826).