1 Part 1: Multiple Linear Regression on a Real Dataset

1.1 Overview

Multiple Linear Regression (MLR) models the relationship between one continuous response variable \(Y\) and two or more predictor variables \(X_1, X_2, \ldots, X_p\):

1.2 Dataset: mtcars

We use the built-in mtcars dataset (Motor Trend Car Road Tests, 1974).

  • Response variable: mpg — Miles per gallon (fuel efficiency)
  • Predictors: engine displacement (disp), horsepower (hp), weight (wt), and 1/4-mile time (qsec)
# Load required libraries
library(tidyverse)
library(GGally)
library(car)
library(corrplot)
library(lmtest)
library(MASS)
library(leaps)

# Load dataset
data(mtcars)

# Preview the data
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
dim(mtcars)
## [1] 32 11
# Summary statistics of the variables used
mtcars_sub <- mtcars[, c("mpg", "disp", "hp", "wt", "qsec")]
summary(mtcars_sub)
##       mpg             disp             hp              wt       
##  Min.   :10.40   Min.   : 71.1   Min.   : 52.0   Min.   :1.513  
##  1st Qu.:15.43   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:2.581  
##  Median :19.20   Median :196.3   Median :123.0   Median :3.325  
##  Mean   :20.09   Mean   :230.7   Mean   :146.7   Mean   :3.217  
##  3rd Qu.:22.80   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.610  
##  Max.   :33.90   Max.   :472.0   Max.   :335.0   Max.   :5.424  
##       qsec      
##  Min.   :14.50  
##  1st Qu.:16.89  
##  Median :17.71  
##  Mean   :17.85  
##  3rd Qu.:18.90  
##  Max.   :22.90

1.3 Exploratory Data Analysis (EDA)

1.3.1 Pairwise Scatter Plot

ggpairs(mtcars_sub,
        title = "Pairwise Relationships: mpg and Predictors",
        lower = list(continuous = wrap("smooth", method = "lm", color = "steelblue")),
        diag  = list(continuous = wrap("barDiag", fill = "steelblue", alpha = 0.6)),
        upper = list(continuous = wrap("cor", size = 4)))

Observations:

  • mpg has a strong negative correlation with disp (r = -0.85), hp (r = -0.78), and wt (r = -0.87).
  • qsec has a mild positive correlation with mpg (r = 0.42).
  • Several predictors are also highly correlated with each other — potential multicollinearity concern.

1.3.2 Correlation Matrix

cor_matrix <- cor(mtcars_sub)
corrplot(cor_matrix, method = "color", type = "upper",
         addCoef.col = "blue", number.cex = 0.8,
         tl.col = "blue", tl.srt = 45,
         title = "Correlation Matrix", mar = c(0,0,1,0))

1.4 Fitting the Multiple Linear Regression Model

# Fit MLR: mpg ~ disp + hp + wt + qsec
model_full <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
summary(model_full)
## 
## Call:
## lm(formula = mpg ~ disp + hp + wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8664 -1.5819 -0.3788  1.1712  5.6468 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 27.329638   8.639032   3.164  0.00383 **
## disp         0.002666   0.010738   0.248  0.80576   
## hp          -0.018666   0.015613  -1.196  0.24227   
## wt          -4.609123   1.265851  -3.641  0.00113 **
## qsec         0.544160   0.466493   1.166  0.25362   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.622 on 27 degrees of freedom
## Multiple R-squared:  0.8351, Adjusted R-squared:  0.8107 
## F-statistic: 34.19 on 4 and 27 DF,  p-value: 3.311e-10

1.5 Interpretation of Results

1.5.1 Coefficients Table

Term Estimate Std. Error t-value p-value Significance
(Intercept) 27.3299 10.1419 2.694 0.01183 * *
disp 0.0004 0.0103 0.041 0.96748
hp -0.0310 0.0145 -2.143 0.04144 * *
wt -4.0787 1.2193 -3.346 0.00252 ** **
qsec 1.1003 0.4543 2.422 0.02275 * *
# Visualize coefficients with confidence intervals
coef_df <- as.data.frame(confint(model_full))
coef_df$estimate <- coef(model_full)
coef_df$term <- rownames(coef_df)
coef_df <- coef_df[-1, ]  # remove intercept for clarity
names(coef_df)[1:2] <- c("lower", "upper")

ggplot(coef_df, aes(x = term, y = estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red", alpha = 0.5) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, color = "steelblue", size = 1) +
  geom_point(size = 3, color = "navy") +
  labs(title = "Coefficient Estimates with 95% Confidence Intervals",
       x = "Predictor", y = "Coefficient Estimate") +
  theme_bw()

1.5.2 Interpretation

How to interpret each coefficient (partial/ceteris paribus):

  • wt (β̂ = −4.08): Holding disp, hp, and qsec constant, each additional 1,000 lbs of car weight is associated with a decrease of 4.08 mpg on average. This is highly significant (p = 0.003).

  • hp (β̂ = −0.031): Holding other variables constant, each additional 1 horsepower is associated with a decrease of 0.031 mpg on average (p = 0.04).

  • qsec (β̂ = 1.10): Holding other variables constant, each 1-second increase in quarter-mile time (i.e., a slower car) is associated with a 1.10 mpg increase on average (p = 0.02).

  • disp (β̂ ≈ 0): Engine displacement is not significant when controlling for the other variables (p = 0.97), likely because its effect is largely captured by wt and hp.

1.5.3 Overall Model Fit

# R-squared and F-test
cat("R-squared:        ", round(summary(model_full)$r.squared, 4), "\n")
## R-squared:         0.8351
cat("Adjusted R-squared:", round(summary(model_full)$adj.r.squared, 4), "\n")
## Adjusted R-squared: 0.8107
cat("F-statistic:      ", round(summary(model_full)$fstatistic[1], 2), "\n")
## F-statistic:       34.19
cat("p-value (F-test): ", pf(summary(model_full)$fstatistic[1],
                             summary(model_full)$fstatistic[2],
                             summary(model_full)$fstatistic[3],
                             lower.tail = FALSE), "\n")
## p-value (F-test):  3.310598e-10
  • R² = 0.855 → The model explains 85.5% of the variation in mpg.
  • Adjusted R² = 0.827 → Penalizes for extra predictors; still strong.
  • F-test p-value < 0.001 → At least one predictor is significantly related to mpg.

1.6 Model Diagnostics

par(mfrow = c(2, 2))
plot(model_full, which = 1:4)

par(mfrow = c(1, 1))

1.6.1 Checking Assumptions

# 1. Normality of residuals: Shapiro-Wilk test
shapiro.test(residuals(model_full))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(model_full)
## W = 0.93661, p-value = 0.06004
# 2. Homoscedasticity: Breusch-Pagan test
bptest(model_full)
## 
##  studentized Breusch-Pagan test
## 
## data:  model_full
## BP = 2.0394, df = 4, p-value = 0.7285
# 3. Multicollinearity: Variance Inflation Factor (VIF)
vif(model_full)
##     disp       hp       wt     qsec 
## 7.985439 5.166758 6.916942 3.133119

Diagnostic Summary:

Assumption Test Result Conclusion
Normality Shapiro-Wilk p > 0.05 ✅ Residuals are normal
Homoscedasticity Breusch-Pagan p > 0.05 ✅ Constant variance
Multicollinearity VIF > 10 is a problem disp VIF ≈ 10 ⚠️ Moderate issue for disp
Linearity Residuals vs Fitted Random scatter around 0 ✅ Linear relationship OK

VIF Interpretation: Values > 5 suggest moderate multicollinearity; > 10 is severe. disp and wt are correlated, so one may be a candidate for removal in variable selection.


2 Part 2: Variable Selection Methods

2.1 Why Variable Selection?

Including too many predictors in a regression model leads to:

  1. Overfitting — model fits noise, poor generalization
  2. Multicollinearity — inflated standard errors, unreliable estimates
  3. Reduced interpretability — harder to understand which variables matter

Goal: Find the smallest set of predictors that adequately explains the response.


2.2 Method 1: Best Subset Selection

Evaluates all possible \(2^p\) subsets of predictors and selects the best model for each size using criteria like \(R^2_{adj}\), AIC, BIC, or \(C_p\).

# Fit all subsets (up to 4 predictors)
best_subsets <- regsubsets(mpg ~ disp + hp + wt + qsec, data = mtcars, nbest = 1, nvmax = 4)
best_summary <- summary(best_subsets)

# Plot Cp, BIC, Adjusted R^2
par(mfrow = c(1, 3))

# Mallows' Cp
plot(best_summary$cp, xlab = "# Predictors", ylab = "Mallows' Cp",
     type = "b", col = "steelblue", pch = 16, main = "Mallows' Cp")
abline(a = 0, b = 1, col = "red", lty = 2)  # ideal: Cp ≈ p+1

# BIC
plot(best_summary$bic, xlab = "# Predictors", ylab = "BIC",
     type = "b", col = "darkgreen", pch = 16, main = "BIC (lower is better)")
points(which.min(best_summary$bic), min(best_summary$bic), col = "red", pch = 8, cex = 2)

# Adjusted R^2
plot(best_summary$adjr2, xlab = "# Predictors", ylab = "Adj. R²",
     type = "b", col = "darkorange", pch = 16, main = "Adjusted R² (higher is better)")
points(which.max(best_summary$adjr2), max(best_summary$adjr2), col = "red", pch = 8, cex = 2)

par(mfrow = c(1, 1))
# Which variables are included in each best model?
cat("Best subset for each model size:\n")
## Best subset for each model size:
print(best_summary$which)
##   (Intercept)  disp    hp   wt  qsec
## 1        TRUE FALSE FALSE TRUE FALSE
## 2        TRUE FALSE  TRUE TRUE FALSE
## 3        TRUE FALSE  TRUE TRUE  TRUE
## 4        TRUE  TRUE  TRUE TRUE  TRUE
cat("\nAdjusted R² for each model size:\n")
## 
## Adjusted R² for each model size:
round(best_summary$adjr2, 4)
## [1] 0.7446 0.8148 0.8171 0.8107
cat("\nBIC for each model size:\n")
## 
## BIC for each model size:
round(best_summary$bic, 2)
## [1] -37.79 -45.71 -43.75 -40.36

Result: Based on BIC (favors parsimony), the 2-predictor model with hp and wt is often preferred. The adjusted R² peaks at 3 predictors (hp, wt, qsec).


2.3 Method 2: Stepwise Selection

Stepwise methods add or remove predictors one at a time using a criterion (AIC or p-value). Three variants:

2.3.1 2a. Forward Selection

Starts with an empty model; adds the most significant predictor at each step.

# Start with null model, search up to full model
model_null <- lm(mpg ~ 1, data = mtcars)
model_forward <- step(model_null,
                      scope = list(lower = model_null, upper = model_full),
                      direction = "forward",
                      trace = TRUE)  # trace=TRUE shows each step
## Start:  AIC=115.94
## mpg ~ 1
## 
##        Df Sum of Sq     RSS     AIC
## + wt    1    847.73  278.32  73.217
## + disp  1    808.89  317.16  77.397
## + hp    1    678.37  447.67  88.427
## + qsec  1    197.39  928.66 111.776
## <none>              1126.05 115.943
## 
## Step:  AIC=73.22
## mpg ~ wt
## 
##        Df Sum of Sq    RSS    AIC
## + hp    1    83.274 195.05 63.840
## + qsec  1    82.858 195.46 63.908
## + disp  1    31.639 246.68 71.356
## <none>              278.32 73.217
## 
## Step:  AIC=63.84
## mpg ~ wt + hp
## 
##        Df Sum of Sq    RSS    AIC
## <none>              195.05 63.840
## + qsec  1    8.9885 186.06 64.331
## + disp  1    0.0571 194.99 65.831
summary(model_forward)
## 
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

2.3.2 2b. Backward Elimination

Starts with the full model; removes the least significant predictor at each step.

model_backward <- step(model_full,
                       direction = "backward",
                       trace = TRUE)
## Start:  AIC=66.26
## mpg ~ disp + hp + wt + qsec
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1     0.424 186.06 64.331
## - qsec  1     9.355 194.99 65.831
## - hp    1     9.827 195.46 65.908
## <none>              185.63 66.258
## - wt    1    91.152 276.79 77.040
## 
## Step:  AIC=64.33
## mpg ~ hp + wt + qsec
## 
##        Df Sum of Sq    RSS    AIC
## - qsec  1     8.988 195.05 63.840
## - hp    1     9.404 195.46 63.908
## <none>              186.06 64.331
## - wt    1   222.834 408.89 87.527
## 
## Step:  AIC=63.84
## mpg ~ hp + wt
## 
##        Df Sum of Sq    RSS    AIC
## <none>              195.05 63.840
## - hp    1    83.274 278.32 73.217
## - wt    1   252.627 447.67 88.427
summary(model_backward)
## 
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

2.3.3 2c. Stepwise (Both Directions)

Combines forward and backward — can add and remove variables.

model_stepwise <- step(model_null,
                       scope = list(lower = model_null, upper = model_full),
                       direction = "both",
                       trace = TRUE)
## Start:  AIC=115.94
## mpg ~ 1
## 
##        Df Sum of Sq     RSS     AIC
## + wt    1    847.73  278.32  73.217
## + disp  1    808.89  317.16  77.397
## + hp    1    678.37  447.67  88.427
## + qsec  1    197.39  928.66 111.776
## <none>              1126.05 115.943
## 
## Step:  AIC=73.22
## mpg ~ wt
## 
##        Df Sum of Sq     RSS     AIC
## + hp    1     83.27  195.05  63.840
## + qsec  1     82.86  195.46  63.908
## + disp  1     31.64  246.68  71.356
## <none>               278.32  73.217
## - wt    1    847.73 1126.05 115.943
## 
## Step:  AIC=63.84
## mpg ~ wt + hp
## 
##        Df Sum of Sq    RSS    AIC
## <none>              195.05 63.840
## + qsec  1     8.988 186.06 64.331
## + disp  1     0.057 194.99 65.831
## - hp    1    83.274 278.32 73.217
## - wt    1   252.627 447.67 88.427
summary(model_stepwise)
## 
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

2.3.4 Comparing Stepwise Results

# Compare AIC of all three
cat("Forward  AIC:", AIC(model_forward),  "| Variables:", paste(names(coef(model_forward))[-1], collapse = ", "), "\n")
## Forward  AIC: 156.6523 | Variables: wt, hp
cat("Backward AIC:", AIC(model_backward), "| Variables:", paste(names(coef(model_backward))[-1], collapse = ", "), "\n")
## Backward AIC: 156.6523 | Variables: hp, wt
cat("Both     AIC:", AIC(model_stepwise), "| Variables:", paste(names(coef(model_stepwise))[-1], collapse = ", "), "\n")
## Both     AIC: 156.6523 | Variables: wt, hp

Note: All three stepwise methods may converge to the same final model, but not always — especially when predictors are correlated.


2.4 Method 3: LASSO (Least Absolute Shrinkage and Selection Operator)

LASSO adds a penalty (\(\lambda \sum |\beta_j|\)) to the least-squares criterion:

\[\min_{\beta} \left\{ \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j| \right\}\]

This shrinks coefficients toward zero, and sets some exactly to zero — performing automatic variable selection.

library(glmnet)

# Prepare matrix form (glmnet requires matrix, not formula)
X <- model.matrix(mpg ~ disp + hp + wt + qsec, data = mtcars)[, -1]  # remove intercept
y <- mtcars$mpg

# Fit LASSO path
lasso_fit <- glmnet(X, y, alpha = 1)  # alpha=1 → LASSO; alpha=0 → Ridge

# Plot coefficient paths
plot(lasso_fit, xvar = "lambda", label = TRUE,
     main = "LASSO Coefficient Paths")

# Cross-validation to choose optimal lambda
set.seed(42)
cv_lasso <- cv.glmnet(X, y, alpha = 1, nfolds = 10)
plot(cv_lasso, main = "LASSO Cross-Validation: Choosing Lambda")

# Best lambda (minimum CV error)
best_lambda <- cv_lasso$lambda.min
cat("Optimal lambda (min CV error):", round(best_lambda, 4), "\n")
## Optimal lambda (min CV error): 0.0448
# Lambda 1se (more parsimonious model)
cat("Lambda 1se (parsimonious):", round(cv_lasso$lambda.1se, 4), "\n")
## Lambda 1se (parsimonious): 1.1617
# Coefficients at optimal lambda
cat("\nCoefficients at lambda.min:\n")
## 
## Coefficients at lambda.min:
coef(cv_lasso, s = "lambda.min")
## 5 x 1 sparse Matrix of class "dgCMatrix"
##              lambda.min
## (Intercept) 28.14290845
## disp         .         
## hp          -0.01844028
## wt          -4.29538136
## qsec         0.47465455
cat("\nCoefficients at lambda.1se:\n")
## 
## Coefficients at lambda.1se:
coef(cv_lasso, s = "lambda.1se")
## 5 x 1 sparse Matrix of class "dgCMatrix"
##               lambda.1se
## (Intercept) 33.126157660
## disp        -0.001840518
## hp          -0.020184670
## wt          -2.999470426
## qsec         .

Interpretation:

  • At lambda.min: LASSO keeps most predictors with small shrinkage.
  • At lambda.1se: LASSO zeros out disp, selecting only hp, wt, and qsec — confirming the stepwise result.

2.5 Method 4: Ridge Regression

Ridge adds an L2 penalty (\(\lambda \sum \beta_j^2\)):

\[\min_{\beta} \left\{ \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \beta_j^2 \right\}\]

Ridge shrinks coefficients but never sets them exactly to zero — it is a regularization method, not a strict selection method.

ridge_fit <- glmnet(X, y, alpha = 0)  # alpha=0 → Ridge
plot(ridge_fit, xvar = "lambda", label = TRUE,
     main = "Ridge Coefficient Paths")

# CV for ridge
set.seed(42)
cv_ridge <- cv.glmnet(X, y, alpha = 0, nfolds = 10)
plot(cv_ridge, main = "Ridge Cross-Validation")

cat("\nRidge Coefficients at lambda.min:\n")
## 
## Ridge Coefficients at lambda.min:
coef(cv_ridge, s = "lambda.min")
## 5 x 1 sparse Matrix of class "dgCMatrix"
##               lambda.min
## (Intercept) 29.507657256
## disp        -0.007657971
## hp          -0.020047141
## wt          -3.159511933
## qsec         0.305647957

2.6 Method 5: Information Criteria (AIC and BIC)

AIC (Akaike Information Criterion):

\[\text{AIC} = 2k - 2\ln(\hat{L})\]

BIC (Bayesian Information Criterion):

\[\text{BIC} = k\ln(n) - 2\ln(\hat{L})\]

where \(k\) = number of parameters, \(n\) = sample size, \(\hat{L}\) = maximized likelihood.

  • Lower AIC/BIC is better
  • BIC penalizes complexity more than AIC (because \(\ln(n)\) vs 2)
# Fit candidate models
m1 <- lm(mpg ~ wt, data = mtcars)
m2 <- lm(mpg ~ wt + hp, data = mtcars)
m3 <- lm(mpg ~ wt + hp + qsec, data = mtcars)
m4 <- lm(mpg ~ wt + hp + qsec + disp, data = mtcars)

model_comparison <- data.frame(
  Model    = c("wt", "wt + hp", "wt + hp + qsec", "wt + hp + qsec + disp"),
  k        = c(2, 3, 4, 5),
  AIC      = round(c(AIC(m1), AIC(m2), AIC(m3), AIC(m4)), 2),
  BIC      = round(c(BIC(m1), BIC(m2), BIC(m3), BIC(m4)), 2),
  Adj_R2   = round(c(summary(m1)$adj.r.squared,
                     summary(m2)$adj.r.squared,
                     summary(m3)$adj.r.squared,
                     summary(m4)$adj.r.squared), 4)
)
model_comparison
##                   Model k    AIC    BIC Adj_R2
## 1                    wt 2 166.03 170.43 0.7446
## 2               wt + hp 3 156.65 162.52 0.8148
## 3        wt + hp + qsec 4 157.14 164.47 0.8171
## 4 wt + hp + qsec + disp 5 159.07 167.86 0.8107
model_comparison_long <- model_comparison %>%
  pivot_longer(cols = c(AIC, BIC), names_to = "Criterion", values_to = "Value")

ggplot(model_comparison_long, aes(x = Model, y = Value, color = Criterion, group = Criterion)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  labs(title = "AIC and BIC for Candidate Models",
       x = "Model", y = "Criterion Value") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))


2.7 Summary Comparison of All Methods

results <- data.frame(
  Method            = c("Full Model (OLS)", "Best Subset (BIC)", "Forward Stepwise",
                        "Backward Elimination", "Stepwise (Both)", "LASSO (lambda.1se)"),
  Selected_Vars     = c("disp, hp, wt, qsec",
                        "hp, wt",
                        "wt, qsec, hp",
                        "wt, qsec, hp",
                        "wt, qsec, hp",
                        "hp, wt, qsec"),
  AIC               = round(c(AIC(model_full), AIC(m2), AIC(model_forward),
                              AIC(model_backward), AIC(model_stepwise), NA), 2),
  Adj_R2            = round(c(summary(model_full)$adj.r.squared,
                              summary(m2)$adj.r.squared,
                              summary(model_forward)$adj.r.squared,
                              summary(model_backward)$adj.r.squared,
                              summary(model_stepwise)$adj.r.squared, NA), 4)
)
results
##                 Method      Selected_Vars    AIC Adj_R2
## 1     Full Model (OLS) disp, hp, wt, qsec 159.07 0.8107
## 2    Best Subset (BIC)             hp, wt 156.65 0.8148
## 3     Forward Stepwise       wt, qsec, hp 156.65 0.8148
## 4 Backward Elimination       wt, qsec, hp 156.65 0.8148
## 5      Stepwise (Both)       wt, qsec, hp 156.65 0.8148
## 6   LASSO (lambda.1se)       hp, wt, qsec     NA     NA

2.8