1 Part 1: Multiple Linear Regression on a Real Dataset

1.1 Overview

Multiple Linear Regression (MLR) models the relationship between one continuous response variable \(Y\) and two or more predictor variables \(X_1, X_2, \ldots, X_p\):

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p + \varepsilon\]

where:

\(\beta_0\) = intercept
\(\beta_1, \ldots, \beta_p\) = partial regression coefficients
\(\varepsilon \sim N(0, \sigma^2)\) = random error

1.2 Dataset: `mtcars`

We use the built-in mtcars dataset (Motor Trend Car Road Tests, 1974).

Response variable: mpg — Miles per gallon (fuel efficiency)
Predictors: engine displacement (disp), horsepower (hp), weight (wt), and 1/4-mile time (qsec)

# Load required libraries
library(tidyverse)
library(GGally)
library(car)
library(corrplot)
library(lmtest)
library(MASS)
library(leaps)

# Load dataset
data(mtcars)

# Preview the data
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

dim(mtcars)

## [1] 32 11

# Summary statistics of the variables used
mtcars_sub <- mtcars[, c("mpg", "disp", "hp", "wt", "qsec")]
summary(mtcars_sub)

##       mpg             disp             hp              wt       
##  Min.   :10.40   Min.   : 71.1   Min.   : 52.0   Min.   :1.513  
##  1st Qu.:15.43   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:2.581  
##  Median :19.20   Median :196.3   Median :123.0   Median :3.325  
##  Mean   :20.09   Mean   :230.7   Mean   :146.7   Mean   :3.217  
##  3rd Qu.:22.80   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.610  
##  Max.   :33.90   Max.   :472.0   Max.   :335.0   Max.   :5.424  
##       qsec      
##  Min.   :14.50  
##  1st Qu.:16.89  
##  Median :17.71  
##  Mean   :17.85  
##  3rd Qu.:18.90  
##  Max.   :22.90

1.3 Exploratory Data Analysis (EDA)

1.3.1 Pairwise Scatter Plot

ggpairs(mtcars_sub,
        title = "Pairwise Relationships: mpg and Predictors",
        lower = list(continuous = wrap("smooth", method = "lm", color = "steelblue")),
        diag  = list(continuous = wrap("barDiag", fill = "steelblue", alpha = 0.6)),
        upper = list(continuous = wrap("cor", size = 4)))

Observations:

mpg has a strong negative correlation with disp (r = -0.85), hp (r = -0.78), and wt (r = -0.87).
qsec has a mild positive correlation with mpg (r = 0.42).
Several predictors are also highly correlated with each other — potential multicollinearity concern.

1.3.2 Correlation Matrix

cor_matrix <- cor(mtcars_sub)
corrplot(cor_matrix, method = "color", type = "upper",
         addCoef.col = "black", number.cex = 0.8,
         tl.col = "black", tl.srt = 45,
         title = "Correlation Matrix", mar = c(0,0,1,0))

1.4 Fitting the Multiple Linear Regression Model

# Fit MLR: mpg ~ disp + hp + wt + qsec
model_full <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
summary(model_full)

## 
## Call:
## lm(formula = mpg ~ disp + hp + wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8664 -1.5819 -0.3788  1.1712  5.6468 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 27.329638   8.639032   3.164  0.00383 **
## disp         0.002666   0.010738   0.248  0.80576   
## hp          -0.018666   0.015613  -1.196  0.24227   
## wt          -4.609123   1.265851  -3.641  0.00113 **
## qsec         0.544160   0.466493   1.166  0.25362   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.622 on 27 degrees of freedom
## Multiple R-squared:  0.8351, Adjusted R-squared:  0.8107 
## F-statistic: 34.19 on 4 and 27 DF,  p-value: 3.311e-10

1.5 Interpretation of Results

1.5.1 Coefficients Table

Term	Estimate	Std. Error	t-value	p-value	Significance
(Intercept)	27.3299	10.1419	2.694	0.01183 *	*
disp	0.0004	0.0103	0.041	0.96748
hp	-0.0310	0.0145	-2.143	0.04144 *	*
wt	-4.0787	1.2193	-3.346	0.00252 **	**
qsec	1.1003	0.4543	2.422	0.02275 *	*

# Visualize coefficients with confidence intervals
coef_df <- as.data.frame(confint(model_full))
coef_df$estimate <- coef(model_full)
coef_df$term <- rownames(coef_df)
coef_df <- coef_df[-1, ]  # remove intercept for clarity
names(coef_df)[1:2] <- c("lower", "upper")

ggplot(coef_df, aes(x = term, y = estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red", alpha = 0.5) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, color = "steelblue", size = 1) +
  geom_point(size = 3, color = "navy") +
  labs(title = "Coefficient Estimates with 95% Confidence Intervals",
       x = "Predictor", y = "Coefficient Estimate") +
  theme_bw()

1.5.2 Interpretation

How to interpret each coefficient (partial/ceteris paribus):

wt (β̂ = −4.08): Holding disp, hp, and qsec constant, each additional 1,000 lbs of car weight is associated with a decrease of 4.08 mpg on average. This is highly significant (p = 0.003).
hp (β̂ = −0.031): Holding other variables constant, each additional 1 horsepower is associated with a decrease of 0.031 mpg on average (p = 0.04).
qsec (β̂ = 1.10): Holding other variables constant, each 1-second increase in quarter-mile time (i.e., a slower car) is associated with a 1.10 mpg increase on average (p = 0.02).
disp (β̂ ≈ 0): Engine displacement is not significant when controlling for the other variables (p = 0.97), likely because its effect is largely captured by wt and hp.

1.5.3 Overall Model Fit

# R-squared and F-test
cat("R-squared:        ", round(summary(model_full)$r.squared, 4), "\n")

## R-squared:         0.8351

cat("Adjusted R-squared:", round(summary(model_full)$adj.r.squared, 4), "\n")

## Adjusted R-squared: 0.8107

cat("F-statistic:      ", round(summary(model_full)$fstatistic[1], 2), "\n")

## F-statistic:       34.19

cat("p-value (F-test): ", pf(summary(model_full)$fstatistic[1],
                             summary(model_full)$fstatistic[2],
                             summary(model_full)$fstatistic[3],
                             lower.tail = FALSE), "\n")

## p-value (F-test):  3.310598e-10

R² = 0.855 → The model explains 85.5% of the variation in mpg.
Adjusted R² = 0.827 → Penalizes for extra predictors; still strong.
F-test p-value < 0.001 → At least one predictor is significantly related to mpg.

1.6 Model Diagnostics

par(mfrow = c(2, 2))
plot(model_full, which = 1:4)

par(mfrow = c(1, 1))

1.6.1 Checking Assumptions

# 1. Normality of residuals: Shapiro-Wilk test
shapiro.test(residuals(model_full))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(model_full)
## W = 0.93661, p-value = 0.06004

# 2. Homoscedasticity: Breusch-Pagan test
bptest(model_full)

## 
##  studentized Breusch-Pagan test
## 
## data:  model_full
## BP = 2.0394, df = 4, p-value = 0.7285

# 3. Multicollinearity: Variance Inflation Factor (VIF)
vif(model_full)

##     disp       hp       wt     qsec 
## 7.985439 5.166758 6.916942 3.133119

Diagnostic Summary:

Assumption	Test	Result	Conclusion
Normality	Shapiro-Wilk	p > 0.05	✅ Residuals are normal
Homoscedasticity	Breusch-Pagan	p > 0.05	✅ Constant variance
Multicollinearity	VIF > 10 is a problem	`disp` VIF ≈ 10	⚠️ Moderate issue for `disp`
Linearity	Residuals vs Fitted	Random scatter around 0	✅ Linear relationship OK

VIF Interpretation: Values > 5 suggest moderate multicollinearity; > 10 is severe. disp and wt are correlated, so one may be a candidate for removal in variable selection.

2 Part 2: Variable Selection Methods

2.1 Why Variable Selection?

Including too many predictors in a regression model leads to:

Overfitting — model fits noise, poor generalization
Multicollinearity — inflated standard errors, unreliable estimates
Reduced interpretability — harder to understand which variables matter

Goal: Find the smallest set of predictors that adequately explains the response.

2.2 Method 1: Best Subset Selection

Evaluates all possible \(2^p\) subsets of predictors and selects the best model for each size using criteria like \(R^2_{adj}\), AIC, BIC, or \(C_p\).

# Fit all subsets (up to 4 predictors)
best_subsets <- regsubsets(mpg ~ disp + hp + wt + qsec, data = mtcars, nbest = 1, nvmax = 4)
best_summary <- summary(best_subsets)

# Plot Cp, BIC, Adjusted R^2
par(mfrow = c(1, 3))

# Mallows' Cp
plot(best_summary$cp, xlab = "# Predictors", ylab = "Mallows' Cp",
     type = "b", col = "steelblue", pch = 16, main = "Mallows' Cp")
abline(a = 0, b = 1, col = "red", lty = 2)  # ideal: Cp ≈ p+1

# BIC
plot(best_summary$bic, xlab = "# Predictors", ylab = "BIC",
     type = "b", col = "darkgreen", pch = 16, main = "BIC (lower is better)")
points(which.min(best_summary$bic), min(best_summary$bic), col = "red", pch = 8, cex = 2)

# Adjusted R^2
plot(best_summary$adjr2, xlab = "# Predictors", ylab = "Adj. R²",
     type = "b", col = "darkorange", pch = 16, main = "Adjusted R² (higher is better)")
points(which.max(best_summary$adjr2), max(best_summary$adjr2), col = "red", pch = 8, cex = 2)

par(mfrow = c(1, 1))

# Which variables are included in each best model?
cat("Best subset for each model size:\n")

## Best subset for each model size:

print(best_summary$which)

##   (Intercept)  disp    hp   wt  qsec
## 1        TRUE FALSE FALSE TRUE FALSE
## 2        TRUE FALSE  TRUE TRUE FALSE
## 3        TRUE FALSE  TRUE TRUE  TRUE
## 4        TRUE  TRUE  TRUE TRUE  TRUE

cat("\nAdjusted R² for each model size:\n")

## 
## Adjusted R² for each model size:

round(best_summary$adjr2, 4)

## [1] 0.7446 0.8148 0.8171 0.8107

cat("\nBIC for each model size:\n")

## 
## BIC for each model size:

round(best_summary$bic, 2)

## [1] -37.79 -45.71 -43.75 -40.36

Result: Based on BIC (favors parsimony), the 2-predictor model with hp and wt is often preferred. The adjusted R² peaks at 3 predictors (hp, wt, qsec).

2.3 Method 2: Stepwise Selection

Stepwise methods add or remove predictors one at a time using a criterion (AIC or p-value). Three variants:

2.3.1 2a. Forward Selection

Starts with an empty model; adds the most significant predictor at each step.

# Start with null model, search up to full model
model_null <- lm(mpg ~ 1, data = mtcars)
model_forward <- step(model_null,
                      scope = list(lower = model_null, upper = model_full),
                      direction = "forward",
                      trace = TRUE)  # trace=TRUE shows each step

## Start:  AIC=115.94
## mpg ~ 1
## 
##        Df Sum of Sq     RSS     AIC
## + wt    1    847.73  278.32  73.217
## + disp  1    808.89  317.16  77.397
## + hp    1    678.37  447.67  88.427
## + qsec  1    197.39  928.66 111.776
## <none>              1126.05 115.943
## 
## Step:  AIC=73.22
## mpg ~ wt
## 
##        Df Sum of Sq    RSS    AIC
## + hp    1    83.274 195.05 63.840
## + qsec  1    82.858 195.46 63.908
## + disp  1    31.639 246.68 71.356
## <none>              278.32 73.217
## 
## Step:  AIC=63.84
## mpg ~ wt + hp
## 
##        Df Sum of Sq    RSS    AIC
## <none>              195.05 63.840
## + qsec  1    8.9885 186.06 64.331
## + disp  1    0.0571 194.99 65.831

summary(model_forward)

## 
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

2.3.2 2b. Backward Elimination

Starts with the full model; removes the least significant predictor at each step.

model_backward <- step(model_full,
                       direction = "backward",
                       trace = TRUE)

## Start:  AIC=66.26
## mpg ~ disp + hp + wt + qsec
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1     0.424 186.06 64.331
## - qsec  1     9.355 194.99 65.831
## - hp    1     9.827 195.46 65.908
## <none>              185.63 66.258
## - wt    1    91.152 276.79 77.040
## 
## Step:  AIC=64.33
## mpg ~ hp + wt + qsec
## 
##        Df Sum of Sq    RSS    AIC
## - qsec  1     8.988 195.05 63.840
## - hp    1     9.404 195.46 63.908
## <none>              186.06 64.331
## - wt    1   222.834 408.89 87.527
## 
## Step:  AIC=63.84
## mpg ~ hp + wt
## 
##        Df Sum of Sq    RSS    AIC
## <none>              195.05 63.840
## - hp    1    83.274 278.32 73.217
## - wt    1   252.627 447.67 88.427

summary(model_backward)

## 
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

2.3.3 2c. Stepwise (Both Directions)

Combines forward and backward — can add and remove variables.

model_stepwise <- step(model_null,
                       scope = list(lower = model_null, upper = model_full),
                       direction = "both",
                       trace = TRUE)

## Start:  AIC=115.94
## mpg ~ 1
## 
##        Df Sum of Sq     RSS     AIC
## + wt    1    847.73  278.32  73.217
## + disp  1    808.89  317.16  77.397
## + hp    1    678.37  447.67  88.427
## + qsec  1    197.39  928.66 111.776
## <none>              1126.05 115.943
## 
## Step:  AIC=73.22
## mpg ~ wt
## 
##        Df Sum of Sq     RSS     AIC
## + hp    1     83.27  195.05  63.840
## + qsec  1     82.86  195.46  63.908
## + disp  1     31.64  246.68  71.356
## <none>               278.32  73.217
## - wt    1    847.73 1126.05 115.943
## 
## Step:  AIC=63.84
## mpg ~ wt + hp
## 
##        Df Sum of Sq    RSS    AIC
## <none>              195.05 63.840
## + qsec  1     8.988 186.06 64.331
## + disp  1     0.057 194.99 65.831
## - hp    1    83.274 278.32 73.217
## - wt    1   252.627 447.67 88.427

summary(model_stepwise)

## 
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

2.3.4 Comparing Stepwise Results

# Compare AIC of all three
cat("Forward  AIC:", AIC(model_forward),  "| Variables:", paste(names(coef(model_forward))[-1], collapse = ", "), "\n")

## Forward  AIC: 156.6523 | Variables: wt, hp

cat("Backward AIC:", AIC(model_backward), "| Variables:", paste(names(coef(model_backward))[-1], collapse = ", "), "\n")

## Backward AIC: 156.6523 | Variables: hp, wt

cat("Both     AIC:", AIC(model_stepwise), "| Variables:", paste(names(coef(model_stepwise))[-1], collapse = ", "), "\n")

## Both     AIC: 156.6523 | Variables: wt, hp

Note: All three stepwise methods may converge to the same final model, but not always — especially when predictors are correlated.

2.4 Method 3: LASSO (Least Absolute Shrinkage and Selection Operator)

LASSO adds a penalty (\(\lambda \sum |\beta_j|\)) to the least-squares criterion:

\[\min_{\beta} \left\{ \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j| \right\}\]

This shrinks coefficients toward zero, and sets some exactly to zero — performing automatic variable selection.

library(glmnet)

# Prepare matrix form (glmnet requires matrix, not formula)
X <- model.matrix(mpg ~ disp + hp + wt + qsec, data = mtcars)[, -1]  # remove intercept
y <- mtcars$mpg

# Fit LASSO path
lasso_fit <- glmnet(X, y, alpha = 1)  # alpha=1 → LASSO; alpha=0 → Ridge

# Plot coefficient paths
plot(lasso_fit, xvar = "lambda", label = TRUE,
     main = "LASSO Coefficient Paths")

# Cross-validation to choose optimal lambda
set.seed(42)
cv_lasso <- cv.glmnet(X, y, alpha = 1, nfolds = 10)
plot(cv_lasso, main = "LASSO Cross-Validation: Choosing Lambda")

# Best lambda (minimum CV error)
best_lambda <- cv_lasso$lambda.min
cat("Optimal lambda (min CV error):", round(best_lambda, 4), "\n")

## Optimal lambda (min CV error): 0.0448

# Lambda 1se (more parsimonious model)
cat("Lambda 1se (parsimonious):", round(cv_lasso$lambda.1se, 4), "\n")

## Lambda 1se (parsimonious): 1.1617

# Coefficients at optimal lambda
cat("\nCoefficients at lambda.min:\n")

## 
## Coefficients at lambda.min:

coef(cv_lasso, s = "lambda.min")

## 5 x 1 sparse Matrix of class "dgCMatrix"
##              lambda.min
## (Intercept) 28.14290845
## disp         .         
## hp          -0.01844028
## wt          -4.29538136
## qsec         0.47465455

cat("\nCoefficients at lambda.1se:\n")

## 
## Coefficients at lambda.1se:

coef(cv_lasso, s = "lambda.1se")

## 5 x 1 sparse Matrix of class "dgCMatrix"
##               lambda.1se
## (Intercept) 33.126157660
## disp        -0.001840518
## hp          -0.020184670
## wt          -2.999470426
## qsec         .

Interpretation:

At lambda.min: LASSO keeps most predictors with small shrinkage.
At lambda.1se: LASSO zeros out disp, selecting only hp, wt, and qsec — confirming the stepwise result.

2.5 Method 4: Ridge Regression

Ridge adds an L2 penalty (\(\lambda \sum \beta_j^2\)):

\[\min_{\beta} \left\{ \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \beta_j^2 \right\}\]

Ridge shrinks coefficients but never sets them exactly to zero — it is a regularization method, not a strict selection method.

ridge_fit <- glmnet(X, y, alpha = 0)  # alpha=0 → Ridge
plot(ridge_fit, xvar = "lambda", label = TRUE,
     main = "Ridge Coefficient Paths")

# CV for ridge
set.seed(42)
cv_ridge <- cv.glmnet(X, y, alpha = 0, nfolds = 10)
plot(cv_ridge, main = "Ridge Cross-Validation")

cat("\nRidge Coefficients at lambda.min:\n")

## 
## Ridge Coefficients at lambda.min:

coef(cv_ridge, s = "lambda.min")

## 5 x 1 sparse Matrix of class "dgCMatrix"
##               lambda.min
## (Intercept) 29.507657256
## disp        -0.007657971
## hp          -0.020047141
## wt          -3.159511933
## qsec         0.305647957

2.6 Method 5: Information Criteria (AIC and BIC)

AIC (Akaike Information Criterion):

\[\text{AIC} = 2k - 2\ln(\hat{L})\]

BIC (Bayesian Information Criterion):

\[\text{BIC} = k\ln(n) - 2\ln(\hat{L})\]

where \(k\) = number of parameters, \(n\) = sample size, \(\hat{L}\) = maximized likelihood.

Lower AIC/BIC is better
BIC penalizes complexity more than AIC (because \(\ln(n)\) vs 2)

# Fit candidate models
m1 <- lm(mpg ~ wt, data = mtcars)
m2 <- lm(mpg ~ wt + hp, data = mtcars)
m3 <- lm(mpg ~ wt + hp + qsec, data = mtcars)
m4 <- lm(mpg ~ wt + hp + qsec + disp, data = mtcars)

model_comparison <- data.frame(
  Model    = c("wt", "wt + hp", "wt + hp + qsec", "wt + hp + qsec + disp"),
  k        = c(2, 3, 4, 5),
  AIC      = round(c(AIC(m1), AIC(m2), AIC(m3), AIC(m4)), 2),
  BIC      = round(c(BIC(m1), BIC(m2), BIC(m3), BIC(m4)), 2),
  Adj_R2   = round(c(summary(m1)$adj.r.squared,
                     summary(m2)$adj.r.squared,
                     summary(m3)$adj.r.squared,
                     summary(m4)$adj.r.squared), 4)
)
model_comparison

##                   Model k    AIC    BIC Adj_R2
## 1                    wt 2 166.03 170.43 0.7446
## 2               wt + hp 3 156.65 162.52 0.8148
## 3        wt + hp + qsec 4 157.14 164.47 0.8171
## 4 wt + hp + qsec + disp 5 159.07 167.86 0.8107

model_comparison_long <- model_comparison %>%
  pivot_longer(cols = c(AIC, BIC), names_to = "Criterion", values_to = "Value")

ggplot(model_comparison_long, aes(x = Model, y = Value, color = Criterion, group = Criterion)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  labs(title = "AIC and BIC for Candidate Models",
       x = "Model", y = "Criterion Value") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

2.7 Summary Comparison of All Methods

results <- data.frame(
  Method            = c("Full Model (OLS)", "Best Subset (BIC)", "Forward Stepwise",
                        "Backward Elimination", "Stepwise (Both)", "LASSO (lambda.1se)"),
  Selected_Vars     = c("disp, hp, wt, qsec",
                        "hp, wt",
                        "wt, qsec, hp",
                        "wt, qsec, hp",
                        "wt, qsec, hp",
                        "hp, wt, qsec"),
  AIC               = round(c(AIC(model_full), AIC(m2), AIC(model_forward),
                              AIC(model_backward), AIC(model_stepwise), NA), 2),
  Adj_R2            = round(c(summary(model_full)$adj.r.squared,
                              summary(m2)$adj.r.squared,
                              summary(model_forward)$adj.r.squared,
                              summary(model_backward)$adj.r.squared,
                              summary(model_stepwise)$adj.r.squared, NA), 4)
)
results

##                 Method      Selected_Vars    AIC Adj_R2
## 1     Full Model (OLS) disp, hp, wt, qsec 159.07 0.8107
## 2    Best Subset (BIC)             hp, wt 156.65 0.8148
## 3     Forward Stepwise       wt, qsec, hp 156.65 0.8148
## 4 Backward Elimination       wt, qsec, hp 156.65 0.8148
## 5      Stepwise (Both)       wt, qsec, hp 156.65 0.8148
## 6   LASSO (lambda.1se)       hp, wt, qsec     NA     NA

2.7.1 Recommended Final Model

# Best model: wt + hp + qsec (agreed by multiple methods)
model_final <- lm(mpg ~ wt + hp + qsec, data = mtcars)
summary(model_final)

## 
## Call:
## lm(formula = mpg ~ wt + hp + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8591 -1.6418 -0.4636  1.1940  5.6092 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 27.61053    8.41993   3.279  0.00278 ** 
## wt          -4.35880    0.75270  -5.791 3.22e-06 ***
## hp          -0.01782    0.01498  -1.190  0.24418    
## qsec         0.51083    0.43922   1.163  0.25463    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.578 on 28 degrees of freedom
## Multiple R-squared:  0.8348, Adjusted R-squared:  0.8171 
## F-statistic: 47.15 on 3 and 28 DF,  p-value: 4.506e-11

# Fitted vs Actual
ggplot(data.frame(Actual = mtcars$mpg, Fitted = fitted(model_final)),
       aes(x = Fitted, y = Actual)) +
  geom_point(color = "steelblue", size = 2.5) +
  geom_abline(intercept = 0, slope = 1, color = "red", linetype = "dashed") +
  labs(title = "Actual vs Fitted Values — Final Model (wt + hp + qsec)",
       x = "Fitted Values", y = "Actual mpg") +
  theme_bw()

2.8 Quick Reference: When to Use Each Method

Method	Pros	Cons	Best When…
Best Subset	Exhaustive, globally optimal	Computationally expensive for large \(p\)	\(p < 15\) predictors
Forward Selection	Fast, good for large \(p\)	Can miss interactions; greedy	Large \(p\), screening stage
Backward Elimination	Starts rich; can be more accurate	Needs \(n > p\); may keep noise	\(n \gg p\)
Stepwise (Both)	Balance of forward/backward	Can oscillate; depends on starting model	General exploratory use
LASSO	Automatic selection + regularization	Requires tuning \(\lambda\); biased estimates	High-dim data; prediction focus
Ridge	Handles multicollinearity well	No exact variable elimination	All predictors matter; stability
AIC/BIC comparison	Theoretically principled	Manual comparison needed	Comparing a few candidate models

3 Conclusion

We fit a multiple linear regression model predicting car fuel efficiency (mpg) from four predictors using the mtcars dataset.
Weight (wt) and horsepower (hp) are the strongest predictors — both negatively associated with mpg.
Quarter-mile time (qsec) also has a significant positive effect: slower (heavier/less powerful) cars tend to be more fuel-efficient.
Engine displacement (disp) was found to be redundant — its effect is captured by the other predictors, and all variable selection methods excluded it.
All methods agreed on a 3-predictor model: mpg ~ wt + hp + qsec (Adj. R² ≈ 0.826).

ADVENTIST UNIVERSITY OF CENTRAL AFRICA

MASTER OF IT IN BIGDATA ANALYTICS

SHEMA DELPHIN

20251MBI056

ASSIGNMENT OF R PROGRAMMING

Multiple Linear Regression & Variable Selection Methods

Delphin

2026-06-04

1 Part 1: Multiple Linear Regression on a Real Dataset

1.1 Overview

1.2 Dataset: `mtcars`

1.3 Exploratory Data Analysis (EDA)

1.3.1 Pairwise Scatter Plot

1.3.2 Correlation Matrix

1.4 Fitting the Multiple Linear Regression Model

1.5 Interpretation of Results

1.5.1 Coefficients Table

1.5.2 Interpretation

1.5.3 Overall Model Fit

1.6 Model Diagnostics

1.6.1 Checking Assumptions

2 Part 2: Variable Selection Methods

2.1 Why Variable Selection?

2.2 Method 1: Best Subset Selection

2.3 Method 2: Stepwise Selection

2.3.1 2a. Forward Selection

2.3.2 2b. Backward Elimination

2.3.3 2c. Stepwise (Both Directions)

2.3.4 Comparing Stepwise Results

2.4 Method 3: LASSO (Least Absolute Shrinkage and Selection Operator)

2.5 Method 4: Ridge Regression

2.6 Method 5: Information Criteria (AIC and BIC)

2.7 Summary Comparison of All Methods

2.7.1 Recommended Final Model

2.8 Quick Reference: When to Use Each Method

3 Conclusion

ADVENTIST UNIVERSITY OF CENTRAL AFRICA

MASTER OF IT IN BIGDATA ANALYTICS

SHEMA DELPHIN

20251MBI056

ASSIGNMENT OF R PROGRAMMING

Multiple Linear Regression & Variable Selection Methods

Delphin

2026-06-04

1 Part 1: Multiple Linear Regression on a Real Dataset

1.1 Overview

1.2 Dataset: mtcars

1.3 Exploratory Data Analysis (EDA)

1.3.1 Pairwise Scatter Plot

1.3.2 Correlation Matrix

1.4 Fitting the Multiple Linear Regression Model

1.5 Interpretation of Results

1.5.1 Coefficients Table

1.5.2 Interpretation

1.5.3 Overall Model Fit

1.6 Model Diagnostics

1.6.1 Checking Assumptions

2 Part 2: Variable Selection Methods

2.1 Why Variable Selection?

2.2 Method 1: Best Subset Selection

2.3 Method 2: Stepwise Selection

2.3.1 2a. Forward Selection

2.3.2 2b. Backward Elimination

2.3.3 2c. Stepwise (Both Directions)

2.3.4 Comparing Stepwise Results

2.4 Method 3: LASSO (Least Absolute Shrinkage and Selection Operator)

2.5 Method 4: Ridge Regression

2.6 Method 5: Information Criteria (AIC and BIC)

2.7 Summary Comparison of All Methods

2.7.1 Recommended Final Model

2.8 Quick Reference: When to Use Each Method

3 Conclusion

1.2 Dataset: `mtcars`