1. Multiple Linear Regression

1.1 Dataset

I use the built-in mtcars dataset (32 cars, 11 variables). The goal is to predict fuel efficiency (mpg) from multiple predictors: weight (wt), horsepower (hp), number of cylinders (cyl), and displacement (disp).

data("mtcars")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

1.2 Exploratory Look

pairs(mtcars[c("mpg","wt","hp","cyl","disp")], main="Scatterplot Matrix")

All four predictors show a negative relationship with mpg. Weight and displacement are strongly correlated with each other, which hints at potential multicollinearity.

1.3 Fitting the Full Model

fit.full <- lm(mpg ~ wt + hp + cyl + disp, data = mtcars)
summary(fit.full)
## 
## Call:
## lm(formula = mpg ~ wt + hp + cyl + disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0562 -1.4636 -0.4281  1.2854  5.8269 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 40.82854    2.75747  14.807 1.76e-14 ***
## wt          -3.85390    1.01547  -3.795 0.000759 ***
## hp          -0.02054    0.01215  -1.691 0.102379    
## cyl         -1.29332    0.65588  -1.972 0.058947 .  
## disp         0.01160    0.01173   0.989 0.331386    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.513 on 27 degrees of freedom
## Multiple R-squared:  0.8486, Adjusted R-squared:  0.8262 
## F-statistic: 37.84 on 4 and 27 DF,  p-value: 1.061e-10

Interpretation:

  • The model explains 84.9% of the variance in mpg (Adjusted R-squared = 0.8262).
  • wt is the only clearly significant predictor (p < 0.001): holding other variables constant, each 1,000 lb increase in weight reduces fuel efficiency by 3.85 mpg.
  • cyl is marginally non-significant (p = 0.059) and hp approaches significance (p = 0.102), but neither crosses the 0.05 threshold – likely because all three overlap in what they measure (engine size and power).
  • disp contributes nothing beyond what the other three already capture (p = 0.331).
  • The overall F-test (p = 1.06e-10) confirms the model as a whole is highly significant.

1.4 Regression Diagnostics

par(mfrow = c(2,2))
plot(fit.full)

  • Residuals vs Fitted: Slight curve in the smoothing line, but no strong pattern. Linearity assumption roughly met.
  • Normal Q-Q: Points track the diagonal well; Chrysler Imperial, Toyota Corolla, and Fiat 128 deviate slightly at the tails but not severely. Normality is acceptable.
  • Scale-Location: Smoothing line is relatively flat – homoscedasticity holds.
  • Residuals vs Leverage: Maserati Bora sits near the Cook’s distance boundary but does not exceed it. No observation is clearly influential enough to require removal.

2. Variable Selection

Variable selection identifies the simplest model with the best predictive accuracy. Three main approaches are demonstrated below.


2.1 All-Subsets Selection

All-subsets evaluates every possible combination of predictors and ranks them by a criterion. Here we use adjusted R-squared via the leaps package.

library(leaps)
reg.sub <- regsubsets(mpg ~ wt + hp + cyl + disp, data = mtcars, nvmax = 4)
reg.sum <- summary(reg.sub)
reg.sum
## Subset selection object
## Call: regsubsets.formula(mpg ~ wt + hp + cyl + disp, data = mtcars, 
##     nvmax = 4)
## 4 Variables  (and intercept)
##      Forced in Forced out
## wt       FALSE      FALSE
## hp       FALSE      FALSE
## cyl      FALSE      FALSE
## disp     FALSE      FALSE
## 1 subsets of each size up to 4
## Selection Algorithm: exhaustive
##          wt  hp  cyl disp
## 1  ( 1 ) "*" " " " " " " 
## 2  ( 1 ) "*" " " "*" " " 
## 3  ( 1 ) "*" "*" "*" " " 
## 4  ( 1 ) "*" "*" "*" "*"
par(mfrow = c(1,2))
plot(reg.sum$adjr2, xlab = "Number of predictors", ylab = "Adjusted R-squared", type = "b")
points(which.max(reg.sum$adjr2), reg.sum$adjr2[which.max(reg.sum$adjr2)], col = "red", pch = 19)
plot(reg.sub, scale = "adjr2", main = "Best subsets by Adjusted R-squared")

cat("Best number of predictors:", which.max(reg.sum$adjr2), "\n")
## Best number of predictors: 3
cat("Variables included:\n")
## Variables included:
print(reg.sum$which[which.max(reg.sum$adjr2), ])
## (Intercept)          wt          hp         cyl        disp 
##        TRUE        TRUE        TRUE        TRUE       FALSE

The highest adjusted R-squared is achieved with 3 predictors. The best 3-variable model includes wt, hp, and cyldisp is excluded. Adding it as a fourth predictor does not improve adjusted R-squared, confirming it adds noise rather than signal.


2.2 Stepwise Selection

Stepwise selection adds or removes one predictor at a time using AIC. Three directions are shown.

Backward: start with all predictors, remove one at a time

fit.step.back <- step(fit.full, direction = "backward")
## Start:  AIC=63.53
## mpg ~ wt + hp + cyl + disp
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1     6.176 176.62 62.665
## <none>              170.44 63.526
## - hp    1    18.048 188.49 64.746
## - cyl   1    24.546 194.99 65.831
## - wt    1    90.925 261.37 75.206
## 
## Step:  AIC=62.66
## mpg ~ wt + hp + cyl
## 
##        Df Sum of Sq    RSS    AIC
## <none>              176.62 62.665
## - hp    1    14.551 191.17 63.198
## - cyl   1    18.427 195.05 63.840
## - wt    1   115.354 291.98 76.750
summary(fit.step.back)
## 
## Call:
## lm(formula = mpg ~ wt + hp + cyl, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9290 -1.5598 -0.5311  1.1850  5.8986 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.75179    1.78686  21.687  < 2e-16 ***
## wt          -3.16697    0.74058  -4.276 0.000199 ***
## hp          -0.01804    0.01188  -1.519 0.140015    
## cyl         -0.94162    0.55092  -1.709 0.098480 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared:  0.8431, Adjusted R-squared:  0.8263 
## F-statistic: 50.17 on 3 and 28 DF,  p-value: 2.184e-11

Backward stepwise drops disp (AIC drops from 63.53 to 62.66) and stops, yielding mpg ~ wt + hp + cyl.

Forward: start with no predictors, add one at a time

fit.null <- lm(mpg ~ 1, data = mtcars)
fit.step.fwd <- step(fit.null, scope = list(lower = fit.null, upper = fit.full), direction = "forward")
## Start:  AIC=115.94
## mpg ~ 1
## 
##        Df Sum of Sq     RSS     AIC
## + wt    1    847.73  278.32  73.217
## + cyl   1    817.71  308.33  76.494
## + disp  1    808.89  317.16  77.397
## + hp    1    678.37  447.67  88.427
## <none>              1126.05 115.943
## 
## Step:  AIC=73.22
## mpg ~ wt
## 
##        Df Sum of Sq    RSS    AIC
## + cyl   1    87.150 191.17 63.198
## + hp    1    83.274 195.05 63.840
## + disp  1    31.639 246.68 71.356
## <none>              278.32 73.217
## 
## Step:  AIC=63.2
## mpg ~ wt + cyl
## 
##        Df Sum of Sq    RSS    AIC
## + hp    1   14.5514 176.62 62.665
## <none>              191.17 63.198
## + disp  1    2.6796 188.49 64.746
## 
## Step:  AIC=62.66
## mpg ~ wt + cyl + hp
## 
##        Df Sum of Sq    RSS    AIC
## <none>              176.62 62.665
## + disp  1    6.1762 170.44 63.526
summary(fit.step.fwd)
## 
## Call:
## lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9290 -1.5598 -0.5311  1.1850  5.8986 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.75179    1.78686  21.687  < 2e-16 ***
## wt          -3.16697    0.74058  -4.276 0.000199 ***
## cyl         -0.94162    0.55092  -1.709 0.098480 .  
## hp          -0.01804    0.01188  -1.519 0.140015    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared:  0.8431, Adjusted R-squared:  0.8263 
## F-statistic: 50.17 on 3 and 28 DF,  p-value: 2.184e-11

Forward stepwise adds wt first (biggest AIC drop), then cyl, then hp, and stops before adding disp. Same final model.

Both directions: combine forward and backward at each step

fit.step.both <- step(fit.null, scope = list(lower = fit.null, upper = fit.full), direction = "both")
## Start:  AIC=115.94
## mpg ~ 1
## 
##        Df Sum of Sq     RSS     AIC
## + wt    1    847.73  278.32  73.217
## + cyl   1    817.71  308.33  76.494
## + disp  1    808.89  317.16  77.397
## + hp    1    678.37  447.67  88.427
## <none>              1126.05 115.943
## 
## Step:  AIC=73.22
## mpg ~ wt
## 
##        Df Sum of Sq     RSS     AIC
## + cyl   1     87.15  191.17  63.198
## + hp    1     83.27  195.05  63.840
## + disp  1     31.64  246.68  71.356
## <none>               278.32  73.217
## - wt    1    847.73 1126.05 115.943
## 
## Step:  AIC=63.2
## mpg ~ wt + cyl
## 
##        Df Sum of Sq    RSS    AIC
## + hp    1    14.551 176.62 62.665
## <none>              191.17 63.198
## + disp  1     2.680 188.49 64.746
## - cyl   1    87.150 278.32 73.217
## - wt    1   117.162 308.33 76.494
## 
## Step:  AIC=62.66
## mpg ~ wt + cyl + hp
## 
##        Df Sum of Sq    RSS    AIC
## <none>              176.62 62.665
## - hp    1    14.551 191.17 63.198
## + disp  1     6.176 170.44 63.526
## - cyl   1    18.427 195.05 63.840
## - wt    1   115.354 291.98 76.750
summary(fit.step.both)
## 
## Call:
## lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9290 -1.5598 -0.5311  1.1850  5.8986 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.75179    1.78686  21.687  < 2e-16 ***
## wt          -3.16697    0.74058  -4.276 0.000199 ***
## cyl         -0.94162    0.55092  -1.709 0.098480 .  
## hp          -0.01804    0.01188  -1.519 0.140015    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared:  0.8431, Adjusted R-squared:  0.8263 
## F-statistic: 50.17 on 3 and 28 DF,  p-value: 2.184e-11

The bidirectional search also converges on mpg ~ wt + hp + cyl. All three stepwise variants agree, which strengthens confidence in this model.

AIC comparison across stepwise models:

AIC(fit.full, fit.step.back, fit.step.fwd, fit.step.both)
##               df      AIC
## fit.full       6 156.3376
## fit.step.back  5 155.4766
## fit.step.fwd   5 155.4766
## fit.step.both  5 155.4766

The three stepwise models share the same AIC (155.48), all lower than the full 4-predictor model (156.34). wt is the only individually significant predictor (p < 0.001): each extra 1,000 lb cuts mpg by 3.17. hp (p = 0.140) and cyl (p = 0.098) stay in by AIC even without reaching 0.05 – their combined presence still improves fit. Adjusted R-squared holds at 0.8263, identical to the full model, confirming disp contributed nothing.


2.3 LASSO (Regularization)

LASSO adds a penalty to the regression that shrinks small coefficients toward zero and can set them exactly to zero, effectively removing variables. The penalty strength is chosen by cross-validation.

library(glmnet)
x <- model.matrix(mpg ~ wt + hp + cyl + disp, data = mtcars)[, -1]
y <- mtcars$mpg
set.seed(1)
cv.lasso <- cv.glmnet(x, y, alpha = 1)
plot(cv.lasso)

cat("Optimal lambda (min CV error):", cv.lasso$lambda.min, "\n")
## Optimal lambda (min CV error): 0.1034148
coef(cv.lasso, s = "lambda.min")
## 5 x 1 sparse Matrix of class "dgCMatrix"
##              lambda.min
## (Intercept) 38.39711409
## wt          -3.10670322
## hp          -0.01718842
## cyl         -0.93577732
## disp         .
cat("Lambda 1se (more regularized):", cv.lasso$lambda.1se, "\n")
## Lambda 1se (more regularized): 1.161684
coef(cv.lasso, s = "lambda.1se")
## 5 x 1 sparse Matrix of class "dgCMatrix"
##               lambda.1se
## (Intercept) 34.755554740
## wt          -2.499948724
## hp          -0.008841051
## cyl         -0.860621916
## disp         .

At lambda.min, LASSO retains all four predictors but shrinks disp close to zero. At the more conservative lambda.1se, disp is shrunk to exactly zero, leaving wt, hp, and cyl – consistent with what all-subsets and stepwise methods also found. LASSO confirms wt as the dominant predictor with the largest retained coefficient in magnitude.


2.4 Regression Diagnostics

All three variable selection methods point to removing disp. The backward stepwise model is used here to check whether regression assumptions hold after variable reduction.

par(mfrow = c(2,2))
plot(fit.step.back)

No major violations of linearity, normality, or homoscedasticity. Toyota Corolla and Chrysler Imperial remain the highest leverage points but stay within Cook’s distance bounds.