Stepwise regression assumes that the predictor variables are not highly correlated. During each step in stepwise regression, a variable is considered for addition to or subtraction from the set of predictor variables based on some pre-specified criterion (e.g. adjusted R-squared). The two main approaches involve forward selection, starting with no variables in the model, and backwards selection, starting with all candidate predictors.
Lasso (least absolute shrinkage and selection operator) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces.
The elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. The elastic net penalty is controlled by alpha, and bridges the gap between lasso (alpha=1) and ridge (alpha=0). Note that the ridge penalty shrinks the coefficients of correlated predictors towards each other while the lasso tends to pick one and discard the others. Ridge is generally good at prediction but tends to be less interpretable.
Here, we will evaluate the usefullness of all three models using a well-described dataset with “crime rate” in 47 states of the USA for 1960 as the response variable and 15 predictor variables. This may seem an odd choice as more recent data with richer sets of predictor features are certainly available. The choice of a limited dataset with a limited number of features, some of which are linearly correlated, will allow us to more easily interpret differences between the regression methods as well as evaluate the usefulness of principal component analysis in combination with the three different regression methods.
# clear the environment and set seed
rm(list = ls())
set.seed(123)
library(MASS)
library(glmnet)
## Warning: package 'glmnet' was built under R version 3.3.3
## Warning: package 'foreach' was built under R version 3.3.1
library(ggplot2)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.3.1
library(caret)
## Warning: package 'caret' was built under R version 3.3.1
Load the crime dataset and display a summary. Crime is the response variable, the other variables are predictor variables.
dfC <- read.csv("http://www.statsci.org/data/general/uscrime.txt", sep="\t")
#print summary
summary(dfC)
## M So Ed Po1
## Min. :11.90 Min. :0.0000 Min. : 8.70 Min. : 4.50
## 1st Qu.:13.00 1st Qu.:0.0000 1st Qu.: 9.75 1st Qu.: 6.25
## Median :13.60 Median :0.0000 Median :10.80 Median : 7.80
## Mean :13.86 Mean :0.3404 Mean :10.56 Mean : 8.50
## 3rd Qu.:14.60 3rd Qu.:1.0000 3rd Qu.:11.45 3rd Qu.:10.45
## Max. :17.70 Max. :1.0000 Max. :12.20 Max. :16.60
## Po2 LF M.F Pop
## Min. : 4.100 Min. :0.4800 Min. : 93.40 Min. : 3.00
## 1st Qu.: 5.850 1st Qu.:0.5305 1st Qu.: 96.45 1st Qu.: 10.00
## Median : 7.300 Median :0.5600 Median : 97.70 Median : 25.00
## Mean : 8.023 Mean :0.5612 Mean : 98.30 Mean : 36.62
## 3rd Qu.: 9.700 3rd Qu.:0.5930 3rd Qu.: 99.20 3rd Qu.: 41.50
## Max. :15.700 Max. :0.6410 Max. :107.10 Max. :168.00
## NW U1 U2 Wealth
## Min. : 0.20 Min. :0.07000 Min. :2.000 Min. :2880
## 1st Qu.: 2.40 1st Qu.:0.08050 1st Qu.:2.750 1st Qu.:4595
## Median : 7.60 Median :0.09200 Median :3.400 Median :5370
## Mean :10.11 Mean :0.09547 Mean :3.398 Mean :5254
## 3rd Qu.:13.25 3rd Qu.:0.10400 3rd Qu.:3.850 3rd Qu.:5915
## Max. :42.30 Max. :0.14200 Max. :5.800 Max. :6890
## Ineq Prob Time Crime
## Min. :12.60 Min. :0.00690 Min. :12.20 Min. : 342.0
## 1st Qu.:16.55 1st Qu.:0.03270 1st Qu.:21.60 1st Qu.: 658.5
## Median :17.60 Median :0.04210 Median :25.80 Median : 831.0
## Mean :19.40 Mean :0.04709 Mean :26.60 Mean : 905.1
## 3rd Qu.:22.75 3rd Qu.:0.05445 3rd Qu.:30.45 3rd Qu.:1057.5
## Max. :27.60 Max. :0.11980 Max. :44.00 Max. :1993.0
Scale the predictor variables, except So. The variable So is really a factor variable indicating whether a state is considered Southern or not and, therefore, should not be scaled.
colNames <- colnames(dfC[,-2])[1:14]
normalize <- function(df, cols) {
result <- df # make a copy of the input data frame
for (j in cols) { # each specified col
m <- mean(df[,j]) # column mean
std <- sd(df[,j]) # column (sample) sd
result[,j] <- sapply(result[,j], function(x) (x - m) / std)
}
return(result)
}
#normalize predictors except 'So'
dfC.norm <- normalize(dfC, colNames)
Perform a backwards stepwise regression with cross-validation. The output of all the iterations is suppressed. The results of the final step are given in the commented section of the next code block.
ctrl <- trainControl(method = "repeatedcv", number = 5, repeats = 5)
lmFit_Step <- train(Crime ~ ., data = dfC.norm, "lmStepAIC", scope =
list(lower = Crime~1, upper = Crime~.), direction = "backward",
trControl=ctrl)
# Step: AIC=503.93
# .outcome ~ M + Ed + Po1 + M.F + U1 + U2 + Ineq + Prob
#
# Df Sum of Sq RSS AIC
# <none> 1453068 503.93
# - M.F 1 103159 1556227 505.16
# - U1 1 127044 1580112 505.87
# - Prob 1 247978 1701046 509.34
# - U2 1 255443 1708511 509.55
# - M 1 296790 1749858 510.67
# - Ed 1 445788 1898855 514.51
# - Ineq 1 738244 2191312 521.24
# - Po1 1 1672038 3125105 537.93
Develop a new model with the eight variables found with stepwise regression. This yields an adjusted R-squared of 0.7444.
mod_Step = lm(Crime ~ M.F+U1+Prob+U2+M+Ed+Ineq+Po1, data = dfC.norm)
summary(mod_Step)
##
## Call:
## lm(formula = Crime ~ M.F + U1 + Prob + U2 + M + Ed + Ineq + Po1,
## data = dfC.norm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -444.70 -111.07 3.03 122.15 483.30
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 905.09 28.52 31.731 < 2e-16 ***
## M.F 65.83 40.08 1.642 0.10874
## U1 -109.73 60.20 -1.823 0.07622 .
## Prob -86.31 33.89 -2.547 0.01505 *
## U2 158.22 61.22 2.585 0.01371 *
## M 117.28 42.10 2.786 0.00828 **
## Ed 201.50 59.02 3.414 0.00153 **
## Ineq 244.70 55.69 4.394 8.63e-05 ***
## Po1 305.07 46.14 6.613 8.26e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195.5 on 38 degrees of freedom
## Multiple R-squared: 0.7888, Adjusted R-squared: 0.7444
## F-statistic: 17.74 on 8 and 38 DF, p-value: 1.159e-10
Next use cross-validation to see how good this model really is. Because we only have 47 data points, let’s use 47-fold cross-validation (equivalent to leave-one-out cross-validation).
SStot <- sum((dfC.norm$Crime - mean(dfC.norm$Crime))^2)
totsse <- 0
for(i in 1:nrow(dfC.norm)) {
mod_Step_i = lm(Crime ~ M.F+U1+Prob+U2+M+Ed+Ineq+Po1, data = dfC.norm[-i,])
pred_i <- predict(mod_Step_i, newdata=dfC.norm[i,])
totsse <- totsse + ((pred_i - dfC.norm[i,16])^2)
}
R2_mod <- 1 - totsse/SStot
R2_mod
## 1
## 0.667621
Note that in the previous model, the p-value for M.F is higher than 0.1. Let’s see what happens if it is removed.
mod_Step = lm(Crime ~ U1+Prob+U2+M+Ed+Ineq+Po1, data = dfC.norm)
summary(mod_Step)
##
## Call:
## lm(formula = Crime ~ U1 + Prob + U2 + M + Ed + Ineq + Po1, data = dfC.norm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -520.76 -105.67 9.53 136.28 519.37
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 905.09 29.14 31.062 < 2e-16 ***
## U1 -63.86 54.48 -1.172 0.2482
## Prob -84.83 34.61 -2.451 0.0188 *
## U2 134.13 60.71 2.209 0.0331 *
## M 134.20 41.70 3.218 0.0026 **
## Ed 244.38 54.07 4.520 5.62e-05 ***
## Ineq 264.65 55.52 4.767 2.61e-05 ***
## Po1 314.89 46.73 6.738 4.91e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 199.8 on 39 degrees of freedom
## Multiple R-squared: 0.7738, Adjusted R-squared: 0.7332
## F-statistic: 19.06 on 7 and 39 DF, p-value: 8.805e-11
Now it appears that U1 is not all that significant. Let’s remove it as well and re-run the model,
mod_Step = lm(Crime ~ Prob+U2+M+Ed+Ineq+Po1, data = dfC.norm)
summary(mod_Step)
##
## Call:
## lm(formula = Crime ~ Prob + U2 + M + Ed + Ineq + Po1, data = dfC.norm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -470.68 -78.41 -19.68 133.12 556.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 905.09 29.27 30.918 < 2e-16 ***
## Prob -86.44 34.74 -2.488 0.01711 *
## U2 75.47 34.55 2.185 0.03483 *
## M 131.98 41.85 3.154 0.00305 **
## Ed 219.79 50.07 4.390 8.07e-05 ***
## Ineq 269.91 55.60 4.855 1.88e-05 ***
## Po1 341.84 40.87 8.363 2.56e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 200.7 on 40 degrees of freedom
## Multiple R-squared: 0.7659, Adjusted R-squared: 0.7307
## F-statistic: 21.81 on 6 and 40 DF, p-value: 3.418e-11
Next, use cross-validation to evaluate this latest model. The results show that the simpler model has an R-squared of 0.666 which is almost the same compared to the model which included M.F and U1. This would suggest that these features may be ommitted from the final model.
SStot <- sum((dfC.norm$Crime - mean(dfC.norm$Crime))^2)
totsse <- 0
for(i in 1:nrow(dfC.norm)) {
mod_Step_i = lm(Crime ~ Prob+U2+M+Ed+Ineq+Po1, data = dfC.norm[-i,])
pred_i <- predict(mod_Step_i,newdata=dfC.norm[i,])
totsse <- totsse + ((pred_i - dfC.norm[i,16])^2)
}
R2_mod <- 1 - totsse/SStot
R2_mod
## 1
## 0.6661638
The same result can be obtained with “step” as shown below (output of code not shown).
model <- lm(Crime ~ ., data = dfC.norm)
step(model, direction = "backward")
Prepare the data for use in lasso regression.
#building lasso
XP=data.matrix(dfC.norm[,-16])
YP=data.matrix(dfC.norm$Crime)
lasso=cv.glmnet(x=as.matrix(dfC.norm[,-16]), y=as.matrix(dfC.norm$Crime), alpha=1,
nfolds = 5, type.measure="mse", family="gaussian")
Output the coefficients of the variables selected by lasso.
coef(lasso, s=lasso$lambda.min)
## 16 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 889.057299
## M 85.003217
## So 47.081683
## Ed 124.290064
## Po1 308.889820
## Po2 .
## LF 1.116734
## M.F 52.037497
## Pop .
## NW 4.443519
## U1 -22.257315
## U2 55.981384
## Wealth .
## Ineq 180.588656
## Prob -81.812835
## Time .
Fit a new model with the ‘remaining’ ten variables. The adjusted R-squared is slightly lower as compared to that obtained with stepwise regression.
mod_lasso = lm(Crime ~So+M+Ed+Po1+LF+M.F+NW+U2+Ineq+Prob, data = dfC.norm)
summary(mod_lasso)
##
## Call:
## lm(formula = Crime ~ So + M + Ed + Po1 + LF + M.F + NW + U2 +
## Ineq + Prob, data = dfC.norm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -410.55 -121.42 5.76 110.54 550.24
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 863.284 53.510 16.133 < 2e-16 ***
## So 122.792 129.896 0.945 0.35080
## M 113.614 49.962 2.274 0.02903 *
## Ed 188.755 64.750 2.915 0.00608 **
## Po1 333.470 49.353 6.757 6.86e-08 ***
## LF 25.162 49.588 0.507 0.61496
## M.F 31.513 43.179 0.730 0.47021
## NW -2.883 59.595 -0.048 0.96169
## U2 72.881 41.795 1.744 0.08973 .
## Ineq 229.121 68.845 3.328 0.00203 **
## Prob -99.145 40.243 -2.464 0.01867 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 206.6 on 36 degrees of freedom
## Multiple R-squared: 0.7767, Adjusted R-squared: 0.7147
## F-statistic: 12.53 on 10 and 36 DF, p-value: 5.374e-09
Use cross-validation to evaluate the model. Note that four of the variables, namely So, LF, M.F., and NW do not appear to be significant. In fact, without these variables, the model is the same as that obtained with stepwise regression.
SStot <- sum((dfC.norm$Crime - mean(dfC.norm$Crime))^2)
totsse <- 0
for(i in 1:nrow(dfC.norm)) {
mod_lasso_i = lm(Crime ~ So+M+Ed+Po1+LF+M.F+NW+U2+Ineq+Prob, data = dfC.norm[-i,])
pred_i <- predict(mod_lasso_i,newdata=dfC.norm[i,])
totsse <- totsse + ((pred_i - dfC.norm[i,16])^2)
}
R2_mod <- 1 - totsse/SStot
R2_mod
## 1
## 0.5840486
The alpha values are varied in steps of 0.1, from 0 to 1, and subsequently the resultant R-Squared values are calculated.
R2=c()
for (i in 0:10) {
mod_elastic = cv.glmnet(x=as.matrix(dfC.norm[,-16]),y=as.matrix(dfC.norm$Crime),
alpha=i/10,nfolds = 5,type.measure="mse",family="gaussian")
#The deviance(dev.ratio ) shows the percentage of deviance explained,
#(equivalent to r squared in case of regression)
R2 = cbind(R2, mod_elastic$glmnet.fit$dev.ratio[which(mod_elastic$glmnet.fit$lambda == mod_elastic$lambda.min)])
}
R2
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.7129289 0.7733668 0.7549545 0.7782892 0.7551427 0.7921597 0.7921574
## [,8] [,9] [,10] [,11]
## [1,] 0.7636336 0.7552367 0.7569563 0.7931003
alpha_best = (which.max(R2)-1)/10
alpha_best
## [1] 1
Use the “best” alpha value to build the model.
Elastic_net=cv.glmnet(x=as.matrix(dfC.norm[,-16]),y=as.matrix(dfC.norm$Crime),alpha=alpha_best,
nfolds = 5,type.measure="mse",family="gaussian")
#Output the coefficients of the variables selected by Elastic Net
coef(Elastic_net, s=Elastic_net$lambda.min)
## 16 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 888.760971
## M 82.888559
## So 47.952148
## Ed 116.058564
## Po1 310.189409
## Po2 .
## LF 2.228622
## M.F 49.818594
## Pop .
## NW 3.623195
## U1 -13.837819
## U2 46.698552
## Wealth .
## Ineq 175.547425
## Prob -80.415268
## Time .
The Elastic Net selects 12 variables compared to 10 in Lasso and 8 in stepwise. Next, compare how this new model performs compared to the Lasso and Stepwise models.
mod_Elastic_net = lm(Crime ~So+M+Ed+Po1+M.F+Pop+NW+U1+U2+Wealth+Ineq+Prob, data = dfC.norm)
summary(mod_Elastic_net)
##
## Call:
## lm(formula = Crime ~ So + M + Ed + Po1 + M.F + Pop + NW + U1 +
## U2 + Wealth + Ineq + Prob, data = dfC.norm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -434.18 -107.01 18.55 115.88 470.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 897.29 51.91 17.286 < 2e-16 ***
## So 22.89 125.35 0.183 0.85621
## M 112.71 49.35 2.284 0.02876 *
## Ed 195.70 62.94 3.109 0.00378 **
## Po1 293.18 64.99 4.511 7.32e-05 ***
## M.F 48.92 48.12 1.017 0.31656
## Pop -33.25 45.63 -0.729 0.47113
## NW 19.16 57.71 0.332 0.74195
## U1 -89.76 65.68 -1.367 0.18069
## U2 140.78 66.77 2.108 0.04245 *
## Wealth 83.30 95.53 0.872 0.38932
## Ineq 285.77 85.19 3.355 0.00196 **
## Prob -92.75 41.12 -2.255 0.03065 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 202.6 on 34 degrees of freedom
## Multiple R-squared: 0.7971, Adjusted R-squared: 0.7255
## F-statistic: 11.13 on 12 and 34 DF, p-value: 1.52e-08
The R-squared appears to be similar to that obtained with Lasso and Stepwise regression. Let’s use cross-validation to evaluate the model.
SStot <- sum((dfC.norm$Crime - mean(dfC.norm$Crime))^2)
totsse <- 0
for(i in 1:nrow(dfC.norm)) {
mod_lasso_i = lm(Crime ~ So+M+Ed+Po1+Po2+M.F+LF+Pop+NW+U1+U2+Wealth+Ineq+Prob+Time, data = dfC.norm[-i,])
pred_i <- predict(mod_lasso_i,newdata=dfC.norm[i,])
totsse <- totsse + ((pred_i - dfC.norm[i,16])^2)
}
R2_mod <- 1 - totsse/SStot
R2_mod
## 1
## 0.485607
The resultant R- squared value is a much worse cross-validated R-squared estimate because for most of the variables, the p-values seemed to indicate they were not significant. If the insignificant variables would be remove, one would end up with a model akin to that obtained with Lasso or Stepwise regression.
The following code block shows PCA on a matrix of scaled predictors.
pca <- prcomp(dfC[,1:15], scale. = TRUE)
summary(pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 2.4534 1.6739 1.4160 1.07806 0.97893 0.74377
## Proportion of Variance 0.4013 0.1868 0.1337 0.07748 0.06389 0.03688
## Cumulative Proportion 0.4013 0.5880 0.7217 0.79920 0.86308 0.89996
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.56729 0.55444 0.48493 0.44708 0.41915 0.35804
## Proportion of Variance 0.02145 0.02049 0.01568 0.01333 0.01171 0.00855
## Cumulative Proportion 0.92142 0.94191 0.95759 0.97091 0.98263 0.99117
## PC13 PC14 PC15
## Standard deviation 0.26333 0.2418 0.06793
## Proportion of Variance 0.00462 0.0039 0.00031
## Cumulative Proportion 0.99579 0.9997 1.00000
The following are useful visualizations when deciding how many principal components to choose.
screeplot(pca, type="lines",col="blue")
var <- pca$sdev^2
propvar <- var/sum(var)
plot(propvar, xlab = "Principal Component", ylab = "Proportion of Variance Explained",ylim = c(0,1), type = "b")
plot(cumsum(propvar), xlab = "Principal Component", ylab = "Cumulative Proportion of Variance Explained",
ylim = c(0,1), type = "b")
Here, let’ us’s use all the principal components (PCs) instead of the original variables and evaluate the performance of the 3 above models
# Creating a dataframe of response variable and PCs
PCcrime <- as.data.frame(cbind(pca$x, dfC[,16]))
colnames(PCcrime)[16] <- "Crime"
Use the code in the block below to perform five-fold cross-validation. The final model is shown in the commented section.
ctrl <- trainControl(method = "repeatedcv", number = 5, repeats = 5)
set.seed(1)
lmFit_Step_PC <- train(Crime ~ ., data = PCcrime, "lmStepAIC", scope =
list(lower = Crime~1, upper = Crime~.), direction = "backward",trControl=ctrl)
##Step: AIC=507.37
##.outcome ~ PC1 + PC2 + PC4 + PC5 + PC6 + PC7 + PC12 + PC14 +
##PC15
##
##Df Sum of Sq RSS AIC
##<none> 1498159 507.37
##- PC15 1 82173 1580332 507.88
##- PC6 1 92261 1590420 508.18
##- PC14 1 129212 1627371 509.26
##- PC7 1 203535 1701694 511.36
##- PC4 1 257832 1755991 512.83
##- PC12 1 494595 1992754 518.78
##- PC2 1 633037 2131196 521.94
##- PC1 1 1177568 2675727 532.63
##- PC5 1 2312556 3810715 549.25
Fitting a new model with these 9 PCS.
mod_Step_PC = lm(Crime ~ PC15+PC6+PC14+PC7+PC4+PC12+PC2+PC1+PC5, data = PCcrime)
summary(mod_Step_PC)
##
## Call:
## lm(formula = Crime ~ PC15 + PC6 + PC14 + PC7 + PC4 + PC12 + PC2 +
## PC1 + PC5, data = PCcrime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -351.54 -93.58 -20.42 121.74 553.91
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 905.09 29.35 30.836 < 2e-16 ***
## PC15 -622.21 436.77 -1.425 0.162660
## PC6 -60.21 39.89 -1.509 0.139665
## PC14 219.19 122.70 1.786 0.082235 .
## PC7 117.26 52.30 2.242 0.031040 *
## PC4 69.45 27.52 2.523 0.016048 *
## PC12 289.61 82.87 3.495 0.001249 **
## PC2 -70.08 17.72 -3.954 0.000334 ***
## PC1 65.22 12.09 5.393 4.17e-06 ***
## PC5 -229.04 30.31 -7.557 5.20e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201.2 on 37 degrees of freedom
## Multiple R-squared: 0.7823, Adjusted R-squared: 0.7293
## F-statistic: 14.77 on 9 and 37 DF, p-value: 8.755e-10
We obtain an Adjusted R-SQuared value = 0.729 using the selected 9PCs using Backward StepWise regression and Cross Validation. This is slightly lower than using the same method on the original variables. Note that PC15 and PC6 are not significant so we will remove them before attempting cross-validation as shown below.
SStot <- sum((dfC$Crime - mean(dfC$Crime))^2)
totsse <- 0
for(i in 1:nrow(PCcrime)) {
mod_lasso_i = lm(Crime ~ PC14+PC7+PC4+PC12+PC2+PC1+PC5, data = PCcrime[-i,])
pred_i <- predict(mod_lasso_i,newdata=PCcrime[i,])
totsse <- totsse + ((pred_i - PCcrime[i,16])^2)
}
R2_mod <- 1 - totsse/SStot
R2_mod
## 1
## 0.6271988
Building the Lasso model and showing the coefficients of the variables selected by Lasso.
XP=data.matrix(PCcrime[,-16])
YP=data.matrix(PCcrime$Crime)
lasso_PC=cv.glmnet(x=as.matrix(PCcrime[,-16]),y=as.matrix(PCcrime$Crime),alpha=1,
nfolds = 5,type.measure="mse",family="gaussian")
coef(lasso_PC, s=lasso_PC$lambda.min)
## 16 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 905.085106
## PC1 61.350753
## PC2 -64.418018
## PC3 18.497103
## PC4 60.649976
## PC5 -219.356045
## PC6 -47.463846
## PC7 100.540212
## PC8 11.613371
## PC9 -17.620884
## PC10 35.107541
## PC11 7.970122
## PC12 263.128175
## PC13 45.775592
## PC14 179.970042
## PC15 -482.612769
Fitting a new model with these 12 PCs compared to 10 original variables.
mod_lasso_PC = lm(Crime ~PC1+PC2+PC3+PC4+PC5+PC6+PC7+PC10+PC12+PC13+PC14+PC15, data = PCcrime)
summary(mod_lasso_PC)
##
## Call:
## lm(formula = Crime ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 +
## PC10 + PC12 + PC13 + PC14 + PC15, data = PCcrime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -404.62 -85.20 -6.35 111.78 526.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 905.09 29.48 30.698 < 2e-16 ***
## PC1 65.22 12.15 5.369 5.70e-06 ***
## PC2 -70.08 17.80 -3.936 0.000389 ***
## PC3 25.19 21.05 1.197 0.239582
## PC4 69.45 27.64 2.512 0.016913 *
## PC5 -229.04 30.44 -7.523 9.82e-09 ***
## PC6 -60.21 40.07 -1.503 0.142142
## PC7 117.26 52.53 2.232 0.032313 *
## PC10 56.32 66.66 0.845 0.404101
## PC12 289.61 83.24 3.479 0.001398 **
## PC13 81.79 113.18 0.723 0.474838
## PC14 219.19 123.25 1.778 0.084288 .
## PC15 -622.21 438.74 -1.418 0.165237
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 202.1 on 34 degrees of freedom
## Multiple R-squared: 0.7981, Adjusted R-squared: 0.7269
## F-statistic: 11.2 on 12 and 34 DF, p-value: 1.408e-08
We obtain a similar Adjusted R-SQuared value = 0.7269 using the selected 12 PCs instead of the 10 variables using Lasso regression.
Let’s see how this cross-validates.
SStot <- sum((dfC$Crime - mean(dfC$Crime))^2)
totsse <- 0
for(i in 1:nrow(PCcrime)) {
mod_lasso_i = lm(Crime ~ PC1+PC2+PC3+PC4+PC5+PC6+PC7+PC10+PC12+PC13+PC14+PC15, data = PCcrime[-i,])
pred_i <- predict(mod_lasso_i,newdata=PCcrime[i,])
totsse <- totsse + ((pred_i - PCcrime[i,16])^2)
}
R2_mod <- 1 - totsse/SStot
R2_mod
## 1
## 0.5857863
The R-squared value would appear a lot worse as compare to the value obtained with stepwise regression. However, note that PCs 3, 6, 10, 13, and 15 do not appear to be significant. If we remove them, we get the same model as when we use only significant variables from the stepwise PC model.
Develop an Elastic Net model using the PCs and display the best alpha value.
R2_PC=c()
for (i in 0:10) {
model = cv.glmnet(x=as.matrix(PCcrime[,-16]),y=as.matrix(PCcrime$Crime),
alpha=i/10,nfolds = 5,type.measure="mse",family="gaussian")
R2_PC = cbind(R2_PC,model$glmnet.fit$dev.ratio[which(model$glmnet.fit$lambda == model$lambda.min)])
}
R2_PC
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.7782769 0.7654027 0.7558492 0.7743684 0.8021926 0.7470497 0.7872682
## [,8] [,9] [,10] [,11]
## [1,] 0.7464576 0.7992367 0.7650309 0.7940698
alpha_best_PC = (which.max(R2_PC)-1)/10
alpha_best_PC
## [1] 0.4
Observe that the best alpha value = 0.4 which is slightly closer to a Lasso model. The R-Squared values are slightly higher here. Let’s build the model using this alpha value and display the coefficients of the variables selected by Elastic Net. Next, we will develop a new model followed by cross-validation.
Elastic_net_PC=cv.glmnet(x=as.matrix(PCcrime[,-16]),y=as.matrix(PCcrime$Crime),alpha=alpha_best,
nfolds = 5,type.measure="mse",family="gaussian")
coef(Elastic_net_PC, s=Elastic_net_PC$lambda.min)
## 16 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 905.085106
## PC1 61.350753
## PC2 -64.418018
## PC3 18.497103
## PC4 60.649976
## PC5 -219.356045
## PC6 -47.463846
## PC7 100.540212
## PC8 11.613371
## PC9 -17.620884
## PC10 35.107541
## PC11 7.970122
## PC12 263.128175
## PC13 45.775592
## PC14 179.970042
## PC15 -482.612769
mod_Elastic_net_PC = lm(Crime ~PC1+PC2+PC3+PC4+PC5+PC6+PC7+PC12+PC14+PC15, data = PCcrime)
summary(mod_Elastic_net_PC)
##
## Call:
## lm(formula = Crime ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 +
## PC12 + PC14 + PC15, data = PCcrime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -376.38 -92.82 -14.14 98.01 515.01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 905.09 29.17 31.029 < 2e-16 ***
## PC1 65.22 12.02 5.427 4.06e-06 ***
## PC2 -70.08 17.61 -3.979 0.000321 ***
## PC3 25.19 20.82 1.210 0.234195
## PC4 69.45 27.35 2.539 0.015574 *
## PC5 -229.04 30.12 -7.605 5.37e-09 ***
## PC6 -60.21 39.64 -1.519 0.137515
## PC7 117.26 51.97 2.256 0.030239 *
## PC12 289.61 82.35 3.517 0.001201 **
## PC14 219.19 121.94 1.798 0.080642 .
## PC15 -622.21 434.06 -1.433 0.160350
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 200 on 36 degrees of freedom
## Multiple R-squared: 0.7908, Adjusted R-squared: 0.7327
## F-statistic: 13.61 on 10 and 36 DF, p-value: 1.785e-09
SStot <- sum((dfC$Crime - mean(dfC$Crime))^2)
totsse <- 0
for(i in 1:nrow(PCcrime)) {
mod_lasso_i = lm(Crime ~ PC1+PC2+PC3+PC4+PC5+PC6+PC7+PC12+PC14+PC15, data = PCcrime[-i,])
pred_i <- predict(mod_lasso_i,newdata=PCcrime[i,])
totsse <- totsse + ((pred_i - PCcrime[i,16])^2)
}
R2_mod <- 1 - totsse/SStot
R2_mod
## 1
## 0.6267523