Consider the dataset titled Boston, which is available
in R. The list of variables and their descriptions are as follows:
Our variable of interest to predict is medv. We will be
working with the entire data set in this example, hence, no need to do
test/train split.
head() command in R. This
step will help you to identify whether the variables are numeric or
categorical.NOTE - variable chas from int to factor (0=no, 1=yes for tract bounds river) NOTE - With only 504 rows, LOOCV method should not be computationally costly
str(Boston)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ black : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
Boston$chas <- as.factor(Boston$chas)
head(Boston)
## crim zn indus chas nox rm age dis rad tax ptratio black lstat
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
## medv
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7
nrow(Boston)
## [1] 506
ncol(Boston)
## [1] 14
Consider 2 multiple linear regression models where medv is the dependent variable:
medv~.rm and ptratio, i.e.,
medv~rm+ptratioUsing 3 cross-validation techniques (leave-one-out, kfolds with
K=5, and kfolds with K=10), determine which model provides the best
predictive performance. (Hint: glm and
cv.glm)
NOTE - Model 1, including all predictors (not just rm+ptratio) provides the best predictive performance with lower lower scores for LOOCV, KFCV(5, 10), and MSE. See chart below for further details.
modelcv1<-glm(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio, data=Boston)
modelcv2<-glm(medv~rm+ptratio, data=Boston)
Model 1:
loocv1 <- cv.glm(Boston, modelcv1)
loocv1$delta[1]
## [1] 29.48657
Model 2:
loocv2 <- cv.glm(Boston, modelcv2)
loocv2$delta[1]
## [1] 37.70066
Model 1, 5-fold:
kcv1 <- cv.glm(Boston, modelcv1, K=5)
kcv1$delta[1]
## [1] 30.75023
Model 1, 10-fold:
kcv2 <- cv.glm(Boston, modelcv1, K=10)
kcv2$delta[1]
## [1] 29.83194
Model 2, 5-fold:
kcv3 <- cv.glm(Boston, modelcv2, K=5)
kcv3$delta[1]
## [1] 38.0965
Model 2, 10-fold:
kcv4 <- cv.glm(Boston, modelcv2, K=10)
kcv4$delta[1]
## [1] 37.8784
##Prediction Accuracy of Model - LOOCV, K-FoldCV, MSE Comparisons (Using Kable)
m1=lm(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio, data=Boston)
m2=lm(medv~rm+ptratio, data=Boston)
library(kableExtra)
Models=c("Model1","Model2")
LOOCV = c(loocv1$delta[1], loocv2$delta[1])
k5.Fold.cv = c(kcv1$delta[1], kcv3$delta[1])
k10.Fold.cv = c(kcv2$delta[1], kcv4$delta[1])
MSE=c(mean(m1$residuals^2), mean(m2$residuals^2))
text_tbl <- data.frame(Models, LOOCV, k5.Fold.cv, k10.Fold.cv, MSE)
kable(text_tbl) %>%
kable_styling(bootstrap_options = "striped", full_width = F)
| Models | LOOCV | k5.Fold.cv | k10.Fold.cv | MSE |
|---|---|---|---|---|
| Model1 | 29.48657 | 30.75023 | 29.83194 | 27.83194 |
| Model2 | 37.70066 | 38.09650 | 37.87840 | 37.03879 |
Using the 3 model selection algorithms (subset selection, forward, and backward selection) identify the best combination of independent variables (you need to use all independent variables as potential candidates).
Choose the model(s) that minimize BIC and Cp
Estimate the best model(s) and calculate the MSE
(Hint:...$delta[1]) using 5 fold cross validation
model<-regsubsets(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio,
Boston, method= "exhaustive", nvmax=15)
which.max(summary(model)$adjr2)
## [1] 10
which.min(summary(model)$bic)
## [1] 10
which.min(summary(model)$cp)
## [1] 10
plot(model)
summary(model)
## Subset selection object
## Call: regsubsets.formula(medv ~ crim + zn + indus + factor(chas) +
## nox + rm + age + dis + rad + tax + ptratio, Boston, method = "exhaustive",
## nvmax = 15)
## 11 Variables (and intercept)
## Forced in Forced out
## crim FALSE FALSE
## zn FALSE FALSE
## indus FALSE FALSE
## factor(chas)1 FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## 1 subsets of each size up to 11
## Selection Algorithm: exhaustive
## crim zn indus factor(chas)1 nox rm age dis rad tax ptratio
## 1 ( 1 ) " " " " " " " " " " "*" " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " "*" " " " " " " " " "*"
## 3 ( 1 ) " " " " " " " " "*" "*" " " " " " " " " "*"
## 4 ( 1 ) " " " " " " " " "*" "*" " " "*" " " " " "*"
## 5 ( 1 ) "*" " " " " " " "*" "*" " " "*" " " " " "*"
## 6 ( 1 ) "*" " " " " "*" "*" "*" " " "*" " " " " "*"
## 7 ( 1 ) "*" " " " " "*" "*" "*" "*" "*" " " " " "*"
## 8 ( 1 ) "*" "*" " " "*" "*" "*" "*" "*" " " " " "*"
## 9 ( 1 ) "*" " " " " "*" "*" "*" "*" "*" "*" "*" "*"
## 10 ( 1 ) "*" "*" " " "*" "*" "*" "*" "*" "*" "*" "*"
## 11 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
par(mfrow=c(2,2))
plot(summary(model)$rsq, type="o", ylab="R-Squared", xlab="")
plot(summary(model)$adjr2, type="o", ylab="Adj-R-Squared", xlab="")
plot(summary(model)$bic, type="o", ylab="BIC", xlab="")
plot(summary(model)$cp, type="o", ylab="Cp", xlab="")
model1 <- regsubsets(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio,
Boston, nvmax=19, method="forward")
which.max(summary(model1)$adjr2)
## [1] 11
which.min(summary(model1)$bic)
## [1] 7
which.min(summary(model1)$cp)
## [1] 11
plot(model1)
summary(model1)
## Subset selection object
## Call: regsubsets.formula(medv ~ crim + zn + indus + factor(chas) +
## nox + rm + age + dis + rad + tax + ptratio, Boston, nvmax = 19,
## method = "forward")
## 11 Variables (and intercept)
## Forced in Forced out
## crim FALSE FALSE
## zn FALSE FALSE
## indus FALSE FALSE
## factor(chas)1 FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## 1 subsets of each size up to 11
## Selection Algorithm: forward
## crim zn indus factor(chas)1 nox rm age dis rad tax ptratio
## 1 ( 1 ) " " " " " " " " " " "*" " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " "*" " " " " " " " " "*"
## 3 ( 1 ) " " " " " " " " "*" "*" " " " " " " " " "*"
## 4 ( 1 ) " " " " " " " " "*" "*" " " "*" " " " " "*"
## 5 ( 1 ) "*" " " " " " " "*" "*" " " "*" " " " " "*"
## 6 ( 1 ) "*" " " " " "*" "*" "*" " " "*" " " " " "*"
## 7 ( 1 ) "*" " " " " "*" "*" "*" "*" "*" " " " " "*"
## 8 ( 1 ) "*" "*" " " "*" "*" "*" "*" "*" " " " " "*"
## 9 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" " " " " "*"
## 10 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" " " "*"
## 11 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
par(mfrow=c(2,2))
plot(summary(model1)$rsq, type="o", ylab="R-Squared", xlab="")
plot(summary(model1)$adjr2, type="o", ylab="Adj-R-Squared", xlab="")
plot(summary(model1)$bic, type="o", ylab="BIC", xlab="")
plot(summary(model1)$cp, type="o", ylab="Cp", xlab="")
model2<-regsubsets(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio,
Boston, nvmax=19, method="backward")
which.max(summary(model2)$adjr2)
## [1] 10
which.min(summary(model2)$bic)
## [1] 10
which.min(summary(model2)$cp)
## [1] 10
plot(model2)
summary(model2)
## Subset selection object
## Call: regsubsets.formula(medv ~ crim + zn + indus + factor(chas) +
## nox + rm + age + dis + rad + tax + ptratio, Boston, nvmax = 19,
## method = "backward")
## 11 Variables (and intercept)
## Forced in Forced out
## crim FALSE FALSE
## zn FALSE FALSE
## indus FALSE FALSE
## factor(chas)1 FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## 1 subsets of each size up to 11
## Selection Algorithm: backward
## crim zn indus factor(chas)1 nox rm age dis rad tax ptratio
## 1 ( 1 ) " " " " " " " " " " "*" " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " "*" " " " " " " " " "*"
## 3 ( 1 ) " " " " " " " " "*" "*" " " " " " " " " "*"
## 4 ( 1 ) " " " " " " " " "*" "*" " " "*" " " " " "*"
## 5 ( 1 ) "*" " " " " " " "*" "*" " " "*" " " " " "*"
## 6 ( 1 ) "*" " " " " "*" "*" "*" " " "*" " " " " "*"
## 7 ( 1 ) "*" " " " " "*" "*" "*" "*" "*" " " " " "*"
## 8 ( 1 ) "*" " " " " "*" "*" "*" "*" "*" "*" " " "*"
## 9 ( 1 ) "*" " " " " "*" "*" "*" "*" "*" "*" "*" "*"
## 10 ( 1 ) "*" "*" " " "*" "*" "*" "*" "*" "*" "*" "*"
## 11 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
par(mfrow=c(2,2))
plot(summary(model2)$rsq, type="o", ylab="R-Squared", xlab="")
plot(summary(model2)$adjr2, type="o", ylab="Adj-R-Squared", xlab="")
plot(summary(model2)$bic, type="o", ylab="BIC", xlab="")
plot(summary(model2)$cp, type="o", ylab="Cp", xlab="")
## Estimate the best models (10, 11, and 7, as determined above) and
calculate MSEs using 5-fold CV
*NOTE - Best Subset Selection (max adj. R^2 = 10. min BIC = 10, min CP = 10) Model_10_Subset = crim, zn, factor(chas), nox, rm, age, dis, rad, tax, ptratio
*NOTE - Best Forward Stepwise Selection (max adj. R^2 = 11. min BIC = 7, min CP = 11) Model_11_Forward = crim, zn, indus, factor(chas), nox, rm, age, dis, rad, ptratio Model_7_Forward = crim, zn, indus, factor(chas), nox, rm, age, dis, ptratio
*NOTE - Best Forward Stepwise Selection (max adj. R^2 = 10. min BIC = 10, min CP = 10) Model_10_Backward = crim, zn, indus, factor(chas), nox, rm, age, dis, rad, tax, ptratio
#Estimate Model_10_Subset and Calculate MSE using 5-fold CV
Model_10_Subset_lm <- lm(medv~crim+zn+factor(chas)+nox+rm+age+dis+rad+tax+ptratio, data=Boston)
Model_10_Subset_glm <- glm(medv~crim+zn+factor(chas)+nox+rm+age+dis+rad+tax+ptratio, data=Boston)
Model_10_Subset_lm
##
## Call:
## lm(formula = medv ~ crim + zn + factor(chas) + nox + rm + age +
## dis + rad + tax + ptratio, data = Boston)
##
## Coefficients:
## (Intercept) crim zn factor(chas)1 nox
## 27.31009 -0.18360 0.04008 3.43034 -22.90594
## rm age dis rad tax
## 6.11392 -0.04543 -1.55523 0.26724 -0.01336
## ptratio
## -1.00820
kcv_Model_10_Subset_glm <- cv.glm(Boston, Model_10_Subset_glm, K=5)
kcv_Model_10_Subset_glm$delta[1]
## [1] 30.53076
#Estimate Model_11_Forward and Calculate MSE using 5-fold CV
Model_11_Forward_lm <- lm(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+ptratio, data=Boston)
Model_11_Forward_glm <- glm(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+ptratio, data=Boston)
Model_11_Forward_lm
##
## Call:
## lm(formula = medv ~ crim + zn + indus + factor(chas) + nox +
## rm + age + dis + rad + ptratio, data = Boston)
##
## Coefficients:
## (Intercept) crim zn indus factor(chas)1
## 25.51530 -0.18289 0.02918 -0.12985 3.82859
## nox rm age dis rad
## -23.12157 6.16478 -0.04624 -1.59654 0.08450
## ptratio
## -1.02772
kcv_Model_11_Forward_glm <- cv.glm(Boston, Model_11_Forward_glm, K=5)
kcv_Model_11_Forward_glm$delta[1]
## [1] 30.1764
#Estimate Model_7_Forward and Calculate MSE using 5-fold CV
Model_7_Forward_lm <- lm(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+ptratio, data=Boston)
Model_7_Forward_glm <- glm(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+ptratio, data=Boston)
Model_7_Forward_lm
##
## Call:
## lm(formula = medv ~ crim + zn + indus + factor(chas) + nox +
## rm + age + dis + ptratio, data = Boston)
##
## Coefficients:
## (Intercept) crim zn indus factor(chas)1
## 21.71783 -0.15166 0.03257 -0.11417 3.84178
## nox rm age dis ptratio
## -20.33621 6.29314 -0.04830 -1.59433 -0.91617
kcv_Model_7_Forward_glm <- cv.glm(Boston, Model_7_Forward_glm, K=5)
kcv_Model_7_Forward_glm$delta[1]
## [1] 30.65317
#Estimate Model_10_Backward and Calculate MSE using 5-fold CV
Model_10_Backward_lm <- lm(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio, data=Boston)
Model_10_Backward_glm <- glm(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio, data=Boston)
Model_10_Backward_lm
##
## Call:
## lm(formula = medv ~ crim + zn + indus + factor(chas) + nox +
## rm + age + dis + rad + tax + ptratio, data = Boston)
##
## Coefficients:
## (Intercept) crim zn indus factor(chas)1
## 27.15237 -0.18403 0.03910 -0.04232 3.48753
## nox rm age dis rad
## -22.18211 6.07574 -0.04519 -1.58385 0.25472
## tax ptratio
## -0.01221 -0.99621
kcv_Model_10_Backward_glm <- cv.glm(Boston, Model_10_Backward_glm, K=5)
kcv_Model_10_Backward_glm$delta[1]
## [1] 29.32097
Using all the variables (except MEDV), complete data
set, and same grid used in lecture, find the best lambda based on a 10
fold cross validation.
Estimate a Lasso regression using the best lambda, predict using
the whole data set (Hint: newx=x) and get the MSE.
# Omit Missing values:
Boston <- na.omit(Boston)
# Create model matrix and grid for lambda
x <- model.matrix(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio, Boston)[,-1]
grid <- 10^seq(10,-2, length=100)
# Train (80%) - Test (20%) Split
set.seed(123)
train <- sample(1:nrow(x), size=0.8*nrow(x))
test <- setdiff(1:nrow(x), train)
# Cross-Validation for Lasso
cv.out <- cv.glmnet(x[train,], Boston$medv[train], alpha=1, lambda=grid, nfolds=10)
plot(cv.out)
# Optimal Lambda Determination
bestlam=cv.out$lambda.min
print(bestlam)
## [1] 0.01
print(log(bestlam))
## [1] -4.60517
print(bestlam %in% grid)
## [1] TRUE
out <- glmnet(x[train, ], Boston$medv[train], alpha=1, lambda=bestlam)
lasso_pred <- predict(out, s=bestlam, newx=x[test, ])
lasso_coefs <- coef(out, s=bestlam)
lasso_coefs_matrix <- as.matrix(lasso_coefs)
num_coefs <- nrow(lasso_coefs_matrix)
print(lasso_coefs_matrix)
## s1
## (Intercept) 27.48993315
## crim -0.17930031
## zn 0.03824751
## indus -0.02473282
## factor(chas)1 3.55577963
## nox -22.56939957
## rm 6.03929936
## age -0.03783326
## dis -1.49138918
## rad 0.26012025
## tax -0.01140096
## ptratio -1.06801318
#Running a regression model for comparison
out2<-glmnet(x,Boston$medv, lambda=0)
regpred<-predict(out2, s=0, newx=x[test,], exact=T)
Methods=c("Lasso", "Regression")
Testing.MSE=c(mean((lasso_pred-Boston$medv[test])^2), mean((regpred-Boston$medv[test])^2))
tbl <- data.frame(Methods, Testing.MSE)
tbll<-kable(tbl, format = "html")
kable_styling(tbll, bootstrap_options = c("striped", "hover"))
| Methods | Testing.MSE |
|---|---|
| Lasso | 24.12722 |
| Regression | 23.45699 |
Using the whole data, estimate a principle components regression.
(Hint: No need to use subset=train argument)
Predict the medv using 5 principle components and
calculate the MSE.
pcrfit <- pcr(medv~crim+zn+indus+factor(chas)+nox+rm+age+dis+rad+tax+ptratio, data=Boston, scale=TRUE, validation="CV")
summary(pcrfit)
## Data: X dimension: 506 11
## Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 11
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 9.206 7.660 7.131 6.301 6.225 5.753 5.758
## adjCV 9.206 7.657 7.117 6.295 6.221 5.746 5.751
## 7 comps 8 comps 9 comps 10 comps 11 comps
## CV 5.688 5.698 5.644 5.479 5.438
## adjCV 5.681 5.691 5.637 5.471 5.429
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 48.80 61.40 71.61 79.29 85.69 90.53 93.67 96.13
## medv 31.49 41.22 54.01 55.97 62.18 62.30 63.33 63.39
## 9 comps 10 comps 11 comps
## X 97.87 99.42 100.00
## medv 64.40 66.41 67.03
validationplot(pcrfit, val.type="MSEP")
pcrpredict_5_PC <- predict(pcrfit, x[test,], ncomp=5)
mse_PC_5=mean((pcrpredict_5_PC-Boston$medv[test])^2)
mse_PC_5
## [1] 27.40716
Shrinkage involves the process of reducing or fully eliminating the impact on response variable medv of predictior variables that have less of a direct impact on the response variable. While the ridge shrinkage methods does not result in variables being eliminated, the lasso shrinkage method will result in both reduced and eliminated predictor variables. This is advantageous in creating a model with lower variance and easier interpret ability. Note, however, that a trade off of lower model variance is increased bias as the model does not fully account for as much fluctuation in the response variable given changes in the independent predictor variables.
Lasso method shrinkage can be applied in R statistical software using a simple process. First, ensure there are no missing values in the Boston dataset to allow for eventual comparison of MSE between normal lm model and lasso model. Next create a matrix, called X, containing all independent variables (hence [,-1]) and grid containing all possible lambda from 0.01 to 100. Separate the Boston dataset into training and testing to allow for model development and independent prediction accuracy error testing, set a seed to allow for reproducible results. using cv.glmnet function, perform cross-validation using k-fold=10 to determine the optimal value of lambda (the regularization parameter for Lasso) for the Boston predictor variables (matrix X) given lambda 0.01 to 100. The cross-validation k-fold=10 test will run 10 different tests using 9/10 available data observations as training data for the model and the other 1/10 data observations as testing data. The average of all 10 k-fold accuracy tests will provide a output in cv.out that can be plotted to observe the average MSE for each model with lambda ranging from 0.01 to 100. Note that the best lambda will be the lambda that produces the lowest MSE, which can be determined by expecting the cv.out plot visual or utilizing the minimum function. Note that lambda 0.01 producs the lowest MSE and should be used to fit the generalized linear model with added penalty (lambda = bestlambda or 0.01). For the Boston dataset above, the out2 assigned name using glmnet to fit the model has a low lambda, meaning that there is a lower penalty for the existing predictor variables. The result is that, despite using lasso method in which certain predictor variables could be reduced to 0 assuming a higher lambda, no predictor variables are omitted in the adjusted model, meaning that while there is less bias (potential changes in the response variable not explained by predictor variables in a given model), variance will likely be lower. Lower variance is not necessarily a positive, as the model may be overfitted suggesting that the model attempts to explain noise that, in essence, reflect variance in the response variable due to happenstance rather than changes in the predictor variables. Consequently, MSE may be higher when testing the model with unseen test data. While the coefficient for variables X1 to X12 in the Boston dataset above have been reduced to decrease potential variance after using Lasso shrinkage method, no variables were completely omitted. The MSE for the lasso predict model is 24.12 while the untouched regression model has a lower MSE, 23.45. The MSE comparison results between the Lasso predict model and the regression model is logical as there is 509 records and only 14 variabls in the Boston dataset. Shrinkage is often more important when P=>N, the number of characteristics (predictor variables) is simlar to the total number of rows or records in the data set.
In this module 4 exercise, I learned how to better fit a regression model or classification model to better represent the predictor variable’s relationship with the independent response variable using a variety of dimension reduction and shrinkage methods. I also learned how to test the regression model using leave one out cross-validation (LOOCV) and k-fold cross-validation (KFCV). LOOCV involves training the data on n-1 observations and averaging the MSE or error measurement by the n-1 tests that run. LOOCV can be a computationally expensive method for training and testing the data but ensures that the underlying model will be able to accurately predict the true values of the independent variables. K-fold is much less computationally costly and involves 5 or 10 tests (depending upon whether k-fold= 5 or k-fold=10 is selected) that is based on 90% or 80% (contingent on k=5, 10) of the dataset with the X-80% or 90% belonging to the test group. The result is an increased level of assurance that the underlying model will be able to accurately predict new unseen data with little error, MSE. Furthermore, the concept of bias and variance was explored in greater detail, particularly in the context of shrinkage (Lasso and Ridge) and dimension reduction (principal component regression [PCR] and partial least squares regression [PLS]) techniques. Efforts to reduce variance with the objective of minimizing overfitting will often result in increased bias as the model fails to explain the same level of change in the response variable. Statistical methods or modeling techniques, such as Lasso, Ridge, PLC, and PC are powerful tools that can be used to adjust how a regression model is fitted to the underlying data. R statistical software can be used to simplify the underlying math and processes that occur in these length statistical modeling techniques. Understanding when and how to use R packages, such as pls and glmnet, as well as the statistical and mathematical processes can be helpful in adding value to an organization looking to maintain a competitive advantage in a global marketplace dominated by competitors that can leverage the potential of big data.