For part (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
The Lasso regression relative to least squares is Less flexible and hence will get improved prediction accuracy when its increase in bias is less than its decrease in variance.At Least Squares coefficient estimates , the variance is high with zero bias whereas as the tuning parameter \(\lambda\) increases, the ridge regression coefficient get reduced some will reduce to zero which results in the reduction in variance at the cost of increase in bias.
The Ridge regression relative to least squares is Less flexible and hence will get improved prediction accuracy when its increase in bias is less than its decrease in variance.At Least Squares coefficient estimates , the variance is high with zero bias whereas as the tuning parameter \(\lambda\) increases, the ridge regression coefficient get reduced close to zero but not exactly to zero, which in turn results in the reduction in variance at the cost of increase in bias.
The lasso and ridge regression perform equally relative to least square . The only difference is lasso regression contains only limited variables relative to the shrinkage of the Lasso regression coefficient whereas the ridge regression contain all the variables with non zero coefficients due to the shrinkage of ridge coefficients close to zero but not exactly to zero.
The non-linear methods are more flexible relative to the Least squares. As the Flexibility increases, the accuracy prediction will get improved with slight increase in variance at the cost of decrease in bias
In this exercise , we will predict the number of application received using the other variables in the college data set
data("College")
str(College)
## 'data.frame': 777 obs. of 18 variables:
## $ Private : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ Apps : num 1660 2186 1428 417 193 ...
## $ Accept : num 1232 1924 1097 349 146 ...
## $ Enroll : num 721 512 336 137 55 158 103 489 227 172 ...
## $ Top10perc : num 23 16 22 60 16 38 17 37 30 21 ...
## $ Top25perc : num 52 29 50 89 44 62 45 68 63 44 ...
## $ F.Undergrad: num 2885 2683 1036 510 249 ...
## $ P.Undergrad: num 537 1227 99 63 869 ...
## $ Outstate : num 7440 12280 11250 12960 7560 ...
## $ Room.Board : num 3300 6450 3750 5450 4120 ...
## $ Books : num 450 750 400 450 800 500 500 450 300 660 ...
## $ Personal : num 2200 1500 1165 875 1500 ...
## $ PhD : num 70 29 53 92 76 67 90 89 79 40 ...
## $ Terminal : num 78 30 66 97 72 73 93 100 84 41 ...
## $ S.F.Ratio : num 18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
## $ perc.alumni: num 12 16 30 37 2 11 26 37 23 15 ...
## $ Expend : num 7041 10527 8735 19016 10922 ...
## $ Grad.Rate : num 60 56 54 59 15 55 63 73 80 52 ...
college.x=model.matrix(Apps~.,College)[,-1]
college.y=College$Apps
Split the data set into a training and a test set
set.seed(1)
train=sample(c(TRUE,FALSE), nrow(College),rep=TRUE)
test= (!train)
set the \(\lambda\)
grid=10^seq(10,-2,length=100)
Fit a linear model using least squares on the training set and report the test error obtained
Train the training data set with Least square method
colg.lm.model= glmnet(college.x[train,],college.y[train],alpha=0,lambda = grid)
predict the Application received for the test data set based on the trained model
colg.lm.pred= predict(colg.lm.model,s=0,newx = college.x[test,],exact = T,x=college.x[train,],y=college.y[train])
Validation test error for least square method
(colg.lm.err=mean((colg.lm.pred-college.y[test])^2))
## [1] 984706.3
Accuracy of the Least Square method
(colg.lm.acc=(cor(college.y[test],colg.lm.pred)^2))
## s1
## [1,] 0.9287323
colg.regfit.full=regsubsets(Apps~.,data=College[train,],nvmax=18)
reg.bs.summary=summary(colg.regfit.full)
Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model
par(mfrow=c(2,2))
plot(reg.bs.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(reg.bs.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(reg.bs.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(reg.bs.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")
Create the Model Matrix for the test data in order find the best model using the best subset selection
colg.test.mat = model.matrix(Apps~.,data=College[test,])
Find the MSE using the test model matrix
Colg.val.errors=rep(NA,17)
colg.val.acc.bss= rep(0,17)
for (i in 1:17){
colg.coefi= coef(colg.regfit.full,id=i)
colg.pred=colg.test.mat[,names(colg.coefi)]%*%colg.coefi
colg.val.acc.bss[i]=(cor(College$Apps[test],colg.pred)^2)
Colg.val.errors[i]=mean((College$Apps[test]-colg.pred)^2)
}
Find the Minimum MSE to get the Best model
which.min(Colg.val.errors)
## [1] 10
Based on the subset selection method, model with 10 predictors is the best model.
coef(colg.regfit.full,10)
## (Intercept) PrivateYes Accept Enroll Top10perc
## 132.63597831 -686.81821880 1.67769305 -0.67442437 58.72119828
## Top25perc Outstate Room.Board PhD Expend
## -21.48113789 -0.08148979 0.11099350 -13.01400088 0.07754081
## Grad.Rate
## 10.17149120
(colg.bss.err=Colg.val.errors[10])
## [1] 969154.1
Accuracy of the Best Subset Selection model
(colg.bss.acc=colg.val.acc.bss[10])
## [1] 0.929913
colg.regfit.fwd=regsubsets(Apps~.,data=College[train,],nvmax=18,method="forward")
reg.fwd.summary=summary(colg.regfit.fwd)
Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model
par(mfrow=c(2,2))
plot(reg.fwd.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(reg.fwd.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(reg.fwd.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(reg.fwd.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")
Find the MSE using the test model matrix
Colg.val.errors.fwd=rep(NA,17)
colg.val.acc.fwd= rep(0,17)
for (i in 1:17){
colg.coefi.fwd= coef(colg.regfit.fwd,id=i)
colg.pred.fwd=colg.test.mat[,names(colg.coefi.fwd)]%*%colg.coefi.fwd
colg.val.acc.fwd[i]=(cor(College$Apps[test],colg.pred.fwd)^2)
Colg.val.errors.fwd[i]=mean((College$Apps[test]-colg.pred.fwd)^2)
}
Find the Minimum MSE to get the Best model
which.min(Colg.val.errors.fwd)
## [1] 10
Based on the forward selection method, model with 10 predictors is the best model.
coef(colg.regfit.fwd,10)
## (Intercept) PrivateYes Accept Enroll Top10perc
## 132.63597831 -686.81821880 1.67769305 -0.67442437 58.72119828
## Top25perc Outstate Room.Board PhD Expend
## -21.48113789 -0.08148979 0.11099350 -13.01400088 0.07754081
## Grad.Rate
## 10.17149120
(colg.fwd.err=Colg.val.errors.fwd[10])
## [1] 969154.1
Accuracy of the forward Selection model
(colg.fwd.acc=colg.val.acc.fwd[10])
## [1] 0.929913
colg.regfit.bkd=regsubsets(Apps~.,data=College[train,],nvmax=18,method="backward")
reg.bkd.summary=summary(colg.regfit.bkd)
Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model
par(mfrow=c(2,2))
plot(reg.bkd.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(reg.bkd.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(reg.bkd.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(reg.bkd.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")
Find the MSE using the test model matrix
Colg.val.errors.bkd=rep(NA,17)
colg.val.acc.bkd= rep(0,17)
for (i in 1:17){
colg.coefi.bkd= coef(colg.regfit.bkd,id=i)
colg.pred.bkd=colg.test.mat[,names(colg.coefi.bkd)]%*%colg.coefi.bkd
colg.val.acc.bkd[i]=(cor(College$Apps[test],colg.pred.bkd)^2)
Colg.val.errors.bkd[i]=mean((College$Apps[test]-colg.pred.bkd)^2)
}
Find the Minimum MSE to get the Best model
which.min(Colg.val.errors.bkd)
## [1] 10
Based on the Backward selection method, model with 10 predictors is the best model.
coef(colg.regfit.bkd,10)
## (Intercept) PrivateYes Accept Enroll Top10perc
## 132.63597831 -686.81821880 1.67769305 -0.67442437 58.72119828
## Top25perc Outstate Room.Board PhD Expend
## -21.48113789 -0.08148979 0.11099350 -13.01400088 0.07754081
## Grad.Rate
## 10.17149120
(colg.bkd.err=Colg.val.errors.bkd[10])
## [1] 969154.1
Accuracy of the backward Selection model
(colg.bkd.acc=colg.val.acc.bkd[10])
## [1] 0.929913
Fit a ridge regression on the training data set with \(\lambda\) chosen by cross-validation.report the test error obtained
Find out the ridge regression model for the training data set using the grid of \(\lambda\)
colg.ridge.model=glmnet(college.x[train,],college.y[train],alpha=0,lambda = grid)
Execute cross-validation to find out the \(\lambda\)
set.seed(2)
colg.cv.out= cv.glmnet(college.x[train,],college.y[train],alpha=0)
plot(colg.cv.out)
Find out the \(\lambda\) with minimum MSE value
(colg.bestlambda=colg.cv.out$lambda.min)
## [1] 394.2365
log(394.2365)
## [1] 5.976951
predict the coefficients of the predictors along with the intercept for the best \(\lambda\) value
predict(colg.ridge.model,s=colg.bestlambda,type = "coefficients")[,]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -1.301713e+03 -5.814072e+02 1.128794e+00 3.311844e-01 2.946961e+01
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## -3.410179e+00 7.381110e-02 1.454077e-02 -3.094319e-02 1.760185e-01
## Books Personal PhD Terminal S.F.Ratio
## 1.908980e-01 -6.514172e-02 -4.749598e+00 -8.127520e+00 2.021354e+01
## perc.alumni Expend Grad.Rate
## -9.731471e-02 8.349743e-02 1.175989e+01
Predict the application received for the new test data set based on the best \(\lambda\) value
colg.apps.pred.ridge=predict(colg.ridge.model,s=colg.bestlambda,newx = college.x[test,])
Find out the test error for the best fit \(\lambda\)
(colg.ridge.err=mean((colg.apps.pred.ridge-college.y[test])^2))
## [1] 940753.5
Accuracy of the Ridge model
(colg.ridge.acc=(cor(college.y[test],colg.apps.pred.ridge)^2))
## s1
## [1,] 0.9287269
fit the Lasso model on the training data set with the \(\lambda\) by cross validation. report the test error obtained, along with the number of non-zero coefficient estimates Find out the ridge regression model for the training data set using the grid of \(\lambda\)
colg.lasso.model=glmnet(college.x[train,],college.y[train],alpha=1,lambda = grid)
Execute cross-validation to find out the \(\lambda\)
set.seed(2)
colg.cv.out.lasso= cv.glmnet(college.x[train,],college.y[train],alpha=1)
plot(colg.cv.out.lasso)
Find out the \(\lambda\) with minimum MSE value
(colg.lasso.bestlambda=colg.cv.out.lasso$lambda.min)
## [1] 54.59727
predict the coefficients of the predictors along with the intercept for the best \(\lambda\) value
lasso_coef=predict(colg.lasso.model,s=colg.lasso.bestlambda,type = "coefficients")[,]
lasso_coef
## (Intercept) PrivateYes Accept Enroll Top10perc
## -6.662641e+02 -2.453703e+02 1.478906e+00 -4.745848e-03 2.606622e+01
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## 0.000000e+00 0.000000e+00 0.000000e+00 -1.375606e-02 2.384654e-03
## Books Personal PhD Terminal S.F.Ratio
## 0.000000e+00 0.000000e+00 -1.746498e+00 -5.433008e+00 0.000000e+00
## perc.alumni Expend Grad.Rate
## 0.000000e+00 6.674557e-02 3.297330e+00
Predict the application received for the new test data set based on the best \(\lambda\) value
colg.apps.pred.lasso=predict(colg.lasso.model,s=colg.lasso.bestlambda,newx = college.x[test,])
Find out the test error for the best fit \(\lambda\)
(colg.lasso.err=mean((colg.apps.pred.lasso-college.y[test])^2))
## [1] 991035.4
Find out the number of non zero coefficients for the best fit lasso model
length(lasso_coef[lasso_coef!=0])
## [1] 11
Accuracy of the Lasso model
(colg.lasso.acc=(cor(college.y[test],colg.apps.pred.lasso)^2))
## s1
## [1,] 0.9279095
fit a PCR model on the training data set, with M chosen by cross Validation. Report the test error obtained along with the value of M selected by cross-validation
Execute the PCR to find out M by cross validation
set.seed(3)
colg.pcr.model= pcr(Apps~.,data=College[train,],scale=TRUE,validation="CV")
summary(colg.pcr.model)
## Data: X dimension: 393 17
## Y dimension: 393 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4189 4167 2389 2392 2129 2109 2030
## adjCV 4189 4165 2381 2387 2108 2111 2015
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 2029 1999 1893 1860 1875 1892 1893
## adjCV 2017 1980 1879 1847 1867 1879 1881
## 14 comps 15 comps 16 comps 17 comps
## CV 1918 1759 1374 1368
## adjCV 1934 1716 1359 1352
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 31.858 57.44 64.20 69.91 75.10 80.17 83.82 87.30
## Apps 4.353 70.99 71.18 76.84 78.34 81.03 81.59 82.21
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.26 92.74 94.79 96.70 97.76 98.67 99.37
## Apps 83.31 83.97 83.97 84.34 84.58 84.70 91.28
## 16 comps 17 comps
## X 99.82 100.00
## Apps 92.83 93.02
validationplot(colg.pcr.model,val.type = "MSEP")
Based on the Validation plot, the MSE is low when the components is more than 15 which is very few predictors less than the actual 17 predictors in the data set.If we use the Components as 17, it is more or less equal t least squares and there is no dimension reduction occurs.As per the Plot, the MSE started to reduce when the M component is 2,also when the M component is 2 about 70% variance of application received is explained by that component so for analysis purpose i would consider M=2 is the best model
colg.pcr.pred=predict(colg.pcr.model,college.x[test,],ncomp = 2)
Find the Test Validation Error
(colg.pcr.err=mean((colg.pcr.pred-college.y[test])^2))
## [1] 2973971
Test Validation error is very large when compared to the Best subset selection,Ridge regression and Lasso regression
Accuracy of the PCR model
(colg.pcr.acc=(cor(college.y[test],colg.pcr.pred)^2))
## [1] 0.775845
Fit a PLS Model on the training set, with M chosen by cross validation.report the test error obtained along with the value of M selected by cross validation
Execute the PLS to find out M by cross validation
set.seed(3)
colg.pls.model= plsr(Apps~.,data=College[train,],scale=TRUE,validation="CV")
summary(colg.pls.model)
## Data: X dimension: 393 17
## Y dimension: 393 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4189 2228 1921 1785 1701 1516 1385
## adjCV 4189 2216 1917 1769 1671 1488 1369
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1359 1358 1360 1364 1368 1370 1370
## adjCV 1346 1344 1345 1349 1353 1354 1354
## 14 comps 15 comps 16 comps 17 comps
## CV 1368 1368 1368 1368
## adjCV 1352 1352 1352 1352
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 26.01 44.96 62.49 65.22 68.52 72.89 77.13 80.46
## Apps 75.74 82.40 86.74 90.58 92.34 92.79 92.88 92.93
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 82.45 84.76 88.08 90.76 92.80 94.45 97.02
## Apps 92.98 93.00 93.01 93.01 93.02 93.02 93.02
## 16 comps 17 comps
## X 98.03 100.00
## Apps 93.02 93.02
validationplot(colg.pls.model,val.type = "MSEP")
Based on the Validation plot, the MSE is low when the components is more than 6.When M=2, the percentage Variance of application received explained by the model 82% which is good. At the same time the percentage variance of all other predictors is around 45% which also good. So based on this explanation, M=2 is the best fit model,even though the test validation error is very high compared to ridge, lasso, best subset selection methods.
colg.pls.pred=predict(colg.pls.model,college.x[test,],ncomp = 2)
Find the Test Validation Error
(colg.pls.err=mean((colg.pls.pred-college.y[test])^2))
## [1] 1852867
Accuracy of the PLS model
(colg.pls.acc=(cor(college.y[test],colg.pls.pred)^2))
## [1] 0.8718939
Test Validation error is very large, when compared to the PCR, Best subset selection,Ridge regression and Lasso regression.
Comments on the results obtained. How accurately can we predict the number of college application received?is there much difference among the test errors resulting from the five approaches?
Model=c("Least Square","Best Subset","Forward","Backward","Ridge","Lasso","PCR","PLS")
Val_error = c(colg.lm.err,colg.bss.err,colg.fwd.err,colg.bkd.err,colg.ridge.err,colg.lasso.err,colg.pcr.err,colg.pls.err)
Accuracy = c(colg.lm.acc,colg.bss.acc,colg.fwd.acc,colg.bkd.acc,colg.ridge.acc,colg.lasso.acc,colg.pcr.acc,colg.pls.acc)
(results.df= data.frame(Model,Val_error,Accuracy))
## Model Val_error Accuracy
## 1 Least Square 984706.3 0.9287323
## 2 Best Subset 969154.1 0.9299130
## 3 Forward 969154.1 0.9299130
## 4 Backward 969154.1 0.9299130
## 5 Ridge 940753.5 0.9287269
## 6 Lasso 991035.4 0.9279095
## 7 PCR 2973970.9 0.7758450
## 8 PLS 1852867.5 0.8718939
Try out some of the regression methods explored in this chapter, such as best subset selection ,lasso regression,ridge regression and PCR. present and discuss the results for the approaches that you consider
Check the data
data(Boston)
str(Boston)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ black : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
boston.x=model.matrix(crim~.,Boston)[,-1]
boston.y=Boston$crim
Split the data set into a training and a test set
set.seed(1)
Boston.train=sample(c(TRUE,FALSE), nrow(Boston),rep=TRUE)
Boston.test= (!Boston.train)
set the \(\lambda\)
grid=10^seq(10,-2,length=100)
Train the training data set with Least square method
boston.lm.model= glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=0,lambda = grid)
predict the Application received for the test data set based on the trained model
boston.lm.pred= predict(boston.lm.model,s=0,newx = boston.x[Boston.test,],exact = T,x=boston.x[Boston.train,],y=boston.y[Boston.train])
Validation test error for least square method
(boston.lm.err=mean((boston.lm.pred-boston.y[Boston.test])^2))
## [1] 50.66302
Accuracy of the Least Square method
(boston.lm.acc=(cor(boston.y[Boston.test],boston.lm.pred)^2))
## s1
## [1,] 0.3969972
boston.regfit.full=regsubsets(crim~.,data=Boston[Boston.train,],nvmax=18)
(boston.reg.bs.summary=summary(boston.regfit.full))
## Subset selection object
## Call: regsubsets.formula(crim ~ ., data = Boston[Boston.train, ], nvmax = 18)
## 13 Variables (and intercept)
## Forced in Forced out
## zn FALSE FALSE
## indus FALSE FALSE
## chas FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## black FALSE FALSE
## lstat FALSE FALSE
## medv FALSE FALSE
## 1 subsets of each size up to 13
## Selection Algorithm: exhaustive
## zn indus chas nox rm age dis rad tax ptratio black lstat medv
## 1 ( 1 ) " " " " " " " " " " " " " " "*" " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " "*" " " " " "*" " " " "
## 3 ( 1 ) " " " " " " " " " " " " " " "*" " " " " "*" "*" " "
## 4 ( 1 ) " " "*" " " " " " " " " " " "*" " " " " "*" "*" " "
## 5 ( 1 ) "*" " " " " " " " " " " "*" "*" " " " " "*" " " "*"
## 6 ( 1 ) "*" "*" " " " " " " " " "*" "*" " " " " "*" " " "*"
## 7 ( 1 ) "*" "*" " " " " " " " " "*" "*" " " " " "*" "*" "*"
## 8 ( 1 ) "*" "*" " " "*" " " " " "*" "*" " " " " "*" "*" "*"
## 9 ( 1 ) "*" "*" " " "*" " " " " "*" "*" " " "*" "*" "*" "*"
## 10 ( 1 ) "*" "*" " " "*" " " "*" "*" "*" " " "*" "*" "*" "*"
## 11 ( 1 ) "*" "*" " " "*" "*" "*" "*" "*" " " "*" "*" "*" "*"
## 12 ( 1 ) "*" "*" " " "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
## 13 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model
par(mfrow=c(2,2))
plot(boston.reg.bs.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(boston.reg.bs.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(boston.reg.bs.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(boston.reg.bs.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")
Create the Model Matrix for the test data in order find the best model using the best subset selection
boston.test.mat = model.matrix(crim~.,data=Boston[Boston.test,])
Find the MSE using the test model matrix
boston.val.errors=rep(NA,13)
boston.val.acc.bss= rep(0,13)
for (i in 1:13){
boston.coefi= coef(boston.regfit.full,id=i)
boston.pred=boston.test.mat[,names(boston.coefi)]%*%boston.coefi
boston.val.acc.bss[i]=(cor(Boston$crim[Boston.test],boston.pred)^2)
boston.val.errors[i]=mean((Boston$crim[Boston.test]-boston.pred)^2)
}
Find the Minimum MSE to get the Best model
which.min(boston.val.errors)
## [1] 9
Based on the subset selection method, model with 9 predictors is the best model.
coef(boston.regfit.full,9)
## (Intercept) zn indus nox dis rad
## 16.49784501 0.04428501 -0.11356470 -6.80041892 -0.87067024 0.48133294
## ptratio black lstat medv
## -0.17759119 -0.01438142 0.12943566 -0.13215744
(boston.bss.err=boston.val.errors[9])
## [1] 50.43627
Accuracy of the Best Subset Selection model
(boston.bss.acc=boston.val.acc.bss[9])
## [1] 0.3996243
boston.regfit.fwd=regsubsets(crim~.,data=Boston[Boston.train,],nvmax=18,method="forward")
boston.reg.fwd.summary=summary(boston.regfit.fwd)
Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model
par(mfrow=c(2,2))
plot(boston.reg.fwd.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(boston.reg.fwd.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(boston.reg.fwd.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(boston.reg.fwd.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")
Find the MSE using the test model matrix
boston.val.errors.fwd=rep(NA,13)
boston.val.acc.fwd= rep(0,13)
for (i in 1:13){
boston.coefi.fwd= coef(boston.regfit.fwd,id=i)
boston.pred.fwd=boston.test.mat[,names(boston.coefi.fwd)]%*%boston.coefi.fwd
boston.val.acc.fwd[i]=(cor(Boston$crim[Boston.test],boston.pred.fwd)^2)
boston.val.errors.fwd[i]=mean((Boston$crim[Boston.test]-boston.pred.fwd)^2)
}
Find the Minimum MSE to get the Best model
which.min(boston.val.errors.fwd)
## [1] 9
Based on the forward selection method, model with 10 predictors is the best model.
coef(boston.regfit.fwd,9)
## (Intercept) zn indus nox dis rad
## 16.49784501 0.04428501 -0.11356470 -6.80041892 -0.87067024 0.48133294
## ptratio black lstat medv
## -0.17759119 -0.01438142 0.12943566 -0.13215744
(boston.fwd.err=boston.val.errors.fwd[9])
## [1] 50.43627
Accuracy of the forward Selection model
(boston.fwd.acc=boston.val.acc.fwd[10])
## [1] 0.3985968
boston.regfit.bkd=regsubsets(crim~.,data=Boston[Boston.train,],nvmax=18,method="backward")
boston.reg.bkd.summary=summary(boston.regfit.bkd)
Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model
par(mfrow=c(2,2))
plot(boston.reg.bkd.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(boston.reg.bkd.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(boston.reg.bkd.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(boston.reg.bkd.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")
Find the MSE using the test model matrix
boston.val.errors.bkd=rep(NA,13)
boston.val.acc.bkd= rep(0,13)
for (i in 1:13){
boston.coefi.bkd= coef(boston.regfit.bkd,id=i)
boston.pred.bkd=boston.test.mat[,names(boston.coefi.bkd)]%*%boston.coefi.bkd
boston.val.acc.bkd[i]=(cor(Boston$crim[Boston.test],boston.pred.bkd)^2)
boston.val.errors.bkd[i]=mean((Boston$crim[Boston.test]-boston.pred.bkd)^2)
}
Find the Minimum MSE to get the Best model
which.min(boston.val.errors.bkd)
## [1] 9
Based on the Backward selection method, model with 10 predictors is the best model.
coef(boston.regfit.bkd,9)
## (Intercept) zn indus nox dis rad
## 16.49784501 0.04428501 -0.11356470 -6.80041892 -0.87067024 0.48133294
## ptratio black lstat medv
## -0.17759119 -0.01438142 0.12943566 -0.13215744
(boston.bkd.err=boston.val.errors.bkd[9])
## [1] 50.43627
Accuracy of the backward Selection model
(boston.bkd.acc=boston.val.acc.bkd[9])
## [1] 0.3996243
Find out the ridge regression model for the training data set using the grid of \(\lambda\)
boston.ridge.model=glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=0,lambda = grid)
Execute cross-validation to find out the \(\lambda\)
set.seed(2)
boston.cv.out= cv.glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=0)
plot(boston.cv.out)
Find out the \(\lambda\) with minimum MSE value
(boston.bestlambda=boston.cv.out$lambda.min)
## [1] 0.5240686
log(0.5240686)
## [1] -0.6461327
predict the coefficients of the predictors along with the intercept for the best \(\lambda\) value
predict(boston.ridge.model,s=boston.bestlambda,type = "coefficients")[,]
## (Intercept) zn indus chas nox rm
## 11.40028509 0.03571144 -0.10725767 -0.30096628 -3.91773023 -0.21352290
## age dis rad tax ptratio black
## 0.01235778 -0.59941426 0.36811114 0.00436501 -0.08864320 -0.01513869
## lstat medv
## 0.12230979 -0.08876017
Predict the application received for the new test data set based on the best \(\lambda\) value
boston.apps.pred.ridge=predict(boston.ridge.model,s=boston.bestlambda,newx = boston.x[Boston.test,])
Find out the test error for the best fit \(\lambda\)
(boston.ridge.err=mean((boston.apps.pred.ridge-boston.y[Boston.test])^2))
## [1] 51.45778
Accuracy of the Ridge model
(boston.ridge.acc=(cor(boston.y[Boston.test],boston.apps.pred.ridge)^2))
## s1
## [1,] 0.3883494
Find out the lasso regression model for the training data set using the grid of \(\lambda\)
boston.lasso.model=glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=1,lambda = grid)
Execute cross-validation to find out the \(\lambda\)
set.seed(2)
boston.cv.out.lasso= cv.glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=1)
plot(boston.cv.out.lasso)
Find out the \(\lambda\) with minimum MSE value
(boston.lasso.bestlambda=boston.cv.out.lasso$lambda.min)
## [1] 0.115564
predict the coefficients of the predictors along with the intercept for the best \(\lambda\) value
boston.lasso_coef=predict(boston.lasso.model,s=boston.lasso.bestlambda,type = "coefficients")[,]
boston.lasso_coef
## (Intercept) zn indus chas nox rm
## 6.914027481 0.029229796 -0.057097795 -0.006762669 -0.315009674 0.000000000
## age dis rad tax ptratio black
## 0.001418141 -0.436111039 0.433432142 0.000000000 -0.005861043 -0.014563026
## lstat medv
## 0.123607242 -0.079365869
Predict the application received for the new test data set based on the best \(\lambda\) value
boston.apps.pred.lasso=predict(boston.lasso.model,s=boston.lasso.bestlambda,newx = boston.x[Boston.test,])
Find out the test error for the best fit \(\lambda\)
(boston.lasso.err=mean((boston.apps.pred.lasso-boston.y[Boston.test])^2))
## [1] 51.34046
Find out the number of non zero coefficients for the best fit lasso model
length(boston.lasso_coef[boston.lasso_coef!=0])
## [1] 12
Accuracy of the Lasso model
(boston.lasso.acc=(cor(boston.y[Boston.test],boston.apps.pred.lasso)^2))
## s1
## [1,] 0.3893684
Execute the PCR to find out M by cross validation
set.seed(3)
boston.pcr.model= pcr(crim~.,data=Boston[Boston.train,],scale=TRUE,validation="CV")
summary(boston.pcr.model)
## Data: X dimension: 262 13
## Y dimension: 262 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.065 6.574 6.576 6.078 6.042 6.036 6.114
## adjCV 8.065 6.572 6.574 6.072 6.035 6.031 6.105
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.085 5.984 5.977 5.976 5.981 5.962 5.906
## adjCV 6.076 5.964 5.960 5.964 5.969 5.948 5.892
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 44.58 58.02 68.58 75.64 82.03 87.37 90.66 93.02
## crim 33.90 34.29 44.32 44.91 45.57 45.62 46.13 48.13
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.01 96.87 98.4 99.54 100.00
## crim 48.80 48.96 49.0 49.91 50.71
validationplot(boston.pcr.model,val.type = "MSEP")
Based on the Validation plot, the MSE is low when the components is more than 12 which is very few predictors less than the actual 13 predictors in the data set.If we use the Components as 13, it is more or less equal to least squares and there is no dimension reduction occurs.As per the Plot, the MSE started to reduce when the M component is 2,also when the M component is 2 about 70% variance of application received is explained by that component so for analysis purpose i would consider M=2 is the best model
boston.pcr.pred=predict(boston.pcr.model,boston.x[Boston.test,],ncomp = 2)
Find the Test Validation Error
(boston.pcr.err=mean((boston.pcr.pred-boston.y[Boston.test])^2))
## [1] 61.27925
Test Validation error is very large when compared to the Best subset selection,Ridge regression and Lasso regression
Accuracy of the PCR model
(boston.pcr.acc=(cor(boston.y[Boston.test],boston.pcr.pred)^2))
## [1] 0.2767922
Execute the PLS to find out M by cross validation
set.seed(3)
boston.pls.model= plsr(crim~.,data=Boston[Boston.train,],scale=TRUE,validation="CV")
summary(boston.pls.model)
## Data: X dimension: 262 13
## Y dimension: 262 1
## Fit method: kernelpls
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.065 6.334 5.988 5.977 5.953 5.924 5.903
## adjCV 8.065 6.331 5.980 5.960 5.938 5.911 5.890
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 5.932 5.917 5.896 5.905 5.906 5.906 5.906
## adjCV 5.916 5.902 5.883 5.891 5.892 5.892 5.892
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 43.95 54.85 60.29 69.59 76.82 79.48 83.83 87.61
## crim 39.21 47.44 49.46 50.05 50.31 50.56 50.64 50.68
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 90.24 94.42 96.55 98.20 100.00
## crim 50.70 50.71 50.71 50.71 50.71
validationplot(boston.pls.model,val.type = "MSEP")
Based on the Validation plot, the MSE is low when the components is more than 2.When M=2, the percentage Variance of application received explained by the model 47.4% which is good. At the same time the percentage variance of all other predictors is around 55% which also good. So based on this explanation, M=2 is the best fit model,even though the test validation error is very high compared to ridge, lasso, best subset selection methods.
boston.pls.pred=predict(boston.pls.model,boston.x[Boston.test,],ncomp = 2)
Find the Test Validation Error
(boston.pls.err=mean((boston.pls.pred-boston.y[Boston.test])^2))
## [1] 53.7501
Accuracy of the PLS model
(boston.pls.acc=(cor(boston.y[Boston.test],boston.pls.pred)^2))
## [1] 0.3591553
Test Validation error is very large, when compared to the PCR, Best subset selection,Ridge regression and Lasso regression.
propose a model(or set of models) that seem to perform well on this data set and justify your answer.Make sure that you are evaluating model performance using validation set error, cross validation or some other reasonable alternative,as opposed to using training error
Boston.Model=c("Least Square","Best Subset","Forward","Backward","Ridge","Lasso","PCR","PLS")
Boston.Val_error = c(boston.lm.err,boston.bss.err,boston.fwd.err,boston.bkd.err,boston.ridge.err,boston.lasso.err,boston.pcr.err,boston.pls.err)
Boston.Accuracy = c(boston.lm.acc,boston.bss.acc,boston.fwd.acc,boston.bkd.acc,boston.ridge.acc,boston.lasso.acc,boston.pcr.acc,boston.pls.acc)
(Boston.results.df= data.frame(Boston.Model,Boston.Val_error,Boston.Accuracy))
## Boston.Model Boston.Val_error Boston.Accuracy
## 1 Least Square 50.66302 0.3969972
## 2 Best Subset 50.43627 0.3996243
## 3 Forward 50.43627 0.3985968
## 4 Backward 50.43627 0.3996243
## 5 Ridge 51.45778 0.3883494
## 6 Lasso 51.34046 0.3893684
## 7 PCR 61.27925 0.2767922
## 8 PLS 53.75010 0.3591553
Does your chosen model involve all the features in the data set? why or not?
As per the Best subset selection , all the features available in the data set are not used in the model. only the below predictors are used to predict the per capita crime rate in boston city.
* zn
* indus
* nox
* dis
* rad
* ptratio
* black
* lstat
* medv
If we use all the predictors it will result to over fitting so based on this analysis , will go with prediction of per capita crime using the above predictors.
As per the PLS method only two components will be used to predict the per capita crima rate in boston city. Two components capture the major variance explained by the predictors inorder to predict the per capita crime in boston city