Question 2

For part (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

Question 2a

The Lasso regression relative to least squares is Less flexible and hence will get improved prediction accuracy when its increase in bias is less than its decrease in variance.At Least Squares coefficient estimates , the variance is high with zero bias whereas as the tuning parameter \(\lambda\) increases, the ridge regression coefficient get reduced some will reduce to zero which results in the reduction in variance at the cost of increase in bias.

Question 2b

The Ridge regression relative to least squares is Less flexible and hence will get improved prediction accuracy when its increase in bias is less than its decrease in variance.At Least Squares coefficient estimates , the variance is high with zero bias whereas as the tuning parameter \(\lambda\) increases, the ridge regression coefficient get reduced close to zero but not exactly to zero, which in turn results in the reduction in variance at the cost of increase in bias.

The lasso and ridge regression perform equally relative to least square . The only difference is lasso regression contains only limited variables relative to the shrinkage of the Lasso regression coefficient whereas the ridge regression contain all the variables with non zero coefficients due to the shrinkage of ridge coefficients close to zero but not exactly to zero.

Question 2c

The non-linear methods are more flexible relative to the Least squares. As the Flexibility increases, the accuracy prediction will get improved with slight increase in variance at the cost of decrease in bias

Question 9

In this exercise , we will predict the number of application received using the other variables in the college data set

data("College")
str(College)
## 'data.frame':    777 obs. of  18 variables:
##  $ Private    : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Apps       : num  1660 2186 1428 417 193 ...
##  $ Accept     : num  1232 1924 1097 349 146 ...
##  $ Enroll     : num  721 512 336 137 55 158 103 489 227 172 ...
##  $ Top10perc  : num  23 16 22 60 16 38 17 37 30 21 ...
##  $ Top25perc  : num  52 29 50 89 44 62 45 68 63 44 ...
##  $ F.Undergrad: num  2885 2683 1036 510 249 ...
##  $ P.Undergrad: num  537 1227 99 63 869 ...
##  $ Outstate   : num  7440 12280 11250 12960 7560 ...
##  $ Room.Board : num  3300 6450 3750 5450 4120 ...
##  $ Books      : num  450 750 400 450 800 500 500 450 300 660 ...
##  $ Personal   : num  2200 1500 1165 875 1500 ...
##  $ PhD        : num  70 29 53 92 76 67 90 89 79 40 ...
##  $ Terminal   : num  78 30 66 97 72 73 93 100 84 41 ...
##  $ S.F.Ratio  : num  18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
##  $ perc.alumni: num  12 16 30 37 2 11 26 37 23 15 ...
##  $ Expend     : num  7041 10527 8735 19016 10922 ...
##  $ Grad.Rate  : num  60 56 54 59 15 55 63 73 80 52 ...
college.x=model.matrix(Apps~.,College)[,-1]
college.y=College$Apps

Question 9a

Split the data set into a training and a test set

set.seed(1)
train=sample(c(TRUE,FALSE), nrow(College),rep=TRUE)
test= (!train)

set the \(\lambda\)

grid=10^seq(10,-2,length=100)

Question 9b

Fit a linear model using least squares on the training set and report the test error obtained

Least square method:

Train the training data set with Least square method

colg.lm.model= glmnet(college.x[train,],college.y[train],alpha=0,lambda = grid)

predict the Application received for the test data set based on the trained model

colg.lm.pred= predict(colg.lm.model,s=0,newx = college.x[test,],exact = T,x=college.x[train,],y=college.y[train])

Validation test error for least square method

(colg.lm.err=mean((colg.lm.pred-college.y[test])^2))
## [1] 984706.3

Accuracy of the Least Square method

(colg.lm.acc=(cor(college.y[test],colg.lm.pred)^2))
##             s1
## [1,] 0.9287323

Best Subset Selection method

colg.regfit.full=regsubsets(Apps~.,data=College[train,],nvmax=18)
reg.bs.summary=summary(colg.regfit.full)

Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model

par(mfrow=c(2,2))
plot(reg.bs.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(reg.bs.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(reg.bs.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(reg.bs.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")

Create the Model Matrix for the test data in order find the best model using the best subset selection

colg.test.mat = model.matrix(Apps~.,data=College[test,])

Find the MSE using the test model matrix

Colg.val.errors=rep(NA,17)
colg.val.acc.bss= rep(0,17)
for (i in 1:17){
  colg.coefi= coef(colg.regfit.full,id=i)
  colg.pred=colg.test.mat[,names(colg.coefi)]%*%colg.coefi
  colg.val.acc.bss[i]=(cor(College$Apps[test],colg.pred)^2)
  Colg.val.errors[i]=mean((College$Apps[test]-colg.pred)^2)
}

Find the Minimum MSE to get the Best model

which.min(Colg.val.errors)
## [1] 10

Best Subset Selection Inference:

Based on the subset selection method, model with 10 predictors is the best model.

coef(colg.regfit.full,10)
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
##  132.63597831 -686.81821880    1.67769305   -0.67442437   58.72119828 
##     Top25perc      Outstate    Room.Board           PhD        Expend 
##  -21.48113789   -0.08148979    0.11099350  -13.01400088    0.07754081 
##     Grad.Rate 
##   10.17149120
(colg.bss.err=Colg.val.errors[10])
## [1] 969154.1

Accuracy of the Best Subset Selection model

(colg.bss.acc=colg.val.acc.bss[10])
## [1] 0.929913

Forward Selection method

colg.regfit.fwd=regsubsets(Apps~.,data=College[train,],nvmax=18,method="forward")
reg.fwd.summary=summary(colg.regfit.fwd)

Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model

par(mfrow=c(2,2))
plot(reg.fwd.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(reg.fwd.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(reg.fwd.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(reg.fwd.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")

Find the MSE using the test model matrix

Colg.val.errors.fwd=rep(NA,17)
colg.val.acc.fwd= rep(0,17)
for (i in 1:17){
  colg.coefi.fwd= coef(colg.regfit.fwd,id=i)
  colg.pred.fwd=colg.test.mat[,names(colg.coefi.fwd)]%*%colg.coefi.fwd
  colg.val.acc.fwd[i]=(cor(College$Apps[test],colg.pred.fwd)^2)
  Colg.val.errors.fwd[i]=mean((College$Apps[test]-colg.pred.fwd)^2)
}

Find the Minimum MSE to get the Best model

which.min(Colg.val.errors.fwd)
## [1] 10

Forward Selection Inference:

Based on the forward selection method, model with 10 predictors is the best model.

coef(colg.regfit.fwd,10)
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
##  132.63597831 -686.81821880    1.67769305   -0.67442437   58.72119828 
##     Top25perc      Outstate    Room.Board           PhD        Expend 
##  -21.48113789   -0.08148979    0.11099350  -13.01400088    0.07754081 
##     Grad.Rate 
##   10.17149120
(colg.fwd.err=Colg.val.errors.fwd[10])
## [1] 969154.1

Accuracy of the forward Selection model

(colg.fwd.acc=colg.val.acc.fwd[10])
## [1] 0.929913

Backward Selection method

colg.regfit.bkd=regsubsets(Apps~.,data=College[train,],nvmax=18,method="backward")
reg.bkd.summary=summary(colg.regfit.bkd)

Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model

par(mfrow=c(2,2))
plot(reg.bkd.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(reg.bkd.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(reg.bkd.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(reg.bkd.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")

Find the MSE using the test model matrix

Colg.val.errors.bkd=rep(NA,17)
colg.val.acc.bkd= rep(0,17)
for (i in 1:17){
  colg.coefi.bkd= coef(colg.regfit.bkd,id=i)
  colg.pred.bkd=colg.test.mat[,names(colg.coefi.bkd)]%*%colg.coefi.bkd
  colg.val.acc.bkd[i]=(cor(College$Apps[test],colg.pred.bkd)^2)
  Colg.val.errors.bkd[i]=mean((College$Apps[test]-colg.pred.bkd)^2)
}

Find the Minimum MSE to get the Best model

which.min(Colg.val.errors.bkd)
## [1] 10

Backward Selection Inference:

Based on the Backward selection method, model with 10 predictors is the best model.

coef(colg.regfit.bkd,10)
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
##  132.63597831 -686.81821880    1.67769305   -0.67442437   58.72119828 
##     Top25perc      Outstate    Room.Board           PhD        Expend 
##  -21.48113789   -0.08148979    0.11099350  -13.01400088    0.07754081 
##     Grad.Rate 
##   10.17149120
(colg.bkd.err=Colg.val.errors.bkd[10])
## [1] 969154.1

Accuracy of the backward Selection model

(colg.bkd.acc=colg.val.acc.bkd[10])
## [1] 0.929913

Question 9c

Fit a ridge regression on the training data set with \(\lambda\) chosen by cross-validation.report the test error obtained

Find out the ridge regression model for the training data set using the grid of \(\lambda\)

colg.ridge.model=glmnet(college.x[train,],college.y[train],alpha=0,lambda = grid)

Execute cross-validation to find out the \(\lambda\)

set.seed(2)
colg.cv.out= cv.glmnet(college.x[train,],college.y[train],alpha=0)
plot(colg.cv.out)

Find out the \(\lambda\) with minimum MSE value

(colg.bestlambda=colg.cv.out$lambda.min)
## [1] 394.2365
log(394.2365)
## [1] 5.976951

predict the coefficients of the predictors along with the intercept for the best \(\lambda\) value

predict(colg.ridge.model,s=colg.bestlambda,type = "coefficients")[,]
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -1.301713e+03 -5.814072e+02  1.128794e+00  3.311844e-01  2.946961e+01 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
## -3.410179e+00  7.381110e-02  1.454077e-02 -3.094319e-02  1.760185e-01 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##  1.908980e-01 -6.514172e-02 -4.749598e+00 -8.127520e+00  2.021354e+01 
##   perc.alumni        Expend     Grad.Rate 
## -9.731471e-02  8.349743e-02  1.175989e+01

Predict the application received for the new test data set based on the best \(\lambda\) value

colg.apps.pred.ridge=predict(colg.ridge.model,s=colg.bestlambda,newx = college.x[test,])

Find out the test error for the best fit \(\lambda\)

(colg.ridge.err=mean((colg.apps.pred.ridge-college.y[test])^2))
## [1] 940753.5

Accuracy of the Ridge model

(colg.ridge.acc=(cor(college.y[test],colg.apps.pred.ridge)^2))
##             s1
## [1,] 0.9287269

Question 9d

fit the Lasso model on the training data set with the \(\lambda\) by cross validation. report the test error obtained, along with the number of non-zero coefficient estimates Find out the ridge regression model for the training data set using the grid of \(\lambda\)

colg.lasso.model=glmnet(college.x[train,],college.y[train],alpha=1,lambda = grid)

Execute cross-validation to find out the \(\lambda\)

set.seed(2)
colg.cv.out.lasso= cv.glmnet(college.x[train,],college.y[train],alpha=1)
plot(colg.cv.out.lasso)

Find out the \(\lambda\) with minimum MSE value

(colg.lasso.bestlambda=colg.cv.out.lasso$lambda.min)
## [1] 54.59727

predict the coefficients of the predictors along with the intercept for the best \(\lambda\) value

lasso_coef=predict(colg.lasso.model,s=colg.lasso.bestlambda,type = "coefficients")[,]
lasso_coef
##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -6.662641e+02 -2.453703e+02  1.478906e+00 -4.745848e-03  2.606622e+01 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
##  0.000000e+00  0.000000e+00  0.000000e+00 -1.375606e-02  2.384654e-03 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##  0.000000e+00  0.000000e+00 -1.746498e+00 -5.433008e+00  0.000000e+00 
##   perc.alumni        Expend     Grad.Rate 
##  0.000000e+00  6.674557e-02  3.297330e+00

Predict the application received for the new test data set based on the best \(\lambda\) value

colg.apps.pred.lasso=predict(colg.lasso.model,s=colg.lasso.bestlambda,newx = college.x[test,])

Find out the test error for the best fit \(\lambda\)

(colg.lasso.err=mean((colg.apps.pred.lasso-college.y[test])^2))
## [1] 991035.4

Find out the number of non zero coefficients for the best fit lasso model

length(lasso_coef[lasso_coef!=0])
## [1] 11

Accuracy of the Lasso model

(colg.lasso.acc=(cor(college.y[test],colg.apps.pred.lasso)^2))
##             s1
## [1,] 0.9279095

Question 9e

fit a PCR model on the training data set, with M chosen by cross Validation. Report the test error obtained along with the value of M selected by cross-validation

Execute the PCR to find out M by cross validation

set.seed(3)
colg.pcr.model= pcr(Apps~.,data=College[train,],scale=TRUE,validation="CV")
summary(colg.pcr.model)
## Data:    X dimension: 393 17 
##  Y dimension: 393 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4189     4167     2389     2392     2129     2109     2030
## adjCV         4189     4165     2381     2387     2108     2111     2015
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        2029     1999     1893      1860      1875      1892      1893
## adjCV     2017     1980     1879      1847      1867      1879      1881
##        14 comps  15 comps  16 comps  17 comps
## CV         1918      1759      1374      1368
## adjCV      1934      1716      1359      1352
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      31.858    57.44    64.20    69.91    75.10    80.17    83.82    87.30
## Apps    4.353    70.99    71.18    76.84    78.34    81.03    81.59    82.21
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.26     92.74     94.79     96.70     97.76     98.67     99.37
## Apps    83.31     83.97     83.97     84.34     84.58     84.70     91.28
##       16 comps  17 comps
## X        99.82    100.00
## Apps     92.83     93.02
validationplot(colg.pcr.model,val.type = "MSEP")

Based on the Validation plot, the MSE is low when the components is more than 15 which is very few predictors less than the actual 17 predictors in the data set.If we use the Components as 17, it is more or less equal t least squares and there is no dimension reduction occurs.As per the Plot, the MSE started to reduce when the M component is 2,also when the M component is 2 about 70% variance of application received is explained by that component so for analysis purpose i would consider M=2 is the best model

colg.pcr.pred=predict(colg.pcr.model,college.x[test,],ncomp = 2)

Find the Test Validation Error

(colg.pcr.err=mean((colg.pcr.pred-college.y[test])^2))
## [1] 2973971

Test Validation error is very large when compared to the Best subset selection,Ridge regression and Lasso regression

Accuracy of the PCR model

(colg.pcr.acc=(cor(college.y[test],colg.pcr.pred)^2))
## [1] 0.775845

Question 9f

Fit a PLS Model on the training set, with M chosen by cross validation.report the test error obtained along with the value of M selected by cross validation

Execute the PLS to find out M by cross validation

set.seed(3)
colg.pls.model= plsr(Apps~.,data=College[train,],scale=TRUE,validation="CV")
summary(colg.pls.model)
## Data:    X dimension: 393 17 
##  Y dimension: 393 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4189     2228     1921     1785     1701     1516     1385
## adjCV         4189     2216     1917     1769     1671     1488     1369
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1359     1358     1360      1364      1368      1370      1370
## adjCV     1346     1344     1345      1349      1353      1354      1354
##        14 comps  15 comps  16 comps  17 comps
## CV         1368      1368      1368      1368
## adjCV      1352      1352      1352      1352
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       26.01    44.96    62.49    65.22    68.52    72.89    77.13    80.46
## Apps    75.74    82.40    86.74    90.58    92.34    92.79    92.88    92.93
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       82.45     84.76     88.08     90.76     92.80     94.45     97.02
## Apps    92.98     93.00     93.01     93.01     93.02     93.02     93.02
##       16 comps  17 comps
## X        98.03    100.00
## Apps     93.02     93.02
validationplot(colg.pls.model,val.type = "MSEP")

Based on the Validation plot, the MSE is low when the components is more than 6.When M=2, the percentage Variance of application received explained by the model 82% which is good. At the same time the percentage variance of all other predictors is around 45% which also good. So based on this explanation, M=2 is the best fit model,even though the test validation error is very high compared to ridge, lasso, best subset selection methods.

colg.pls.pred=predict(colg.pls.model,college.x[test,],ncomp = 2)

Find the Test Validation Error

(colg.pls.err=mean((colg.pls.pred-college.y[test])^2))
## [1] 1852867

Accuracy of the PLS model

(colg.pls.acc=(cor(college.y[test],colg.pls.pred)^2))
## [1] 0.8718939

Test Validation error is very large, when compared to the PCR, Best subset selection,Ridge regression and Lasso regression.

Question 9g

Comments on the results obtained. How accurately can we predict the number of college application received?is there much difference among the test errors resulting from the five approaches?

Model=c("Least Square","Best Subset","Forward","Backward","Ridge","Lasso","PCR","PLS")
Val_error = c(colg.lm.err,colg.bss.err,colg.fwd.err,colg.bkd.err,colg.ridge.err,colg.lasso.err,colg.pcr.err,colg.pls.err)
Accuracy = c(colg.lm.acc,colg.bss.acc,colg.fwd.acc,colg.bkd.acc,colg.ridge.acc,colg.lasso.acc,colg.pcr.acc,colg.pls.acc)
(results.df= data.frame(Model,Val_error,Accuracy))
##          Model Val_error  Accuracy
## 1 Least Square  984706.3 0.9287323
## 2  Best Subset  969154.1 0.9299130
## 3      Forward  969154.1 0.9299130
## 4     Backward  969154.1 0.9299130
## 5        Ridge  940753.5 0.9287269
## 6        Lasso  991035.4 0.9279095
## 7          PCR 2973970.9 0.7758450
## 8          PLS 1852867.5 0.8718939

Inference about the 8 approaches

  • As per the above table, the validation error is minimum for Ridge regression model.Accuracy is also good. But the Model is explained by 10 predictors.
  • Whereas in PCR and PLS the validation is quite large, but the variance of the application received is very well explained by very few components(PCR-6 components, PLS=2 components) with greater accuracy.
  • So as a conclusion, I feel like the PLS is the best model to predict the number of application received by a college

Question 11

Question 11a

Try out some of the regression methods explored in this chapter, such as best subset selection ,lasso regression,ridge regression and PCR. present and discuss the results for the approaches that you consider

Check the data

data(Boston)
str(Boston)
## 'data.frame':    506 obs. of  14 variables:
##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ black  : num  397 397 393 395 397 ...
##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
boston.x=model.matrix(crim~.,Boston)[,-1]
boston.y=Boston$crim

Split the data set into a training and a test set

set.seed(1)
Boston.train=sample(c(TRUE,FALSE), nrow(Boston),rep=TRUE)
Boston.test= (!Boston.train)

set the \(\lambda\)

grid=10^seq(10,-2,length=100)

Least square method:

Train the training data set with Least square method

boston.lm.model= glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=0,lambda = grid)

predict the Application received for the test data set based on the trained model

boston.lm.pred= predict(boston.lm.model,s=0,newx = boston.x[Boston.test,],exact = T,x=boston.x[Boston.train,],y=boston.y[Boston.train])

Validation test error for least square method

(boston.lm.err=mean((boston.lm.pred-boston.y[Boston.test])^2))
## [1] 50.66302

Accuracy of the Least Square method

(boston.lm.acc=(cor(boston.y[Boston.test],boston.lm.pred)^2))
##             s1
## [1,] 0.3969972

Best Subset Selection method

boston.regfit.full=regsubsets(crim~.,data=Boston[Boston.train,],nvmax=18)
(boston.reg.bs.summary=summary(boston.regfit.full))
## Subset selection object
## Call: regsubsets.formula(crim ~ ., data = Boston[Boston.train, ], nvmax = 18)
## 13 Variables  (and intercept)
##         Forced in Forced out
## zn          FALSE      FALSE
## indus       FALSE      FALSE
## chas        FALSE      FALSE
## nox         FALSE      FALSE
## rm          FALSE      FALSE
## age         FALSE      FALSE
## dis         FALSE      FALSE
## rad         FALSE      FALSE
## tax         FALSE      FALSE
## ptratio     FALSE      FALSE
## black       FALSE      FALSE
## lstat       FALSE      FALSE
## medv        FALSE      FALSE
## 1 subsets of each size up to 13
## Selection Algorithm: exhaustive
##           zn  indus chas nox rm  age dis rad tax ptratio black lstat medv
## 1  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     " "   " "   " " 
## 2  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     "*"   " "   " " 
## 3  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     "*"   "*"   " " 
## 4  ( 1 )  " " "*"   " "  " " " " " " " " "*" " " " "     "*"   "*"   " " 
## 5  ( 1 )  "*" " "   " "  " " " " " " "*" "*" " " " "     "*"   " "   "*" 
## 6  ( 1 )  "*" "*"   " "  " " " " " " "*" "*" " " " "     "*"   " "   "*" 
## 7  ( 1 )  "*" "*"   " "  " " " " " " "*" "*" " " " "     "*"   "*"   "*" 
## 8  ( 1 )  "*" "*"   " "  "*" " " " " "*" "*" " " " "     "*"   "*"   "*" 
## 9  ( 1 )  "*" "*"   " "  "*" " " " " "*" "*" " " "*"     "*"   "*"   "*" 
## 10  ( 1 ) "*" "*"   " "  "*" " " "*" "*" "*" " " "*"     "*"   "*"   "*" 
## 11  ( 1 ) "*" "*"   " "  "*" "*" "*" "*" "*" " " "*"     "*"   "*"   "*" 
## 12  ( 1 ) "*" "*"   " "  "*" "*" "*" "*" "*" "*" "*"     "*"   "*"   "*" 
## 13  ( 1 ) "*" "*"   "*"  "*" "*" "*" "*" "*" "*" "*"     "*"   "*"   "*"

Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model

par(mfrow=c(2,2))
plot(boston.reg.bs.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(boston.reg.bs.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(boston.reg.bs.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(boston.reg.bs.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")

Create the Model Matrix for the test data in order find the best model using the best subset selection

boston.test.mat = model.matrix(crim~.,data=Boston[Boston.test,])

Find the MSE using the test model matrix

boston.val.errors=rep(NA,13)
boston.val.acc.bss= rep(0,13)
for (i in 1:13){
  boston.coefi= coef(boston.regfit.full,id=i)
  boston.pred=boston.test.mat[,names(boston.coefi)]%*%boston.coefi
  boston.val.acc.bss[i]=(cor(Boston$crim[Boston.test],boston.pred)^2)
  boston.val.errors[i]=mean((Boston$crim[Boston.test]-boston.pred)^2)
}

Find the Minimum MSE to get the Best model

which.min(boston.val.errors)
## [1] 9

Best Subset Selection Inference:

Based on the subset selection method, model with 9 predictors is the best model.

coef(boston.regfit.full,9)
## (Intercept)          zn       indus         nox         dis         rad 
## 16.49784501  0.04428501 -0.11356470 -6.80041892 -0.87067024  0.48133294 
##     ptratio       black       lstat        medv 
## -0.17759119 -0.01438142  0.12943566 -0.13215744
(boston.bss.err=boston.val.errors[9])
## [1] 50.43627

Accuracy of the Best Subset Selection model

(boston.bss.acc=boston.val.acc.bss[9])
## [1] 0.3996243

Forward Selection method

boston.regfit.fwd=regsubsets(crim~.,data=Boston[Boston.train,],nvmax=18,method="forward")
boston.reg.fwd.summary=summary(boston.regfit.fwd)

Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model

par(mfrow=c(2,2))
plot(boston.reg.fwd.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(boston.reg.fwd.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(boston.reg.fwd.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(boston.reg.fwd.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")

Find the MSE using the test model matrix

boston.val.errors.fwd=rep(NA,13)
boston.val.acc.fwd= rep(0,13)
for (i in 1:13){
  boston.coefi.fwd= coef(boston.regfit.fwd,id=i)
  boston.pred.fwd=boston.test.mat[,names(boston.coefi.fwd)]%*%boston.coefi.fwd
  boston.val.acc.fwd[i]=(cor(Boston$crim[Boston.test],boston.pred.fwd)^2)
  boston.val.errors.fwd[i]=mean((Boston$crim[Boston.test]-boston.pred.fwd)^2)
}

Find the Minimum MSE to get the Best model

which.min(boston.val.errors.fwd)
## [1] 9

Forward Selection Inference:

Based on the forward selection method, model with 10 predictors is the best model.

coef(boston.regfit.fwd,9)
## (Intercept)          zn       indus         nox         dis         rad 
## 16.49784501  0.04428501 -0.11356470 -6.80041892 -0.87067024  0.48133294 
##     ptratio       black       lstat        medv 
## -0.17759119 -0.01438142  0.12943566 -0.13215744
(boston.fwd.err=boston.val.errors.fwd[9])
## [1] 50.43627

Accuracy of the forward Selection model

(boston.fwd.acc=boston.val.acc.fwd[10])
## [1] 0.3985968

Backward Selection method

boston.regfit.bkd=regsubsets(crim~.,data=Boston[Boston.train,],nvmax=18,method="backward")
boston.reg.bkd.summary=summary(boston.regfit.bkd)

Plot the R2, Adjr2, Mallow CP, BIC score to find out the best model

par(mfrow=c(2,2))
plot(boston.reg.bkd.summary$rsq,xlab="Number of Predictor Variables",ylab = "R Square")
plot(boston.reg.bkd.summary$adjr2,xlab="Number of Predictor Variables",ylab = "Adjusted R Square")
plot(boston.reg.bkd.summary$cp,xlab="Number of Predictor Variables",ylab = "CP")
plot(boston.reg.bkd.summary$bic,xlab="Number of Predictor Variables",ylab = "bic")

Find the MSE using the test model matrix

boston.val.errors.bkd=rep(NA,13)
boston.val.acc.bkd= rep(0,13)
for (i in 1:13){
  boston.coefi.bkd= coef(boston.regfit.bkd,id=i)
  boston.pred.bkd=boston.test.mat[,names(boston.coefi.bkd)]%*%boston.coefi.bkd
  boston.val.acc.bkd[i]=(cor(Boston$crim[Boston.test],boston.pred.bkd)^2)
  boston.val.errors.bkd[i]=mean((Boston$crim[Boston.test]-boston.pred.bkd)^2)
}

Find the Minimum MSE to get the Best model

which.min(boston.val.errors.bkd)
## [1] 9

Backward Selection Inference:

Based on the Backward selection method, model with 10 predictors is the best model.

coef(boston.regfit.bkd,9)
## (Intercept)          zn       indus         nox         dis         rad 
## 16.49784501  0.04428501 -0.11356470 -6.80041892 -0.87067024  0.48133294 
##     ptratio       black       lstat        medv 
## -0.17759119 -0.01438142  0.12943566 -0.13215744
(boston.bkd.err=boston.val.errors.bkd[9])
## [1] 50.43627

Accuracy of the backward Selection model

(boston.bkd.acc=boston.val.acc.bkd[9])
## [1] 0.3996243

Ridge Regression Method:

Find out the ridge regression model for the training data set using the grid of \(\lambda\)

boston.ridge.model=glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=0,lambda = grid)

Execute cross-validation to find out the \(\lambda\)

set.seed(2)
boston.cv.out= cv.glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=0)
plot(boston.cv.out)

Find out the \(\lambda\) with minimum MSE value

(boston.bestlambda=boston.cv.out$lambda.min)
## [1] 0.5240686
log(0.5240686)
## [1] -0.6461327

predict the coefficients of the predictors along with the intercept for the best \(\lambda\) value

predict(boston.ridge.model,s=boston.bestlambda,type = "coefficients")[,]
## (Intercept)          zn       indus        chas         nox          rm 
## 11.40028509  0.03571144 -0.10725767 -0.30096628 -3.91773023 -0.21352290 
##         age         dis         rad         tax     ptratio       black 
##  0.01235778 -0.59941426  0.36811114  0.00436501 -0.08864320 -0.01513869 
##       lstat        medv 
##  0.12230979 -0.08876017

Predict the application received for the new test data set based on the best \(\lambda\) value

boston.apps.pred.ridge=predict(boston.ridge.model,s=boston.bestlambda,newx = boston.x[Boston.test,])

Find out the test error for the best fit \(\lambda\)

(boston.ridge.err=mean((boston.apps.pred.ridge-boston.y[Boston.test])^2))
## [1] 51.45778

Accuracy of the Ridge model

(boston.ridge.acc=(cor(boston.y[Boston.test],boston.apps.pred.ridge)^2))
##             s1
## [1,] 0.3883494

Lasso regression Model:

Find out the lasso regression model for the training data set using the grid of \(\lambda\)

boston.lasso.model=glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=1,lambda = grid)

Execute cross-validation to find out the \(\lambda\)

set.seed(2)
boston.cv.out.lasso= cv.glmnet(boston.x[Boston.train,],boston.y[Boston.train],alpha=1)
plot(boston.cv.out.lasso)

Find out the \(\lambda\) with minimum MSE value

(boston.lasso.bestlambda=boston.cv.out.lasso$lambda.min)
## [1] 0.115564

predict the coefficients of the predictors along with the intercept for the best \(\lambda\) value

boston.lasso_coef=predict(boston.lasso.model,s=boston.lasso.bestlambda,type = "coefficients")[,]
boston.lasso_coef
##  (Intercept)           zn        indus         chas          nox           rm 
##  6.914027481  0.029229796 -0.057097795 -0.006762669 -0.315009674  0.000000000 
##          age          dis          rad          tax      ptratio        black 
##  0.001418141 -0.436111039  0.433432142  0.000000000 -0.005861043 -0.014563026 
##        lstat         medv 
##  0.123607242 -0.079365869

Predict the application received for the new test data set based on the best \(\lambda\) value

boston.apps.pred.lasso=predict(boston.lasso.model,s=boston.lasso.bestlambda,newx = boston.x[Boston.test,])

Find out the test error for the best fit \(\lambda\)

(boston.lasso.err=mean((boston.apps.pred.lasso-boston.y[Boston.test])^2))
## [1] 51.34046

Find out the number of non zero coefficients for the best fit lasso model

length(boston.lasso_coef[boston.lasso_coef!=0])
## [1] 12

Accuracy of the Lasso model

(boston.lasso.acc=(cor(boston.y[Boston.test],boston.apps.pred.lasso)^2))
##             s1
## [1,] 0.3893684

PCR Analysis

Execute the PCR to find out M by cross validation

set.seed(3)
boston.pcr.model= pcr(crim~.,data=Boston[Boston.train,],scale=TRUE,validation="CV")
summary(boston.pcr.model)
## Data:    X dimension: 262 13 
##  Y dimension: 262 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.065    6.574    6.576    6.078    6.042    6.036    6.114
## adjCV        8.065    6.572    6.574    6.072    6.035    6.031    6.105
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.085    5.984    5.977     5.976     5.981     5.962     5.906
## adjCV    6.076    5.964    5.960     5.964     5.969     5.948     5.892
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       44.58    58.02    68.58    75.64    82.03    87.37    90.66    93.02
## crim    33.90    34.29    44.32    44.91    45.57    45.62    46.13    48.13
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.01     96.87      98.4     99.54    100.00
## crim    48.80     48.96      49.0     49.91     50.71
validationplot(boston.pcr.model,val.type = "MSEP")

Based on the Validation plot, the MSE is low when the components is more than 12 which is very few predictors less than the actual 13 predictors in the data set.If we use the Components as 13, it is more or less equal to least squares and there is no dimension reduction occurs.As per the Plot, the MSE started to reduce when the M component is 2,also when the M component is 2 about 70% variance of application received is explained by that component so for analysis purpose i would consider M=2 is the best model

boston.pcr.pred=predict(boston.pcr.model,boston.x[Boston.test,],ncomp = 2)

Find the Test Validation Error

(boston.pcr.err=mean((boston.pcr.pred-boston.y[Boston.test])^2))
## [1] 61.27925

Test Validation error is very large when compared to the Best subset selection,Ridge regression and Lasso regression

Accuracy of the PCR model

(boston.pcr.acc=(cor(boston.y[Boston.test],boston.pcr.pred)^2))
## [1] 0.2767922

PLS Method:

Execute the PLS to find out M by cross validation

set.seed(3)
boston.pls.model= plsr(crim~.,data=Boston[Boston.train,],scale=TRUE,validation="CV")
summary(boston.pls.model)
## Data:    X dimension: 262 13 
##  Y dimension: 262 1
## Fit method: kernelpls
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.065    6.334    5.988    5.977    5.953    5.924    5.903
## adjCV        8.065    6.331    5.980    5.960    5.938    5.911    5.890
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       5.932    5.917    5.896     5.905     5.906     5.906     5.906
## adjCV    5.916    5.902    5.883     5.891     5.892     5.892     5.892
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       43.95    54.85    60.29    69.59    76.82    79.48    83.83    87.61
## crim    39.21    47.44    49.46    50.05    50.31    50.56    50.64    50.68
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       90.24     94.42     96.55     98.20    100.00
## crim    50.70     50.71     50.71     50.71     50.71
validationplot(boston.pls.model,val.type = "MSEP")

Based on the Validation plot, the MSE is low when the components is more than 2.When M=2, the percentage Variance of application received explained by the model 47.4% which is good. At the same time the percentage variance of all other predictors is around 55% which also good. So based on this explanation, M=2 is the best fit model,even though the test validation error is very high compared to ridge, lasso, best subset selection methods.

boston.pls.pred=predict(boston.pls.model,boston.x[Boston.test,],ncomp = 2)

Find the Test Validation Error

(boston.pls.err=mean((boston.pls.pred-boston.y[Boston.test])^2))
## [1] 53.7501

Accuracy of the PLS model

(boston.pls.acc=(cor(boston.y[Boston.test],boston.pls.pred)^2))
## [1] 0.3591553

Test Validation error is very large, when compared to the PCR, Best subset selection,Ridge regression and Lasso regression.

Question 11b:

propose a model(or set of models) that seem to perform well on this data set and justify your answer.Make sure that you are evaluating model performance using validation set error, cross validation or some other reasonable alternative,as opposed to using training error

Boston.Model=c("Least Square","Best Subset","Forward","Backward","Ridge","Lasso","PCR","PLS")
Boston.Val_error = c(boston.lm.err,boston.bss.err,boston.fwd.err,boston.bkd.err,boston.ridge.err,boston.lasso.err,boston.pcr.err,boston.pls.err)
Boston.Accuracy = c(boston.lm.acc,boston.bss.acc,boston.fwd.acc,boston.bkd.acc,boston.ridge.acc,boston.lasso.acc,boston.pcr.acc,boston.pls.acc)
(Boston.results.df= data.frame(Boston.Model,Boston.Val_error,Boston.Accuracy))
##   Boston.Model Boston.Val_error Boston.Accuracy
## 1 Least Square         50.66302       0.3969972
## 2  Best Subset         50.43627       0.3996243
## 3      Forward         50.43627       0.3985968
## 4     Backward         50.43627       0.3996243
## 5        Ridge         51.45778       0.3883494
## 6        Lasso         51.34046       0.3893684
## 7          PCR         61.27925       0.2767922
## 8          PLS         53.75010       0.3591553

Inference about the 8 approaches

  • As per the above table, the validation error is minimum for Best subset selection.Accuracy is also good compared to other models. But the Model is explained by 9 predictors.
  • Whereas in PCR and PLS the validation error is quite large, but the variance of the application received is very well explained by very few components(PCR-2 components, PLS=2 components) with reasonable accuracy.
  • So as a conclusion, I feel like the PLS and best subset selection is the best model to predict the per capita crime rate in Boston

Question 11c

Does your chosen model involve all the features in the data set? why or not?

As per the Best subset selection , all the features available in the data set are not used in the model. only the below predictors are used to predict the per capita crime rate in boston city.

* zn
* indus
* nox
* dis
* rad
* ptratio
* black
* lstat
* medv

If we use all the predictors it will result to over fitting so based on this analysis , will go with prediction of per capita crime using the above predictors.

As per the PLS method only two components will be used to predict the per capita crima rate in boston city. Two components capture the major variance explained by the predictors inorder to predict the per capita crime in boston city