Repeat (a) for ridge regression relative to least squares. The same as part a, because it is less flexible hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
Repeat (a) for non-linear methods relative to least squares.
attach(College)
set.seed(1)
index = sample(1:nrow(College), 0.7*nrow(College))
college.train = College[index,]
college.test = College[-index,]
We find the test error rate to be 1261630
lm.apps = lm(Apps ~ ., college.train)
predict.apps = predict(lm.apps, college.test)
test.error = mean((college.test$Apps-predict.apps)^2)
test.error
## [1] 1261630
Here we get a test error of 1261271
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.0.4
## Loading required package: Matrix
## Loaded glmnet 4.1-1
matrix.train = model.matrix(Apps~.,data=college.train)
matrix.test = model.matrix(Apps~.,data=college.test)
grid.college = 10^seq(2,-2,length=100)
ridge.college = glmnet(matrix.train,college.train$Apps,alpha=0,lambda=grid.college,thresh = 1e-12)
college.cv.ridge = cv.glmnet(matrix.train,college.train$Apps,alpha=0,lambda=grid.college, thresh = 1e-12)
college.bestlam.ridge = college.cv.ridge$lambda.min
college.bestlam.ridge
## [1] 0.01
college.pred.newridge = predict(ridge.college,s = college.bestlam.ridge,newx = matrix.test)
ridge.mse.college = mean((college.test$Apps-college.pred.newridge)^2)
ridge.mse.college
## [1] 1261598
Test error rate for cross validation comes out to 1261591
lasso = glmnet(matrix.train,college.train$Apps,alpha=1,lambda=grid.college,thresh = 1e-12)
cv.lasso = cv.glmnet(matrix.train,college.train$Apps,alpha=1,lambda=grid.college,thresh=1e-12)
best.lasso = cv.lasso$lambda.min
best.lasso
## [1] 0.01
pred.lasso = predict(lasso,s=best.lasso,newx =matrix.test)
mean((college.test$Apps-pred.lasso)^2)
## [1] 1261591
predict(lasso,s=best.lasso,type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -5.377440e+02
## (Intercept) .
## PrivateYes -5.044661e+02
## Accept 1.722328e+00
## Enroll -1.054309e+00
## Top10perc 5.357483e+01
## Top25perc -1.613811e+01
## F.Undergrad 2.961764e-02
## P.Undergrad 7.161170e-02
## Outstate -8.840082e-02
## Room.Board 1.630175e-01
## Books 2.725806e-01
## Personal -7.289853e-03
## PhD -9.674379e+00
## Terminal -3.774252e-01
## S.F.Ratio 1.626145e+01
## perc.alumni 2.354532e+00
## Expend 5.986089e-02
## Grad.Rate 7.157181e+00
With PCR model on the training set we return a test error of 2067220
library(pls)
## Warning: package 'pls' was built under R version 4.0.4
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcrmodel = pcr(Apps~.,data=college.train,scale=TRUE,validation="CV")
summary(pcrmodel)
## Data: X dimension: 543 17
## Y dimension: 543 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 3895 3793 2133 2152 1835 1705 1719
## adjCV 3895 3794 2129 2153 1808 1695 1712
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1719 1670 1646 1639 1642 1651 1646
## adjCV 1713 1658 1639 1632 1635 1643 1639
## 14 comps 15 comps 16 comps 17 comps
## CV 1647 1621 1216 1197
## adjCV 1640 1603 1205 1185
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.051 57.00 64.42 70.27 75.65 80.65 84.26 87.61
## Apps 5.788 71.69 71.70 80.97 82.60 82.60 82.69 84.06
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.58 92.84 94.93 96.74 97.82 98.72 99.39
## Apps 84.55 84.82 84.86 84.86 85.01 85.05 89.81
## 16 comps 17 comps
## X 99.85 100.00
## Apps 93.03 93.32
predict.pcr = predict(pcrmodel,college.test,ncomp=6)
mean((college.test$Apps-predict.pcr)^2)
## [1] 2067220
Fitting a PLS model on a training set we return a test error of 1283996
plsr.model = plsr(Apps~.,data=college.train,scale=TRUE,validation="CV")
predict.plsr<-predict(plsr.model,college.test,ncomp=6)
mean((college.test$Apps-predict.plsr)^2)
## [1] 1283996
There isn’t much different between any of the five approaches. Overall we see that we return the highest accuracy with ridge regression.
test.avg <- mean(college.test$Apps)
lm <- 1 - mean((predict.apps - college.test$Apps)^2) / mean((test.avg - college.test$Apps)^2)
ridge <- 1 - mean((college.pred.newridge - college.test$Apps)^2) / mean((test.avg - college.test$Apps)^2)
lasso <- 1 - mean((pred.lasso - college.test$Apps)^2) / mean((test.avg - college.test$Apps)^2)
pcr <- 1 - mean((predict.pcr - college.test$Apps)^2) / mean((test.avg - college.test$Apps)^2)
pls <- 1 - mean((predict.plsr - college.test$Apps)^2) / mean((test.avg - college.test$Apps)^2)
lm
## [1] 0.9134458
ridge
## [1] 0.913448
lasso
## [1] 0.9134485
pcr
## [1] 0.8581783
pls
## [1] 0.9119114
attach(Boston)
dim(Boston)
## [1] 506 14
str(Boston)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ black : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
set.seed(1)
index = sample(1:nrow(Boston), 0.75*nrow(Boston))
boston.train = Boston[index,]
boston.test = Boston[-index,]
We’ll first start with Ridge Regression. Looking at the mse we return a value of 67.96, which at fist glance might seem high, but as it is our first model examining we’ll need to compare it with the others.
boston.x = model.matrix(crim~.,data=boston.train)
boston.y = model.matrix(crim~.,data=boston.test)
grid = 10^seq(4,-2,length=100)
ridge.model = glmnet(boston.x,boston.train$crim,alpha=0,lambda=grid)
bos.cv.ridge<-cv.glmnet(boston.x,boston.train$crim,alpha=0,lambda=grid)
boston.bestlam.ridge<-bos.cv.ridge$lambda.min
boston.bestlam.ridge
## [1] 0.1873817
ridge.model.bos = glmnet(boston.x,boston.train$crim,alpha=0,lambda=boston.bestlam.ridge)
coef(ridge.model.bos)
## 15 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 11.2629302279
## (Intercept) .
## zn 0.0305370585
## indus -0.0824127219
## chas -0.5553656195
## nox -5.2788768616
## rm 0.1767109007
## age -0.0009530483
## dis -0.6099655904
## rad 0.4452240286
## tax 0.0008107787
## ptratio -0.2203982049
## black -0.0108054534
## lstat 0.2037272188
## medv -0.0992378183
boston.pred.newridge = predict(ridge.model,s=boston.bestlam.ridge,newx=boston.y)
ridge.mse = mean((boston.test$crim-boston.pred.newridge)^2)
ridge.mse
## [1] 67.96956
Next we’ll look at Lasso. After computing the mse we return a value of 68.05, which in comparison to our Ridge regression model is only slightly lower.
lasso.model = glmnet(boston.x,boston.train$crim,alpha=0,lambda=grid)
bos.cv.lasso = cv.glmnet(boston.x,boston.train$crim,alpha=0,lambda=grid)
bos.bestlam.lasso<-bos.cv.lasso$lambda.min
bos.bestlam.lasso
## [1] 0.2154435
lasso.model.bos = glmnet(boston.x,boston.train$crim,alpha=0,lambda=bos.bestlam.lasso)
coef(lasso.model.bos)
## 15 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 10.9109334208
## (Intercept) .
## zn 0.0301534426
## indus -0.0827857455
## chas -0.5538777338
## nox -5.0631981876
## rm 0.1715961065
## age -0.0008422032
## dis -0.5978626881
## rad 0.4374621932
## tax 0.0011360955
## ptratio -0.2135348396
## black -0.0108410516
## lstat 0.2034722538
## medv -0.0970580327
boston.pred.las = predict(lasso.model,s=bos.bestlam.lasso,newx=boston.y)
lasso.mse<-mean((boston.test$crim-boston.pred.las)^2)
lasso.mse
## [1] 68.05862
For our next model we’ll look at PCR. We return a mse of 73.48, slightly higher than our Lasso and Ridge Regression model.
pcr.model.bos = pcr(crim~.,data=boston.train,scale=TRUE,validation="CV")
summary(pcr.model.bos)
## Data: X dimension: 379 13
## Y dimension: 379 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 7.897 6.531 6.543 6.085 6.040 6.063 6.099
## adjCV 7.897 6.528 6.539 6.079 6.031 6.055 6.090
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.079 5.943 5.930 5.942 5.951 5.953 5.909
## adjCV 6.070 5.932 5.918 5.930 5.939 5.940 5.895
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 48.38 61.00 70.43 77.12 83.29 88.16 91.41 93.66
## crim 32.67 32.85 42.57 43.46 43.46 43.55 43.96 46.49
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.45 97.10 98.44 99.50 100.00
## crim 46.91 47.19 47.21 47.63 48.67
predict.pcr.bos = predict(pcr.model.bos,boston.test,ncomp=5)
mean((boston.test$crim-predict.pcr.bos)^2)
## [1] 73.48524
pcr.fit.bos = pcr(crim ~ .,data = boston.train,scale=TRUE,ncomp=5)
summary(pcr.fit.bos)
## Data: X dimension: 379 13
## Y dimension: 379 1
## Fit method: svdpc
## Number of components considered: 5
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps
## X 48.38 61.00 70.43 77.12 83.29
## crim 32.67 32.85 42.57 43.46 43.46
Lastly, we’ll be looking at PLS. For PLS we return a 67.37 mse which is our lowest out of the other models. So it would ultimately appear that the PLS model preforms the best with the data.
pls.model.bos = plsr(crim~.,data=boston.train,scale=TRUE,validation="CV")
predict.pls.bos = predict(pls.model.bos,boston.test,ncomp=5)
mean((boston.test$crim-predict.pls.bos)^2)
## [1] 68.37759
pls.fit.bos = plsr(crim ~ ., data = boston.train, scale = TRUE, ncomp = 5)
summary(pls.fit.bos)
## Data: X dimension: 379 13
## Y dimension: 379 1
## Fit method: kernelpls
## Number of components considered: 5
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps
## X 47.89 57.54 62.26 71.53 78.23
## crim 36.89 45.51 47.55 48.04 48.27
As mentioned previously after performing Ridge Regression, Lasso, Principle Component Regression, and Partial Least Squares that we return the lowest validation set error of 67.37 from the PLS model, which would be our proposal for the best model to run on the data.
We ended up using the entire model but decided on 5 components as they accounted for a little over 80% of the model variation in the PCR model.