Q2

A)

III is correct. By increasing lambda we get a big reduction in variance and a small reduction in the bias. This reduces the MSE and produces better accuracy. Compared to Ridge, Lasso allows for some coefficients to be zero when lambda is large enough. This also uses variables selection which is better for interpreting the results.

B)

III is correct. When lambda is zero ridge will assimilate OLS. When the shrinkage parameter for ridge regression increases the flexibility of the model decreases. This will result in a substantial reduction in the variance with a small bias increase. This would reduce total MSE.

C)

II is correct. More flexible models

Q9

A)

library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.3
college = College

set.seed(42)
train = sample(nrow(college), .8*(nrow(college)) )

college.train = college[train, ]
college.test = college[-train, ]

B)

college.lm = lm(Apps~ ., college.train)

college.lm.pred = predict(college.lm, newdata = college.test)

college.error = mean((college.lm.pred - college.test$Apps)^2)
college.error
## [1] 1941715

C)

library(glmnet)
## Warning: package 'glmnet' was built under R version 4.0.4
## Loading required package: Matrix
## Loaded glmnet 4.1-1
set.seed(42)
xtrain = model.matrix(Apps ~., college.train)[,-1]
ytrain = college.train$Apps
xtest = model.matrix(Apps~., college.test)[,-1]
ytest = college.test$Apps

college.ridge = cv.glmnet(xtrain, ytrain, alpha = 0)
plot(college.ridge)

bestlam=college.ridge$lambda.min
bestlam
## [1] 337.0816
ridge.pred=predict(college.ridge,s=bestlam,newx=xtest)
mean((ridge.pred-ytest)^2)
## [1] 3844468

D)

set.seed(42)

college.lasso = cv.glmnet(xtrain, ytrain, alpha = 1)
plot(college.lasso)

bestlam= college.lasso$lambda.min
bestlam
## [1] 8.74735
lasso.pred=predict(college.lasso,s=bestlam,newx=xtest)
mean((lasso.pred-ytest)^2)
## [1] 2053837
lasso.coef=predict(college.lasso,type="coefficients",s=bestlam)[1:18,]
lasso.coef[lasso.coef!=0]
##   (Intercept)    PrivateYes        Accept     Top10perc     Top25perc 
## -8.726150e+02 -5.462323e+02  1.227041e+00  4.201700e+01 -1.039634e+01 
##   F.Undergrad   P.Undergrad      Outstate    Room.Board      Personal 
##  5.288884e-02  2.995941e-02 -3.894791e-02  1.973311e-01 -2.143431e-04 
##           PhD      Terminal     S.F.Ratio   perc.alumni        Expend 
## -6.177096e+00 -5.522741e+00  1.635132e+01 -4.051310e+00  7.855491e-02 
##     Grad.Rate 
##  8.035061e+00

E)

library(pls)
## Warning: package 'pls' was built under R version 4.0.4
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
set.seed(42)
college.pcr = pcr(Apps~., data = college.train, scale = TRUE, validation = "CV")
summary(college.pcr)
## Data:    X dimension: 621 17 
##  Y dimension: 621 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3606     3497     1705     1715     1480     1264     1257
## adjCV         3606     3498     1704     1716     1466     1253     1255
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1229     1212     1187      1179      1182      1181      1183
## adjCV     1226     1204     1186      1176      1180      1178      1180
##        14 comps  15 comps  16 comps  17 comps
## CV         1183      1190      1055      1035
## adjCV      1180      1187      1052      1032
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      31.872    57.63    64.85    70.51    75.83    80.81    84.41    87.82
## Apps    6.168    77.92    77.92    84.76    88.39    88.39    88.95    89.44
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.84     93.24     95.34     97.11     98.14     98.92     99.44
## Apps    89.66     89.91     89.97     90.07     90.08     90.12     90.13
##       16 comps  17 comps
## X        99.82    100.00
## Apps     92.34     92.71
validationplot(college.pcr,val.type="MSEP")

pcr.pred=predict(college.pcr,xtest,ncomp=9)
mean((pcr.pred-ytest)^2)
## [1] 5261564

F)

set.seed(42)
college.pls = plsr(Apps~., data = college.train, scale = TRUE, validation ="CV")
summary(college.pls)
## Data:    X dimension: 621 17 
##  Y dimension: 621 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3606     1546     1324     1154     1130     1096     1058
## adjCV         3606     1544     1324     1152     1125     1083     1051
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1041     1039     1037      1034      1036      1035      1035
## adjCV     1037     1036     1034      1031      1032      1031      1032
##        14 comps  15 comps  16 comps  17 comps
## CV         1035      1035      1035      1035
## adjCV      1032      1032      1032      1032
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       26.34    46.91    62.99    66.01    68.24    71.98    75.40    81.19
## Apps    82.05    87.06    90.33    91.21    92.20    92.54    92.63    92.64
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       83.51     85.23     87.18     88.80     91.34     93.31     97.11
## Apps    92.67     92.69     92.70     92.71     92.71     92.71     92.71
##       16 comps  17 comps
## X        99.34    100.00
## Apps     92.71     92.71
validationplot(college.pls,val.type="MSEP")

pls.pred=predict(college.pls,xtest,ncomp=9)
mean((pls.pred-ytest)^2)
## [1] 1960015

G)

There was a considerable amount of variability among the test errors. The best model was OLS and the worst was PCR.

Q11

A)

library(MASS)
boston = Boston
set.seed(42)
train = sample(nrow(boston), .8*(nrow(boston)) )
boston.train = boston[train,]
boston.test = boston[-train,]

RIDGE

set.seed(42)

xtrain = model.matrix(crim ~., boston.train)[,-1]
ytrain = boston.train$crim
xtest = model.matrix(crim~., boston.test)[,-1]
ytest = boston.test$crim

boston.ridge = cv.glmnet(xtrain, ytrain, alpha = 0)
plot(boston.ridge)

bestlam=boston.ridge$lambda.min
bestlam
## [1] 0.567925
ridge.pred=predict(boston.ridge,s=bestlam,newx=xtest)
mean((ridge.pred-ytest)^2)
## [1] 10.98377

LASSO

set.seed(42)
boston.lasso = cv.glmnet(xtrain, ytrain, alpha = 1)
plot(boston.lasso)

bestlam= boston.lasso$lambda.min
bestlam
## [1] 0.03102164
lasso.pred=predict(boston.lasso,s=bestlam,newx=xtest)
mean((lasso.pred-ytest)^2)
## [1] 12.76214
lasso.coef=predict(boston.lasso,type="coefficients",s=bestlam)[1:14,]
lasso.coef[lasso.coef!=0]
##   (Intercept)            zn         indus          chas           nox 
##  20.682539741   0.044316097  -0.076586514  -0.927983848 -12.483375461 
##            rm           dis           rad           tax       ptratio 
##   0.523101614  -1.105489998   0.614518358  -0.003255396  -0.299417050 
##         black         lstat          medv 
##  -0.010513309   0.110123592  -0.233892553

PCR

set.seed(42)

boston.pcr = pcr(crim~., data = boston.train, scale = TRUE, validation = "CV")
summary(boston.pcr)
## Data:    X dimension: 404 13 
##  Y dimension: 404 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           9.315    7.846    7.831    7.382    7.388    7.401    7.431
## adjCV        9.315    7.844    7.829    7.376    7.379    7.396    7.424
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       7.419    7.297     7.35     7.352     7.358     7.279     7.174
## adjCV    7.411    7.288     7.34     7.340     7.346     7.265     7.159
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       46.62    59.73    68.81    75.84    82.71    87.78    90.98    93.28
## crim    29.46    29.80    37.86    38.12    38.12    38.21    38.88    40.94
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.31     97.02     98.44     99.53    100.00
## crim    40.94     41.14     41.30     43.29     44.89
validationplot(boston.pcr,val.type="MSEP")

pcr.pred=predict(boston.pcr,xtest,ncomp=8)
mean((pcr.pred-ytest)^2)
## [1] 10.49009

B)

To evaluate the model we used the validation set. The best score that was achieved was by the PCR model with an error of 10.49.

C)

The PCR model includes 8 of the 13 variables. We manually selected the number of features to be included by examining the validation plot that was produced. After 8 variables, the chart rises and then ultimately decreases when it reached all the variables.