Answer: iii. The flexibility of the fit decreases as λ increases, which in turn leads to a decrease in variance and an increase in bias.
Answer: iii. Similar to the lasso, as the λ increases, the variance will decrease and the bias will increase.
Answer: ii. A non-linear method will be more flexible than the least squares method.
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.1.2
summary(College)
## Private Apps Accept Enroll Top10perc
## No :212 Min. : 81 Min. : 72 Min. : 35 Min. : 1.00
## Yes:565 1st Qu.: 776 1st Qu.: 604 1st Qu.: 242 1st Qu.:15.00
## Median : 1558 Median : 1110 Median : 434 Median :23.00
## Mean : 3002 Mean : 2019 Mean : 780 Mean :27.56
## 3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:35.00
## Max. :48094 Max. :26330 Max. :6392 Max. :96.00
## Top25perc F.Undergrad P.Undergrad Outstate
## Min. : 9.0 Min. : 139 Min. : 1.0 Min. : 2340
## 1st Qu.: 41.0 1st Qu.: 992 1st Qu.: 95.0 1st Qu.: 7320
## Median : 54.0 Median : 1707 Median : 353.0 Median : 9990
## Mean : 55.8 Mean : 3700 Mean : 855.3 Mean :10441
## 3rd Qu.: 69.0 3rd Qu.: 4005 3rd Qu.: 967.0 3rd Qu.:12925
## Max. :100.0 Max. :31643 Max. :21836.0 Max. :21700
## Room.Board Books Personal PhD
## Min. :1780 Min. : 96.0 Min. : 250 Min. : 8.00
## 1st Qu.:3597 1st Qu.: 470.0 1st Qu.: 850 1st Qu.: 62.00
## Median :4200 Median : 500.0 Median :1200 Median : 75.00
## Mean :4358 Mean : 549.4 Mean :1341 Mean : 72.66
## 3rd Qu.:5050 3rd Qu.: 600.0 3rd Qu.:1700 3rd Qu.: 85.00
## Max. :8124 Max. :2340.0 Max. :6800 Max. :103.00
## Terminal S.F.Ratio perc.alumni Expend
## Min. : 24.0 Min. : 2.50 Min. : 0.00 Min. : 3186
## 1st Qu.: 71.0 1st Qu.:11.50 1st Qu.:13.00 1st Qu.: 6751
## Median : 82.0 Median :13.60 Median :21.00 Median : 8377
## Mean : 79.7 Mean :14.09 Mean :22.74 Mean : 9660
## 3rd Qu.: 92.0 3rd Qu.:16.50 3rd Qu.:31.00 3rd Qu.:10830
## Max. :100.0 Max. :39.80 Max. :64.00 Max. :56233
## Grad.Rate
## Min. : 10.00
## 1st Qu.: 53.00
## Median : 65.00
## Mean : 65.46
## 3rd Qu.: 78.00
## Max. :118.00
attach(College)
set.seed(1)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College.train = College[train, ]
College.test = College[test, ]
y.test=y[test]
lm.fit=lm(Apps~., data=College, subset=train)
summary(lm.fit)
##
## Call:
## lm(formula = Apps ~ ., data = College, subset = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5741.2 -479.5 15.3 359.6 7258.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.902e+02 6.381e+02 -1.238 0.216410
## PrivateYes -3.070e+02 2.006e+02 -1.531 0.126736
## Accept 1.779e+00 5.420e-02 32.830 < 2e-16 ***
## Enroll -1.470e+00 3.115e-01 -4.720 3.35e-06 ***
## Top10perc 6.673e+01 8.310e+00 8.030 1.31e-14 ***
## Top25perc -2.231e+01 6.533e+00 -3.415 0.000708 ***
## F.Undergrad 9.269e-02 5.529e-02 1.676 0.094538 .
## P.Undergrad 9.397e-03 5.493e-02 0.171 0.864275
## Outstate -1.084e-01 2.700e-02 -4.014 7.22e-05 ***
## Room.Board 2.115e-01 7.224e-02 2.928 0.003622 **
## Books 2.912e-01 3.985e-01 0.731 0.465399
## Personal 6.133e-03 8.803e-02 0.070 0.944497
## PhD -1.548e+01 6.681e+00 -2.316 0.021082 *
## Terminal 6.415e+00 7.290e+00 0.880 0.379470
## S.F.Ratio 2.283e+01 2.047e+01 1.115 0.265526
## perc.alumni 1.134e+00 6.083e+00 0.186 0.852274
## Expend 4.857e-02 1.619e-02 2.999 0.002890 **
## Grad.Rate 7.490e+00 4.397e+00 1.703 0.089324 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1083 on 370 degrees of freedom
## Multiple R-squared: 0.9389, Adjusted R-squared: 0.9361
## F-statistic: 334.3 on 17 and 370 DF, p-value: < 2.2e-16
lm.pred=predict(lm.fit, College.test)
mean((College.test$Apps - lm.pred)^2)
## [1] 1135758
library(glmnet)
## Warning: package 'glmnet' was built under R version 4.1.2
## Loading required package: Matrix
## Loaded glmnet 4.1-3
grid=10^seq(10, -2, length=100)
ridge.mod=glmnet(x, y, alpha=0, lambda=grid)
set.seed(1)
cv.out=cv.glmnet(x[train,], y[train], alpha=0)
bestlam=cv.out$lambda.min
bestlam
## [1] 405.8404
ridge.pred=predict(ridge.mod, s=bestlam, newx=x[test,])
mean((ridge.pred -y.test)^2)
## [1] 906029.4
lasso.mod=glmnet(x[train,], y[train], alpha=1, lambda=grid)
set.seed(1)
cv.out1=cv.glmnet(x[train,], y[train], alpha=1)
bestlam1=cv.out1$lambda.min
lasso.pred=predict(lasso.mod ,s=bestlam1, newx=x[test ,])
mean((lasso.pred - y.test)^2)
## [1] 1116252
lasso.coef=predict(lasso.mod, type="coefficients", s=bestlam1)[1:18,]
lasso.coef[lasso.coef!=0]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -7.700064e+02 -3.118246e+02 1.763118e+00 -1.324681e+00 6.484111e+01
## Top25perc F.Undergrad P.Undergrad Outstate Room.Board
## -2.084190e+01 7.231529e-02 1.210751e-02 -1.049092e-01 2.088339e-01
## Books Personal PhD Terminal S.F.Ratio
## 2.936591e-01 3.742302e-03 -1.443128e+01 5.292709e+00 2.176770e+01
## perc.alumni Expend Grad.Rate
## 5.149189e-01 4.824620e-02 7.035589e+00
library(pls)
## Warning: package 'pls' was built under R version 4.1.3
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcr.fit=pcr(Apps~., data=College.train, scale=TRUE, validation="CV")
summary(pcr.fit)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: svdpc
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4288 4013 2368 2388 2072 1926 1900
## adjCV 4288 4012 2364 2386 2025 1907 1893
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1905 1907 1868 1811 1815 1820 1809
## adjCV 1899 1903 1862 1799 1807 1812 1801
## 14 comps 15 comps 16 comps 17 comps
## CV 1819 1753 1312 1236
## adjCV 1813 1725 1298 1225
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 32.20 57.78 65.31 70.99 76.37 81.27 84.8 87.85
## Apps 13.44 70.93 71.07 79.87 81.15 82.25 82.3 82.33
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 90.62 92.91 94.98 96.74 97.79 98.72 99.42
## Apps 83.38 84.76 84.80 84.84 85.11 85.14 90.55
## 16 comps 17 comps
## X 99.88 100.00
## Apps 93.42 93.89
validationplot(pcr.fit, val.type="MSEP")
pcr.pred=predict(pcr.fit, x[test ,], ncomp=10)
mean((pcr.pred-y.test)^2)
## [1] 1723100
The value of M selected by cross-validation is 10.
set.seed(1)
pls.fit=plsr(Apps~., data=College.train, scale=TRUE, validation="CV")
summary (pls.fit)
## Data: X dimension: 388 17
## Y dimension: 388 1
## Fit method: kernelpls
## Number of components considered: 17
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4288 2217 2019 1761 1630 1533 1347
## adjCV 4288 2211 2012 1749 1605 1510 1331
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 1309 1303 1286 1283 1283 1277 1271
## adjCV 1296 1289 1273 1270 1270 1264 1258
## 14 comps 15 comps 16 comps 17 comps
## CV 1270 1270 1270 1270
## adjCV 1258 1257 1257 1257
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 27.21 50.73 63.06 65.52 70.20 74.20 78.62 80.81
## Apps 75.39 81.24 86.97 91.14 92.62 93.43 93.56 93.68
## 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
## X 83.29 87.17 89.15 91.37 92.58 94.42 96.98
## Apps 93.76 93.79 93.83 93.86 93.88 93.89 93.89
## 16 comps 17 comps
## X 98.78 100.00
## Apps 93.89 93.89
validationplot(pls.fit, val.type="MSEP")
pls.pred=predict(pls.fit, x[test,], ncomp=8)
mean((pls.pred-y.test)^2)
## [1] 1090290
The value of M selected by cross-validation is 8.
Based on the models, the number of college applications received can be predicted fairly accurately. The test errors for each approach were close in range, except for the PCR approach, which had the highest test error.
detach(College)
library(MASS)
attach(Boston)
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
set.seed(1)
x1=model.matrix(crim~., Boston)
y1=Boston$crim
train1=sample(1:nrow(x1), nrow(x1)/2)
test1=(-train1)
Boston.train = Boston[train1, ]
Boston.test = Boston[test1, ]
y1.test1=y1[test1]
Ridge Regression
grid=10^seq(10, -2, length=100)
ridge.mod2=glmnet(x1, y1, alpha=0, lambda=grid)
set.seed(1)
cv.out2=cv.glmnet(x1[train1,], y1[train1], alpha=0)
bestlam2=cv.out2$lambda.min
bestlam2
## [1] 0.5919159
ridge.pred2=predict(ridge.mod2, s=bestlam2, newx=x1[test1,])
mean((ridge.pred2 - y1.test1)^2)
## [1] 38.01174
Lasso
lasso.mod3=glmnet(x1[train1,], y1[train1], alpha=1, lambda=grid)
set.seed(1)
cv.out3=cv.glmnet(x1[train1,], y1[train1], alpha=1)
bestlam3=cv.out3$lambda.min
lasso.pred3=predict(lasso.mod3 ,s=bestlam3, newx=x1[test1 ,])
mean((lasso.pred3 - y1.test1)^2)
## [1] 40.99066
lasso.coef3=predict(lasso.mod3,type="coefficients",s=bestlam3)[1:15,]
lasso.coef3[lasso.coef3!=0]
## (Intercept) zn indus chas nox
## 18.9503431634 0.0366638419 -0.1248394521 -0.4770056594 -8.3082861310
## rm age dis rad tax
## 0.1230732403 -0.0007472670 -0.8420393123 0.5337673781 -0.0001484142
## ptratio black lstat medv
## -0.3832539956 -0.0129716801 0.2580918024 -0.1611287956
set.seed(1)
pls.fit1=plsr(crim~., data=Boston.train, scale=TRUE, validation="CV")
summary (pls.fit1)
## Data: X dimension: 253 13
## Y dimension: 253 1
## Fit method: kernelpls
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 9.275 7.328 6.842 6.784 6.741 6.718 6.695
## adjCV 9.275 7.322 6.834 6.768 6.728 6.707 6.683
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.722 6.707 6.696 6.701 6.700 6.700 6.700
## adjCV 6.706 6.693 6.683 6.688 6.686 6.686 6.686
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 48.10 57.46 62.68 69.65 77.95 80.82 84.40 87.65
## crim 38.94 47.55 49.73 50.61 50.85 51.17 51.29 51.35
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 90.47 93.07 96.66 98.26 100.00
## crim 51.37 51.37 51.37 51.37 51.37
validationplot(pls.fit1, val.type="MSEP")
pls.pred1=predict(pls.fit1, Boston.test, ncomp=6)
mean((pls.pred1-y1.test1)^2)
## [1] 41.78158
Based on the test errors of the Ridge Regression, Lasso, and PLS approaches, Ridge Regression is the lowest.
The Ridge Regression model performed well using cross-validation. It had the lowest test error compared to the Lasso and PCR approaches.
This model does include all of the features and reduces the size of coefficients.