data("College")
data("Boston")
library(ISLR2)
library(glmnet)
library(pls)
For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease invariance.
More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
Part iii, the lasso shrinks the coefficient estimates towards zero. The shrinkage is what reduces the variance of the predictions, at the cost of a small increase in bias.
Repeat (a) for ridge regression relative to least squares. Part iii, As λ increases, the flexibility of the ridge regression fit decreases, leading to decreased variance but increased bias.
Repeat (a) for non-linear methods relative to least squares. Part ii, the non-linear method is a more flexible approach. This will therefore lead to a decrease in bias that outweighs any increase in variance
In this exercise, we will predict the number of applications received using the other variables in the College data set.
set.seed(1)
s=sample(1:nrow(College), nrow(College) * 0.7)
train=College[s,]
test=College[-s,]
ls=lm(Apps ~ ., data= train)
summary(ls)
##
## Call:
## lm(formula = Apps ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5816.1 -451.6 -1.0 327.2 7445.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.377e+02 5.076e+02 -1.059 0.28995
## PrivateYes -5.045e+02 1.648e+02 -3.061 0.00232 **
## Accept 1.722e+00 4.763e-02 36.159 < 2e-16 ***
## Enroll -1.055e+00 2.437e-01 -4.329 1.80e-05 ***
## Top10perc 5.358e+01 6.440e+00 8.320 7.64e-16 ***
## Top25perc -1.614e+01 5.092e+00 -3.170 0.00161 **
## F.Undergrad 2.970e-02 4.399e-02 0.675 0.49976
## P.Undergrad 7.162e-02 3.649e-02 1.963 0.05019 .
## Outstate -8.841e-02 2.176e-02 -4.064 5.57e-05 ***
## Room.Board 1.630e-01 5.577e-02 2.923 0.00362 **
## Books 2.727e-01 2.723e-01 1.001 0.31715
## Personal -7.316e-03 7.283e-02 -0.100 0.92002
## PhD -9.676e+00 5.360e+00 -1.805 0.07161 .
## Terminal -3.781e-01 6.015e+00 -0.063 0.94990
## S.F.Ratio 1.627e+01 1.608e+01 1.012 0.31214
## perc.alumni 2.358e+00 4.853e+00 0.486 0.62722
## Expend 5.986e-02 1.337e-02 4.476 9.34e-06 ***
## Grad.Rate 7.158e+00 3.520e+00 2.034 0.04248 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1022 on 525 degrees of freedom
## Multiple R-squared: 0.9332, Adjusted R-squared: 0.9311
## F-statistic: 431.6 on 17 and 525 DF, p-value: < 2.2e-16
pred=predict(ls, test)
mean((pred-test$Apps)^2)
## [1] 1261630
trmat=model.matrix(Apps ~ ., data=train)
tmat=model.matrix(Apps ~ ., data=test)
grid=10^seq(4,-2, length=100)
ridge=glmnet(trmat, train$Apps, alpha=0, lambda=grid, thresh = 1e-12)
cv.out=cv.glmnet(trmat, train$Apps, alpha=0, lambda=grid, thresh=1e-12)
bestlam=cv.out$lambda.min
bestlam
## [1] 0.01
predridge=predict(ridge, s=bestlam ,newx =tmat)
mean((predridge-test$Apps)^2)
## [1] 1261598
The MSE is 1261598 for ridge regression which is slightly less than the least square model.
lasso=glmnet(trmat, train$Apps, alpha=1, lambda=grid, thresh = 1e-12)
cv.out2=cv.glmnet(trmat, train$Apps, alpha=1, lambda=grid, thresh=1e-12)
bestlaml=cv.out2$lambda.min
bestlaml
## [1] 0.01
predlasso=predict(lasso, s=bestlaml, newx =tmat)
mean((predlasso-test$Apps)^2)
## [1] 1261591
The MSE for the lasso is 1228885 which is quite smaller than both of the previous models.
predict(lasso, s=bestlam, type="coefficients")
## 19 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.377539e+02
## (Intercept) .
## PrivateYes -5.044625e+02
## Accept 1.722329e+00
## Enroll -1.054317e+00
## Top10perc 5.357463e+01
## Top25perc -1.613792e+01
## F.Undergrad 2.961867e-02
## P.Undergrad 7.161220e-02
## Outstate -8.840061e-02
## Room.Board 1.630166e-01
## Books 2.725781e-01
## Personal -7.289348e-03
## PhD -9.674575e+00
## Terminal -3.772228e-01
## S.F.Ratio 1.626160e+01
## perc.alumni 2.354463e+00
## Expend 5.986103e-02
## Grad.Rate 7.157200e+00
pcr=pcr(Apps ~ ., data=train, scale=TRUE, validation="CV")
predpcr=predict(pcr, test)
mean((predpcr-test$Apps)^2)
## [1] 2670100
The MSE for the pcr model in 2670100 which is larger than the other models.
plsr=plsr(Apps ~ ., data=train, scale=TRUE, validation="CV")
predplsr=predict(plsr, test)
mean((predplsr-test$Apps)^2)
## [1] 1413894
The MSE for the pls model is 1413894.
All of the models predict them accurately and not much difference between the test errors.
We will now try to predict per capita crime rate in the Boston data set.
set.seed(1)
s=sample(1:nrow(Boston), nrow(Boston) * 0.7)
btrain=Boston[s,]
btest=Boston[-s,]
btrmat=model.matrix(medv ~ ., data=btrain)
btmat=model.matrix(medv ~ ., data=btest)
grid=10^seq(4,-2, length=100)
bridge=glmnet(btrmat, btrain$medv, alpha=0, lambda=grid)
bcv.out=cv.glmnet(btrmat, btrain$medv, alpha=0, lambda=grid)
bbestlam=bcv.out$lambda.min
bpred=predict(bridge, s=bbestlam, newx=btmat)
mean((btest$medv-bpred)^2)
## [1] 27.24035
lasso2=glmnet(btrmat, btrain$medv, alpha=0, lambda=grid)
bcv.out2=cv.glmnet(btrmat, btrain$medv, alpha=0, lambda=grid)
bbestlaml=bcv.out2$lambda.min
bpredl=predict(lasso2, s=bbestlaml, newx=btmat)
mean((bpredl- btest$medv)^2)
## [1] 27.24035
pcr2<-pcr(medv ~ ., data=btrain, scale=TRUE, validation="CV")
bpredpcr<-predict(pcr2, btest)
mean((bpredpcr-btest$medv)^2)
## [1] 33.42198
The models with the best result is the ridge regression and lasso model and used crossvalidation when calculating the results.