- The lasso, relative to least squares, is:
- Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance
- Repeat (a) for ridge regression relative to least squares
- Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
- Repeat (a) for non-linear methods relative to least squares.
- More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
- Split the data set into a training set and a test set
library(ISLR)
attach(College)
x=model.matrix(Apps~.,College)[,-1]
y=College$Apps
set.seed(10)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
College.train = College[train, ]
College.test = College[test, ]
y.test=y[test]
- Fit a linear model using least squares on the training set, and report the test error obtained.
LS.fit<-lm(Apps~., data=College, subset=train)
summary(LS.fit)
##
## Call:
## lm(formula = Apps ~ ., data = College, subset = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5139.5 -473.3 -21.1 353.2 7402.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -629.36179 639.35741 -0.984 0.325579
## PrivateYes -647.56836 192.17056 -3.370 0.000832 ***
## Accept 1.68912 0.05038 33.530 < 2e-16 ***
## Enroll -1.02383 0.27721 -3.693 0.000255 ***
## Top10perc 48.19124 8.10714 5.944 6.42e-09 ***
## Top25perc -10.51538 6.44952 -1.630 0.103865
## F.Undergrad 0.01992 0.05364 0.371 0.710574
## P.Undergrad 0.04213 0.05348 0.788 0.431373
## Outstate -0.09489 0.02674 -3.549 0.000436 ***
## Room.Board 0.14549 0.07243 2.009 0.045277 *
## Books 0.06660 0.31115 0.214 0.830623
## Personal 0.05663 0.09453 0.599 0.549475
## PhD -10.11489 7.11588 -1.421 0.156027
## Terminal -2.29300 8.03546 -0.285 0.775528
## S.F.Ratio 22.07117 18.70991 1.180 0.238897
## perc.alumni 2.08121 6.00673 0.346 0.729179
## Expend 0.07654 0.01672 4.577 6.45e-06 ***
## Grad.Rate 9.99706 4.49821 2.222 0.026857 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1092 on 370 degrees of freedom
## Multiple R-squared: 0.9395, Adjusted R-squared: 0.9367
## F-statistic: 338 on 17 and 370 DF, p-value: < 2.2e-16
pred.app<-predict(LS.fit, College.test)
test.error<-mean((College.test$Apps-pred.app)^2)
test.error
## [1] 1020100
Test error = 1020100
- Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-1
train.mat <- model.matrix(Apps ~ . -1 , data = College.train)
test.mat <- model.matrix(Apps ~ . -1, data = College.test)
grid <- 10 ^ seq(4, -2, length = 100)
mod.ridge <- cv.glmnet(train.mat, College.train[, "Apps"],
alpha = 0, lambda = grid, thresh = 1e-12)
lambda.best <- mod.ridge$lambda.min
lambda.best
## [1] 0.01
ridge.pred = predict(mod.ridge, newx=test.mat, s=lambda.best)
mean((College.test[, "Apps"] - ridge.pred)^2)
## [1] 1020090
- Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
summary(lasso.mod)
## Length Class Mode
## a0 100 -none- numeric
## beta 1700 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 5 -none- call
## nobs 1 -none- numeric
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
bestlam=cv.out$lambda.min
bestlam
## [1] 24.66235
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
## [1] 1008160
out=glmnet(x,y,alpha=1,lambda = grid)
lasso.coef=predict(out,type="coefficients",s=bestlam)[1:18,]
lasso.coef[lasso.coef!=0]
## (Intercept) PrivateYes Accept Enroll Top10perc
## -632.19678987 -408.42335973 1.43816397 -0.14465029 31.41379876
## Top25perc P.Undergrad Outstate Room.Board PhD
## -0.83485017 0.01490164 -0.05356409 0.12037339 -5.11705456
## Terminal S.F.Ratio perc.alumni Expend Grad.Rate
## -3.37899557 2.77583630 -1.02745168 0.06845218 4.69309424
- Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
library(pls)
##
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
##
## loadings
pcr.fit = pcr(Apps~., data=College.train, scale=T, validation="CV")
validationplot(pcr.fit, val.type="MSEP")
- Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
pls.fit = plsr(Apps~., data=College.train, scale=T, validation="CV")
validationplot(pls.fit, val.type="MSEP")
pls.pred = predict(pls.fit, College.test, ncomp=6)
mean((College.test[, "Apps"] - data.frame(pls.pred))^2)
## Warning in mean.default((College.test[, "Apps"] - data.frame(pls.pred))^2):
## argument is not numeric or logical: returning NA
## [1] NA
- Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
library(leaps)
library(MASS)
set.seed(1)
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
b<-Boston
set.seed(1)
train<-sample(1:nrow(b), nrow(b)*.8)
b.train<-b[train,]
b.test<-b[-train,]
b.lm<-lm(crim~., data = b.train)
b.lm.pred<-predict(b.lm, b.test, type = 'response')
mean((b.lm.pred-b.test$crim)^2)
## [1] 69.18774
predict.regsubsets = function(object, newdata, id, ...) {
form = as.formula(object$call[[2]])
mat = model.matrix(form, newdata)
coefi = coef(object, id = id)
mat[, names(coefi)] %*% coefi
}
k = 10
p = ncol(Boston) - 1
folds = sample(rep(1:k, length = nrow(Boston)))
errors = matrix(NA, k, p)
mse.bss = sqrt(apply(errors, 2, mean))
x = model.matrix(crim ~ . - 1, data = Boston) y = Boston$crim lasso.model = cv.glmnet(x, y, type.measure = "mse") plot(lasso.model)
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
lasso.model = cv.glmnet(x, y, type.measure = "mse")
plot(lasso.model)
coef(lasso.model)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 1.4186415
## zn .
## indus .
## chas .
## nox .
## rm .
## age .
## dis .
## rad 0.2298449
## tax .
## ptratio .
## black .
## lstat .
## medv .
sqrt(lasso.model$cvm[lasso.model$lambda == lasso.model$lambda.1se])
## [1] 7.536873
x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
ridge.model = cv.glmnet(x, y, type.measure = "mse", alpha = 0)
plot(ridge.model)
sqrt(ridge.model$cvm[ridge.model$lambda == ridge.model$lambda.1se])
## [1] 7.735648
library(pls)
pcr.fit = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.fit)
## Data: X dimension: 506 13
## Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 13
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 8.61 7.195 7.190 6.781 6.756 6.773 6.805
## adjCV 8.61 7.193 7.188 6.776 6.749 6.768 6.798
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 6.798 6.669 6.685 6.687 6.679 6.630 6.562
## adjCV 6.790 6.660 6.677 6.677 6.669 6.619 6.550
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 47.70 60.36 69.67 76.45 82.99 88.00 91.14 93.45
## crim 30.69 30.87 39.27 39.61 39.61 39.86 40.14 42.47
## 9 comps 10 comps 11 comps 12 comps 13 comps
## X 95.40 97.04 98.46 99.52 100.0
## crim 42.55 42.78 43.04 44.13 45.4