library(ISLR)
library(leaps)
Warning: package ‘leaps’ was built under R version 4.0.5
library(glmnet)
Warning: package ‘glmnet’ was built under R version 4.0.5Loaded glmnet 4.1-4
library(pls)
Attaching package: ‘pls’
The following object is masked from ‘package:stats’:
loadings
2a. 3 is correct since the lasso is less flexible but worth it if the small increase in bias is offset by the larger reduction in variance. It also can remove non-essential predictors.
2b. 3 is correct since ridge-reduction and lasso are very similar but lasso can remove a predictor in its entirety. Ridge-Reduction would only result in a very small coef.
2c. 2 is the correct since non-linear models have less bias as well as being less biased than a model generated from least-squares regression.
attach(College)
x = model.matrix(Apps~.,College)[,-1]
y = Apps
set.seed(1)
train <- sample (c(TRUE , FALSE), nrow (Hitters),
replace = TRUE)
test <- (!train)
y.test=y[test]
linear <- lm(Apps~., data = College[train,])
linear.pred <- predict(linear,College[test,],type = "response")
mean((linear.pred-College[test,]$Apps)^2)
[1] 1048053
train.x<-model.matrix(Apps~.,data=College[train,])
train.y<-College[train,]$Apps
test.x<-model.matrix(Apps~.,data=College[test,])
test.y<-College[test,]$Apps
ridge.fit<-cv.glmnet(train.x,train.y, alpha = 0)
lam.ridge<- ridge.fit$lambda.min
ridge.pred<-predict(ridge.fit, s=lam.ridge, newx=test.x)
mean((ridge.pred-test.y)^2)
[1] 1087614
lasso.fit <- cv.glmnet(train.x,train.y, alpha = 1)
lam.lasso<- lasso.fit$lambda.min
lasso.pred<-predict(lasso.fit, s=lam.ridge, newx=test.x)
mean((lasso.pred-test.y)^2)
[1] 1325859
pcr.fit <- pcr(Apps~., data = College[train,], scale = TRUE, validation = "CV")
summary(pcr.fit)
Data: X dimension: 390 17
Y dimension: 390 1
Fit method: svdpc
Number of components considered: 17
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
CV 4199 4125 2354 2346 1932 1864 1868 1868
adjCV 4199 4124 2350 2423 1914 1853 1860 1852
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
CV 1852 1773 1753 1788 1790 1796 1795 1771
adjCV 1839 1762 1742 1777 1780 1786 1811 1747
16 comps 17 comps
CV 1408 1352
adjCV 1389 1335
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps 9 comps
X 31.458 56.66 63.24 69.60 75.22 80.12 83.82 87.41 90.37
Apps 3.856 70.36 70.57 81.38 82.73 82.82 83.50 83.76 84.62
10 comps 11 comps 12 comps 13 comps 14 comps 15 comps 16 comps 17 comps
X 92.82 94.99 96.93 97.98 98.71 99.34 99.82 100.00
Apps 85.23 85.29 85.38 85.40 85.42 90.56 93.25 93.63
xcol<-1:17
mse_pcr_comps<-numeric(length = length(xcol))
for (i in 1:length(xcol)){
mse_pcr_comps[i]<-mean(
(predict(pcr.fit,College[test,], ncomp = i )-College[test,]$Apps)^2)
}
mse_pcr_comps
[1] 11717019 3104455 3095180 2093298 1965883 1974813 1908495 1794200
[9] 1754765 1778336 1803771 1828637 1829753 1833692 1419045 1086308
[17] 1048053
pcr.pred=predict(pcr.fit,College[test,],ncomp=16)
mean((pcr.pred-y.test)^2)
[1] 1086308
pls.fit=plsr(Apps~., data= College[train,], scale = TRUE, validation="CV" )
summary(pls.fit)
Data: X dimension: 390 17
Y dimension: 390 1
Fit method: kernelpls
Number of components considered: 17
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
CV 4199 2123 1825 1680 1670 1571 1448 1417
adjCV 4199 2116 1818 1671 1646 1538 1425 1396
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps
CV 1399 1385 1386 1388 1391 1390 1389 1390
adjCV 1379 1366 1367 1369 1371 1370 1370 1370
16 comps 17 comps
CV 1390 1390
adjCV 1371 1371
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
X 25.50 42.05 62.57 65.26 67.81 71.99 75.39 79.20
Apps 76.53 83.93 87.06 90.72 92.98 93.41 93.49 93.53
9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15 comps 16 comps
X 82.55 85.58 89.18 90.83 92.57 95.11 97.12 99.13
Apps 93.58 93.61 93.62 93.63 93.63 93.63 93.63 93.63
17 comps
X 100.00
Apps 93.63
xcol<-1:17
mse_pls_comps<-numeric(length = length(xcol))
for (i in 1:length(xcol)){
mse_pls_comps[i]<-mean(
(predict(pls.fit,College[test,], ncomp = i )-College[test,]$Apps)^2)
}
mse_pls_comps
[1] 2529127 1968498 1595733 1269633 1140182 1075574 1061787 1049708 1056804 1051164
[11] 1057288 1045069 1049615 1048361 1048134 1048165 1048053
pls.pred=predict(pls.fit,College[test,],ncomp=8)
mean((pls.pred-y.test)^2)
[1] 1049708
G.Linear performed the best but PLS was within a hair of it.
library(MASS)
set.seed(1)
train <- sample (c(TRUE , FALSE), nrow (Boston),
replace = TRUE)
test <- (!train)
btrain <-Boston[train,]
btest <- Boston[test,]
y.test=y[test]
linear <- lm(crim~., data = btrain)
boston_pred <- predict(linear, btest)
mean((boston_pred - btest$crim)^2)
[1] 50.65678
forward <- regsubsets(crim~., data = btrain, nvmax = 13, method = "forward")
forward_summary <- summary(forward)
plot(forward, scale = "adjr2")
It seems that around 7 predictors is where the optimal model would be found.
RIDGE
train.x<-model.matrix(crim~.,data=btrain)
train.y<-btrain$crim
test.x<-model.matrix(crim~.,data=btest)
test.y<-btest$crim
ridge.fit<-cv.glmnet(train.x,train.y, alpha = 0)
ridge<- ridge.fit$lambda.min
ridge
[1] 0.5240686
ridge.pred<-predict(ridge.fit, s=ridge, newx=test.x)
mean((ridge.pred-test.y)^2)
[1] 51.46284
LASSO
lasso.fit<-cv.glmnet(train.x,train.y, alpha = 1)
lasso<- lasso.fit$lambda.min
lasso
[1] 0.01973085
lasso.pred<-predict(lasso.fit, s=lasso, newx=test.x)
mean((lasso.pred-test.y)^2)
[1] 50.7379
pcr.fit=pcr(crim~., data= btrain, scale = TRUE, validation="CV" )
summary(pcr.fit)
Data: X dimension: 262 13
Y dimension: 262 1
Fit method: svdpc
Number of components considered: 13
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
CV 8.065 6.649 6.651 6.188 6.168 6.163 6.193 6.178
adjCV 8.065 6.642 6.646 6.177 6.158 6.151 6.179 6.163
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
CV 6.068 6.048 6.053 6.061 6.018 5.972
adjCV 6.053 6.023 6.037 6.045 6.001 5.955
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
X 44.58 58.02 68.58 75.64 82.03 87.37 90.66 93.02
crim 33.90 34.29 44.32 44.91 45.57 45.62 46.13 48.13
9 comps 10 comps 11 comps 12 comps 13 comps
X 95.01 96.87 98.4 99.54 100.00
crim 48.80 48.96 49.0 49.91 50.71
xcol<-1:13
mse_pcr_comps<-numeric(length = length(xcol))
for (i in 1:length(xcol))
{
mse_pcr_comps[i]<-mean((predict(pcr.fit, btest, ncomp = i )-btest$crim)^2)
}
mse_pcr<-as.data.frame(cbind(ncomps = c(1:13), mse = as.numeric(mse_pcr_comps)))
mse_pcr
B. it seems pcr with ncomp = 13 has the lowest mean squared error.
C. yes the mode has all the features since it resulted in the lowest mean squared error.