STA-6543 Assignment #5

Question 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

The lasso, relative to least squares, is:

Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Lasso adds a L1 penalty to the least squares loss, reducing coefficients to zero. This creates bias but drastically reduces variance.

Repeat (a) for ridge regression relative to least squares.

Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Ridge regression adds a penalty to the loss function using a L2 penalty. This reduces coefficients close to zero. Variance is reduced with bias increases.

Repeat (a) for non-linear methods relative to least squares.

More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Non-linear methods are good for complex relationships and more flexible than least squares but their flexibility leads to higher variance.

Question 9

In this exercise, we will predict the number of applications received using the other variables in the College data set. (a) Split the data set into a training set and a test set.

library(ISLR)

## Warning: package 'ISLR' was built under R version 4.4.2

attach(College)

set.seed(1)
train=sample(c(TRUE,FALSE),nrow(College),rep=TRUE)
test=(!train)

College.train = College[train,]
College.test = College[test,]

Fit a linear model using least squares on the training set, and report the test error obtained.

lm.fit = lm(Apps~., data=College.train)
lm.pred = predict(lm.fit, College.test, type="response")
mean((lm.pred-College.test$Apps)^2)

## [1] 984743.1

Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

library(glmnet)

## Warning: package 'glmnet' was built under R version 4.4.3

## Loading required package: Matrix

## Loaded glmnet 4.1-10

set.seed(1)
train.mat = model.matrix(Apps~., data = College.train)
test.mat = model.matrix(Apps~., data = College.test)
grid=10^seq(10,-2,length=100)
ridge.mod=glmnet(train.mat,College.train$Apps,alpha=0,lambda=grid)
cv.out = cv.glmnet(train.mat,College.train$Apps,alpha=0)
bestlam = cv.out$lambda.min
bestlam

## [1] 394.2365

summary(ridge.mod)

##           Length Class     Mode   
## a0         100   -none-    numeric
## beta      1800   dgCMatrix S4     
## df         100   -none-    numeric
## dim          2   -none-    numeric
## lambda     100   -none-    numeric
## dev.ratio  100   -none-    numeric
## nulldev      1   -none-    numeric
## npasses      1   -none-    numeric
## jerr         1   -none-    numeric
## offset       1   -none-    logical
## call         5   -none-    call   
## nobs         1   -none-    numeric

ridge.pred = predict(ridge.mod,s=bestlam,newx = test.mat)
mean((ridge.pred - College.test$Apps)^2)

## [1] 940753.5

The test error of the ridge regression fit is 940753.5. Less than the test error of the linear model.

Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coeffcient estimates.

lasso.mod=glmnet(train.mat,College.train$Apps,alpha=1,lambda=grid)
summary(lasso.mod)

##           Length Class     Mode   
## a0         100   -none-    numeric
## beta      1800   dgCMatrix S4     
## df         100   -none-    numeric
## dim          2   -none-    numeric
## lambda     100   -none-    numeric
## dev.ratio  100   -none-    numeric
## nulldev      1   -none-    numeric
## npasses      1   -none-    numeric
## jerr         1   -none-    numeric
## offset       1   -none-    logical
## call         5   -none-    call   
## nobs         1   -none-    numeric

cv.outl2=cv.glmnet(train.mat,College.train$Apps,alpha=1)
bestlam2=cv.outl2$lambda.min
bestlam2

## [1] 2.309051

lasso.pred=predict(lasso.mod,s=bestlam2,newx = test.mat)
mean((lasso.pred-College.test$Apps)^2)

## [1] 978962.6

The test error of the lasso model is higher than the linear model test error and ridge regression test error.

out=glmnet(train.mat,College.train$Apps,alpha=1,lambda = grid)
lasso.coef=predict(out,type="coefficients",s=bestlam2)[1:18,]
lasso.coef[lasso.coef!=0]

##   (Intercept)    PrivateYes        Accept        Enroll     Top10perc 
## -173.89023228 -649.94882654    1.67083516   -0.75305706   57.37667424 
##     Top25perc   F.Undergrad   P.Undergrad      Outstate    Room.Board 
##  -20.72078908    0.01792667    0.03670117   -0.08378483    0.10937820 
##         Books      Personal           PhD      Terminal     S.F.Ratio 
##    0.05923882   -0.01840355   -9.09196597   -6.37371755   20.24530218 
##   perc.alumni        Expend 
##    8.95013487    0.08288697

Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)

## Warning: package 'pls' was built under R version 4.4.3

## 
## Attaching package: 'pls'

## The following object is masked from 'package:stats':
## 
##     loadings

pcr.college=pcr(Apps~., data=College.train,scale=TRUE,validation="CV")
summary(pcr.college)

## Data:    X dimension: 393 17 
##  Y dimension: 393 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4189     4147     2344     2340     2099     2048     1978
## adjCV         4189     4146     2338     2335     2092     2073     1968
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1972     1949     1889      1893      1895      1906      1913
## adjCV     1964     1933     1875      1881      1884      1892      1900
##        14 comps  15 comps  16 comps  17 comps
## CV         1944      1853      1396      1404
## adjCV      1960      1812      1379      1386
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      31.858    57.44    64.20    69.91    75.10    80.17    83.82    87.30
## Apps    4.353    70.99    71.18    76.84    78.34    81.03    81.59    82.21
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.26     92.74     94.79     96.70     97.76     98.67     99.37
## Apps    83.31     83.97     83.97     84.34     84.58     84.70     91.28
##       16 comps  17 comps
## X        99.82    100.00
## Apps     92.83     93.02

validationplot(pcr.college, val.type="MSEP")

pcr.pred=predict(pcr.college,College.test,ncomp=10)
mean((pcr.pred-College.test$Apps)^2)

## [1] 1682909

Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

pls.college=plsr(Apps~., data=College.train,scale=TRUE, validation="CV")
validationplot(pls.college, val.type="MSEP")

summary(pls.college)

## Data:    X dimension: 393 17 
##  Y dimension: 393 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            4189     2180     1909     1755     1696     1455     1332
## adjCV         4189     2171     1906     1741     1665     1431     1318
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1310     1301     1298      1298      1297      1298      1297
## adjCV     1298     1289     1287      1286      1285      1286      1285
##        14 comps  15 comps  16 comps  17 comps
## CV         1297      1296      1296      1296
## adjCV      1285      1285      1285      1285
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       26.01    44.96    62.49    65.22    68.52    72.89    77.13    80.46
## Apps    75.74    82.40    86.74    90.58    92.34    92.79    92.88    92.93
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       82.45     84.76     88.08     90.76     92.80     94.45     97.02
## Apps    92.98     93.00     93.01     93.01     93.02     93.02     93.02
##       16 comps  17 comps
## X        98.03    100.00
## Apps     93.02     93.02

pls.pred=predict(pls.college,College.test,ncomp=8)
mean((pls.pred-College.test$Apps)^2)

## [1] 978534.3

pls.pred=predict(pls.college,College.test,ncomp=9)
mean((pls.pred-College.test$Apps)^2)

## [1] 1007163

M=8 was the best performing model.

Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much diference among the test errors resulting from these fve approaches?

PLS, linear, and lasso performed similarly. Ridge regression worse than those and PCR the worst by far.

Question 9

We will now try to predict per capita crime rate in the Boston data set. (a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

library(leaps)

## Warning: package 'leaps' was built under R version 4.4.3

library(MASS)

set.seed(1)
attach(Boston)
predict.regsubsets = function(object, newdata, id, ...) {
    form = as.formula(object$call[[2]])
    mat = model.matrix(form, newdata)
    coefi = coef(object, id = id)
    mat[, names(coefi)] %*% coefi
}

k = 10
p = ncol(Boston) - 1
folds = sample(rep(1:k, length = nrow(Boston)))
cv.errors = matrix(NA, k, p)
for (i in 1:k) {
    best.fit = regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
    for (j in 1:p) {
        pred = predict(best.fit, Boston[folds == i, ], id = j)
        cv.errors[i, j] = mean((Boston$crim[folds == i] - pred)^2)
    }
}
mean.cv.errors <- apply(cv.errors, 2, mean)
plot(mean.cv.errors, type = "b", xlab = "Number of variables", ylab = "CV error")

which.min(mean.cv.errors)

## [1] 9

mean.cv.errors[which.min(mean.cv.errors)]

## [1] 42.81453

x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.lasso = cv.glmnet(x, y, type.measure = "mse")
plot(cv.lasso)

coef(cv.lasso)

## 14 x 1 sparse Matrix of class "dgCMatrix"
##             lambda.1se
## (Intercept)   2.176491
## zn            .       
## indus         .       
## chas          .       
## nox           .       
## rm            .       
## age           .       
## dis           .       
## rad           0.150484
## tax           .       
## ptratio       .       
## black         .       
## lstat         .       
## medv          .

sqrt(cv.lasso$cvm[cv.lasso$lambda == cv.lasso$lambda.1se])

## [1] 7.921353

x = model.matrix(crim ~ . - 1, data = Boston)
y = Boston$crim
cv.ridge = cv.glmnet(x, y, type.measure = "mse", alpha = 0)
plot(cv.ridge)

coef(cv.ridge)

## 14 x 1 sparse Matrix of class "dgCMatrix"
##               lambda.1se
## (Intercept)  1.523899542
## zn          -0.002949852
## indus        0.029276741
## chas        -0.166526007
## nox          1.874769665
## rm          -0.142852604
## age          0.006207995
## dis         -0.094547258
## rad          0.045932737
## tax          0.002086668
## ptratio      0.071258052
## black       -0.002605281
## lstat        0.035745604
## medv        -0.023480540

sqrt(cv.ridge$cvm[cv.ridge$lambda == cv.ridge$lambda.1se])

## [1] 7.669133

pcr.crime = pcr(crim ~ ., data = Boston, scale = TRUE, validation = "CV")
summary(pcr.crime)

## Data:    X dimension: 506 13 
##  Y dimension: 506 1
## Fit method: svdpc
## Number of components considered: 13
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            8.61    7.175    7.180    6.724    6.731    6.727    6.727
## adjCV         8.61    7.174    7.179    6.721    6.725    6.724    6.724
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV       6.722    6.614    6.618     6.607     6.598     6.553     6.488
## adjCV    6.718    6.609    6.613     6.602     6.592     6.546     6.481
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       47.70    60.36    69.67    76.45    82.99    88.00    91.14    93.45
## crim    30.69    30.87    39.27    39.61    39.61    39.86    40.14    42.47
##       9 comps  10 comps  11 comps  12 comps  13 comps
## X       95.40     97.04     98.46     99.52     100.0
## crim    42.55     42.78     43.04     44.13      45.4

Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

The best subset selection model had the lowest cross-validation error.

Does your chosen model involve all of the features in the data set? Why or why not?

The model has 11 predictors. By excluding 2, the model has lower variance and higher accuracy.

STA-6543 Assignment #5

Luis Gutierrez

2025-07-21

Question 2

Question 9

Question 9