Problem 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

(a) The lasso, relative to least squares, is:

  1. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Explanation: The lasso works by shrinking the coefficients estimates to zero thus significantly reducing their variance. It also helps reduce the chance of overfitting, thus why it is less flexible.

(b) Repeat (a) for ridge regression relative to least squares.

  1. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Explanation: The ridge regression works similar to the lasso because it also shrinks the coefficients significantly reducing their variance but at the expense of a slight increase in bias.

(c) Repeat (a) for non-linear methods relative to least squares.

  1. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

Problem 9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

library(ISLR2)
attach(College)

(a) Split the data set into a training set and a test set.

set.seed(1)
inTrain <- sample(1:nrow(College), 0.8*nrow(College))
college_train <- College[inTrain,]
college_test <- College[-inTrain,]

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

The test error after fitting the linear model is 1567324.

lm_fit <- lm(Apps~., data=college_train)
summary(lm_fit)
## 
## Call:
## lm(formula = Apps ~ ., data = college_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5555.2  -404.6    19.9   310.3  7577.7 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -630.58238  435.56266  -1.448 0.148209    
## PrivateYes  -388.97393  148.87623  -2.613 0.009206 ** 
## Accept         1.69123    0.04433  38.153  < 2e-16 ***
## Enroll        -1.21543    0.20873  -5.823 9.41e-09 ***
## Top10perc     50.45622    5.88174   8.578  < 2e-16 ***
## Top25perc    -13.62655    4.67321  -2.916 0.003679 ** 
## F.Undergrad    0.08271    0.03632   2.277 0.023111 *  
## P.Undergrad    0.06555    0.03367   1.947 0.052008 .  
## Outstate      -0.07562    0.01987  -3.805 0.000156 ***
## Room.Board     0.14161    0.05130   2.760 0.005947 ** 
## Books          0.21161    0.25184   0.840 0.401102    
## Personal       0.01873    0.06604   0.284 0.776803    
## PhD           -9.72551    4.91228  -1.980 0.048176 *  
## Terminal      -0.48690    5.43302  -0.090 0.928620    
## S.F.Ratio     18.26146   13.83984   1.319 0.187508    
## perc.alumni    1.39008    4.39572   0.316 0.751934    
## Expend         0.05764    0.01254   4.595 5.26e-06 ***
## Grad.Rate      5.89480    3.11185   1.894 0.058662 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 993.8 on 603 degrees of freedom
## Multiple R-squared:  0.9347, Adjusted R-squared:  0.9328 
## F-statistic: 507.5 on 17 and 603 DF,  p-value: < 2.2e-16
lm_pred <- predict(lm_fit, college_test, type="response")
mean((lm_pred-college_test$Apps)^2)
## [1] 1567324

(c) Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.

The test error obtained after fitting the ridge regression model is 1441717.

library(glmnet)
set.seed(1)
train_mat <- model.matrix(Apps~., data = college_train)
test_mat = model.matrix(Apps~., data = college_test)
cv_out <- cv.glmnet(train_mat,college_train$Apps,alpha=0)
bestlam <- cv_out$lambda.min
bestlam
## [1] 362.9786
ridge_mod <- glmnet(train_mat,college_train$Apps,alpha = 0)
ridge_pred <- predict(ridge_mod,s=bestlam,newx = test_mat)
mean((ridge_pred - college_test$Apps)^2)
## [1] 1441717

(d) Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.

The test error of the lasso model on the training set is 1526566 with 3 non-zero coefficients.

set.seed(1)
cv_out2 <- cv.glmnet(train_mat,college_train$Apps,alpha=1)
bestlam2 <- cv_out2$lambda.min
bestlam2
## [1] 10.33776
lasso_mod <- glmnet(train_mat,college_train$Apps,alpha=1)
lasso_pred <- predict(lasso_mod,s=bestlam2,newx=test_mat)
mean((lasso_pred - college_test$Apps)^2)
## [1] 1526566
lasso_coef <- predict(lasso_mod, type="coefficients", s=bestlam)[1:18,]
lasso_coef[lasso_coef!=0]
##   (Intercept)        Accept     Top10perc        Expend 
## -1.807026e+02  1.349772e+00  1.645574e+01  1.014995e-03

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

The test error obtained after fitting a PCR model is 2322618 using a value of 10 as M, which was selected by cross validation.

library(pls)
set.seed(1)
pcr_fit <- pcr(Apps~., data = college_train, scale = TRUE, validation = "CV")
summary(pcr_fit)
## Data:    X dimension: 621 17 
##  Y dimension: 621 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3837     3756     2050     2056     1660     1574     1578
## adjCV         3837     3756     2047     2056     1642     1562     1576
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1564     1522     1509      1496      1501      1503      1496
## adjCV     1563     1515     1506      1493      1498      1500      1493
##        14 comps  15 comps  16 comps  17 comps
## CV         1497      1497      1199      1118
## adjCV      1494      1487      1187      1109
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      32.003    57.06    64.13    70.03    75.36    80.38    84.09    87.44
## Apps    4.441    72.01    72.02    81.86    83.67    83.77    84.01    85.14
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       90.47     92.83     94.91     96.78     97.86     98.72     99.36
## Apps    85.42     85.76     85.76     85.76     85.89     85.95     89.93
##       16 comps  17 comps
## X        99.83    100.00
## Apps     92.89     93.47
pcr_pred <- predict(pcr_fit, college_test, ncomp = 10)
mean((pcr_pred -college_test$Apps)^2)
## [1] 2322618

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

The test error obtained after fitting a PLS model on the training set was 1551898 with M as a value of 10 as selected by cross-validation.

set.seed(1)
pls_fit <- plsr(Apps~., data=college_train, scale=TRUE,validation="CV")
summary(pls_fit)
## Data:    X dimension: 621 17 
##  Y dimension: 621 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3837     1865     1623     1441     1380     1244     1166
## adjCV         3837     1862     1623     1437     1363     1222     1154
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1147     1133     1123      1122      1122      1121      1118
## adjCV     1136     1125     1115      1114      1114      1112      1110
##        14 comps  15 comps  16 comps  17 comps
## CV         1118      1118      1118      1118
## adjCV      1109      1110      1109      1109
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       25.55    45.38    62.59    65.08    67.55    72.02    75.93    80.46
## Apps    77.30    83.57    87.51    90.88    92.88    93.15    93.24    93.31
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       82.51     85.43     87.83     91.09     92.73     95.12     96.95
## Apps    93.39     93.42     93.45     93.46     93.47     93.47     93.47
##       16 comps  17 comps
## X        97.97    100.00
## Apps     93.47     93.47
validationplot(pls_fit, val.type = "MSEP")

pls_pred <- predict(pls_fit, college_test, ncomp = 10)
mean((pls_pred - college_test$Apps)^2)
## [1] 1551898

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

The test error for the linear model is 1567324, test error for the ridge regression model is 1441717, test error for the lasso model is 1526566, test error for the PCR model is 2322618, and the test error for the PLS model on the training set was 1551898. All these models have relatively similar test errors except the PCR model but the ridge model gives the best results.

detach(College)

Problem 11

We will now try to predict per capita crime rate in the Boston data set.

library(ISLR2)
attach(Boston)
set.seed(1)
inTrain <- sample(1:nrow(Boston), 0.8*nrow(Boston))
boston_train <- Boston[inTrain,]
boston_test <- Boston[-inTrain,]

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

The linear model produced a test error of 64.30352.

lm_fit <- lm(crim~., data=boston_train)
summary(lm_fit)
## 
## Call:
## lm(formula = crim ~ ., data = boston_train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.909 -1.991 -0.354  0.976 59.619 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14.424748   7.433437   1.941  0.05303 .  
## zn           0.037134   0.019236   1.930  0.05427 .  
## indus       -0.061103   0.084930  -0.719  0.47229    
## chas        -0.694068   1.202456  -0.577  0.56413    
## nox         -8.523151   5.455552  -1.562  0.11903    
## rm           0.272036   0.637903   0.426  0.67001    
## age          0.001956   0.018389   0.106  0.91536    
## dis         -0.840921   0.295902  -2.842  0.00472 ** 
## rad          0.590820   0.086960   6.794 4.08e-11 ***
## tax         -0.003601   0.005140  -0.701  0.48392    
## ptratio     -0.344734   0.195586  -1.763  0.07875 .  
## lstat        0.138942   0.077358   1.796  0.07325 .  
## medv        -0.181755   0.060290  -3.015  0.00274 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.006 on 391 degrees of freedom
## Multiple R-squared:  0.4603, Adjusted R-squared:  0.4437 
## F-statistic: 27.79 on 12 and 391 DF,  p-value: < 2.2e-16
lm_pred <- predict(lm_fit, boston_test, type="response")
mean((lm_pred-boston_test$crim)^2)
## [1] 64.30352

The Ridge model produced a test error of 65.04252.

library(glmnet)
set.seed(1)
train_mat <- model.matrix(crim~., data = boston_train)
test_mat = model.matrix(crim~., data = boston_test)
cv_out <- cv.glmnet(train_mat,boston_train$crim,alpha=1)
bestlam <- cv_out$lambda.min
bestlam
## [1] 0.04862993
ridge_mod <- glmnet(train_mat,boston_train$crim,alpha = 1)
ridge_pred <- predict(ridge_mod,s=bestlam,newx = test_mat)
mean((ridge_pred - boston_test$crim)^2)
## [1] 65.04252

The Lasso model produced a test error of 65.04252

set.seed(1)
cv_out2 <- cv.glmnet(train_mat,boston_train$crim ,alpha=1)
bestlam2 <- cv_out2$lambda.min
bestlam2
## [1] 0.04862993
lasso_mod <- glmnet(train_mat,boston_train$crim,alpha=1)
lasso_pred <- predict(lasso_mod,s=bestlam2,newx=test_mat)
mean((lasso_pred - boston_test$crim)^2)
## [1] 65.04252

The PCR model produced a test error of 67.85974.

library(pls)
set.seed(1)
pcr_fit <- pcr(crim~., data = boston_train, scale = TRUE, validation = "CV")
summary(pcr_fit)
## Data:    X dimension: 404 12 
##  Y dimension: 404 1
## Fit method: svdpc
## Number of components considered: 12
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           8.062    6.812    6.809    6.433    6.374    6.347    6.325
## adjCV        8.062    6.808    6.804    6.427    6.369    6.342    6.320
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps
## CV       6.183    6.198    6.184     6.193     6.176     6.094
## adjCV    6.176    6.191    6.177     6.186     6.166     6.084
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       50.37    64.05    73.18    80.30    86.59    90.14    92.71    94.86
## crim    29.85    30.02    37.77    38.96    39.64    40.17    42.82    43.05
##       9 comps  10 comps  11 comps  12 comps
## X       96.71     98.23     99.44    100.00
## crim    43.45     43.53     44.50     46.03
pcr_pred <- predict(pcr_fit, boston_test, ncomp = 7)
mean((pcr_pred -boston_test$crim)^2)
## [1] 67.85974

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

The linear model produced the best test error at 64.30352 and is a very simple model to implement and comprehend.

(c) Does your chosen model involve all of the features in the data set? Why or why not?

Yes, the chosen linear model involves all the possible predictors because through a few test I found just leaving all the predictors in gave the lowest test error.