Problem 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

  1. The lasso, relative to least squares, is:

iii.Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Explaination: This shrinkage is what reduces the variance of the predictions, at the cost of a small increase in bias.

  1. The ridge regression, relative to least squares, is:

iii.Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Explaination: The ridge regression won’t shrink coefficients of less-useful variables to exactly zero, but the shrinkage can reduce the variance, at the cost of an increase in bias.

  1. The non-linear method, relative to least squares, is:

ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. Explaination: Using a non-linear method is more flexible because we are making less assumptions about the functional form.

Problem 9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

  1. Split the data set into a training set and a test set.
set.seed(1)
train <- sample(c(TRUE, FALSE),length(College$Private),replace = T)
test <- (!train)
  1. Fit a linear model using least squares on the training set, and report the test error obtained.
set.seed(1)
lm.mod1 <- lm(Apps~., data = College, subset = train)

preds <- predict(lm.mod1, College[test,], type = "response")

TestMSE <- mean( (College[test,]$Apps - preds)^2 )

MSE = 984743.08

  1. Fit a ridge regression model on the training set, with λ chosen by cross-validation. Report the test error obtained.
y <- College[train,]$Apps
X <- data.matrix(subset(College[train,], select = -c(Apps)))
set.seed(1)
ridge.mod <- cv.glmnet(X, y, alpha = 0)

best_lambda <- ridge.mod$lambda.min

best.ridge <- glmnet(X, y, alpha = 0, lambda = best_lambda)

preds <- predict(best.ridge, s = best_lambda, newx = 
                   data.matrix(subset(College[test,], select = -c(Apps))))

TestMSE <- mean( (College[test,]$Apps - preds)^2 )

Using ridge regression with λ= 394.2364685, we obtain a test MSE of 941653.3.

  1. Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.
y <- College[train,]$Apps
X <- data.matrix(subset(College[train,], select = -c(Apps)))
set.seed(1)
lasso.mod <- cv.glmnet(X, y, alpha = 1)

best_lambda <- lasso.mod$lambda.min

best.lasso <- glmnet(X, y, alpha = 1, lambda = best_lambda)

preds <- predict(best.lasso, s = best_lambda, newx = 
                   data.matrix(subset(College[test,], select = -c(Apps))))

TestMSE <- mean( (College[test,]$Apps - preds)^2 )

N <- sum(coef(best.lasso)!=0)-1

Using lasso regression with λ=59.9204378, we obtain a test MSE of 993721.2. There are 8 non-zero coefficient estimates.

  1. Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
set.seed(1)
pcr.mod <- pcr(Apps~., data=College, subset=train, scale=TRUE, validation="CV")
validationplot(pcr.mod, val.type = "MSEP")

preds <- predict(pcr.mod, College[test,], ncomp = 10)
TestMSE <- mean( (College[test,]$Apps - preds)^2 )

Using principal components regression with M = 10, we obtain a test MSE of 1682909.

  1. Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.
set.seed(1)
plsr.mod <- plsr(Apps~., data=College, subset=train, scale=TRUE, validation="CV")
validationplot(plsr.mod, val.type = "MSEP")

preds <- predict(plsr.mod, College[test,], ncomp = 8)
TestMSE <- mean( (College[test,]$Apps - preds)^2 )

Using partial least squares regression with M = 8, we obtain a test MSE of 978534.3.

  1. Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

Results were comprable except principle components regression did not work as well.

Problem 11

We will now try to predict per capita crime rate in the Boston data set.

  1. Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.
sum(is.na(Boston))
## [1] 0
summary(Boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          lstat      
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   : 1.73  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.: 6.95  
##  Median : 5.000   Median :330.0   Median :19.05   Median :11.36  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :12.65  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:16.95  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :37.97  
##       medv      
##  Min.   : 5.00  
##  1st Qu.:17.02  
##  Median :21.20  
##  Mean   :22.53  
##  3rd Qu.:25.00  
##  Max.   :50.00
Boston$chas <- as.factor(Boston$chas)
contrasts(Boston$chas)
##   1
## 0 0
## 1 1
set.seed(1)
train <- sample(c(TRUE, FALSE), length(Boston$crim), replace = T)
test <- !(train)
set.seed(1)
best.subset <- regsubsets(crim~., data = Boston, subset=train, nvmax = 20)

best.subsetSummary <- summary(best.subset)
best.subsetSummary
## Subset selection object
## Call: regsubsets.formula(crim ~ ., data = Boston, subset = train, nvmax = 20)
## 12 Variables  (and intercept)
##         Forced in Forced out
## zn          FALSE      FALSE
## indus       FALSE      FALSE
## chas1       FALSE      FALSE
## nox         FALSE      FALSE
## rm          FALSE      FALSE
## age         FALSE      FALSE
## dis         FALSE      FALSE
## rad         FALSE      FALSE
## tax         FALSE      FALSE
## ptratio     FALSE      FALSE
## lstat       FALSE      FALSE
## medv        FALSE      FALSE
## 1 subsets of each size up to 12
## Selection Algorithm: exhaustive
##           zn  indus chas1 nox rm  age dis rad tax ptratio lstat medv
## 1  ( 1 )  " " " "   " "   " " " " " " " " "*" " " " "     " "   " " 
## 2  ( 1 )  " " " "   " "   " " " " " " " " "*" " " " "     "*"   " " 
## 3  ( 1 )  " " " "   " "   " " " " " " " " "*" " " " "     "*"   "*" 
## 4  ( 1 )  "*" " "   " "   " " " " " " "*" "*" " " " "     " "   "*" 
## 5  ( 1 )  "*" "*"   " "   " " " " " " "*" "*" " " " "     " "   "*" 
## 6  ( 1 )  "*" "*"   " "   " " " " " " "*" "*" " " " "     "*"   "*" 
## 7  ( 1 )  "*" " "   " "   "*" " " " " "*" "*" " " "*"     "*"   "*" 
## 8  ( 1 )  "*" "*"   " "   "*" " " " " "*" "*" " " "*"     "*"   "*" 
## 9  ( 1 )  "*" "*"   " "   "*" " " "*" "*" "*" " " "*"     "*"   "*" 
## 10  ( 1 ) "*" "*"   "*"   "*" " " "*" "*" "*" " " "*"     "*"   "*" 
## 11  ( 1 ) "*" "*"   "*"   "*" " " "*" "*" "*" "*" "*"     "*"   "*" 
## 12  ( 1 ) "*" "*"   "*"   "*" "*" "*" "*" "*" "*" "*"     "*"   "*"
plot(best.subsetSummary$adjr2)

plot(best.subsetSummary$bic)

Choosing the model with the best 6 variables because of Maslow’s Cp and it has close to the highest adjusted R2.

lm.6var <- lm(crim ~ zn + indus + dis + rad + lstat + medv, data = Boston, subset = train)

pred <- predict(lm.6var, Boston[test,], type = "response")

TestMSE <- mean( (Boston[test,]$crim - pred)^2 )

Using 6 variables, we obtain a test MSE of 49.64277.

y <- Boston[train,]$crim
X <- data.matrix(subset(Boston[train,], select = -c(crim)))
set.seed(1)
lasso.mod <- cv.glmnet(X, y, alpha = 1)

best_lambda <- lasso.mod$lambda.min

best.lasso <- glmnet(X, y, alpha = 1, lambda = best_lambda)

pred <- predict(best.lasso, s = best_lambda, newx = 
                   data.matrix(subset(Boston[test,], select = -c(crim))))

TestMSE <- mean( (Boston[test,]$crim - pred)^2 )

Using λ=0.0455808, we obtain a test MSE of 49.38006 with LASSO.

y <- Boston[train,]$crim
X <- data.matrix(subset(Boston[train,], select = -c(crim)))
set.seed(1)
ridge.mod <- cv.glmnet(X, y, alpha = 0)

best_lambda <- ridge.mod$lambda.min

best.ridge <- glmnet(X, y, alpha = 0, lambda = best_lambda)

pred <- predict(best.ridge, s = best_lambda, newx = 
                   data.matrix(subset(Boston[test,], select = -c(crim))))

TestMSE <- mean( (Boston[test,]$crim - pred)^2 )

Using λ= 0.5240686, we obtain a test MSE of 49.91203 with ridge.

set.seed(1)
pcr.mod <- pcr(crim~., data=Boston, subset=train, scale=TRUE, validation="CV")
validationplot(pcr.mod, val.type = "MSEP")

pred <- predict(pcr.mod, Boston[test,], ncomp = 7)
TestMSE <- mean( (Boston[test,]$crim - pred)^2 )

Using M = 7, we obtain a test MSE of 51.7224 with principle component regression.