For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.
More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
iii.Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Explaination: This shrinkage is what reduces the variance of the predictions, at the cost of a small increase in bias.
iii.Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. Explaination: The ridge regression won’t shrink coefficients of less-useful variables to exactly zero, but the shrinkage can reduce the variance, at the cost of an increase in bias.
ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias. Explaination: Using a non-linear method is more flexible because we are making less assumptions about the functional form.
In this exercise, we will predict the number of applications received using the other variables in the College data set.
set.seed(1)
train <- sample(c(TRUE, FALSE),length(College$Private),replace = T)
test <- (!train)
set.seed(1)
lm.mod1 <- lm(Apps~., data = College, subset = train)
preds <- predict(lm.mod1, College[test,], type = "response")
TestMSE <- mean( (College[test,]$Apps - preds)^2 )
MSE = 984743.08
y <- College[train,]$Apps
X <- data.matrix(subset(College[train,], select = -c(Apps)))
set.seed(1)
ridge.mod <- cv.glmnet(X, y, alpha = 0)
best_lambda <- ridge.mod$lambda.min
best.ridge <- glmnet(X, y, alpha = 0, lambda = best_lambda)
preds <- predict(best.ridge, s = best_lambda, newx =
data.matrix(subset(College[test,], select = -c(Apps))))
TestMSE <- mean( (College[test,]$Apps - preds)^2 )
Using ridge regression with λ= 394.2364685, we obtain a test MSE of 941653.3.
y <- College[train,]$Apps
X <- data.matrix(subset(College[train,], select = -c(Apps)))
set.seed(1)
lasso.mod <- cv.glmnet(X, y, alpha = 1)
best_lambda <- lasso.mod$lambda.min
best.lasso <- glmnet(X, y, alpha = 1, lambda = best_lambda)
preds <- predict(best.lasso, s = best_lambda, newx =
data.matrix(subset(College[test,], select = -c(Apps))))
TestMSE <- mean( (College[test,]$Apps - preds)^2 )
N <- sum(coef(best.lasso)!=0)-1
Using lasso regression with λ=59.9204378, we obtain a test MSE of 993721.2. There are 8 non-zero coefficient estimates.
set.seed(1)
pcr.mod <- pcr(Apps~., data=College, subset=train, scale=TRUE, validation="CV")
validationplot(pcr.mod, val.type = "MSEP")
preds <- predict(pcr.mod, College[test,], ncomp = 10)
TestMSE <- mean( (College[test,]$Apps - preds)^2 )
Using principal components regression with M = 10, we obtain a test MSE of 1682909.
set.seed(1)
plsr.mod <- plsr(Apps~., data=College, subset=train, scale=TRUE, validation="CV")
validationplot(plsr.mod, val.type = "MSEP")
preds <- predict(plsr.mod, College[test,], ncomp = 8)
TestMSE <- mean( (College[test,]$Apps - preds)^2 )
Using partial least squares regression with M = 8, we obtain a test MSE of 978534.3.
Results were comprable except principle components regression did not work as well.
We will now try to predict per capita crime rate in the Boston data set.
sum(is.na(Boston))
## [1] 0
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio lstat
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 1.73
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.: 6.95
## Median : 5.000 Median :330.0 Median :19.05 Median :11.36
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :12.65
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:16.95
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :37.97
## medv
## Min. : 5.00
## 1st Qu.:17.02
## Median :21.20
## Mean :22.53
## 3rd Qu.:25.00
## Max. :50.00
Boston$chas <- as.factor(Boston$chas)
contrasts(Boston$chas)
## 1
## 0 0
## 1 1
set.seed(1)
train <- sample(c(TRUE, FALSE), length(Boston$crim), replace = T)
test <- !(train)
set.seed(1)
best.subset <- regsubsets(crim~., data = Boston, subset=train, nvmax = 20)
best.subsetSummary <- summary(best.subset)
best.subsetSummary
## Subset selection object
## Call: regsubsets.formula(crim ~ ., data = Boston, subset = train, nvmax = 20)
## 12 Variables (and intercept)
## Forced in Forced out
## zn FALSE FALSE
## indus FALSE FALSE
## chas1 FALSE FALSE
## nox FALSE FALSE
## rm FALSE FALSE
## age FALSE FALSE
## dis FALSE FALSE
## rad FALSE FALSE
## tax FALSE FALSE
## ptratio FALSE FALSE
## lstat FALSE FALSE
## medv FALSE FALSE
## 1 subsets of each size up to 12
## Selection Algorithm: exhaustive
## zn indus chas1 nox rm age dis rad tax ptratio lstat medv
## 1 ( 1 ) " " " " " " " " " " " " " " "*" " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " "*" " " " " "*" " "
## 3 ( 1 ) " " " " " " " " " " " " " " "*" " " " " "*" "*"
## 4 ( 1 ) "*" " " " " " " " " " " "*" "*" " " " " " " "*"
## 5 ( 1 ) "*" "*" " " " " " " " " "*" "*" " " " " " " "*"
## 6 ( 1 ) "*" "*" " " " " " " " " "*" "*" " " " " "*" "*"
## 7 ( 1 ) "*" " " " " "*" " " " " "*" "*" " " "*" "*" "*"
## 8 ( 1 ) "*" "*" " " "*" " " " " "*" "*" " " "*" "*" "*"
## 9 ( 1 ) "*" "*" " " "*" " " "*" "*" "*" " " "*" "*" "*"
## 10 ( 1 ) "*" "*" "*" "*" " " "*" "*" "*" " " "*" "*" "*"
## 11 ( 1 ) "*" "*" "*" "*" " " "*" "*" "*" "*" "*" "*" "*"
## 12 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
plot(best.subsetSummary$adjr2)
plot(best.subsetSummary$bic)
Choosing the model with the best 6 variables because of Maslow’s Cp and it has close to the highest adjusted R2.
lm.6var <- lm(crim ~ zn + indus + dis + rad + lstat + medv, data = Boston, subset = train)
pred <- predict(lm.6var, Boston[test,], type = "response")
TestMSE <- mean( (Boston[test,]$crim - pred)^2 )
Using 6 variables, we obtain a test MSE of 49.64277.
y <- Boston[train,]$crim
X <- data.matrix(subset(Boston[train,], select = -c(crim)))
set.seed(1)
lasso.mod <- cv.glmnet(X, y, alpha = 1)
best_lambda <- lasso.mod$lambda.min
best.lasso <- glmnet(X, y, alpha = 1, lambda = best_lambda)
pred <- predict(best.lasso, s = best_lambda, newx =
data.matrix(subset(Boston[test,], select = -c(crim))))
TestMSE <- mean( (Boston[test,]$crim - pred)^2 )
Using λ=0.0455808, we obtain a test MSE of 49.38006 with LASSO.
y <- Boston[train,]$crim
X <- data.matrix(subset(Boston[train,], select = -c(crim)))
set.seed(1)
ridge.mod <- cv.glmnet(X, y, alpha = 0)
best_lambda <- ridge.mod$lambda.min
best.ridge <- glmnet(X, y, alpha = 0, lambda = best_lambda)
pred <- predict(best.ridge, s = best_lambda, newx =
data.matrix(subset(Boston[test,], select = -c(crim))))
TestMSE <- mean( (Boston[test,]$crim - pred)^2 )
Using λ= 0.5240686, we obtain a test MSE of 49.91203 with ridge.
set.seed(1)
pcr.mod <- pcr(crim~., data=Boston, subset=train, scale=TRUE, validation="CV")
validationplot(pcr.mod, val.type = "MSEP")
pred <- predict(pcr.mod, Boston[test,], ncomp = 7)
TestMSE <- mean( (Boston[test,]$crim - pred)^2 )
Using M = 7, we obtain a test MSE of 51.7224 with principle component regression.