HW 5 Data Mining

Chapter 6

Packages

set.seed(2000)
library(ISLR)
library(tidyverse)
library (glmnet)#For lasso and ridge
library(pls) # PCR

Question 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

The lasso, relative to least squares, is
- More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  - Lasso is less flexible as OLS because it assumes a linear relationship but will remove variables in the model. It is true that it increases bias and decreases variance of the \(\beta\) estimates. Regarding prediction accuracy it depends on the data set wither lasso will improve prediction accuracy. If all the variables have an effect on y it is possible that lasso will preform worse compared to OLS.
- More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  - Incorrect, Lasso will increase bias and lower variance
- Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  - Lasso is less flexible then OLS. Yes the Bias increases should be out weighed then it’s decrease in variance
- Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  - It is less flexible, Lasso should decrease variance.
The Ridge, relative to least squares, is
- More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  - Ridge is less flexible as OLS because it assumes a linear relationship but will remove variables in the model. It is true that it increases bias and decreases variance of the \(\beta\) estimates. Regarding prediction accuracy it depends on the data set wither lasso will improve prediction accuracy. If all the variables have an effect on y it is possible that lasso will preform worse compared to OLS.
- More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  - Incorrect, Ridge will increase bias and lower variance
- Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  - Ridge is less flexible then OLS. Yes the Bias increases should be out weighed then it’s decrease in variance
- Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  - It is less flexible, Ridge should decrease variance.
The non linear methods relative to least squares, is
- More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  - True, Non linear methods are more flexible then OLS and will improve prediciton accuracy when it the increase in variance is less then its bias.

Question 9

Predicting the number of applications received in the college data set

Reading in Data

df.r.9 <- College #Dataframe.raw.question 9

Training and Test set

partion <- .80
tt <- sample(nrow(df.r.9), nrow(df.r.9)*partion)
df.r.9.train <- df.r.9[tt,]
df.r.9.test <- df.r.9[-tt,]

Linear Model

lm.9.fit <- lm(Apps ~ ., data = df.r.9.train)
mean((df.r.9.test$Apps - predict(lm.9.fit, df.r.9.test))^2)

## [1] 1871831

Ridge Regression Model

Creating training and testing datasets

#grid =10^seq(10,-2, length =100)

x.9.train = model.matrix(Apps ~., df.r.9.train)
y.9.train = df.r.9.train$Apps

x.9.test = model.matrix(Apps ~., df.r.9.test)
y.9.test = df.r.9.test$Apps

Fitting the ridge regression with the best lambda value

ridge.9.cv = cv.glmnet(x.9.train,y.9.train,alpha =0)
bestlam.ridge = ridge.9.cv$lambda.min
ridge.9.fit = glmnet(x.9.train,y.9.train, alpha = 0, lambda = bestlam.ridge)

ridge.9.pred = predict(ridge.9.fit, newx =  x.9.test)
mean((y.9.test - ridge.9.pred )^2)

## [1] 3706089

Lasso Regression Model

lasso.9.cv = cv.glmnet(x.9.train,y.9.train,alpha = 1)
bestlam.lasso = ridge.9.cv$lambda.min
lasso.9.fit = glmnet(x.9.train,y.9.train, alpha = 1, lambda = bestlam.lasso)

lasso.9.pred = predict(lasso.9.fit, newx =  x.9.test)
colSums(coef(lasso.9.fit)[,] != 0)

## s0 
##  5

mean((y.9.test - lasso.9.pred )^2)

## [1] 2216886

Principal Compeonet Regression

pcr.9.fit = pcr(Apps~., data=df.r.9.train, scale=TRUE, validation ="CV")

summary(pcr.9.fit)

## Data:    X dimension: 621 17 
##  Y dimension: 621 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3561     3506     1692     1691     1693     1373     1319
## adjCV         3561     3506     1690     1692     1724     1349     1316
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1285     1254     1217      1214      1218      1219      1215
## adjCV     1283     1250     1215      1212      1215      1217      1212
##        14 comps  15 comps  16 comps  17 comps
## CV         1213      1213      1080      1064
## adjCV      1211      1211      1077      1060
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X      31.336    57.04    64.37    70.13    75.61    80.78    84.58    88.06
## Apps    3.688    77.99    77.99    78.00    86.88    87.00    87.72    88.35
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       91.01     93.23     95.27     97.03     98.06     98.92     99.45
## Apps    89.03     89.20     89.20     89.24     89.35     89.38     89.39
##       16 comps  17 comps
## X        99.86    100.00
## Apps     91.75     92.24

validationplot(pcr.9.fit) #Let's say M = 16 is the lowest

pcr.9.pred = predict(pcr.9.fit, df.r.9.test, ncomp = 16)

mean((pcr.9.pred - df.r.9.test$Apps)^2)

## [1] 2194053

Partial Least Squares

plsr.9.fit=plsr(Apps~., data=df.r.9.train, scale=TRUE, validation ="CV")
summary(plsr.9.fit)

## Data:    X dimension: 621 17 
##  Y dimension: 621 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV            3561     1533     1302     1184     1171     1154     1111
## adjCV         3561     1531     1304     1182     1164     1132     1102
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        1092     1088     1082      1082      1078      1080      1080
## adjCV     1086     1083     1077      1077      1073      1075      1075
##        14 comps  15 comps  16 comps  17 comps
## CV         1080      1081      1081      1081
## adjCV      1075      1076      1076      1076
## 
## TRAINING: % variance explained
##       1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X       25.98    43.13    62.24    65.29    66.95    71.75    76.24    78.90
## Apps    81.88    87.11    89.61    90.59    91.76    91.96    92.05    92.11
##       9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15 comps
## X       82.28     84.93     87.01     90.65     94.26     96.35     97.37
## Apps    92.14     92.18     92.22     92.23     92.23     92.24     92.24
##       16 comps  17 comps
## X        97.99    100.00
## Apps     92.24     92.24

validationplot(plsr.9.fit) #Let's say M = 5 is the lowest

plsr.9.pred = predict(plsr.9.fit, df.r.9.test, ncomp = 5)

mean((plsr.9.pred - df.r.9.test$Apps)^2)

## [1] 2338137

Results

From running the various models we saw that PCR had the smallest MSE. And PLSR had the highest MSE. PCR had 16 components which there are only 17 variables so we see that PCR is the closest to linear regression. Linear regression being the second best model.