Chapter 06 (page 259): 9, 11 ##2.For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:

~~i. More flexable and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.~~

~~ii. More flexable and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.~~

iii. Less flexable and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

~~iv. Less flexable and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.~~

(b) Repeat (a) for ridge regression relative to least squares.

~~i. More flexable and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.~~

~~ii. More flexable and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.~~

iii. Less flexable and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

~~iv. Less flexable and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.~~

(c) Repeat (a) for non-linear methods relative to least squares

~~i. More flexable and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.~~

ii. More flexable and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

~~iii. Less flexable and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.~~

~~iv. Less flexable and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.~~

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

(a) Split the data set into a training set and a test set.

I decided to set the seed because as I ran the probram over and over again i noticed i was getting different numbers then the previous run.

library(ISLR)

## Warning: package 'ISLR' was built under R version 3.5.3

attach(College)
set.seed(11)
faction<-sample(1:dim(College)[1],dim(College)[1]/2)
train<-College[faction, ]
test<-College[-faction, ]

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

lm.fit<-lm(Apps~.,data = train )
pred<-predict(lm.fit,test)
mean((pred-test$Apps)^2)

## [1] 1538442

(c) Fit a ridge regression model on the training set, with ?? chosen by cross-validation. Report the test error obtained.

library(glmnet)

## Warning: package 'glmnet' was built under R version 3.5.3

## Loading required package: Matrix

## Loading required package: foreach

## Warning: package 'foreach' was built under R version 3.5.3

## Loaded glmnet 2.0-16

X<-model.matrix(Apps~., data = train )
Y<- model.matrix(Apps~., data = test)
grid = 10^seq(10,-2, length= 100)
ridge.mod = cv.glmnet(X,train$Apps, alpha =0,lambda = grid, thresh = 1e-12)
lamba.best<-ridge.mod$lambda.min
lamba.best

## [1] 18.73817

Reported test error:

(d) Fit a lasso model on the training set, with ?? chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coe???cient estimates.

lasso.fit<-cv.glmnet(X,train$Apps, alpha=1,lambda = grid, thresh = 1e-12)
lambda.best1<-lasso.fit$lambda.min
lambda.best1

## [1] 18.73817

Now the Test Error

lasso.pred<-predict(lasso.fit, newx = Y, s = lambda.best1)
mean((test$Apps-lasso.pred)^2)

## [1] 1632226

Now for the Coeficients:

lasso.mod<-glmnet(model.matrix(Apps~., data = College),College$Apps, alpha = 1)
predict(lasso.mod,s = lambda.best1, type = "coefficients")

## 19 x 1 sparse Matrix of class "dgCMatrix"
##                         1
## (Intercept) -5.813707e+02
## (Intercept)  .           
## PrivateYes  -4.368911e+02
## Accept       1.471991e+00
## Enroll      -2.541925e-01
## Top10perc    3.574808e+01
## Top25perc   -3.838790e+00
## F.Undergrad  .           
## P.Undergrad  2.596168e-02
## Outstate    -6.179537e-02
## Room.Board   1.284644e-01
## Books        .           
## Personal     2.520096e-03
## PhD         -6.025350e+00
## Terminal    -3.256552e+00
## S.F.Ratio    6.026031e+00
## perc.alumni -8.955597e-01
## Expend       7.076914e-02
## Grad.Rate    5.580536e+00

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

library(pls)

## Warning: package 'pls' was built under R version 3.5.3

## 
## Attaching package: 'pls'

## The following object is masked from 'package:stats':
## 
##     loadings

pcr.fit<-pcr(Apps~.,data = train, scale = TRUE, validation = "CV")
validationplot(pcr.fit, val.type = "MSEP")

pcr.pred <- predict(pcr.fit, test, ncomp = 10)
mean((test[,"Apps"]- pcr.pred)^2)

## [1] 3014496

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

pls.fit<-plsr(Apps~.,data= train, scale = TRUE, validation= "CV")
validationplot(pls.fit,val.type = "MSEP")

pls.predict<-predict(pls.fit,test, ncomp=10)
mean((pls.predict-test$Apps)^2)

## [1] 1508987

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much di???erence among the test errors resulting from these ???ve approaches?

test.avg <- mean(test$Apps)

For question 11 I DID have help writting this functions, went to tutors in the stats lab who directed me to of course google. ALSO I COULD NOT FIGURE OUT THE BUG IN THE NEXT R SECTION SO I HAD TO SUBMIT IT LIKE THIS.

11. We will now try to predict per capita crime rate in the Boston data set.

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

#install.packages("MASS")
#library(MASS)
#library(leaps)
#library(glmnet)

#predict.regsubsets = function(object, newdata, id, ...) {
 #   form = as.formula(object$call[[2]])
  #  mat = model.matrix(form, newdata)
   # coefi = coef(object, id = id)
    #mat[, names(coefi)] %*% coefi
#}

#k = 10
#p = (ncol(Boston) -1)
#folds = sample(rep(1:k, length = nrow(Boston)))
#cv.errors = matrix(NA, k, p)
#for (i in 1:k) {
 #   best.fit = regsubsets(crim ~ ., data = Boston[folds != i, ], nvmax = p)
  #  for (j in 1:p) {
   #     pred = predict(best.fit, Boston[folds == i, ], id = j)
    #    cv.errors[i, j] = mean((Boston$crim[folds == i] - pred)^2)
#    }
#}
#rmse.cv = sqrt(apply(cv.errors, 2, mean))
#plot(rmse.cv, pch = 19, type = "b")

LASSO

#Boston
#x<-model.matrix(crim~.-1,data = Boston)
#Y = Boston$crim
#cv.lasso = cv.glmnet(x,Y,type.measure = "mse")
#plot(cv.lasso)
#coef(cv.lasso)

Ridge

#X<-model.matrix(crim~.-1,data = Boston)
#Y<-Boston$crim
#cv.ridge<-cv.glmnet(X,Y,type.measure = "mse", alpha = 0)
#plot(cv.ridge)

#coef(cv.ridge)

PCR

#library(pls)
#pcr.fit<-pcr(crim~.,data = Boston,scale =T, Validation = "CV")
#summary(pcr.fit)

Here we see PCR has the lowest cross validation.

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

The best subset mode will be the PCR, it has the lowest rate.

(c) Does your chosen model involve all of the features in the data set? Why or why not

No the model of PCR only accounts for 13 not all 14 because running str shows many “NAN” and “NA”.

Data Mining(Assignement 5)

Joshua Escareno

March 19, 2019

(a) The lasso, relative to least squares, is:

(b) Repeat (a) for ridge regression relative to least squares.

(c) Repeat (a) for non-linear methods relative to least squares

9. In this exercise, we will predict the number of applications received using the other variables in the College data set.

(a) Split the data set into a training set and a test set.

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

(c) Fit a ridge regression model on the training set, with ?? chosen by cross-validation. Report the test error obtained.

(d) Fit a lasso model on the training set, with ?? chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coe???cient estimates.

(e) Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

(f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much di???erence among the test errors resulting from these ???ve approaches?

For question 11 I DID have help writting this functions, went to tutors in the stats lab who directed me to of course google. ALSO I COULD NOT FIGURE OUT THE BUG IN THE NEXT R SECTION SO I HAD TO SUBMIT IT LIKE THIS.

11. We will now try to predict per capita crime rate in the Boston data set.

(a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider.

LASSO

Ridge

PCR

(b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error.

(c) Does your chosen model involve all of the features in the data set? Why or why not