Nicole-Brown-Algorithms-II-HW-7

Problem 3. Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of ˆpm1. The xaxis should display ˆpm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy. Hint: In a setting with two classes, ˆpm1 = 1− ˆpm2. You could make this plot by hand, but it will be much easier to make in R.

p=seq(0,1,0.01)

gini= 2*p*(1-p)
classerror= 1-pmax(p,1-p)
crossentropy= -(p*log(p)+(1-p)*log(1-p))

plot(NA,NA,xlim=c(0,1),ylim=c(0,1),xlab='pm1',ylab='gini index')

lines(p,gini,type='l', col='purple')
lines(p,classerror,col='darkgreen')
lines(p,crossentropy,col='darkblue')

legend(x='top',legend=c('gini','class error','cross entropy'),
       col=c('purple','darkgreen','darkblue'),lty=1,text.width = .4)
grid()

Problem 8. In the lab, a classification tree was applied to the [Carseats]:https://rdrr.io/cran/ISLR/man/Carseats.html data set after converting `Sales` into a qualitative response variable. Now we will seek to predict `Sales` using regression trees and related approaches,treating the response as a quantitative variable.

(a) Split the data set into a training set and a test set.

library(ISLR)
attach(Carseats)
library(caret)
library(rpart)

#rm(list = ls())

set.seed(0)
inTrain<-createDataPartition(Carseats$Sales, p=0.7, list = FALSE)
train<-Carseats[inTrain,]
dim(Carseats)

## [1] 400  11

(b) Fit a regression tree to the training set. Plot the tree, and interpret the results. What test MSE do you obtain?

tree.carseats<-rpart(Sales~., data=Carseats, method="anova", control=rpart.control(minsplit=15, cp=0.1))
summary(tree.carseats)

## Call:
## rpart(formula = Sales ~ ., data = Carseats, method = "anova", 
##     control = rpart.control(minsplit = 15, cp = 0.1))
##   n= 400 
## 
##          CP nsplit rel error    xerror       xstd
## 1 0.2505104      0 1.0000000 1.0076850 0.06930296
## 2 0.1050726      1 0.7494896 0.7568962 0.05135891
## 3 0.1000000      2 0.6444171 0.7262632 0.05014727
## 
## Variable importance
##  ShelveLoc      Price  CompPrice Population 
##         65         27          7          1 
## 
## Node number 1: 400 observations,    complexity param=0.2505104
##   mean=7.496325, MSE=7.955687 
##   left son=2 (315 obs) right son=3 (85 obs)
##   Primary splits:
##       ShelveLoc   splits as  LRL,       improve=0.25051040, (0 missing)
##       Price       < 94.5  to the right, improve=0.14251530, (0 missing)
##       Advertising < 7.5   to the left,  improve=0.07303226, (0 missing)
##       Age         < 61.5  to the right, improve=0.07120203, (0 missing)
##       US          splits as  LR,        improve=0.03136203, (0 missing)
## 
## Node number 2: 315 observations,    complexity param=0.1050726
##   mean=6.762984, MSE=5.903364 
##   left son=4 (207 obs) right son=5 (108 obs)
##   Primary splits:
##       Price       < 105.5 to the right, improve=0.17981130, (0 missing)
##       ShelveLoc   splits as  L-R,       improve=0.11418740, (0 missing)
##       Advertising < 7.5   to the left,  improve=0.09324535, (0 missing)
##       Age         < 68.5  to the right, improve=0.06549277, (0 missing)
##       Income      < 60.5  to the left,  improve=0.04926766, (0 missing)
##   Surrogate splits:
##       CompPrice  < 113.5 to the right, agree=0.749, adj=0.269, (0 split)
##       Population < 507.5 to the left,  agree=0.667, adj=0.028, (0 split)
##       Income     < 22.5  to the right, agree=0.660, adj=0.009, (0 split)
## 
## Node number 3: 85 observations
##   mean=10.214, MSE=6.182615 
## 
## Node number 4: 207 observations
##   mean=6.018792, MSE=4.621123 
## 
## Node number 5: 108 observations
##   mean=8.189352, MSE=5.264976

library(rattle)
fancyRpartPlot(tree.carseats)

pred.carseats=predict(tree.carseats,Carseats[-inTrain,])
mean((Carseats[-inTrain,]$Sales-pred.carseats))^2

## [1] 3.219636e-05

The test MSE is 3.219636e-05. We can see from the plot that ShelvLoc is the most important variable and that there are a total of 5 nodes.

(c) Use cross-validation in order to determine the optimal level of tree complexity. Does pruning the tree improve the test MSE?

plotcp(tree.carseats)

printcp(tree.carseats)

## 
## Regression tree:
## rpart(formula = Sales ~ ., data = Carseats, method = "anova", 
##     control = rpart.control(minsplit = 15, cp = 0.1))
## 
## Variables actually used in tree construction:
## [1] Price     ShelveLoc
## 
## Root node error: 3182.3/400 = 7.9557
## 
## n= 400 
## 
##        CP nsplit rel error  xerror     xstd
## 1 0.25051      0   1.00000 1.00769 0.069303
## 2 0.10507      1   0.74949 0.75690 0.051359
## 3 0.10000      2   0.64442 0.72626 0.050147

#To get the lowest cp value. 
tree.carseats$cptable[which.min(tree.carseats$cptable[,"xerror"]), "CP"]

## [1] 0.1

# To prune the tree based on the lowest cp value
Carseats.prune<-prune(tree.carseats, cp=tree.carseats$cptable[which.min(tree.carseats$cptable[,"xerror"]), "CP"])
fancyRpartPlot(Carseats.prune)

pred2.carseats=predict(Carseats.prune, Carseats[-inTrain,])
mean((Carseats[-inTrain,]$Sales-pred2.carseats))^2

## [1] 3.219636e-05

Pruning the tree does not improve the test MSE

(d) Use the bagging approach in order to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important.

train_control<-trainControl(method="repeatedcv", number = 10, repeats= 3)

library(randomForest)
carseats.bag <- randomForest(Sales ~ ., data = train, mtry = 10, 
    importance = T)
pred.bag <- predict(carseats.bag, Carseats[-inTrain,])
mean((Carseats[-inTrain,]$Sales - pred.bag)^2)

## [1] 3.088025

The test MSE is 3.064892

varImp(carseats.bag)

##                Overall
## CompPrice   29.6554418
## Income       4.7553208
## Advertising 33.6804892
## Population  -1.3794291
## Price       66.6085059
## ShelveLoc   71.0262997
## Age         17.0521900
## Education   -0.2063884
## Urban       -1.4805490
## US           3.1558671

The important variables are ShelveLoc and Price

(e) Use random forests to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important. Describe the effect of m, the number of variables considered at each split, on the error rate obtained.

library(caret)
Carseats.rf<-train(Sales~., data=train, method='rf', trControl=trainControl("cv", number=10), importance=TRUE)

Carseats.rf

## Random Forest 
## 
## 281 samples
##  10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 253, 253, 253, 253, 253, 253, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE     
##    2    1.837082  0.6573744  1.479132
##    6    1.604996  0.6903955  1.282460
##   11    1.585937  0.6844050  1.261302
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 11.

pred.Carseats.rf<-predict(Carseats.rf, Carseats[-inTrain,])

mean((Carseats[-inTrain,]$Sales - pred.Carseats.rf)^2)

## [1] 3.137676

varImp(Carseats.rf)

## rf variable importance
## 
##                  Overall
## ShelveLocGood   100.0000
## Price            97.6162
## Advertising      49.4827
## CompPrice        42.3446
## ShelveLocMedium  38.9101
## Age              25.1378
## Income           10.0008
## USYes             9.6621
## Education         2.2647
## UrbanYes          0.5605
## Population        0.0000

plot(varImp(Carseats.rf))

The test MSE is 3.137676 which is slightly higher than the bagged method. This model leverages all 11 variables for each split which only slightly reduced the training MSE.ShelveLoc and Price continue to be the most important variables in the data set.

Problem 9. This problem involves the OJ data set which is part of the ISLR package.

attach(OJ)

(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

dim(OJ)

## [1] 1070   18

set.seed(1)
inTrain=sample(1:nrow(OJ),800)

train=OJ[inTrain,]
test=OJ[-inTrain,]

(b) Fit a tree to the training data, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?

control=rpart.control(minsplit=15, cp=0.1)

tree.oj<-rpart(Purchase~., data=train, method="class")
summary(tree.oj)

## Call:
## rpart(formula = Purchase ~ ., data = train, method = "class")
##   n= 800 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.50476190      0 1.0000000 1.0000000 0.04387030
## 2 0.01904762      1 0.4952381 0.5365079 0.03665241
## 3 0.01587302      4 0.4253968 0.5174603 0.03616662
## 4 0.01269841      6 0.3936508 0.5174603 0.03616662
## 5 0.01000000      8 0.3682540 0.5047619 0.03583208
## 
## Variable importance
##        LoyalCH      PriceDiff    SalePriceMM        PriceMM        StoreID 
##             46              8              8              6              6 
##         DiscMM WeekofPurchase      PctDiscMM        PriceCH  ListPriceDiff 
##              5              5              5              3              2 
##          STORE      SpecialCH      SpecialMM         Store7    SalePriceCH 
##              2              2              1              1              1 
## 
## Node number 1: 800 observations,    complexity param=0.5047619
##   predicted class=CH  expected loss=0.39375  P(node) =1
##     class counts:   485   315
##    probabilities: 0.606 0.394 
##   left son=2 (489 obs) right son=3 (311 obs)
##   Primary splits:
##       LoyalCH   < 0.48285   to the right, improve=133.25810, (0 missing)
##       StoreID   < 3.5       to the right, improve= 44.40685, (0 missing)
##       Store7    splits as  RL, improve= 28.30298, (0 missing)
##       STORE     < 0.5       to the left,  improve= 28.30298, (0 missing)
##       PriceDiff < 0.015     to the right, improve= 22.29786, (0 missing)
##   Surrogate splits:
##       StoreID        < 3.5       to the right, agree=0.646, adj=0.090, (0 split)
##       PriceMM        < 1.89      to the right, agree=0.628, adj=0.042, (0 split)
##       WeekofPurchase < 236.5     to the right, agree=0.625, adj=0.035, (0 split)
##       PriceCH        < 1.72      to the right, agree=0.622, adj=0.029, (0 split)
##       SpecialMM      < 0.5       to the left,  agree=0.619, adj=0.019, (0 split)
## 
## Node number 2: 489 observations,    complexity param=0.01904762
##   predicted class=CH  expected loss=0.1635992  P(node) =0.61125
##     class counts:   409    80
##    probabilities: 0.836 0.164 
##   left son=4 (261 obs) right son=5 (228 obs)
##   Primary splits:
##       LoyalCH       < 0.7645725 to the right, improve=16.514490, (0 missing)
##       PriceDiff     < 0.015     to the right, improve=14.720370, (0 missing)
##       SalePriceMM   < 1.84      to the right, improve=10.965130, (0 missing)
##       ListPriceDiff < 0.255     to the right, improve= 8.289196, (0 missing)
##       SpecialMM     < 0.5       to the left,  improve= 7.093301, (0 missing)
##   Surrogate splits:
##       WeekofPurchase < 257.5     to the right, agree=0.607, adj=0.158, (0 split)
##       SalePriceMM    < 1.84      to the right, agree=0.595, adj=0.132, (0 split)
##       PriceMM        < 2.04      to the right, agree=0.593, adj=0.127, (0 split)
##       PriceDiff      < 0.015     to the right, agree=0.593, adj=0.127, (0 split)
##       PriceCH        < 1.825     to the right, agree=0.589, adj=0.118, (0 split)
## 
## Node number 3: 311 observations,    complexity param=0.01587302
##   predicted class=MM  expected loss=0.244373  P(node) =0.38875
##     class counts:    76   235
##    probabilities: 0.244 0.756 
##   left son=6 (134 obs) right son=7 (177 obs)
##   Primary splits:
##       LoyalCH   < 0.280875  to the right, improve=9.721989, (0 missing)
##       PriceDiff < 0.49      to the right, improve=6.531048, (0 missing)
##       STORE     < 1.5       to the left,  improve=6.506024, (0 missing)
##       StoreID   < 3.5       to the right, improve=6.184411, (0 missing)
##       Store7    splits as  RL, improve=5.771670, (0 missing)
##   Surrogate splits:
##       STORE          < 1.5       to the left,  agree=0.624, adj=0.127, (0 split)
##       StoreID        < 1.5       to the left,  agree=0.617, adj=0.112, (0 split)
##       SalePriceCH    < 1.775     to the left,  agree=0.595, adj=0.060, (0 split)
##       PriceDiff      < 0.325     to the right, agree=0.592, adj=0.052, (0 split)
##       WeekofPurchase < 275.5     to the right, agree=0.588, adj=0.045, (0 split)
## 
## Node number 4: 261 observations
##   predicted class=CH  expected loss=0.04214559  P(node) =0.32625
##     class counts:   250    11
##    probabilities: 0.958 0.042 
## 
## Node number 5: 228 observations,    complexity param=0.01904762
##   predicted class=CH  expected loss=0.3026316  P(node) =0.285
##     class counts:   159    69
##    probabilities: 0.697 0.303 
##   left son=10 (148 obs) right son=11 (80 obs)
##   Primary splits:
##       PriceDiff     < 0.015     to the right, improve=18.285490, (0 missing)
##       ListPriceDiff < 0.235     to the right, improve=16.816390, (0 missing)
##       SalePriceMM   < 1.84      to the right, improve=13.398910, (0 missing)
##       SpecialMM     < 0.5       to the left,  improve= 8.988505, (0 missing)
##       DiscMM        < 0.15      to the left,  improve= 8.823708, (0 missing)
##   Surrogate splits:
##       SalePriceMM   < 1.84      to the right, agree=0.961, adj=0.888, (0 split)
##       PctDiscMM     < 0.1155095 to the left,  agree=0.890, adj=0.688, (0 split)
##       DiscMM        < 0.15      to the left,  agree=0.873, adj=0.638, (0 split)
##       PriceMM       < 2.04      to the right, agree=0.794, adj=0.413, (0 split)
##       ListPriceDiff < 0.18      to the right, agree=0.789, adj=0.400, (0 split)
## 
## Node number 6: 134 observations,    complexity param=0.01587302
##   predicted class=MM  expected loss=0.3880597  P(node) =0.1675
##     class counts:    52    82
##    probabilities: 0.388 0.612 
##   left son=12 (58 obs) right son=13 (76 obs)
##   Primary splits:
##       SalePriceMM < 2.04      to the right, improve=8.030176, (0 missing)
##       PriceDiff   < 0.05      to the right, improve=5.930605, (0 missing)
##       DiscMM      < 0.22      to the left,  improve=4.398151, (0 missing)
##       PctDiscMM   < 0.0729725 to the left,  improve=4.080526, (0 missing)
##       SpecialCH   < 0.5       to the right, improve=4.027225, (0 missing)
##   Surrogate splits:
##       PriceDiff      < 0.135     to the right, agree=0.896, adj=0.759, (0 split)
##       PriceMM        < 2.04      to the right, agree=0.799, adj=0.534, (0 split)
##       DiscMM         < 0.08      to the left,  agree=0.784, adj=0.500, (0 split)
##       PctDiscMM      < 0.038887  to the left,  agree=0.784, adj=0.500, (0 split)
##       WeekofPurchase < 244       to the right, agree=0.739, adj=0.397, (0 split)
## 
## Node number 7: 177 observations
##   predicted class=MM  expected loss=0.1355932  P(node) =0.22125
##     class counts:    24   153
##    probabilities: 0.136 0.864 
## 
## Node number 10: 148 observations
##   predicted class=CH  expected loss=0.1554054  P(node) =0.185
##     class counts:   125    23
##    probabilities: 0.845 0.155 
## 
## Node number 11: 80 observations,    complexity param=0.01904762
##   predicted class=MM  expected loss=0.425  P(node) =0.1
##     class counts:    34    46
##    probabilities: 0.425 0.575 
##   left son=22 (38 obs) right son=23 (42 obs)
##   Primary splits:
##       StoreID        < 3.5       to the right, improve=6.177694, (0 missing)
##       ListPriceDiff  < 0.235     to the right, improve=4.729091, (0 missing)
##       WeekofPurchase < 240.5     to the left,  improve=4.130644, (0 missing)
##       DiscMM         < 0.47      to the left,  improve=3.141026, (0 missing)
##       PctDiscMM      < 0.227263  to the left,  improve=3.141026, (0 missing)
##   Surrogate splits:
##       Store7         splits as  RL, agree=0.850, adj=0.684, (0 split)
##       STORE          < 0.5       to the left,  agree=0.850, adj=0.684, (0 split)
##       WeekofPurchase < 238       to the left,  agree=0.775, adj=0.526, (0 split)
##       SpecialMM      < 0.5       to the left,  agree=0.725, adj=0.421, (0 split)
##       PriceCH        < 1.825     to the left,  agree=0.688, adj=0.342, (0 split)
## 
## Node number 12: 58 observations
##   predicted class=CH  expected loss=0.4137931  P(node) =0.0725
##     class counts:    34    24
##    probabilities: 0.586 0.414 
## 
## Node number 13: 76 observations,    complexity param=0.01269841
##   predicted class=MM  expected loss=0.2368421  P(node) =0.095
##     class counts:    18    58
##    probabilities: 0.237 0.763 
##   left son=26 (12 obs) right son=27 (64 obs)
##   Primary splits:
##       SpecialCH      < 0.5       to the right, improve=5.265351, (0 missing)
##       PriceDiff      < -0.24     to the right, improve=1.925297, (0 missing)
##       WeekofPurchase < 234       to the left,  improve=1.903618, (0 missing)
##       DiscMM         < 0.47      to the left,  improve=1.598684, (0 missing)
##       PctDiscMM      < 0.227263  to the left,  improve=1.598684, (0 missing)
##   Surrogate splits:
##       DiscCH      < 0.25      to the right, agree=0.868, adj=0.167, (0 split)
##       SalePriceCH < 1.49      to the left,  agree=0.868, adj=0.167, (0 split)
##       PctDiscCH   < 0.1366045 to the right, agree=0.868, adj=0.167, (0 split)
## 
## Node number 22: 38 observations,    complexity param=0.01269841
##   predicted class=CH  expected loss=0.3684211  P(node) =0.0475
##     class counts:    24    14
##    probabilities: 0.632 0.368 
##   left son=44 (30 obs) right son=45 (8 obs)
##   Primary splits:
##       WeekofPurchase < 272.5     to the left,  improve=2.950877, (0 missing)
##       PriceCH        < 1.89      to the left,  improve=1.455639, (0 missing)
##       PriceMM        < 2.04      to the left,  improve=1.455639, (0 missing)
##       LoyalCH        < 0.5039495 to the right, improve=1.455639, (0 missing)
##       SalePriceCH    < 1.89      to the left,  improve=1.455639, (0 missing)
##   Surrogate splits:
##       PriceCH     < 1.89      to the left,  agree=0.947, adj=0.75, (0 split)
##       PriceMM     < 2.04      to the left,  agree=0.947, adj=0.75, (0 split)
##       SalePriceCH < 1.89      to the left,  agree=0.947, adj=0.75, (0 split)
##       PriceDiff   < -0.25     to the right, agree=0.947, adj=0.75, (0 split)
##       DiscMM      < 0.47      to the left,  agree=0.895, adj=0.50, (0 split)
## 
## Node number 23: 42 observations
##   predicted class=MM  expected loss=0.2380952  P(node) =0.0525
##     class counts:    10    32
##    probabilities: 0.238 0.762 
## 
## Node number 26: 12 observations
##   predicted class=CH  expected loss=0.3333333  P(node) =0.015
##     class counts:     8     4
##    probabilities: 0.667 0.333 
## 
## Node number 27: 64 observations
##   predicted class=MM  expected loss=0.15625  P(node) =0.08
##     class counts:    10    54
##    probabilities: 0.156 0.844 
## 
## Node number 44: 30 observations
##   predicted class=CH  expected loss=0.2666667  P(node) =0.0375
##     class counts:    22     8
##    probabilities: 0.733 0.267 
## 
## Node number 45: 8 observations
##   predicted class=MM  expected loss=0.25  P(node) =0.01
##     class counts:     2     6
##    probabilities: 0.250 0.750

There are 9 terminal nodes. This model leverages LoyalCH, PriceDiff, SalePriceMM, SpecialCH, StoreID, and WeekofPurchase. The cross validated error rate is 0.4666667.

(c) Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed.

tree.oj

## n= 800 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 800 315 CH (0.60625000 0.39375000)  
##    2) LoyalCH>=0.48285 489  80 CH (0.83640082 0.16359918)  
##      4) LoyalCH>=0.7645725 261  11 CH (0.95785441 0.04214559) *
##      5) LoyalCH< 0.7645725 228  69 CH (0.69736842 0.30263158)  
##       10) PriceDiff>=0.015 148  23 CH (0.84459459 0.15540541) *
##       11) PriceDiff< 0.015 80  34 MM (0.42500000 0.57500000)  
##         22) StoreID>=3.5 38  14 CH (0.63157895 0.36842105)  
##           44) WeekofPurchase< 272.5 30   8 CH (0.73333333 0.26666667) *
##           45) WeekofPurchase>=272.5 8   2 MM (0.25000000 0.75000000) *
##         23) StoreID< 3.5 42  10 MM (0.23809524 0.76190476) *
##    3) LoyalCH< 0.48285 311  76 MM (0.24437299 0.75562701)  
##      6) LoyalCH>=0.280875 134  52 MM (0.38805970 0.61194030)  
##       12) SalePriceMM>=2.04 58  24 CH (0.58620690 0.41379310) *
##       13) SalePriceMM< 2.04 76  18 MM (0.23684211 0.76315789)  
##         26) SpecialCH>=0.5 12   4 CH (0.66666667 0.33333333) *
##         27) SpecialCH< 0.5 64  10 MM (0.15625000 0.84375000) *
##      7) LoyalCH< 0.280875 177  24 MM (0.13559322 0.86440678) *

Terminal node 4 begins at root variable LoyalCH. If LoyalCH>=0.7645725 then the observation will be classified as CH. Terminal Node 4 consists of CH classifciations and 11 that required further analysis.

(d) Create a plot of the tree, and interpret the results.

library(rattle)
fancyRpartPlot(tree.oj)

5 terminal nodes classify as CH and 4 termnal nodes classify as MM. It appears that the major impacts for MM classification involve a low CH loyalty mixed with a variable sales price.

(e) Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?

oj.pred = predict(tree.oj, newdata = test, type = "class")
table(oj.pred, test$Purchase)

##        
## oj.pred  CH  MM
##      CH 154  35
##      MM  14  67

(35+14)/270

## [1] 0.1814815

(f) Apply the cv.tree() function to the training set in order to determine the optimal tree size.

library(tree)
oj.tree2<-tree(Purchase~., data=train, method = "class")
cv.oj.tree2<-cv.tree(oj.tree2, FUN=prune.misclass)
summary(oj.tree2)

## 
## Classification tree:
## tree(formula = Purchase ~ ., data = train, method = "class")
## Variables actually used in tree construction:
## [1] "LoyalCH"       "PriceDiff"     "SpecialCH"     "ListPriceDiff"
## [5] "PctDiscMM"    
## Number of terminal nodes:  9 
## Residual mean deviance:  0.7432 = 587.8 / 791 
## Misclassification error rate: 0.1588 = 127 / 800

cv.oj.tree2

## $size
## [1] 9 8 7 4 2 1
## 
## $dev
## [1] 147 147 154 159 168 311
## 
## $k
## [1]       -Inf   0.000000   3.000000   4.333333  10.500000 151.000000
## 
## $method
## [1] "misclass"
## 
## attr(,"class")
## [1] "prune"         "tree.sequence"

(g) Produce a plot with tree size on the x-axis and cross(g) Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.

plot(cv.oj.tree2$size, cv.oj.tree2$dev, type = "b", xlab = "Size", ylab = "CV Error Rate")

(h) Which tree size corresponds to the lowest cross-validated classification error rate?

Tree size of 4 corresponds with the lowest CV error rate.

(i) Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.

oj.tree2.prune<-prune.misclass(oj.tree2, best=7)
summary(oj.tree2.prune)

## 
## Classification tree:
## snip.tree(tree = oj.tree2, nodes = c(4L, 10L))
## Variables actually used in tree construction:
## [1] "LoyalCH"       "PriceDiff"     "ListPriceDiff" "PctDiscMM"    
## Number of terminal nodes:  7 
## Residual mean deviance:  0.7748 = 614.4 / 793 
## Misclassification error rate: 0.1625 = 130 / 800

plot(oj.tree2.prune)
text(oj.tree2.prune)

(j) Compare the training error rates between the pruned and unpruned trees. Which is higher?

Pruned Misclassification error rate: 0.1625 = 130 / 800 Non-Pruned Misclassification error rate: 0.1588 = 127 / 800

Oddly, it seems that in this case the pruned has a higher error rate

(k) Compare the test error rates between the pruned and unpruned trees. Which is higher?

oj.tree2.prune.pred<-predict(oj.tree2.prune, newdata = test, type="class")
table(oj.tree2.prune.pred, test$Purchase)

##                    
## oj.tree2.prune.pred  CH  MM
##                  CH 160  36
##                  MM   8  66

(36+8)/270

#Pruned Test Error
(36+8)/270

## [1] 0.162963

oj.tree2.pred<-predict(oj.tree2, newdata = test, type="class")
table(oj.tree2.pred, test$Purchase)

##              
## oj.tree2.pred  CH  MM
##            CH 160  38
##            MM   8  64

#Non-Pruned test error
(38+8)/270

## [1] 0.1703704

The test error is higher on the non-pruned tree