Problem 3. Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of ˆpm1. The xaxis should display ˆpm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy. Hint: In a setting with two classes, ˆpm1 = 1− ˆpm2. You could make this plot by hand, but it will be much easier to make in R.

p=seq(0,1,0.01)

gini= 2*p*(1-p)
classerror= 1-pmax(p,1-p)
crossentropy= -(p*log(p)+(1-p)*log(1-p))

plot(NA,NA,xlim=c(0,1),ylim=c(0,1),xlab='pm1',ylab='gini index')

lines(p,gini,type='l', col='purple')
lines(p,classerror,col='darkgreen')
lines(p,crossentropy,col='darkblue')

legend(x='top',legend=c('gini','class error','cross entropy'),
       col=c('purple','darkgreen','darkblue'),lty=1,text.width = .4)
grid()

Problem 9. This problem involves the OJ data set which is part of the ISLR package.

attach(OJ)

(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

dim(OJ)
## [1] 1070   18
set.seed(1)
inTrain=sample(1:nrow(OJ),800)

train=OJ[inTrain,]
test=OJ[-inTrain,]

(b) Fit a tree to the training data, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?

control=rpart.control(minsplit=15, cp=0.1)

tree.oj<-rpart(Purchase~., data=train, method="class")
summary(tree.oj)
## Call:
## rpart(formula = Purchase ~ ., data = train, method = "class")
##   n= 800 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.50476190      0 1.0000000 1.0000000 0.04387030
## 2 0.01904762      1 0.4952381 0.5365079 0.03665241
## 3 0.01587302      4 0.4253968 0.5174603 0.03616662
## 4 0.01269841      6 0.3936508 0.5174603 0.03616662
## 5 0.01000000      8 0.3682540 0.5047619 0.03583208
## 
## Variable importance
##        LoyalCH      PriceDiff    SalePriceMM        PriceMM        StoreID 
##             46              8              8              6              6 
##         DiscMM WeekofPurchase      PctDiscMM        PriceCH  ListPriceDiff 
##              5              5              5              3              2 
##          STORE      SpecialCH      SpecialMM         Store7    SalePriceCH 
##              2              2              1              1              1 
## 
## Node number 1: 800 observations,    complexity param=0.5047619
##   predicted class=CH  expected loss=0.39375  P(node) =1
##     class counts:   485   315
##    probabilities: 0.606 0.394 
##   left son=2 (489 obs) right son=3 (311 obs)
##   Primary splits:
##       LoyalCH   < 0.48285   to the right, improve=133.25810, (0 missing)
##       StoreID   < 3.5       to the right, improve= 44.40685, (0 missing)
##       Store7    splits as  RL, improve= 28.30298, (0 missing)
##       STORE     < 0.5       to the left,  improve= 28.30298, (0 missing)
##       PriceDiff < 0.015     to the right, improve= 22.29786, (0 missing)
##   Surrogate splits:
##       StoreID        < 3.5       to the right, agree=0.646, adj=0.090, (0 split)
##       PriceMM        < 1.89      to the right, agree=0.628, adj=0.042, (0 split)
##       WeekofPurchase < 236.5     to the right, agree=0.625, adj=0.035, (0 split)
##       PriceCH        < 1.72      to the right, agree=0.622, adj=0.029, (0 split)
##       SpecialMM      < 0.5       to the left,  agree=0.619, adj=0.019, (0 split)
## 
## Node number 2: 489 observations,    complexity param=0.01904762
##   predicted class=CH  expected loss=0.1635992  P(node) =0.61125
##     class counts:   409    80
##    probabilities: 0.836 0.164 
##   left son=4 (261 obs) right son=5 (228 obs)
##   Primary splits:
##       LoyalCH       < 0.7645725 to the right, improve=16.514490, (0 missing)
##       PriceDiff     < 0.015     to the right, improve=14.720370, (0 missing)
##       SalePriceMM   < 1.84      to the right, improve=10.965130, (0 missing)
##       ListPriceDiff < 0.255     to the right, improve= 8.289196, (0 missing)
##       SpecialMM     < 0.5       to the left,  improve= 7.093301, (0 missing)
##   Surrogate splits:
##       WeekofPurchase < 257.5     to the right, agree=0.607, adj=0.158, (0 split)
##       SalePriceMM    < 1.84      to the right, agree=0.595, adj=0.132, (0 split)
##       PriceMM        < 2.04      to the right, agree=0.593, adj=0.127, (0 split)
##       PriceDiff      < 0.015     to the right, agree=0.593, adj=0.127, (0 split)
##       PriceCH        < 1.825     to the right, agree=0.589, adj=0.118, (0 split)
## 
## Node number 3: 311 observations,    complexity param=0.01587302
##   predicted class=MM  expected loss=0.244373  P(node) =0.38875
##     class counts:    76   235
##    probabilities: 0.244 0.756 
##   left son=6 (134 obs) right son=7 (177 obs)
##   Primary splits:
##       LoyalCH   < 0.280875  to the right, improve=9.721989, (0 missing)
##       PriceDiff < 0.49      to the right, improve=6.531048, (0 missing)
##       STORE     < 1.5       to the left,  improve=6.506024, (0 missing)
##       StoreID   < 3.5       to the right, improve=6.184411, (0 missing)
##       Store7    splits as  RL, improve=5.771670, (0 missing)
##   Surrogate splits:
##       STORE          < 1.5       to the left,  agree=0.624, adj=0.127, (0 split)
##       StoreID        < 1.5       to the left,  agree=0.617, adj=0.112, (0 split)
##       SalePriceCH    < 1.775     to the left,  agree=0.595, adj=0.060, (0 split)
##       PriceDiff      < 0.325     to the right, agree=0.592, adj=0.052, (0 split)
##       WeekofPurchase < 275.5     to the right, agree=0.588, adj=0.045, (0 split)
## 
## Node number 4: 261 observations
##   predicted class=CH  expected loss=0.04214559  P(node) =0.32625
##     class counts:   250    11
##    probabilities: 0.958 0.042 
## 
## Node number 5: 228 observations,    complexity param=0.01904762
##   predicted class=CH  expected loss=0.3026316  P(node) =0.285
##     class counts:   159    69
##    probabilities: 0.697 0.303 
##   left son=10 (148 obs) right son=11 (80 obs)
##   Primary splits:
##       PriceDiff     < 0.015     to the right, improve=18.285490, (0 missing)
##       ListPriceDiff < 0.235     to the right, improve=16.816390, (0 missing)
##       SalePriceMM   < 1.84      to the right, improve=13.398910, (0 missing)
##       SpecialMM     < 0.5       to the left,  improve= 8.988505, (0 missing)
##       DiscMM        < 0.15      to the left,  improve= 8.823708, (0 missing)
##   Surrogate splits:
##       SalePriceMM   < 1.84      to the right, agree=0.961, adj=0.888, (0 split)
##       PctDiscMM     < 0.1155095 to the left,  agree=0.890, adj=0.688, (0 split)
##       DiscMM        < 0.15      to the left,  agree=0.873, adj=0.638, (0 split)
##       PriceMM       < 2.04      to the right, agree=0.794, adj=0.413, (0 split)
##       ListPriceDiff < 0.18      to the right, agree=0.789, adj=0.400, (0 split)
## 
## Node number 6: 134 observations,    complexity param=0.01587302
##   predicted class=MM  expected loss=0.3880597  P(node) =0.1675
##     class counts:    52    82
##    probabilities: 0.388 0.612 
##   left son=12 (58 obs) right son=13 (76 obs)
##   Primary splits:
##       SalePriceMM < 2.04      to the right, improve=8.030176, (0 missing)
##       PriceDiff   < 0.05      to the right, improve=5.930605, (0 missing)
##       DiscMM      < 0.22      to the left,  improve=4.398151, (0 missing)
##       PctDiscMM   < 0.0729725 to the left,  improve=4.080526, (0 missing)
##       SpecialCH   < 0.5       to the right, improve=4.027225, (0 missing)
##   Surrogate splits:
##       PriceDiff      < 0.135     to the right, agree=0.896, adj=0.759, (0 split)
##       PriceMM        < 2.04      to the right, agree=0.799, adj=0.534, (0 split)
##       DiscMM         < 0.08      to the left,  agree=0.784, adj=0.500, (0 split)
##       PctDiscMM      < 0.038887  to the left,  agree=0.784, adj=0.500, (0 split)
##       WeekofPurchase < 244       to the right, agree=0.739, adj=0.397, (0 split)
## 
## Node number 7: 177 observations
##   predicted class=MM  expected loss=0.1355932  P(node) =0.22125
##     class counts:    24   153
##    probabilities: 0.136 0.864 
## 
## Node number 10: 148 observations
##   predicted class=CH  expected loss=0.1554054  P(node) =0.185
##     class counts:   125    23
##    probabilities: 0.845 0.155 
## 
## Node number 11: 80 observations,    complexity param=0.01904762
##   predicted class=MM  expected loss=0.425  P(node) =0.1
##     class counts:    34    46
##    probabilities: 0.425 0.575 
##   left son=22 (38 obs) right son=23 (42 obs)
##   Primary splits:
##       StoreID        < 3.5       to the right, improve=6.177694, (0 missing)
##       ListPriceDiff  < 0.235     to the right, improve=4.729091, (0 missing)
##       WeekofPurchase < 240.5     to the left,  improve=4.130644, (0 missing)
##       DiscMM         < 0.47      to the left,  improve=3.141026, (0 missing)
##       PctDiscMM      < 0.227263  to the left,  improve=3.141026, (0 missing)
##   Surrogate splits:
##       Store7         splits as  RL, agree=0.850, adj=0.684, (0 split)
##       STORE          < 0.5       to the left,  agree=0.850, adj=0.684, (0 split)
##       WeekofPurchase < 238       to the left,  agree=0.775, adj=0.526, (0 split)
##       SpecialMM      < 0.5       to the left,  agree=0.725, adj=0.421, (0 split)
##       PriceCH        < 1.825     to the left,  agree=0.688, adj=0.342, (0 split)
## 
## Node number 12: 58 observations
##   predicted class=CH  expected loss=0.4137931  P(node) =0.0725
##     class counts:    34    24
##    probabilities: 0.586 0.414 
## 
## Node number 13: 76 observations,    complexity param=0.01269841
##   predicted class=MM  expected loss=0.2368421  P(node) =0.095
##     class counts:    18    58
##    probabilities: 0.237 0.763 
##   left son=26 (12 obs) right son=27 (64 obs)
##   Primary splits:
##       SpecialCH      < 0.5       to the right, improve=5.265351, (0 missing)
##       PriceDiff      < -0.24     to the right, improve=1.925297, (0 missing)
##       WeekofPurchase < 234       to the left,  improve=1.903618, (0 missing)
##       DiscMM         < 0.47      to the left,  improve=1.598684, (0 missing)
##       PctDiscMM      < 0.227263  to the left,  improve=1.598684, (0 missing)
##   Surrogate splits:
##       DiscCH      < 0.25      to the right, agree=0.868, adj=0.167, (0 split)
##       SalePriceCH < 1.49      to the left,  agree=0.868, adj=0.167, (0 split)
##       PctDiscCH   < 0.1366045 to the right, agree=0.868, adj=0.167, (0 split)
## 
## Node number 22: 38 observations,    complexity param=0.01269841
##   predicted class=CH  expected loss=0.3684211  P(node) =0.0475
##     class counts:    24    14
##    probabilities: 0.632 0.368 
##   left son=44 (30 obs) right son=45 (8 obs)
##   Primary splits:
##       WeekofPurchase < 272.5     to the left,  improve=2.950877, (0 missing)
##       PriceCH        < 1.89      to the left,  improve=1.455639, (0 missing)
##       PriceMM        < 2.04      to the left,  improve=1.455639, (0 missing)
##       LoyalCH        < 0.5039495 to the right, improve=1.455639, (0 missing)
##       SalePriceCH    < 1.89      to the left,  improve=1.455639, (0 missing)
##   Surrogate splits:
##       PriceCH     < 1.89      to the left,  agree=0.947, adj=0.75, (0 split)
##       PriceMM     < 2.04      to the left,  agree=0.947, adj=0.75, (0 split)
##       SalePriceCH < 1.89      to the left,  agree=0.947, adj=0.75, (0 split)
##       PriceDiff   < -0.25     to the right, agree=0.947, adj=0.75, (0 split)
##       DiscMM      < 0.47      to the left,  agree=0.895, adj=0.50, (0 split)
## 
## Node number 23: 42 observations
##   predicted class=MM  expected loss=0.2380952  P(node) =0.0525
##     class counts:    10    32
##    probabilities: 0.238 0.762 
## 
## Node number 26: 12 observations
##   predicted class=CH  expected loss=0.3333333  P(node) =0.015
##     class counts:     8     4
##    probabilities: 0.667 0.333 
## 
## Node number 27: 64 observations
##   predicted class=MM  expected loss=0.15625  P(node) =0.08
##     class counts:    10    54
##    probabilities: 0.156 0.844 
## 
## Node number 44: 30 observations
##   predicted class=CH  expected loss=0.2666667  P(node) =0.0375
##     class counts:    22     8
##    probabilities: 0.733 0.267 
## 
## Node number 45: 8 observations
##   predicted class=MM  expected loss=0.25  P(node) =0.01
##     class counts:     2     6
##    probabilities: 0.250 0.750

There are 9 terminal nodes. This model leverages LoyalCH, PriceDiff, SalePriceMM, SpecialCH, StoreID, and WeekofPurchase. The cross validated error rate is 0.4666667.

(c) Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed.

tree.oj
## n= 800 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 800 315 CH (0.60625000 0.39375000)  
##    2) LoyalCH>=0.48285 489  80 CH (0.83640082 0.16359918)  
##      4) LoyalCH>=0.7645725 261  11 CH (0.95785441 0.04214559) *
##      5) LoyalCH< 0.7645725 228  69 CH (0.69736842 0.30263158)  
##       10) PriceDiff>=0.015 148  23 CH (0.84459459 0.15540541) *
##       11) PriceDiff< 0.015 80  34 MM (0.42500000 0.57500000)  
##         22) StoreID>=3.5 38  14 CH (0.63157895 0.36842105)  
##           44) WeekofPurchase< 272.5 30   8 CH (0.73333333 0.26666667) *
##           45) WeekofPurchase>=272.5 8   2 MM (0.25000000 0.75000000) *
##         23) StoreID< 3.5 42  10 MM (0.23809524 0.76190476) *
##    3) LoyalCH< 0.48285 311  76 MM (0.24437299 0.75562701)  
##      6) LoyalCH>=0.280875 134  52 MM (0.38805970 0.61194030)  
##       12) SalePriceMM>=2.04 58  24 CH (0.58620690 0.41379310) *
##       13) SalePriceMM< 2.04 76  18 MM (0.23684211 0.76315789)  
##         26) SpecialCH>=0.5 12   4 CH (0.66666667 0.33333333) *
##         27) SpecialCH< 0.5 64  10 MM (0.15625000 0.84375000) *
##      7) LoyalCH< 0.280875 177  24 MM (0.13559322 0.86440678) *

Terminal node 4 begins at root variable LoyalCH. If LoyalCH>=0.7645725 then the observation will be classified as CH. Terminal Node 4 consists of CH classifciations and 11 that required further analysis.

(d) Create a plot of the tree, and interpret the results.

library(rattle)
fancyRpartPlot(tree.oj)

5 terminal nodes classify as CH and 4 termnal nodes classify as MM. It appears that the major impacts for MM classification involve a low CH loyalty mixed with a variable sales price.

(e) Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?

oj.pred = predict(tree.oj, newdata = test, type = "class")
table(oj.pred, test$Purchase)
##        
## oj.pred  CH  MM
##      CH 154  35
##      MM  14  67
(35+14)/270
## [1] 0.1814815

(f) Apply the cv.tree() function to the training set in order to determine the optimal tree size.

library(tree)
oj.tree2<-tree(Purchase~., data=train, method = "class")
cv.oj.tree2<-cv.tree(oj.tree2, FUN=prune.misclass)
summary(oj.tree2)
## 
## Classification tree:
## tree(formula = Purchase ~ ., data = train, method = "class")
## Variables actually used in tree construction:
## [1] "LoyalCH"       "PriceDiff"     "SpecialCH"     "ListPriceDiff"
## [5] "PctDiscMM"    
## Number of terminal nodes:  9 
## Residual mean deviance:  0.7432 = 587.8 / 791 
## Misclassification error rate: 0.1588 = 127 / 800
cv.oj.tree2
## $size
## [1] 9 8 7 4 2 1
## 
## $dev
## [1] 147 147 154 159 168 311
## 
## $k
## [1]       -Inf   0.000000   3.000000   4.333333  10.500000 151.000000
## 
## $method
## [1] "misclass"
## 
## attr(,"class")
## [1] "prune"         "tree.sequence"

(g) Produce a plot with tree size on the x-axis and cross(g) Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.

plot(cv.oj.tree2$size, cv.oj.tree2$dev, type = "b", xlab = "Size", ylab = "CV Error Rate")

(h) Which tree size corresponds to the lowest cross-validated classification error rate?

Tree size of 4 corresponds with the lowest CV error rate.

(i) Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.

oj.tree2.prune<-prune.misclass(oj.tree2, best=7)
summary(oj.tree2.prune)
## 
## Classification tree:
## snip.tree(tree = oj.tree2, nodes = c(4L, 10L))
## Variables actually used in tree construction:
## [1] "LoyalCH"       "PriceDiff"     "ListPriceDiff" "PctDiscMM"    
## Number of terminal nodes:  7 
## Residual mean deviance:  0.7748 = 614.4 / 793 
## Misclassification error rate: 0.1625 = 130 / 800
plot(oj.tree2.prune)
text(oj.tree2.prune)

(j) Compare the training error rates between the pruned and unpruned trees. Which is higher?

Pruned Misclassification error rate: 0.1625 = 130 / 800 Non-Pruned Misclassification error rate: 0.1588 = 127 / 800

Oddly, it seems that in this case the pruned has a higher error rate

(k) Compare the test error rates between the pruned and unpruned trees. Which is higher?

oj.tree2.prune.pred<-predict(oj.tree2.prune, newdata = test, type="class")
table(oj.tree2.prune.pred, test$Purchase)
##                    
## oj.tree2.prune.pred  CH  MM
##                  CH 160  36
##                  MM   8  66

(36+8)/270

#Pruned Test Error
(36+8)/270
## [1] 0.162963
oj.tree2.pred<-predict(oj.tree2, newdata = test, type="class")
table(oj.tree2.pred, test$Purchase)
##              
## oj.tree2.pred  CH  MM
##            CH 160  38
##            MM   8  64
#Non-Pruned test error
(38+8)/270
## [1] 0.1703704

The test error is higher on the non-pruned tree