Hint: In a setting with two classes, pˆm1 = 1 − pˆm2. You could make this plot by hand, but it will be much easier to make in R.
gini=function(m1){
return(2*(m1*(1-m1)))
}
ent=function(m1){
m2=1-m1
return(-((m1*log(m1))+(m2*log(m2))))
}
classerr=function(m1){
m2=1-m1
return(1-max(m1,m2))
}
err=seq(0,1,by=0.01)
c.err=sapply(err,classerr)
g=sapply(err,gini)
e=sapply(err,ent)
d=data.frame(Gini.Index=g,Cross.Entropy=e)
plot(err,c.err,type='l',col="green",xlab="m1",ylim=c(0,0.8),ylab="value")
matlines(err,d,col=c("orange","blue"))
library(ISLR)
attach(OJ)
View(OJ)
set.seed(1)
train <- sample(1:nrow(OJ), 800)
OJ.training <- OJ[train, ]
OJ.testing <- OJ[-train, ]
tree.oj <- tree(Purchase ~ ., data = OJ.training)
summary(tree.oj)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.training)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
the MSE is 0.1588 with 8 nodes.
tree.oj
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1073.00 CH ( 0.60625 0.39375 )
## 2) LoyalCH < 0.5036 365 441.60 MM ( 0.29315 0.70685 )
## 4) LoyalCH < 0.280875 177 140.50 MM ( 0.13559 0.86441 )
## 8) LoyalCH < 0.0356415 59 10.14 MM ( 0.01695 0.98305 ) *
## 9) LoyalCH > 0.0356415 118 116.40 MM ( 0.19492 0.80508 ) *
## 5) LoyalCH > 0.280875 188 258.00 MM ( 0.44149 0.55851 )
## 10) PriceDiff < 0.05 79 84.79 MM ( 0.22785 0.77215 )
## 20) SpecialCH < 0.5 64 51.98 MM ( 0.14062 0.85938 ) *
## 21) SpecialCH > 0.5 15 20.19 CH ( 0.60000 0.40000 ) *
## 11) PriceDiff > 0.05 109 147.00 CH ( 0.59633 0.40367 ) *
## 3) LoyalCH > 0.5036 435 337.90 CH ( 0.86897 0.13103 )
## 6) LoyalCH < 0.764572 174 201.00 CH ( 0.73563 0.26437 )
## 12) ListPriceDiff < 0.235 72 99.81 MM ( 0.50000 0.50000 )
## 24) PctDiscMM < 0.196196 55 73.14 CH ( 0.61818 0.38182 ) *
## 25) PctDiscMM > 0.196196 17 12.32 MM ( 0.11765 0.88235 ) *
## 13) ListPriceDiff > 0.235 102 65.43 CH ( 0.90196 0.09804 ) *
## 7) LoyalCH > 0.764572 261 91.20 CH ( 0.95785 0.04215 ) *
The split criterion is LoyalCH < 0.035 (Node 8), the number of observations in that branch is 57 with a deviance of 10.07 and an overall prediction for the branch of MM. Less than 2% of the observations in that branch take the value of CH, and the remaining 98% take the value of MM.
plot(tree.oj)
text(tree.oj, pretty = 0)
The top three nodes contain LoyalCH, so this variable is the most important indicator of Purchase.
tree.pred <- predict(tree.oj, OJ.testing, type = "class")
table(tree.pred, OJ.testing$Purchase)
##
## tree.pred CH MM
## CH 160 38
## MM 8 64
1 - (160 + 64) / 270
## [1] 0.1703704
The error rate is 17%
cv.oj <- cv.tree(tree.oj, FUN = prune.misclass)
cv.oj
## $size
## [1] 9 8 7 4 2 1
##
## $dev
## [1] 150 150 149 158 172 315
##
## $k
## [1] -Inf 0.000000 3.000000 4.333333 10.500000 151.000000
##
## $method
## [1] "misclass"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
plot(cv.oj$size, cv.oj$dev, type = "b", xlab = "Tree size", ylab = "Deviance")
The 7-node tree is the smallest tree with the lowest classification error rate.
prune.oj <- prune.misclass(tree.oj, best = 7)
plot(prune.oj)
text(prune.oj, pretty = 0)
summary(tree.oj)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.training)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
summary(prune.oj)
##
## Classification tree:
## snip.tree(tree = tree.oj, nodes = c(4L, 10L))
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "ListPriceDiff" "PctDiscMM"
## Number of terminal nodes: 7
## Residual mean deviance: 0.7748 = 614.4 / 793
## Misclassification error rate: 0.1625 = 130 / 800
The MSE is higher for pruned tree.
For the pruned tree:
prune.pred <- predict(prune.oj, OJ.testing, type = "class")
table(prune.pred, OJ.testing$Purchase)
##
## prune.pred CH MM
## CH 160 36
## MM 8 66
1 - (160 + 66) / 270
## [1] 0.162963
For the unpruned tree:
prune.pred <- predict(tree.oj, OJ.testing, type = "class")
table(prune.pred, OJ.testing$Purchase)
##
## prune.pred CH MM
## CH 160 38
## MM 8 64
1 - (160 + 64) / 270
## [1] 0.1703704
The pruned tree had an error of 17%, this is higher than the unpruned tree.