p=seq(0, 1, 0.01)
gini.ind=2 * p * (1 - p)
classification.error=1 - pmax(p, 1 - p)
entropy=- (p * log(p) + (1 - p) * log(1 - p))
matplot(p, cbind(gini.ind, classification.error, entropy), pch=c(15,17,19) ,ylab = "gini.ind, classification.error, entropy",col = c("darkblue" , "yellow", "red"), type = 'b')
legend('bottom', inset=.01, legend = c('gini.ind', 'classification.error', 'entropy'), col = c("darkblue" , "yellow", "red"), pch=c(15,17,19))
OJ data set which
is part of the ISLR2 package.(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
library(ISLR2)
attach(OJ)
set.seed(5)
train=sample(dim(OJ)[1],800)
train.OJ=OJ[train,]
test.OJ=OJ[-train,]
(b) Fit a tree to the training data, with Purchase as the
response and the other variables as predictors. Use the
summary() function to produce summary statistics about the
tree, and describe the results obtained. What is the training error
rate? How many terminal nodes does the tree have?
oj.tree=tree(Purchase~.,data=train.OJ)
summary(oj.tree)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = train.OJ)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "ListPriceDiff"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7347 = 581.1 / 791
## Misclassification error rate: 0.1662 = 133 / 800
The Summary statistics about this tree are: Misclassification error
rate is 0.1662, Number or terminal nodes is 9 and variables “LoyalCH”
PriceDiff” ListPriceDiff” are actually used in tree construction.
(c) Type in the name of the tree object in order to get a
detailed text output. Pick one of the terminal nodes, and interpret the
information displayed.
oj.tree
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1068.00 CH ( 0.61250 0.38750 )
## 2) LoyalCH < 0.5036 346 412.40 MM ( 0.28324 0.71676 )
## 4) LoyalCH < 0.280875 164 125.50 MM ( 0.12805 0.87195 )
## 8) LoyalCH < 0.0356415 56 10.03 MM ( 0.01786 0.98214 ) *
## 9) LoyalCH > 0.0356415 108 103.50 MM ( 0.18519 0.81481 ) *
## 5) LoyalCH > 0.280875 182 248.00 MM ( 0.42308 0.57692 )
## 10) PriceDiff < 0.05 71 67.60 MM ( 0.18310 0.81690 ) *
## 11) PriceDiff > 0.05 111 151.30 CH ( 0.57658 0.42342 ) *
## 3) LoyalCH > 0.5036 454 362.00 CH ( 0.86344 0.13656 )
## 6) PriceDiff < -0.39 31 40.32 MM ( 0.35484 0.64516 )
## 12) LoyalCH < 0.638841 10 0.00 MM ( 0.00000 1.00000 ) *
## 13) LoyalCH > 0.638841 21 29.06 CH ( 0.52381 0.47619 ) *
## 7) PriceDiff > -0.39 423 273.70 CH ( 0.90071 0.09929 )
## 14) LoyalCH < 0.705326 135 143.00 CH ( 0.77778 0.22222 )
## 28) ListPriceDiff < 0.255 67 89.49 CH ( 0.61194 0.38806 ) *
## 29) ListPriceDiff > 0.255 68 30.43 CH ( 0.94118 0.05882 ) *
## 15) LoyalCH > 0.705326 288 99.77 CH ( 0.95833 0.04167 ) *
I am considering the node 29 where the root is split into nodes using
variable ListPriceDiff greater than .255 with 68 observation in it. The
prediction is made that Citrus Hill is being purchased with the deviance
of 30.43 and it is observed to be correct on 94.11% of the
observation.
(d) Create a plot of the tree, and interpret the
results.
plot(oj.tree)
text(oj.tree,pretty=0)
LoyalCH, which represents the brand loyalty of CH’s clients, is the
splitting variable at the top. As a result, it appears that LoyalCH is
the most crucial variable in this situation, followed by PriceDiff and
ListPriceDiff. Customers are more inclined to purchase Minute Maid if
they are less devoted to Citrus Hill. It takes a greater price gap to
persuade the more devoted Citrus Hill Customers to leave the
company.
(e) Predict the response on the test data, and produce a
confusion matrix comparing the test labels to the predicted test labels.
What is the test error rate?
oj.pred=predict(oj.tree,newdata=test.OJ,type="class")
table(test.OJ$Purchase,oj.pred)
## oj.pred
## CH MM
## CH 148 15
## MM 32 75
mean(oj.pred!=test.OJ$Purchase)
## [1] 0.1740741
The test error rate is 17.41%
(f) Apply the cv.tree() function to the training set in order to
determine the optimal tree size.
cv.oj=cv.tree(oj.tree,FUN=prune.misclass)
cv.oj
## $size
## [1] 9 6 5 3 2 1
##
## $dev
## [1] 155 156 156 172 170 310
##
## $k
## [1] -Inf 0.0 1.0 8.5 9.0 150.0
##
## $method
## [1] "misclass"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
(g) Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.
plot(cv.oj$size,cv.oj$dev,type="b",xlab="Tree Size",ylab="Deviance")
(h) Which tree size corresponds to the lowest cross-validated
classification error rate?
The lowest cross validated classification error rate is a tree with tree
size as 9 nodes.
(i) Produce a pruned tree corresponding to the optimal tree size
obtained using cross-validation. If cross-validation does not lead to
selection of a pruned tree, then create a pruned tree with five terminal
nodes.
oj.pruned=prune.misclass(oj.tree,best=9)
plot(oj.pruned)
text(oj.pruned,pretty=0)
(j) Compare the training error rates between the pruned and unpruned trees. Which is higher?
summary(oj.pruned)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = train.OJ)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "ListPriceDiff"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7347 = 581.1 / 791
## Misclassification error rate: 0.1662 = 133 / 800
Both the trimmed and unpruned trees have the same test error rate.
The optimal size was found to be 9 terminal nodes using Cross validated
classification.
(k) Compare the test error rates between the pruned and unpruned
trees. Which is higher?
ojprune.pred=predict(oj.pruned, newdata=test.OJ, type="class")
mean(ojprune.pred != test.OJ$Purchase)
## [1] 0.1740741
As a result of the cross-validated classification finding that the unpruned tree was the best, we once again obtain the same test error rate.
detach(OJ)