Chapter 08 (page 361): 3, 8, 9
3. Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function ofˆpm1. The x-axis should displayˆpm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy
Hint: In a setting with two classes,ˆpm1 = 1−ˆpm2. You could make this plot by hand, but it will be much easier to make in R.
p=seq(0,1,0.0001)
G=2*p*(1-p)
E=1-pmax(p,1-p)
D=-(p*log(p) + (1-p)*log(1-p))
plot(p,D, col="red",ylab="")
lines(p,E,col='blue')
lines(p,G,col='green')
8. In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.
library(ISLR2)
attach(Carseats)
set.seed(96)
train = sample(dim(Carseats)[1], dim(Carseats)[1]/3)
Carseats.train = Carseats[-train, ]
Carseats.test = Carseats[train, ]
library(tree)
tree.carseats = tree(Sales ~ ., data = Carseats.train)
summary(tree.carseats)
##
## Regression tree:
## tree(formula = Sales ~ ., data = Carseats.train)
## Variables actually used in tree construction:
## [1] "ShelveLoc" "Price" "Age" "Income" "CompPrice"
## [6] "Education" "Advertising"
## Number of terminal nodes: 19
## Residual mean deviance: 2.408 = 597.1 / 248
## Distribution of residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.87000 -1.10000 0.06133 0.00000 1.05900 3.93500
plot(tree.carseats)
text(tree.carseats, pretty = 0)
pred.carseats = predict(tree.carseats, Carseats.test)
mean((Carseats.test$Sales - pred.carseats)^2)
## [1] 4.106416
MSE is roughly 4.106
cv.carseats = cv.tree(tree.carseats, FUN = prune.tree)
par(mfrow = c(1, 2))
plot(cv.carseats$size, cv.carseats$dev, type = "b")
plot(cv.carseats$k, cv.carseats$dev, type = "b")
Best size = 13
pruned.carseats = prune.tree(tree.carseats, best = 13)
par(mfrow = c(1, 1))
plot(pruned.carseats)
text(pruned.carseats, pretty = 0)
pred.pruned = predict(pruned.carseats, Carseats.test)
mean((Carseats.test$Sales - pred.pruned)^2)
## [1] 4.316175
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.3.3
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
1 + 1
## [1] 2
1 + 1
## [1] 2
9. This problem involves the OJ data set which is part of the ISLP package.
a.Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
set.seed(1)
train = sample(1:nrow(OJ), 800)
OJtrain = OJ[train, ]
OJtest = OJ[-train, ]
tree.OJ = tree(Purchase ~ ., data = OJtrain)
summary(tree.OJ)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJtrain)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
plot(tree.OJ)
text(tree.OJ, pretty = 0)
5 variables were actually used in the tree construction. The training
error rate is 0.1588. There are 9 terminal nodes on the tree.
tree.OJ
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1073.00 CH ( 0.60625 0.39375 )
## 2) LoyalCH < 0.5036 365 441.60 MM ( 0.29315 0.70685 )
## 4) LoyalCH < 0.280875 177 140.50 MM ( 0.13559 0.86441 )
## 8) LoyalCH < 0.0356415 59 10.14 MM ( 0.01695 0.98305 ) *
## 9) LoyalCH > 0.0356415 118 116.40 MM ( 0.19492 0.80508 ) *
## 5) LoyalCH > 0.280875 188 258.00 MM ( 0.44149 0.55851 )
## 10) PriceDiff < 0.05 79 84.79 MM ( 0.22785 0.77215 )
## 20) SpecialCH < 0.5 64 51.98 MM ( 0.14062 0.85938 ) *
## 21) SpecialCH > 0.5 15 20.19 CH ( 0.60000 0.40000 ) *
## 11) PriceDiff > 0.05 109 147.00 CH ( 0.59633 0.40367 ) *
## 3) LoyalCH > 0.5036 435 337.90 CH ( 0.86897 0.13103 )
## 6) LoyalCH < 0.764572 174 201.00 CH ( 0.73563 0.26437 )
## 12) ListPriceDiff < 0.235 72 99.81 MM ( 0.50000 0.50000 )
## 24) PctDiscMM < 0.196196 55 73.14 CH ( 0.61818 0.38182 ) *
## 25) PctDiscMM > 0.196196 17 12.32 MM ( 0.11765 0.88235 ) *
## 13) ListPriceDiff > 0.235 102 65.43 CH ( 0.90196 0.09804 ) *
## 7) LoyalCH > 0.764572 261 91.20 CH ( 0.95785 0.04215 ) *
Line 9 (LoyalCH) has 118 observations. It also shows that branch 9 has a value of LoyalCH < 0.0356415. Over 80% of the observations in branch 9 take the value of MM and just under 20% of the observations take the value of CH.
plot(tree.OJ)
text(tree.OJ, pretty = 0)
LoyalCH, SpecialCH, PriceDiff, PctDiscMM, and ListPriceDiff are the most
important variables.
treeOJ.pred = predict(tree.OJ, newdata = OJtest, type = "class")
table(treeOJ.pred, OJtest$Purchase)
##
## treeOJ.pred CH MM
## CH 160 38
## MM 8 64
(38 + 8) / 270
## [1] 0.1703704
0.1703704 is the test error rate.
OJcv = cv.tree(tree.OJ, FUN = prune.misclass)
OJcv
## $size
## [1] 9 8 7 4 2 1
##
## $dev
## [1] 150 150 149 158 172 315
##
## $k
## [1] -Inf 0.000000 3.000000 4.333333 10.500000 151.000000
##
## $method
## [1] "misclass"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
plot(OJcv$size, OJcv$dev, type = "b", xlab = "Tree Size", ylab = "cv classification error rate")
The tree size of 8 corresponds to the lowest cross-validated classification error rate.
prune.OJ=prune.tree(tree.OJ,best=7)
The pruned tree has a higher training error rate (0.1625) than the un-pruned tree (0.1588).
summary(tree.OJ)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJtrain)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
summary(prune.OJ)
##
## Classification tree:
## snip.tree(tree = tree.OJ, nodes = c(10L, 4L))
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "ListPriceDiff" "PctDiscMM"
## Number of terminal nodes: 7
## Residual mean deviance: 0.7748 = 614.4 / 793
## Misclassification error rate: 0.1625 = 130 / 800
The pruned tree has a higher training error rate (0.1625) than the un-pruned tree (0.1588).
treeOJ.pred = predict(tree.OJ, newdata = OJtest, type = "class")
table(treeOJ.pred, OJtest$Purchase)
##
## treeOJ.pred CH MM
## CH 160 38
## MM 8 64
unprunedOJvalerr = (38 + 8) / 270
unprunedOJvalerr
## [1] 0.1703704
pruneOJ.pred = predict(prune.OJ, newdata = OJtest, type = "class")
table(pruneOJ.pred, OJtest$Purchase)
##
## pruneOJ.pred CH MM
## CH 160 36
## MM 8 66
prunedOJvalerr = (36 + 8) / 270
prunedOJvalerr
## [1] 0.162963
The pruned tree has a lower test error rate (0.162963) than the un-pruned tree (0.1703704). The un-pruned tree has a higher test error rate.