Consider the Gini index, classification error, and cross-entropy in a simple classificationsetting with two classes. Create a single plot that displays each of these quantities as a function of {m1}. The x-axis should display {m1}, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy.
phat = seq(0, 1, 0.001)
gindex = 2 * phat * (1 - phat)
class_error = 1 - pmax(phat, 1 - phat)
cross_entropy = - (phat * log(phat) + (1 - phat) * log(1 - phat))
matplot(phat, cbind(gindex, class_error, cross_entropy), ylab = "gindex, class_error, cross_entropy", col = c("red", "pink", "black"))
In the lab, a classification tree was aplied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression tress and related approchaes, treating the response as a quantitative variable.
set.seed(2)
train=sample(1:nrow(Carseats), 200)
Sales.test=Carseats[train,]
Sales.train=Carseats[-train]
tree.carseats = tree(Sales ~ ., data = Sales.train)
plot(tree.carseats)
text(tree.carseats,pretty=0)
treecarseat.pred = predict(tree.carseats, newdata = Sales.test)
mean((treecarseat.pred - Sales.test$Sales)^2)
## [1] 3.299254
cv.carseats=cv.tree(tree.carseats,FUN=prune.tree)
cv.carseats
## $size
## [1] 13 12 11 10 9 8 7 6 5 4 3 2 1
##
## $dev
## [1] 1730.991 1764.369 1810.329 1819.288 1856.248 1886.286 1880.487 1819.406
## [9] 1895.456 2102.823 2139.673 2409.865 3191.715
##
## $k
## [1] -Inf 38.65535 40.44960 51.05171 57.62047 70.52963 76.57441
## [8] 84.75430 105.50729 145.33849 162.67977 334.36974 797.19286
##
## $method
## [1] "deviance"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
par(mfrow=c(1,2))
plot(cv.carseats$size,cv.carseats$dev,type="b")
plot(cv.carseats$k,cv.carseats$dev,type="b")
prune.carseats = prune.tree(tree.carseats, best = 9)
par(mfrow = c(1, 1))
plot(prune.carseats)
text(prune.carseats, pretty = 0)
prune.predict = predict(prune.carseats, Sales.test)
mean((Sales.test$Sales - prune.predict)^2)
## [1] 3.807178
The test error rate increased!
bag.carseats = randomForest(Sales ~ ., data = Sales.train, mtry = 10, ntree = 500,
importance = TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
## range
bag.predict = predict(bag.carseats, Sales.test)
mean((Sales.test$Sales - bag.predict)^2)
## [1] 0.5188452
importance(bag.carseats)
## %IncMSE IncNodePurity
## CompPrice 55.9539107 487.75206
## Advertising 34.3753400 326.10737
## Population 0.9788418 201.89978
## Price 90.8152469 1051.20729
## ShelveLoc 93.7762179 993.04044
## Urban -3.9210358 25.91243
The MSE lowers with bagging. The most important are Price, CompPrice and ShelveLoc.
randf.carseats = randomForest(Sales ~ ., data = Sales.train, mtry = 5, ntree = 500,
importance = TRUE)
randf.predict = predict(randf.carseats, Sales.test)
mean((Sales.test$Sales - randf.predict)^2)
## [1] 0.5326838
importance(randf.carseats)
## %IncMSE IncNodePurity
## CompPrice 50.5570433 481.91090
## Advertising 30.2499639 337.48141
## Population 0.9220121 213.59218
## Price 88.7035702 1036.46133
## ShelveLoc 94.1513389 990.39349
## Urban -1.9322968 26.73983
Random Forest increases the MSE. We see that the important predictors are CompPrice, Price and ShelveLoc.
This Problem involves the OJ Data set which is part of the ISLR package.
set.seed(1)
train = sample(1:nrow(OJ), 800)
OJ.train = OJ[train, ]
OJ.test = OJ[-train, ]
OJ.tree = tree(Purchase ~., data=OJ.train)
summary(OJ.tree)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
The tree uses three variables LoyalCH, PriceDiff, SpecialCH and has 9 terminal nodes.
OJ.tree
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1073.00 CH ( 0.60625 0.39375 )
## 2) LoyalCH < 0.5036 365 441.60 MM ( 0.29315 0.70685 )
## 4) LoyalCH < 0.280875 177 140.50 MM ( 0.13559 0.86441 )
## 8) LoyalCH < 0.0356415 59 10.14 MM ( 0.01695 0.98305 ) *
## 9) LoyalCH > 0.0356415 118 116.40 MM ( 0.19492 0.80508 ) *
## 5) LoyalCH > 0.280875 188 258.00 MM ( 0.44149 0.55851 )
## 10) PriceDiff < 0.05 79 84.79 MM ( 0.22785 0.77215 )
## 20) SpecialCH < 0.5 64 51.98 MM ( 0.14062 0.85938 ) *
## 21) SpecialCH > 0.5 15 20.19 CH ( 0.60000 0.40000 ) *
## 11) PriceDiff > 0.05 109 147.00 CH ( 0.59633 0.40367 ) *
## 3) LoyalCH > 0.5036 435 337.90 CH ( 0.86897 0.13103 )
## 6) LoyalCH < 0.764572 174 201.00 CH ( 0.73563 0.26437 )
## 12) ListPriceDiff < 0.235 72 99.81 MM ( 0.50000 0.50000 )
## 24) PctDiscMM < 0.196197 55 73.14 CH ( 0.61818 0.38182 ) *
## 25) PctDiscMM > 0.196197 17 12.32 MM ( 0.11765 0.88235 ) *
## 13) ListPriceDiff > 0.235 102 65.43 CH ( 0.90196 0.09804 ) *
## 7) LoyalCH > 0.764572 261 91.20 CH ( 0.95785 0.04215 ) *
Node 12’s splitting variable is at LisePriceDiff with the node being 0.235. Below this node there are 72 points. Since this node does now contain a * that means it is now a terminal node. The prediction for this node is MM, but it’s sales belong to half percentage CH and half percentage MM.
plot(OJ.tree)
text(OJ.tree, pretty=0)
We can see that LoyalCH is the most important variable. If LoyalCH <0.280875, then it’ll predict MM. If its <0.764572, then it’ll predict CH.
OJ.predict=predict(OJ.tree, OJ.test, type="class")
table(OJ.test$Purchase, OJ.predict)
## OJ.predict
## CH MM
## CH 160 8
## MM 38 64
(38+8)/270
## [1] 0.1703704
We get that the test error rate is 0.1703704.
OJcv= cv.tree(OJ.tree, FUN=prune.misclass)
OJcv
## $size
## [1] 9 8 7 4 2 1
##
## $dev
## [1] 150 150 149 158 172 315
##
## $k
## [1] -Inf 0.000000 3.000000 4.333333 10.500000 151.000000
##
## $method
## [1] "misclass"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
plot(OJcv$size, OJcv$dev, type = "b", xlab = "Tree Size", ylab = "Cross-validated classification error rate")
h) Which tree size corresponds to the lowest cross-validated classification error rate?
The tree size 8 looks to have the lowest cross-validated classification error rate.
OJ.prune=prune.tree(OJ.tree,best=8)
summary(OJ.prune)
##
## Classification tree:
## snip.tree(tree = OJ.tree, nodes = 10L)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "ListPriceDiff" "PctDiscMM"
## Number of terminal nodes: 8
## Residual mean deviance: 0.7582 = 600.5 / 792
## Misclassification error rate: 0.1625 = 130 / 800
summary(OJ.tree)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
The pruned tree has a higher training error rate.
OJtree.predict=predict(OJ.tree, newdata=OJ.test, type="class")
table(OJtree.predict, OJ.test$Purchase)
##
## OJtree.predict CH MM
## CH 160 38
## MM 8 64
(38+8)/270
## [1] 0.1703704
OJtreepruned.predict=predict(OJ.prune, newdata=OJ.test, type="class")
table(OJtreepruned.predict, OJ.test$Purchase)
##
## OJtreepruned.predict CH MM
## CH 160 36
## MM 8 66
(36+8)/270
## [1] 0.162963
The pruned tree has a lower test rate.