Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of \(\hat{p}_{m1}\). The x-axis should display \(\hat{p}_{m1}\), ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy.
Hint: In a setting with two classes, \(\hat{p}_{m1}\) = 1 ??? \(\hat{p}_{m2}\). You could make this plot by hand, but it will be much easier to make in R.
p=seq(0,1,0.01)
gini.index= 2 *p*(1-p)
class.err= 1 - pmax(p, 1 - p)
entropy=-(p * log(p) + (1 - p) * log(1 - p))
matplot(p, cbind(gini.index, class.err, entropy), col=c("dodgerblue", "darkorchid", "darkseagreen"), pch=16, main="Classification Trees", xlab="p values", ylab = "Splitting Criterion")
legend("topright",pch=16,col=c("cadetblue", "darkorchid", "darkseagreen"), legend=c("Gini Index","Class Error","Entropy"))
In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.
library(ISLR)
set.seed(1)
data(Carseats)
trainingindex=sample(nrow(Carseats), 0.75*nrow(Carseats))
head(trainingindex)
## [1] 107 149 228 361 80 355
Car.train<-Carseats[trainingindex, ]
Car.test<-Carseats[-trainingindex, ]
library(tree)
reg.tree=tree(Sales~., data=Car.train)
summary(reg.tree)
##
## Regression tree:
## tree(formula = Sales ~ ., data = Car.train)
## Variables actually used in tree construction:
## [1] "ShelveLoc" "Price" "Age" "Advertising" "Population"
## [6] "CompPrice"
## Number of terminal nodes: 15
## Residual mean deviance: 2.849 = 812 / 285
## Distribution of residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.12700 -1.04800 -0.01051 0.00000 1.03200 5.15700
The summary of the tree lists the variables that are used as internal nodes in the tree and provides the number of terminal nodes. The variables that are used include: Shelve Location, Price, Age, Advertising, Populationm Education and Comp Price.
plot(reg.tree)
text(reg.tree, pretty=0)
Since the summary of the tree may be a little hard to decipher or even imagine, the tree above is plotted. A prettier version of this tree can be created with the function rpart.plot(). Unfortunately, I was not able to further complete the analysis and was having a hard time producing the minimum dev.
yhat=predict(reg.tree, newdata = Car.test)
mean((yhat-Car.test$Sales)^2)
## [1] 5.491716
The test MSE that is received from this regression tree is 5.491716.
library(tree)
reg.tree=tree(Sales ~ ., data = Car.train)
cv.car=cv.tree(reg.tree)
plot(cv.car$size, cv.car$dev, type = "b")
In this case, I see that the tree size that was choen by cross-validation is 7. I will continue to prune the tree to obtain the 7-node tree.
prune.car=prune.tree(reg.tree, best=7)
plot(prune.car)
text(prune.car, pretty=0)
yhat=predict(prune.car, newdata = Car.test)
mean((yhat-Car.test$Sales)^2)
## [1] 5.419728
The test MSE of the pruned tree is 5.4197728. This MSE was pretty close to the un-pruned tree.
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
set.seed(1)
bag.car=randomForest(Sales~., data=Car.train, mtry=10, importance=TRUE)
yhat.bag=predict(bag.car, newdata = Car.test)
mean((yhat.bag-Car.test$Sales)^2)
## [1] 2.209486
importance(bag.car)
## %IncMSE IncNodePurity
## CompPrice 30.260612 228.92041
## Income 8.102690 116.70390
## Advertising 21.884122 190.20770
## Population 3.621385 95.10519
## Price 72.265798 731.37342
## ShelveLoc 76.197166 655.36199
## Age 22.506044 218.59352
## Education 2.551942 63.46486
## Urban -3.905544 10.01294
## US 4.222018 13.39688
varImpPlot(bag.car)
The most importance variables are the Shelve Location and Price of the car seats. The test MSE associated with the bagged regression tree is 2.209486, which is remarkably smaller than the MSE obtained from the pruned single tree.
library(randomForest)
set.seed(1)
rf.car=randomForest(Sales~., data=Car.train, mtry=3, importance=TRUE)
yhat.rf=predict(rf.car, newdata=Car.test)
mean((yhat.rf-Car.test$Sales)^2)
## [1] 2.683383
The Random Forest test MSE is 2.683383. This method did not outperform the bagging method.
This problem involves the OJ data set which is part of the ISLR package.
library(ISLR)
data("OJ")
summary(OJ)
## Purchase WeekofPurchase StoreID PriceCH PriceMM
## CH:653 Min. :227.0 Min. :1.00 Min. :1.690 Min. :1.690
## MM:417 1st Qu.:240.0 1st Qu.:2.00 1st Qu.:1.790 1st Qu.:1.990
## Median :257.0 Median :3.00 Median :1.860 Median :2.090
## Mean :254.4 Mean :3.96 Mean :1.867 Mean :2.085
## 3rd Qu.:268.0 3rd Qu.:7.00 3rd Qu.:1.990 3rd Qu.:2.180
## Max. :278.0 Max. :7.00 Max. :2.090 Max. :2.290
## DiscCH DiscMM SpecialCH SpecialMM
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.05186 Mean :0.1234 Mean :0.1477 Mean :0.1617
## 3rd Qu.:0.00000 3rd Qu.:0.2300 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :0.50000 Max. :0.8000 Max. :1.0000 Max. :1.0000
## LoyalCH SalePriceMM SalePriceCH PriceDiff
## Min. :0.000011 Min. :1.190 Min. :1.390 Min. :-0.6700
## 1st Qu.:0.325257 1st Qu.:1.690 1st Qu.:1.750 1st Qu.: 0.0000
## Median :0.600000 Median :2.090 Median :1.860 Median : 0.2300
## Mean :0.565782 Mean :1.962 Mean :1.816 Mean : 0.1465
## 3rd Qu.:0.850873 3rd Qu.:2.130 3rd Qu.:1.890 3rd Qu.: 0.3200
## Max. :0.999947 Max. :2.290 Max. :2.090 Max. : 0.6400
## Store7 PctDiscMM PctDiscCH ListPriceDiff
## No :714 Min. :0.0000 Min. :0.00000 Min. :0.000
## Yes:356 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.140
## Median :0.0000 Median :0.00000 Median :0.240
## Mean :0.0593 Mean :0.02731 Mean :0.218
## 3rd Qu.:0.1127 3rd Qu.:0.00000 3rd Qu.:0.300
## Max. :0.4020 Max. :0.25269 Max. :0.440
## STORE
## Min. :0.000
## 1st Qu.:0.000
## Median :2.000
## Mean :1.631
## 3rd Qu.:3.000
## Max. :4.000
set.seed(1)
OJ.trainingindex=sample(dim(OJ)[1], 800)
OJ.train=OJ[OJ.trainingindex,]
OJ.test=OJ[-OJ.trainingindex,]
OJ.regtree=tree(Purchase~., data=OJ.train)
summary(OJ.regtree)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## Number of terminal nodes: 8
## Residual mean deviance: 0.7305 = 578.6 / 792
## Misclassification error rate: 0.165 = 132 / 800
The regression tree from the OJ dataset uses 4 variables (LoyalCH, PriceDiff, SpecialCH, and ListPriceDiff). There are 8 terminal nodes and a training error rate of 0.165.
OJ.regtree
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1064.00 CH ( 0.61750 0.38250 )
## 2) LoyalCH < 0.508643 350 409.30 MM ( 0.27143 0.72857 )
## 4) LoyalCH < 0.264232 166 122.10 MM ( 0.12048 0.87952 )
## 8) LoyalCH < 0.0356415 57 10.07 MM ( 0.01754 0.98246 ) *
## 9) LoyalCH > 0.0356415 109 100.90 MM ( 0.17431 0.82569 ) *
## 5) LoyalCH > 0.264232 184 248.80 MM ( 0.40761 0.59239 )
## 10) PriceDiff < 0.195 83 91.66 MM ( 0.24096 0.75904 )
## 20) SpecialCH < 0.5 70 60.89 MM ( 0.15714 0.84286 ) *
## 21) SpecialCH > 0.5 13 16.05 CH ( 0.69231 0.30769 ) *
## 11) PriceDiff > 0.195 101 139.20 CH ( 0.54455 0.45545 ) *
## 3) LoyalCH > 0.508643 450 318.10 CH ( 0.88667 0.11333 )
## 6) LoyalCH < 0.764572 172 188.90 CH ( 0.76163 0.23837 )
## 12) ListPriceDiff < 0.235 70 95.61 CH ( 0.57143 0.42857 ) *
## 13) ListPriceDiff > 0.235 102 69.76 CH ( 0.89216 0.10784 ) *
## 7) LoyalCH > 0.764572 278 86.14 CH ( 0.96403 0.03597 ) *
I will be explaining up to node 7 (a terminal node as indicated by the astericks). The first split to get to the node begins at the root of CH. If Loyal CH > 0.508643, then go to test to see if LoyalCH is < or > 0.764572. For node 7, the Orange Juice selection would stop here if LoyalCH is > 0.764572. At node 7, there are 278 observations with a deviance of 86.14. Less than 3.6% of the observations in node 7 take the value of MM and over 96% of the observations in this branch take the value of CH.
plot(OJ.regtree)
text(OJ.regtree, pretty=TRUE)
OJtree.pred=predict(OJ.regtree, newdata = OJ.test, type="class")
table(OJtree.pred, OJ.test$Purchase)
##
## OJtree.pred CH MM
## CH 147 49
## MM 12 62
(147+62)/270
## [1] 0.7740741
Based on the OJ.regtree model that I have created, the test observations were correctly predicted 77.41% of the time. The test error rate for this model is approximately 23%.
cv.OJtree=cv.tree(OJ.regtree, FUN=prune.misclass)
cv.OJtree
## $size
## [1] 8 5 2 1
##
## $dev
## [1] 156 156 156 306
##
## $k
## [1] -Inf 0.000000 4.666667 160.000000
##
## $method
## [1] "misclass"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
plot(cv.OJtree)
plot(cv.OJtree$size, cv.OJtree$dev, type="b", xlab="Tree Size", ylab="Deviance")
It looks like from the previous 2 graphs that the lowest cross-validated classification error rate is the tree of a size 2.
prune.OJtree=prune.misclass(OJ.regtree, best=2)
plot(prune.OJtree)
text(prune.OJtree, pretty=0)
summary(OJ.regtree)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## Number of terminal nodes: 8
## Residual mean deviance: 0.7305 = 578.6 / 792
## Misclassification error rate: 0.165 = 132 / 800
summary(prune.OJtree)
##
## Classification tree:
## snip.tree(tree = OJ.regtree, nodes = c(3L, 2L))
## Variables actually used in tree construction:
## [1] "LoyalCH"
## Number of terminal nodes: 2
## Residual mean deviance: 0.9115 = 727.4 / 798
## Misclassification error rate: 0.1825 = 146 / 800
The training error rate for the pruned tree is a little higher than the tree that had not been pruned. This may be due to the fact that there are less significant decision factors to help distinguish whether or not the purchaser will choose CH or MM.
prune.OJtreepred=predict(prune.OJtree, OJ.test, type="class")
table(prune.OJtreepred, OJ.test$Purchase)
##
## prune.OJtreepred CH MM
## CH 119 30
## MM 40 81
# Not Pruned Tree Prediction
1-((147+62)/270)
## [1] 0.2259259
1-((119+81)/270)
## [1] 0.2592593
For this problem the pruned tree produced a higher test error rate to about 26% from 23%. However, the pruned tree allows for more interpretation considering how little the tree requires for factors to decided what a purchaser will purchase.