Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of ˆpm1. The xaxis should display ˆpm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy. Hint: In a setting with two classes, pˆm1 = 1 − pˆm2. You could make this plot by hand, but it will be much easier to make in R.
prob<-c(0:100)/100
Gini_index<-prob*(1-prob)+prob*(1-prob)
Classify_error<-1-pmax(prob,1-prob)
Cross_entropy<-(prob*log(prob)+(1-prob)*log(1-prob))
matplot(prob,cbind(Gini_index,Classify_error,Cross_entropy),
col=c("purple","maroon","darkgreen"),
xlab="Probability", ylab="Gini_index,Classify_error,Cross_entropy",
main="Different Indices vs probability",
cex=1,pch=c(20,21,22))
legend("bottom", cex=0.75,inset=0.05, legend=c("Gini", "C_error", "Entropy"),
pch=c(20,21,22), col=c("purple","maroon","darkgreen"), horiz=TRUE)
In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.
library(ISLR)
library(tree)
## Warning: package 'tree' was built under R version 4.1.3
## Registered S3 method overwritten by 'tree':
## method from
## print.tree cli
library(MASS)
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.1.3
## randomForest 4.7-1
## Type rfNews() to see new features/changes/bug fixes.
set.seed(1)
train <- sample(1 : nrow(Carseats), nrow(Carseats) / 2)
tree.carseats <- tree(Sales ~ ., data = Carseats, subset = train)
plot(tree.carseats)
text(tree.carseats, pretty = 0)
tree.pred <- predict(tree.carseats, Carseats[-train, ])
mean((Carseats[-train, 'Sales'] - tree.pred) ^ 2)
## [1] 4.922039
set.seed(21)
cv.carseats=cv.tree(tree.carseats)
plot(cv.carseats, type = "b")
# Best size = 8
abline(h = min(cv.carseats$dev) + 0.2 * sd(cv.carseats$dev), col = "red", lty = 2)
points(cv.carseats$size[which.min(cv.carseats$dev)], min(cv.carseats$dev),
col = "#BC3C29FF", cex = 2, pch = 20)
prune.carseats = prune.tree(tree.carseats, best = 8)
plot(prune.carseats)
text(prune.carseats, pretty = 0)
tree.pred=predict(prune.carseats,Carseats[-train,])
mean((tree.pred-Carseats[-train,'Sales'])^2)
## [1] 5.113254
In this case, pruning the tree increase the test MSE.
library(randomForest)
set.seed(1)
bag.carseats <- randomForest(Sales ~ ., data = Carseats, subset = train, mtry = ncol(Carseats) - 1, importance = TRUE)
yhat.bag <- predict(bag.carseats, newdata = Carseats[-train, ])
mean((yhat.bag - Carseats[-train, 'Sales']) ^ 2)
## [1] 2.605253
importance(bag.carseats)
## %IncMSE IncNodePurity
## CompPrice 24.8888481 170.182937
## Income 4.7121131 91.264880
## Advertising 12.7692401 97.164338
## Population -1.8074075 58.244596
## Price 56.3326252 502.903407
## ShelveLoc 48.8886689 380.032715
## Age 17.7275460 157.846774
## Education 0.5962186 44.598731
## Urban 0.1728373 9.822082
## US 4.2172102 18.073863
the most 2 important predictors are: Price and ShelveLoc. The test MSE is 2.60.
rf.carseats <- randomForest(Sales ~ ., data = Carseats, subset = train, importance = TRUE)
yhat.rf <- predict(rf.carseats, Carseats[-train, ])
mean((yhat.rf - Carseats[-train, 'Sales']) ^ 2)
## [1] 3.054306
importance(rf.carseats)
## %IncMSE IncNodePurity
## CompPrice 12.9540442 157.53376
## Income 2.1683293 129.18612
## Advertising 8.7289900 111.38250
## Population -2.5290493 102.78681
## Price 33.9482500 393.61313
## ShelveLoc 34.1358807 289.28756
## Age 12.0804387 172.03776
## Education 0.2213600 72.02479
## Urban 0.9793293 14.73763
## US 4.1072742 33.91622
he default m for a regression problem is \(p/3\), where p is number of predictors. In this setting, the test MSE is 3.321, higher than bagging method, lower than complete and pruning regression tree.
The most 2 important predictors are the same with bagging method above: Price and ShelveLoc.
This problem involves the OJ data set which is part of the ISLR package. (a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
library(ISLR)
library(tree)
set.seed(1)
train <- sample(1 : nrow(OJ), 800)
oj.train <- OJ[train, ]
oj.test <- OJ[-train, ]
oj.tree <- tree(Purchase ~ ., OJ, subset = train)
summary(oj.tree)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ, subset = train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
The tree contains 5 variables: LoyalCH, DiscMM, PriceDiff,ListPriceDiff,PctDiscMM. The training error rate is 0.1588. The tree contains 9 terminal nodes.
oj.tree
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1073.00 CH ( 0.60625 0.39375 )
## 2) LoyalCH < 0.5036 365 441.60 MM ( 0.29315 0.70685 )
## 4) LoyalCH < 0.280875 177 140.50 MM ( 0.13559 0.86441 )
## 8) LoyalCH < 0.0356415 59 10.14 MM ( 0.01695 0.98305 ) *
## 9) LoyalCH > 0.0356415 118 116.40 MM ( 0.19492 0.80508 ) *
## 5) LoyalCH > 0.280875 188 258.00 MM ( 0.44149 0.55851 )
## 10) PriceDiff < 0.05 79 84.79 MM ( 0.22785 0.77215 )
## 20) SpecialCH < 0.5 64 51.98 MM ( 0.14062 0.85938 ) *
## 21) SpecialCH > 0.5 15 20.19 CH ( 0.60000 0.40000 ) *
## 11) PriceDiff > 0.05 109 147.00 CH ( 0.59633 0.40367 ) *
## 3) LoyalCH > 0.5036 435 337.90 CH ( 0.86897 0.13103 )
## 6) LoyalCH < 0.764572 174 201.00 CH ( 0.73563 0.26437 )
## 12) ListPriceDiff < 0.235 72 99.81 MM ( 0.50000 0.50000 )
## 24) PctDiscMM < 0.196197 55 73.14 CH ( 0.61818 0.38182 ) *
## 25) PctDiscMM > 0.196197 17 12.32 MM ( 0.11765 0.88235 ) *
## 13) ListPriceDiff > 0.235 102 65.43 CH ( 0.90196 0.09804 ) *
## 7) LoyalCH > 0.764572 261 91.20 CH ( 0.95785 0.04215 ) *
In node 7 There are 261 samples in the subtree below this node. The deviance for all samples below this node is 91.20. If LoyalCH>0.50 and LoyalCH>0.76, the prediction of Purchase by this node is CH because about 95.8% of samples take Purchase as CH.
plot(oj.tree)
text(oj.tree, pretty = 0)
The variable LoyalCH is the most decisive. If LoyalCH<0.50, the predictions are all MM (I do not know why the nodes are further divided. It may be due to the different prediction probabilities). And if LoyalCH>0.76, the prediction is CH. If LoyalCH<0.76, there are subtrees predicted by PriceDiff and DiscMM.
oj.pred <- predict(oj.tree, oj.test, type = 'class')
table(oj.pred, oj.test$Purchase)
##
## oj.pred CH MM
## CH 160 38
## MM 8 64
set.seed(1)
oj.cv <- cv.tree(oj.tree, FUN = prune.misclass)
oj.cv
## $size
## [1] 9 8 7 4 2 1
##
## $dev
## [1] 145 145 146 146 167 315
##
## $k
## [1] -Inf 0.000000 3.000000 4.333333 10.500000 151.000000
##
## $method
## [1] "misclass"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
plot(oj.cv$size, oj.cv$dev, xlab="Size of the Tree", ylab="Cross validated error rate", type = "b")
points(2, oj.cv$dev[2], col = "red", cex = 2, pch = 20)
The tree size 9 corresponds to the lowest cross-validated classification error rate.
s <- oj.cv$size[which.min(oj.cv$dev)]
s
## [1] 9
prune.OJ = prune.tree(oj.tree, best = 5)
plot(prune.OJ)
text(prune.OJ, pretty = 0)
summary("oj.pruned")
## Length Class Mode
## 1 character character
The training error for pruned tree becomes higher.
oj.pred <- predict(oj.tree, oj.test, type = 'class')
table(oj.pred, oj.test$Purchase)
##
## oj.pred CH MM
## CH 160 38
## MM 8 64
0.1740741
oj.pruned.pred <- predict(oj.tree, oj.test, type = 'class')
table(oj.pruned.pred, oj.test$Purchase)
##
## oj.pruned.pred CH MM
## CH 160 38
## MM 8 64
0.1851852
In this case the error rate for pruned tree is also higher