p = seq(0, 1, 0.01)
gini = p * (1 - p) * 2
entropy = -(p * log(p) + (1 - p) * log(1 - p))
class.err = 1 - pmax(p, 1 - p)
matplot(p, cbind(gini, entropy, class.err), col = c("pink", "purple", "blue"))
Lovaas answer ex 3: As p approachs 0 and 1, all three values approach 0. This implies that as the purity of the clasification increases, these values decrease. Thus, the lower the classification error, Gini index, or entropy, the better.
In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.5
attach(Carseats)
set.seed(42)
train = sample(dim(Carseats)[1], dim(Carseats)[1]/2)
carseats.train = Carseats[train, ]
carseats.test = Carseats[-train, ]
library(tree)
## Warning: package 'tree' was built under R version 4.0.5
tree.carseats<-tree(formula = Sales ∼ ., data = carseats.train)
plot(tree.carseats)
text(tree.carseats ,pretty =1)
tree.carseats
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 200 1575.000 7.642
## 2) ShelveLoc: Bad,Medium 154 926.400 6.904
## 4) Price < 124.5 100 486.400 7.679
## 8) Age < 50.5 41 145.600 8.813
## 16) Income < 60.5 18 38.490 7.585 *
## 17) Income > 60.5 23 58.720 9.774 *
## 9) Age > 50.5 59 251.500 6.891
## 18) Price < 97.5 20 57.420 7.976
## 36) Advertising < 1.5 9 22.880 6.914 *
## 37) Advertising > 1.5 11 16.080 8.845 *
## 19) Price > 97.5 39 158.400 6.334
## 38) CompPrice < 122.5 21 58.920 5.260
## 76) ShelveLoc: Bad 8 25.340 4.124 *
## 77) ShelveLoc: Medium 13 16.900 5.959 *
## 39) CompPrice > 122.5 18 46.960 7.588
## 78) Advertising < 8.5 11 15.690 6.623 *
## 79) Advertising > 8.5 7 4.921 9.104 *
## 5) Price > 124.5 54 268.600 5.469
## 10) CompPrice < 130.5 18 47.140 3.714
## 20) Population < 246.5 5 15.340 2.180 *
## 21) Population > 246.5 13 15.520 4.304 *
## 11) CompPrice > 130.5 36 138.300 6.346
## 22) ShelveLoc: Bad 11 28.160 4.781 *
## 23) ShelveLoc: Medium 25 71.390 7.034
## 46) Population < 174.5 11 24.460 6.094 *
## 47) Population > 174.5 14 29.540 7.774 *
## 3) ShelveLoc: Good 46 283.600 10.110
## 6) Price < 113 18 87.710 11.990
## 12) Urban: No 7 24.460 10.340 *
## 13) Urban: Yes 11 32.270 13.030 *
## 7) Price > 113 28 92.190 8.910
## 14) Price < 135 18 53.950 9.664
## 28) CompPrice < 132.5 10 10.790 8.725 *
## 29) CompPrice > 132.5 8 23.310 10.840 *
## 15) Price > 135 10 9.551 7.552 *
summary(tree.carseats)
##
## Regression tree:
## tree(formula = Sales ~ ., data = carseats.train)
## Variables actually used in tree construction:
## [1] "ShelveLoc" "Price" "Age" "Income" "Advertising"
## [6] "CompPrice" "Population" "Urban"
## Number of terminal nodes: 18
## Residual mean deviance: 2.266 = 412.4 / 182
## Distribution of residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.46100 -1.01400 -0.09829 0.00000 1.06900 3.68600
pred.carseats = predict(tree.carseats, carseats.test)
mean((carseats.test$Sales - pred.carseats)^2)
## [1] 5.686401
Lovaas answer ex 9b: the test MSE is 5.69!
cv.carseats=cv.tree(tree.carseats)
plot(cv.carseats$size,cv.carseats$dev,type="b")
Lovaas note: best size is either 6 or 11. I’ll go with 11.
pruned.carseats = prune.tree(tree.carseats, best = 11)
plot(pruned.carseats)
text(pruned.carseats, pretty = 0)
pred.carseatsprune = predict(pruned.carseats, carseats.test)
mean((carseats.test$Sales - pred.carseatsprune)^2)
## [1] 5.571655
Definitely a lot more readable now!!! MSE is just a smidge higher than it was.
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.0.5
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
set.seed(42)
bag.carseats=randomForest(Sales∼., data=carseats.train,
mtry=10, importance =T)
#bag.carseats
predict.bag = predict(bag.carseats ,carseats.test)
mean((predict.bag-carseats.test$Sales)^2)
## [1] 2.385845
importance(bag.carseats)
## %IncMSE IncNodePurity
## CompPrice 30.4193446 199.584719
## Income 10.1511002 103.836558
## Advertising 16.7343552 104.707939
## Population 3.6597098 61.436124
## Price 54.4799140 430.943270
## ShelveLoc 57.7543087 433.821848
## Age 11.5936567 135.079537
## Education 0.4079986 41.991442
## Urban 3.0021409 8.612624
## US -0.5142460 5.976240
Lovaas answer: Test MSE is 2.39. The top three most important variables are ShelveLoc, Price, and (distant third) CompPrice
set.seed(42)
rf.carseats=randomForest(Sales∼., data=carseats.train, mtry=6, importance =TRUE)
predict.rf = predict(rf.carseats,carseats.test)
mean((predict.rf-carseats.test$Sales)^2)
## [1] 2.370883
importance(rf.carseats)
## %IncMSE IncNodePurity
## CompPrice 22.527728 171.876245
## Income 6.314712 113.177670
## Advertising 10.951799 117.205292
## Population 1.894481 76.627787
## Price 41.519995 396.963595
## ShelveLoc 50.395736 408.490698
## Age 8.768635 150.251870
## Education 2.867641 56.051377
## Urban 2.691405 9.887319
## US 1.730384 9.343072
Lovaas answer ex 8e: Test MSE is 2.37, slightly lower than we got with bagging. Top three most important variables are the same as we had with bagging; with CompPrice still in a distant third. Playing around with m will reduce or increase our test MSE. Increasing M reduces test MSE.
This problem involves the OJ data set which is part of the ISLR package.
attach(OJ)
set.seed(42)
train = sample(dim(OJ)[1], 800)
oj.train = OJ[train, ]
oj.test = OJ[-train, ]
tree.oj<-tree(formula = Purchase ∼ ., data = oj.train)
summary(tree.oj)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = oj.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "SalePriceMM" "PriceDiff"
## Number of terminal nodes: 8
## Residual mean deviance: 0.7392 = 585.5 / 792
## Misclassification error rate: 0.1638 = 131 / 800
trainpred.oj = predict(tree.oj, oj.train, type = "class")
table(oj.train$Purchase, trainpred.oj)
## trainpred.oj
## CH MM
## CH 422 70
## MM 61 247
(70+61)/(800)
## [1] 0.16375
Lovaas answer ex 9b: The model splits on LoyalCH, SalePriceMM, and PriceDiff. Training (not test!!) error rate is (70+61)/(800) = 16.375%. The model has 8 terminal nodes.
tree.oj
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1066.00 CH ( 0.61500 0.38500 )
## 2) LoyalCH < 0.48285 285 296.00 MM ( 0.21404 0.78596 )
## 4) LoyalCH < 0.064156 64 0.00 MM ( 0.00000 1.00000 ) *
## 5) LoyalCH > 0.064156 221 260.40 MM ( 0.27602 0.72398 )
## 10) SalePriceMM < 2.04 128 123.50 MM ( 0.18750 0.81250 ) *
## 11) SalePriceMM > 2.04 93 125.00 MM ( 0.39785 0.60215 ) *
## 3) LoyalCH > 0.48285 515 458.10 CH ( 0.83689 0.16311 )
## 6) LoyalCH < 0.753545 230 282.70 CH ( 0.69565 0.30435 )
## 12) PriceDiff < 0.265 149 203.00 CH ( 0.57718 0.42282 )
## 24) PriceDiff < -0.165 32 38.02 MM ( 0.28125 0.71875 ) *
## 25) PriceDiff > -0.165 117 150.30 CH ( 0.65812 0.34188 )
## 50) LoyalCH < 0.703993 105 139.60 CH ( 0.61905 0.38095 ) *
## 51) LoyalCH > 0.703993 12 0.00 CH ( 1.00000 0.00000 ) *
## 13) PriceDiff > 0.265 81 47.66 CH ( 0.91358 0.08642 ) *
## 7) LoyalCH > 0.753545 285 111.70 CH ( 0.95088 0.04912 ) *
Lovaas answer ex 9c: Chosing node 10. This is a split on SalePriceMM. If lower than 2.04 and fitting the higher node criteria, the model classifiers the buyer as “MM,” meaning most people in that group identify as MM. There are 128 data points at this node.
plot(tree.oj)
text(tree.oj ,pretty =1)
Lovaas answer: IDK what to tell you that you don’t already know. If you take a random data point and follow this tree it will tell you, based on LoyalCH, SalePriceMM, and PriceDiff, whether that data pt is more likely to buy CH or Minute Main OJ.
testpred.oj = predict(tree.oj, oj.test, type = "class")
table(oj.test$Purchase, testpred.oj)
## testpred.oj
## CH MM
## CH 125 36
## MM 15 94
(15+36)/(125+36+15+94)
## [1] 0.1888889
Lovaas answer ex 9e: test (not train!) MSE is 18.89%
cv.oj = cv.tree(tree.oj, FUN = prune.tree)
summary(cv.oj)
## Length Class Mode
## size 8 -none- numeric
## dev 8 -none- numeric
## k 8 -none- numeric
## method 1 -none- character
Lovaas answer ex 9f: optimal size is 8.
plot(cv.oj$size, cv.oj$dev, type = "b", xlab = "Tree Size", ylab = "Deviance")
oj.pruned = prune.tree(tree.oj, best = 7)
summary(oj.pruned)
##
## Classification tree:
## snip.tree(tree = tree.oj, nodes = 25L)
## Variables actually used in tree construction:
## [1] "LoyalCH" "SalePriceMM" "PriceDiff"
## Number of terminal nodes: 7
## Residual mean deviance: 0.7518 = 596.2 / 793
## Misclassification error rate: 0.1638 = 131 / 800
summary(tree.oj)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = oj.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "SalePriceMM" "PriceDiff"
## Number of terminal nodes: 8
## Residual mean deviance: 0.7392 = 585.5 / 792
## Misclassification error rate: 0.1638 = 131 / 800
Lovaas answer ex9j: misclassification rate is the same! That’s fun.
pred.unpruned = predict(tree.oj, oj.test, type = "class")
misclass.unpruned = sum(oj.test$Purchase != pred.unpruned)
misclass.unpruned/length(pred.unpruned)
## [1] 0.1888889
pred.pruned = predict(oj.pruned, oj.test, type = "class")
misclass.pruned = sum(oj.test$Purchase != pred.pruned)
misclass.pruned/length(pred.pruned)
## [1] 0.1888889
Lovaas answer ex 9k: Test misclassification rates are also the same! It must be the same exact model…I should have pruned down to 5 nodes just for the sake of this question.