Chapter 08 (page 332): 3, 8, 9

Chapter 8 ex 3

p = seq(0, 1, 0.01)
gini = p * (1 - p) * 2
entropy = -(p * log(p) + (1 - p) * log(1 - p))
class.err = 1 - pmax(p, 1 - p)
matplot(p, cbind(gini, entropy, class.err), col = c("pink", "purple", "blue"))

Lovaas answer ex 3: As p approachs 0 and 1, all three values approach 0. This implies that as the purity of the clasification increases, these values decrease. Thus, the lower the classification error, Gini index, or entropy, the better.

Chapter 8 ex 8

In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.

library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.5
attach(Carseats)
set.seed(42)

train = sample(dim(Carseats)[1], dim(Carseats)[1]/2)
carseats.train = Carseats[train, ]
carseats.test = Carseats[-train, ]
library(tree)
## Warning: package 'tree' was built under R version 4.0.5
tree.carseats<-tree(formula = Sales ∼ ., data = carseats.train)
plot(tree.carseats)
text(tree.carseats ,pretty =1)

tree.carseats
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##  1) root 200 1575.000  7.642  
##    2) ShelveLoc: Bad,Medium 154  926.400  6.904  
##      4) Price < 124.5 100  486.400  7.679  
##        8) Age < 50.5 41  145.600  8.813  
##         16) Income < 60.5 18   38.490  7.585 *
##         17) Income > 60.5 23   58.720  9.774 *
##        9) Age > 50.5 59  251.500  6.891  
##         18) Price < 97.5 20   57.420  7.976  
##           36) Advertising < 1.5 9   22.880  6.914 *
##           37) Advertising > 1.5 11   16.080  8.845 *
##         19) Price > 97.5 39  158.400  6.334  
##           38) CompPrice < 122.5 21   58.920  5.260  
##             76) ShelveLoc: Bad 8   25.340  4.124 *
##             77) ShelveLoc: Medium 13   16.900  5.959 *
##           39) CompPrice > 122.5 18   46.960  7.588  
##             78) Advertising < 8.5 11   15.690  6.623 *
##             79) Advertising > 8.5 7    4.921  9.104 *
##      5) Price > 124.5 54  268.600  5.469  
##       10) CompPrice < 130.5 18   47.140  3.714  
##         20) Population < 246.5 5   15.340  2.180 *
##         21) Population > 246.5 13   15.520  4.304 *
##       11) CompPrice > 130.5 36  138.300  6.346  
##         22) ShelveLoc: Bad 11   28.160  4.781 *
##         23) ShelveLoc: Medium 25   71.390  7.034  
##           46) Population < 174.5 11   24.460  6.094 *
##           47) Population > 174.5 14   29.540  7.774 *
##    3) ShelveLoc: Good 46  283.600 10.110  
##      6) Price < 113 18   87.710 11.990  
##       12) Urban: No 7   24.460 10.340 *
##       13) Urban: Yes 11   32.270 13.030 *
##      7) Price > 113 28   92.190  8.910  
##       14) Price < 135 18   53.950  9.664  
##         28) CompPrice < 132.5 10   10.790  8.725 *
##         29) CompPrice > 132.5 8   23.310 10.840 *
##       15) Price > 135 10    9.551  7.552 *
summary(tree.carseats)
## 
## Regression tree:
## tree(formula = Sales ~ ., data = carseats.train)
## Variables actually used in tree construction:
## [1] "ShelveLoc"   "Price"       "Age"         "Income"      "Advertising"
## [6] "CompPrice"   "Population"  "Urban"      
## Number of terminal nodes:  18 
## Residual mean deviance:  2.266 = 412.4 / 182 
## Distribution of residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -3.46100 -1.01400 -0.09829  0.00000  1.06900  3.68600
pred.carseats = predict(tree.carseats, carseats.test)
mean((carseats.test$Sales - pred.carseats)^2)
## [1] 5.686401

Lovaas answer ex 9b: the test MSE is 5.69!

cv.carseats=cv.tree(tree.carseats)
plot(cv.carseats$size,cv.carseats$dev,type="b")

Lovaas note: best size is either 6 or 11. I’ll go with 11.

pruned.carseats = prune.tree(tree.carseats, best = 11)
plot(pruned.carseats)
text(pruned.carseats, pretty = 0)

pred.carseatsprune = predict(pruned.carseats, carseats.test)
mean((carseats.test$Sales - pred.carseatsprune)^2)
## [1] 5.571655

Definitely a lot more readable now!!! MSE is just a smidge higher than it was.

library(randomForest)
## Warning: package 'randomForest' was built under R version 4.0.5
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
set.seed(42)
bag.carseats=randomForest(Sales∼., data=carseats.train, 
mtry=10, importance =T)
#bag.carseats
predict.bag = predict(bag.carseats ,carseats.test)
mean((predict.bag-carseats.test$Sales)^2)
## [1] 2.385845
importance(bag.carseats)
##                %IncMSE IncNodePurity
## CompPrice   30.4193446    199.584719
## Income      10.1511002    103.836558
## Advertising 16.7343552    104.707939
## Population   3.6597098     61.436124
## Price       54.4799140    430.943270
## ShelveLoc   57.7543087    433.821848
## Age         11.5936567    135.079537
## Education    0.4079986     41.991442
## Urban        3.0021409      8.612624
## US          -0.5142460      5.976240

Lovaas answer: Test MSE is 2.39. The top three most important variables are ShelveLoc, Price, and (distant third) CompPrice

set.seed(42)
rf.carseats=randomForest(Sales∼., data=carseats.train, mtry=6, importance =TRUE)
predict.rf = predict(rf.carseats,carseats.test)
mean((predict.rf-carseats.test$Sales)^2)
## [1] 2.370883
importance(rf.carseats)
##               %IncMSE IncNodePurity
## CompPrice   22.527728    171.876245
## Income       6.314712    113.177670
## Advertising 10.951799    117.205292
## Population   1.894481     76.627787
## Price       41.519995    396.963595
## ShelveLoc   50.395736    408.490698
## Age          8.768635    150.251870
## Education    2.867641     56.051377
## Urban        2.691405      9.887319
## US           1.730384      9.343072

Lovaas answer ex 8e: Test MSE is 2.37, slightly lower than we got with bagging. Top three most important variables are the same as we had with bagging; with CompPrice still in a distant third. Playing around with m will reduce or increase our test MSE. Increasing M reduces test MSE.

Chapter 8 ex 9

This problem involves the OJ data set which is part of the ISLR package.

attach(OJ)
set.seed(42)

train = sample(dim(OJ)[1], 800)
oj.train = OJ[train, ]
oj.test = OJ[-train, ]
tree.oj<-tree(formula = Purchase ∼ ., data = oj.train)

summary(tree.oj)
## 
## Classification tree:
## tree(formula = Purchase ~ ., data = oj.train)
## Variables actually used in tree construction:
## [1] "LoyalCH"     "SalePriceMM" "PriceDiff"  
## Number of terminal nodes:  8 
## Residual mean deviance:  0.7392 = 585.5 / 792 
## Misclassification error rate: 0.1638 = 131 / 800
trainpred.oj = predict(tree.oj, oj.train, type = "class")
table(oj.train$Purchase, trainpred.oj)
##     trainpred.oj
##       CH  MM
##   CH 422  70
##   MM  61 247
(70+61)/(800)
## [1] 0.16375

Lovaas answer ex 9b: The model splits on LoyalCH, SalePriceMM, and PriceDiff. Training (not test!!) error rate is (70+61)/(800) = 16.375%. The model has 8 terminal nodes.

tree.oj
## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 800 1066.00 CH ( 0.61500 0.38500 )  
##    2) LoyalCH < 0.48285 285  296.00 MM ( 0.21404 0.78596 )  
##      4) LoyalCH < 0.064156 64    0.00 MM ( 0.00000 1.00000 ) *
##      5) LoyalCH > 0.064156 221  260.40 MM ( 0.27602 0.72398 )  
##       10) SalePriceMM < 2.04 128  123.50 MM ( 0.18750 0.81250 ) *
##       11) SalePriceMM > 2.04 93  125.00 MM ( 0.39785 0.60215 ) *
##    3) LoyalCH > 0.48285 515  458.10 CH ( 0.83689 0.16311 )  
##      6) LoyalCH < 0.753545 230  282.70 CH ( 0.69565 0.30435 )  
##       12) PriceDiff < 0.265 149  203.00 CH ( 0.57718 0.42282 )  
##         24) PriceDiff < -0.165 32   38.02 MM ( 0.28125 0.71875 ) *
##         25) PriceDiff > -0.165 117  150.30 CH ( 0.65812 0.34188 )  
##           50) LoyalCH < 0.703993 105  139.60 CH ( 0.61905 0.38095 ) *
##           51) LoyalCH > 0.703993 12    0.00 CH ( 1.00000 0.00000 ) *
##       13) PriceDiff > 0.265 81   47.66 CH ( 0.91358 0.08642 ) *
##      7) LoyalCH > 0.753545 285  111.70 CH ( 0.95088 0.04912 ) *

Lovaas answer ex 9c: Chosing node 10. This is a split on SalePriceMM. If lower than 2.04 and fitting the higher node criteria, the model classifiers the buyer as “MM,” meaning most people in that group identify as MM. There are 128 data points at this node.

  1. Create a plot of the tree, and interpret the results.
plot(tree.oj)
text(tree.oj ,pretty =1)

Lovaas answer: IDK what to tell you that you don’t already know. If you take a random data point and follow this tree it will tell you, based on LoyalCH, SalePriceMM, and PriceDiff, whether that data pt is more likely to buy CH or Minute Main OJ.

testpred.oj = predict(tree.oj, oj.test, type = "class")
table(oj.test$Purchase, testpred.oj)
##     testpred.oj
##       CH  MM
##   CH 125  36
##   MM  15  94
(15+36)/(125+36+15+94)
## [1] 0.1888889

Lovaas answer ex 9e: test (not train!) MSE is 18.89%

cv.oj = cv.tree(tree.oj, FUN = prune.tree)
summary(cv.oj)
##        Length Class  Mode     
## size   8      -none- numeric  
## dev    8      -none- numeric  
## k      8      -none- numeric  
## method 1      -none- character

Lovaas answer ex 9f: optimal size is 8.

plot(cv.oj$size, cv.oj$dev, type = "b", xlab = "Tree Size", ylab = "Deviance")

  1. Which tree size corresponds to the lowest cross-validated classification error rate? Lovaas answer ex 9h: Seven or Eight. I will go with seven.
oj.pruned = prune.tree(tree.oj, best = 7)
summary(oj.pruned)
## 
## Classification tree:
## snip.tree(tree = tree.oj, nodes = 25L)
## Variables actually used in tree construction:
## [1] "LoyalCH"     "SalePriceMM" "PriceDiff"  
## Number of terminal nodes:  7 
## Residual mean deviance:  0.7518 = 596.2 / 793 
## Misclassification error rate: 0.1638 = 131 / 800
summary(tree.oj)
## 
## Classification tree:
## tree(formula = Purchase ~ ., data = oj.train)
## Variables actually used in tree construction:
## [1] "LoyalCH"     "SalePriceMM" "PriceDiff"  
## Number of terminal nodes:  8 
## Residual mean deviance:  0.7392 = 585.5 / 792 
## Misclassification error rate: 0.1638 = 131 / 800

Lovaas answer ex9j: misclassification rate is the same! That’s fun.

pred.unpruned = predict(tree.oj, oj.test, type = "class")
misclass.unpruned = sum(oj.test$Purchase != pred.unpruned)
misclass.unpruned/length(pred.unpruned)
## [1] 0.1888889
pred.pruned = predict(oj.pruned, oj.test, type = "class")
misclass.pruned = sum(oj.test$Purchase != pred.pruned)
misclass.pruned/length(pred.pruned)
## [1] 0.1888889

Lovaas answer ex 9k: Test misclassification rates are also the same! It must be the same exact model…I should have pruned down to 5 nodes just for the sake of this question.