\(\textbf{Chapter 8}\)
\(\textbf{Question 3:}\) Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of \(\hat{p}_{m1}\). The x-axis should display \(\hat{p}_{m1}\), ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy.
Hint: In a setting with two classes, \(\hat{p}_{m1}\) = 1 − \(\hat{p}_{m2}\). You could make this plot by hand, but it will be much easier to make in R
p <- seq(0, 1, 0.01)
gi <- 2 * p * (1 - p)
c_err <- 1 - pmax(p, 1 - p)
entropy <- - (p * log(p) + (1 - p) * log(1 - p))
matplot(p, cbind(gi, c_err, entropy), col = c("blue", "grey", "black"))
\(\textbf{Question 8:}\) In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.
library(tree)
## Warning: package 'tree' was built under R version 4.0.4
library(ISLR)
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.0.5
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
set.seed(5)
training_sample <- sample(1:nrow(Carseats), nrow(Carseats) / 2)
training_ds <- Carseats[training_sample,]
test_ds <- Carseats[-training_sample,]
tree_fit <- tree(Sales ~ ., data = training_ds)
plot(tree_fit)
text(tree_fit, pretty = 0)
prediction = predict(tree_fit, newdata = test_ds)
mean((prediction - test_ds$Sales)^2)
## [1] 5.041825
\(\textbf{Response:}\) The resultant MSE is 5.041825.
set.seed(6)
tree_cv <- cv.tree(tree_fit)
plot(rev(tree_cv$size), tree_cv$dev, type = "b")
points(which.min(tree_cv$dev), tree_cv$dev[which.min(tree_cv$dev)], col = "blue", cex = 2, pch = 20)
prune_tree = prune.tree(tree_fit, best = 9)
plot(prune_tree)
text(prune_tree, pretty = 0)
prediction = predict(prune_tree, newdata = test_ds)
mean((prediction - test_ds$Sales)^2)
## [1] 4.774217
\(\textbf{Response:}\) The resultant MSE decreased to 4.774217 after being pruned to 9 nodes.
set.seed(6)
bagging <- randomForest(Sales ~ ., data = training_ds, mtry = 10, importance = TRUE)
prediction <- predict(bagging, newdata = test_ds)
mean((prediction - test_ds$Sales)^2)
## [1] 2.568033
importance(bagging)
## %IncMSE IncNodePurity
## CompPrice 24.1507039 195.333645
## Income 10.9173085 140.555948
## Advertising 15.4635744 116.386749
## Population -0.9377838 53.144288
## Price 42.8334785 352.388293
## ShelveLoc 56.6800070 531.259260
## Age 14.8566179 139.218229
## Education 2.0203536 51.393441
## Urban 1.6053019 8.602717
## US 1.8223005 6.866832
varImpPlot(bagging)
\(\textbf{Response:}\) As observed in the output, it appears ShelveLoc and Price are the most important variables.
ms = 1:8
mse = rep(NA, length(ms))
for(i in 1:length(ms)){
m = ms[i]
rf_fit = randomForest(Sales ~ ., data = training_ds, mtry = m, importance = TRUE)
print(m)
print(importance(rf_fit))
prediction <- predict(rf_fit, newdata = test_ds)
mse[i] = mean((prediction - test_ds$Sales)^2)
}
## [1] 1
## %IncMSE IncNodePurity
## CompPrice 9.471782418 135.25511
## Income 5.523634254 120.56066
## Advertising 7.514128967 108.82321
## Population -0.952666520 96.80034
## Price 17.551273056 169.87415
## ShelveLoc 23.861029135 210.72491
## Age 9.902839611 129.78439
## Education -0.127561028 72.90789
## Urban -0.004639537 22.18199
## US 3.294371377 33.54436
## [1] 2
## %IncMSE IncNodePurity
## CompPrice 15.1376052 180.01142
## Income 7.3622947 157.28786
## Advertising 7.8944897 142.00347
## Population -3.8387095 104.22104
## Price 25.7064486 279.67362
## ShelveLoc 34.3383852 319.62708
## Age 9.8895830 166.64765
## Education -2.2860866 84.58807
## Urban -0.7065511 19.73474
## US 3.7810012 28.33808
## [1] 3
## %IncMSE IncNodePurity
## CompPrice 15.4977462 182.83494
## Income 8.7248590 155.52146
## Advertising 9.2564508 146.19840
## Population -2.8426640 97.02143
## Price 30.3251786 303.77527
## ShelveLoc 40.8622773 380.66805
## Age 12.3746064 156.89713
## Education 0.1390324 71.65719
## Urban 1.9395501 13.36642
## US 3.0428562 21.76415
## [1] 4
## %IncMSE IncNodePurity
## CompPrice 16.8635237 184.53394
## Income 8.4133380 163.58147
## Advertising 11.7747188 138.75331
## Population -2.0190914 77.51777
## Price 33.0928308 317.86257
## ShelveLoc 44.1168563 419.76485
## Age 11.3157301 159.36913
## Education 0.5436397 65.71724
## Urban -0.1330979 12.97418
## US 4.4624163 17.32562
## [1] 5
## %IncMSE IncNodePurity
## CompPrice 18.8809593 186.85501
## Income 7.7980845 152.54766
## Advertising 11.4851745 129.75165
## Population -1.5748711 72.02456
## Price 36.7320399 339.98411
## ShelveLoc 47.5037691 443.59907
## Age 14.3102077 147.21637
## Education 1.3500123 61.26393
## Urban -0.5680042 11.45582
## US 3.6183442 13.54315
## [1] 6
## %IncMSE IncNodePurity
## CompPrice 19.8944066 192.180115
## Income 10.0528597 151.349181
## Advertising 11.8066517 124.075077
## Population -1.4861804 66.802037
## Price 36.3697614 336.195955
## ShelveLoc 50.8270188 467.038269
## Age 12.8520494 147.919400
## Education 1.6136479 59.442598
## Urban -0.1202717 9.730166
## US 3.8022388 12.221410
## [1] 7
## %IncMSE IncNodePurity
## CompPrice 22.9356355 191.882851
## Income 8.6450110 146.875049
## Advertising 13.6284419 119.662174
## Population -2.4806891 60.792529
## Price 40.3931279 345.296481
## ShelveLoc 55.8851733 494.066654
## Age 15.8031536 146.705284
## Education 3.3911822 51.593242
## Urban 0.5074606 8.938265
## US 3.5761544 10.430808
## [1] 8
## %IncMSE IncNodePurity
## CompPrice 23.0633877 192.830665
## Income 9.3654437 148.955729
## Advertising 13.6498786 116.669006
## Population -1.1591034 55.396291
## Price 39.0987042 341.801328
## ShelveLoc 55.0896660 503.280454
## Age 13.9036826 148.871036
## Education 1.3259643 50.651824
## Urban 0.2491894 7.774526
## US 3.7409931 8.157653
mse
## [1] 4.830256 3.457590 3.050524 2.855002 2.709845 2.702094 2.626212 2.593132
\(\textbf{Response:}\) As can be observed above, as m increased, the MSE decreased. That being said, for m = \(\sqrt p \approx 3\), an MSE of 3.145097 was arrive at which was a bit higher than bagging. Additionally, the importance of ShelveLoc and Price, which appeared to be the most important based off the output, increased as m incremented.
\(\textbf{Question 8:}\) This problem involves the OJ data set which is part of the ISLR package.
set.seed(5)
training_sample <- sample(1:nrow(OJ), 800)
training_ds <- OJ[training_sample,]
test_ds <- OJ[-training_sample,]
tree_fit =tree(Purchase ~ ., data = training_ds)
summary(tree_fit)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = training_ds)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "ListPriceDiff"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7347 = 581.1 / 791
## Misclassification error rate: 0.1662 = 133 / 800
\(\textbf{Response:}\) as shown above, the tree has 9 terminal nodes with training error rate 0.1662.
tree_fit
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1068.00 CH ( 0.61250 0.38750 )
## 2) LoyalCH < 0.5036 346 412.40 MM ( 0.28324 0.71676 )
## 4) LoyalCH < 0.280875 164 125.50 MM ( 0.12805 0.87195 )
## 8) LoyalCH < 0.0356415 56 10.03 MM ( 0.01786 0.98214 ) *
## 9) LoyalCH > 0.0356415 108 103.50 MM ( 0.18519 0.81481 ) *
## 5) LoyalCH > 0.280875 182 248.00 MM ( 0.42308 0.57692 )
## 10) PriceDiff < 0.05 71 67.60 MM ( 0.18310 0.81690 ) *
## 11) PriceDiff > 0.05 111 151.30 CH ( 0.57658 0.42342 ) *
## 3) LoyalCH > 0.5036 454 362.00 CH ( 0.86344 0.13656 )
## 6) PriceDiff < -0.39 31 40.32 MM ( 0.35484 0.64516 )
## 12) LoyalCH < 0.638841 10 0.00 MM ( 0.00000 1.00000 ) *
## 13) LoyalCH > 0.638841 21 29.06 CH ( 0.52381 0.47619 ) *
## 7) PriceDiff > -0.39 423 273.70 CH ( 0.90071 0.09929 )
## 14) LoyalCH < 0.705326 135 143.00 CH ( 0.77778 0.22222 )
## 28) ListPriceDiff < 0.255 67 89.49 CH ( 0.61194 0.38806 ) *
## 29) ListPriceDiff > 0.255 68 30.43 CH ( 0.94118 0.05882 ) *
## 15) LoyalCH > 0.705326 288 99.77 CH ( 0.95833 0.04167 ) *
\(\textbf{Response:}\) for the returned values above, * marks the terminal nodes. In selecting 10, it can be observed that there are 71 observations with a value of PriceDiff < 0.05, which are classified as MM.
plot(tree_fit)
text(tree_fit, pretty = 0)
\(\textbf{Response:}\) I find it interesting that after the LoyalCH evaluation at the top, all but one terminal node on the left classifies as MM and all but one terminal node on the right classifies as CH. To that extent, it appears that LoyalCH is the most instrumental in the classification results.
tree_prediction <- predict(tree_fit, test_ds, type = "class")
conf_mat <- table(test_ds$Purchase, tree_prediction)
conf_mat
## tree_prediction
## CH MM
## CH 148 15
## MM 32 75
(conf_mat[2, 1] + conf_mat[1, 2]) / sum(conf_mat)
## [1] 0.1740741
\(\textbf{Response:}\) the test error rate is 0.1740741.
set.seed(6)
tree_cv <- cv.tree(tree_fit)
tree_cv
## $size
## [1] 9 8 7 6 5 4 3 2 1
##
## $dev
## [1] 699.2249 689.8235 690.5285 742.8358 765.6732 761.3058 775.4678
## [8] 787.4929 1068.9050
##
## $k
## [1] -Inf 11.25968 11.98017 23.10013 29.11546 30.91262 38.92794
## [8] 47.97460 293.76700
##
## $method
## [1] "deviance"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
\(\textbf{Response:}\) 8 appears to be the optimal size but 7 is similar and is a simpler model.
plot(tree_cv$size, tree_cv$dev, type = "b")
points(tree_cv$size[which.max(tree_cv$size)] - which.min(tree_cv$dev) + 1, tree_cv$dev[which.min(tree_cv$dev)], col = "blue", cex = 2, pch = 20)
\(\textbf{Response:}\) though 8 corresponds to the lowest cross-validated classification error rate, note that 7 is comparable yet simpler.
tree_pruned = prune.tree(tree_fit, best = 7)
summary(tree_pruned)
##
## Classification tree:
## snip.tree(tree = tree_fit, nodes = c(6L, 4L))
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "ListPriceDiff"
## Number of terminal nodes: 7
## Residual mean deviance: 0.7622 = 604.4 / 793
## Misclassification error rate: 0.1675 = 134 / 800
\(\textbf{Response:}\) given that 7 is comparable to 8 but simpler, so I opted for 7 with the motivation that it might generalize better to the test data.
\(\textbf{Response:}\) the training error rate was a little higher for the pruned tree.
# pruned tree error
tree_pruned_prediction <- predict(tree_pruned, test_ds, type = "class")
conf_mat <- table(test_ds$Purchase, tree_pruned_prediction)
(conf_mat[2, 1] + conf_mat[1, 2]) / sum(conf_mat)
## [1] 0.1777778
\(\textbf{Response:}\) the test error rate of pruned tree of 0.1777778 is slightly higher than 0.1740741 for the unpruned tree. Out of curiosity, I ran the experiment for 6 and 8 as well, with both also giving 0.1777778.