Question #3: Consider the Gini index, classification
error, and entropy in a simple classification setting with two classes.
Create a single plot that displays each of these quantities as a
function of ˆpm1. The x-axis should display ˆpm1, ranging from 0 to 1,
and the y-axis should display the value of the Gini index,
classification error, and entropy.
p = seq(0, 1, 0.01)
gini = p * (1 - p) * 2
entropy = -(p * log(p) + (1 - p) * log(1 - p))
class.err = 1 - pmax(p, 1 - p)
matplot(p, cbind(gini, entropy, class.err), col = c("blue", "red", "green"))

Question #8: In the lab, a classification tree was
applied to the Carseats data set after converting Sales into a
qualitative response variable. Now we will seek to predict Sales using
regression trees and related approaches, treating the response as a
quantitative variable.
library(ISLR2)
attach(Carseats)
(a) Split the data set into a training set and a
test set.
set.seed(1)
train = sample(dim(Carseats)[1], dim(Carseats)[1]/2)
train.carseats = Carseats[train, ]
test.carseats = Carseats[-train, ]
y.test <- test.carseats$Sales
(b) Fit a regression tree to the training set. Plot
the tree, and interpret the results. What test MSE do you obtain?
library(tree)
tree.carseats = tree(Sales ~ ., data = train.carseats)
summary(tree.carseats)
Regression tree:
tree(formula = Sales ~ ., data = train.carseats)
Variables actually used in tree construction:
[1] "ShelveLoc" "Price" "Age" "Advertising" "CompPrice"
[6] "US"
Number of terminal nodes: 18
Residual mean deviance: 2.167 = 394.3 / 182
Distribution of residuals:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.88200 -0.88200 -0.08712 0.00000 0.89590 4.09900
plot(tree.carseats)
text(tree.carseats, pretty = 0)

pred.carseats = predict(tree.carseats, test.carseats)
mean((test.carseats$Sales - pred.carseats)^2)
[1] 4.922039
- Based on the results, the MSE is 4.92.
(c) Use cross-validation in order to determine the
optimal level of tree complexity. Does pruning the tree improve the test
MSE?
cv.carseats <- cv.tree(tree.carseats)
plot(cv.carseats$size, cv.carseats$dev, type='b')

prune.carseats <- prune.tree(tree.carseats, best=11)
plot(prune.carseats)
text(prune.carseats, pretty=0)

yhat <- predict(prune.carseats, test.carseats)
mean((yhat-y.test)^2)
[1] 4.757881
- Based on the results, pruning resulted in a lower MSE.
(d) Use the bagging approach in order to analyze
this data. What test MSE do you obtain? Use the importance() function to
determine which variables are most important.
#library(randomForest)
#bag.carseats <- randomForest(Sales ~., data=train.carseats, mtry=10, importance=TRUE)
#yhat.bag <- predict(bag.carseats, newdata=test.carseats)
#mean((yhat.bag-y.test)^2)
-Bagging decreases the test MSE to 3.7.
#importance(bag.carseats)
- Based on the results the two most important variables are ShelveLoc
and Price.
(e) Use random forests to analyze this data. What
test MSE do you obtain? Use the importance() function to determine which
variables are most important. Describe the effect of m, the number of
variables considered at each split, on the error rate obtained.
#rf.carseats <- randomForest(Sales~., data=train.carseats, #mtry=floor((ncol(Carseats)-1)/3),importance=TRUE)
#yhat.rf <- predict(rf.carseats, newdata = test.carseats)
#mean((yhat.rf-y.test)^2)
- When using p/3 variables at each node in random forest, we obtain a
higher test MSE (4.43) than by bagging.
#importance(rf.carseats)
- The two most important variables are still ShelveLoc and Price.
(f) Now analyze the data using BART, and report your
results.
Question #9: This problem involves the OJ data set
which is part of the ISLR2 package.
data(OJ)
attach(OJ)
(a) Create a training set containing a random sample
of 800 observations, and a test set containing the remaining
observations.
set.seed(100)
train2 <- sample(1:nrow(OJ), 800)
train.OJ <- OJ[train_n, ]
test.OJ <- OJ[-train_n, ]
(b) Fit a tree to the training data, with Purchase
as the response and the other variables as predictors. Use the summary()
function to produce summary statistics about the tree, and describe the
results obtained. What is the training error rate? How many terminal
nodes does the tree have?
tree.OJ <- tree(Purchase ~., data=train.OJ)
summary(tree.OJ)
- There are 9 terminal nodes in this tree. The training error is
0.15.
(c) Type in the name of the tree object in order to
get a detailed text output. Pick one of the terminal nodes, and
interpret the information displayed.
tree.OJ
- Node 24 in the decision tree comprises 31 observations and has a
deviance of 82.11. The figures in parentheses indicate that
approximately 57% of the observations are incorrectly classified as CH,
while about 43% are accurately classified as MM.
(d) Create a plot of the tree, and interpret the
results.
plot(tree.OJ)
text(tree.OJ, pretty=0)
- The initial split took place at the variable LoyalCH < 0.48,
indicating its significance as a predictor with three additional splits
at this variable. Another vital predictor is PriceDiff, which also
appears at three distinct splits.
(e) Predict the response on the test data, and
produce a confusion matrix comparing the test labels to the predicted
test labels. What is the test error rate?
ytest.OJ <- test.OJ$Purchase
tree.pred <- predict(tree.OJ, test.OJ, type="class")
table(tree.pred, ytest.OJ)
(21+37)/(150+37+21+62)
(f) Apply the cv.tree() function to the training set
in order to determine the optimal tree size.
cv.tree <- cv.tree(tree.OJ, FUN=prune.misclass)
cv.tree
- The cross-validation error is at its lowest point with either 9 or 8
terminal nodes, making these two tree sizes the optimal choices for this
case.
(g) Produce a plot with tree size on the x-axis and
cross-validated classification error rate on the y-axis.
plot(cv.tree$size, cv.tree$dev, type='b')
(h) Which tree size corresponds to the lowest
cross-validated classification error rate?
- The cross-validation method indicates that the tree size of 8 or 9
yields the minimum classification error rate, which is identical for
both sizes.
(i) Produce a pruned tree corresponding to the
optimal tree size obtained using cross-validation. If cross-validation
does not lead to selection of a pruned tree, then create a pruned tree
with five terminal nodes.
prune.OJ <- prune.misclass(tree.OJ, best=5)
summary(prune.OJ)
- Based on the new results, there is a slightly higher training error,
and there are 5 terminal nodes.
plot(prune.OJ)
text(prune.OJ, pretty=0)
(j) Compare the training error rates between the
pruned and unpruned trees. Which is higher?
- The pruned tree training error rate is higher by .0088
(k) Compare the test error rates between the pruned
and unpruned trees. Which is higher?
prune.pred <- predict(prune.OJ, test.OJ, type = "class")
table(prune.pred, ytest.OJ)
(29+33)/(138+29+33+70)
The test error rate results are higher at 22.96% for the pruned tree,
while it was 21.5% for the unpruned tree.
