Assignment 7

3. Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of ˆpm1. The x-axis should display ˆpm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy. Hint: In a setting with two classes, ˆpm1 = 1 − ˆpm2. You could make this plot by hand, but it will be much easier to make in R.

probPos <-seq(0.0, 1.0, .001)
probNeg <- 1 - probPos

error <- probPos
error[probPos > 0.5] <- 1 - probPos[probPos > 0.5]

gini <- probPos * (1 - probPos) + probNeg * (1 - probNeg)
entropy <- -(probPos * log2(probPos) + probNeg * log2(probNeg))

plot(probPos, entropy, typ = "l", xlab = "Probability", ylab = "measure")
lines(probPos, gini, lty = "solid", col = "red")
lines(probPos, error, lty = "solid", col = "blue")
legend("topright", legend = c("Entropy", "Gini", "Error"),col = c("black", "red", "blue"), lty = c("solid", "solid", "solid"))

8. In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.

library(tree)
library(ISLR2)
library(randomForest)
randomForest 4.7-1.1
Type rfNews() to see new features/changes/bug fixes.
library(BART)
Loading required package: nlme
Loading required package: nnet
Loading required package: survival

(a) Split the data set into a training set and a test set.

attach(Carseats)

set.seed(99)

train <- sample( x = nrow(Carseats), size = 0.7 * nrow(Carseats))
training <- Carseats[train,]
testing <- Carseats[-train,]

(b) Fit a regression tree to the training set. Plot the tree, and interpret the results. What test MSE do you obtain?

tree.car <- tree(Sales ~ ., data=training)

plot(tree.car)
text(tree.car, pretty=0)

tree.car
node), split, n, deviance, yval
      * denotes terminal node

 1) root 280 2245.000  7.525  
   2) ShelveLoc: Bad,Medium 224 1270.000  6.788  
     4) Price < 105.5 83  396.700  8.209  
       8) ShelveLoc: Bad 28  107.400  6.770  
        16) CompPrice < 110 12   42.490  5.588 *
        17) CompPrice > 110 16   35.600  7.656 *
       9) ShelveLoc: Medium 55  201.800  8.942  
        18) Price < 86.5 13   24.770 10.790 *
        19) Price > 86.5 42  118.700  8.369  
          38) CompPrice < 117 20   37.150  7.382 *
          39) CompPrice > 117 22   44.310  9.267 *
     5) Price > 105.5 141  606.600  5.952  
      10) ShelveLoc: Bad 44  151.400  4.782  
        20) Population < 196.5 19   62.510  3.775 *
        21) Population > 196.5 25   55.030  5.546 *
      11) ShelveLoc: Medium 97  367.600  6.483  
        22) Price < 127 53  146.100  7.204  
          44) CompPrice < 121.5 15   18.800  5.621 *
          45) CompPrice > 121.5 38   74.870  7.829  
            90) Advertising < 6 23   34.750  7.164 *
            91) Advertising > 6 15   14.350  8.849 *
        23) Price > 127 44  160.800  5.614  
          46) Age < 64.5 35   70.210  6.146 *
          47) Age > 64.5 9   42.140  3.546 *
   3) ShelveLoc: Good 56  368.700 10.470  
     6) Price < 142.5 46  197.500 11.200  
      12) Age < 68.5 40  139.900 11.600  
        24) Price < 107.5 16   43.680 12.770 *
        25) Price > 107.5 24   59.250 10.810  
          50) CompPrice < 120 8   11.250  9.400 *
          51) CompPrice > 120 16   24.070 11.520 *
      13) Age > 68.5 6    8.672  8.533 *
     7) Price > 142.5 10   34.980  7.125 *

According to the selection of nodes, Shelve Location is the first criterion to explain Sales, followed by price.

pred <- predict(tree.car, newdata = testing)
mse <- mean((pred - testing$Sales)^2)
mse
[1] 4.478367

The test MSE associated is: 4.478367

(c) Use cross-validation in order to determine the optimal level of tree complexity. Does pruning the tree improve the test MSE?

set.seed (99)
cv.car <- cv.tree(tree.car)
plot(cv.car, type='b')

6 seems an optimal number of nodes. After 6, deviance increases and it decreases again after 10.

prune.car <- prune.tree(tree.car , best = 6)
plot(prune.car)
text(prune.car , pretty = 0)

pred <- predict(prune.car, newdata = testing)
mse2 <- mean((pred - testing$Sales))
mse2
[1] 0.2689288

For the pruned model, the associated test MSE is 0.2689288, which is a decrease from the original tree.

(d) Use the bagging approach in order to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important.

bag.car <- randomForest(Sales ~ ., data=training, importance=TRUE, ntree=500, mtry=(dim(training)[2]-1))
importance(bag.car)
               %IncMSE IncNodePurity
CompPrice   26.3158803    208.017324
Income       7.0427971     98.141296
Advertising 17.3424074    120.537745
Population   0.2562002     84.611076
Price       67.7991630    696.376483
ShelveLoc   77.3716585    745.784385
Age         16.9891760    169.372835
Education    3.6256098     60.446975
Urban       -1.8416192      9.522343
US           4.1325398     11.394146

The most important variables to predict Sales are: ShelveLoc, Price, CompPrice, Advertising and Age.

pred <- predict(bag.car, newdata=testing)
mse3 <- mean((pred - testing$Sales)^2)
mse3
[1] 2.698012

The test MSE associated to the bagging approach is: 2.698012

(e) Use random forests to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important. Describe the effect of m, the number of variables considered at each split, on the error rate obtained.

random.car3 <- randomForest(Sales ~ ., data=training, mtry=3, importance=TRUE)
importance(random.car3)
                %IncMSE IncNodePurity
CompPrice   13.49669978     183.76349
Income       4.73595066     157.22747
Advertising 13.20260934     178.85366
Population  -0.07336213     146.20050
Price       40.09175888     529.80535
ShelveLoc   49.77274647     558.99264
Age         13.85402261     221.99139
Education    2.96752226      87.72847
Urban        0.21449770      20.73059
US           6.12376855      44.36670
pred <- predict(random.car3, newdata=testing)
mse4 <- mean((pred - testing$Sales)^2)
mse4
[1] 3.022386
random.car6 <- randomForest(Sales ~ ., data=training, mtry=6, importance=TRUE)
importance(random.car6)
                %IncMSE IncNodePurity
CompPrice   19.90162798     192.96568
Income       5.78630210     118.15504
Advertising 15.57367021     147.26581
Population   0.09272235     105.86732
Price       55.49726726     638.11654
ShelveLoc   66.84964124     694.31717
Age         17.44318064     192.90676
Education    3.28870043      71.09437
Urban       -0.30548496      11.92921
US           6.52292573      23.85291
pred <- predict(random.car6, newdata=testing)
mse5 <- mean((pred - testing$Sales)^2)
mse5
[1] 2.621258
random.car9 <- randomForest(Sales ~ ., data=training, mtry=9, importance=TRUE)
importance(random.car9)
               %IncMSE IncNodePurity
CompPrice   25.0970296     208.08514
Income       8.9099512     105.85453
Advertising 15.7663079     122.03898
Population  -0.5105425      87.76381
Price       64.3605523     682.57927
ShelveLoc   75.1341243     726.06571
Age         18.7602980     180.48013
Education    3.5500215      60.68802
Urban        0.3940331      10.32281
US           3.8596165      14.38256
pred <- predict(random.car9, newdata=testing)
mse6 <- mean((pred - testing$Sales)^2)
mse6
[1] 2.599911

Increasing m helped decrease the test MSE; however, from m=6 to m=9, the decrease was small. The associated test MSE for m=6 is: 2.662114.

  1. Now analyze the data using BART, and report your results.
x <- Carseats[, 1:10]
y <- Carseats[, "Sales"]
xtrain <- x[train, ]
ytrain <- y[train]
xtest <- x[-train, ]
ytest <- y[-train]
set.seed (99)
bartfit <- gbart(xtrain , ytrain , x.test = xtest)
*****Calling gbart: type=1
*****Data:
data:n,p,np: 280, 13, 120
y1,yn: -0.544750, 5.045250
x1,x[n*p]: 6.980000, 1.000000
xp1,xp[np*p]: 11.220000, 1.000000
*****Number of Trees: 200
*****Number of Cut Points: 100 ... 1
*****burn,nd,thin: 100,1000,1
*****Prior:beta,alpha,tau,nu,lambda,offset: 2,0.95,0.287616,3,2.89524e-30,7.52475
*****sigma: 0.000000
*****w (weights): 1.000000 ... 1.000000
*****Dirichlet:sparse,theta,omega,a,b,rho,augment: 0,0,1,0.5,1,13,0
*****printevery: 100

MCMC
done 0 (out of 1100)
done 100 (out of 1100)
done 200 (out of 1100)
done 300 (out of 1100)
done 400 (out of 1100)
done 500 (out of 1100)
done 600 (out of 1100)
done 700 (out of 1100)
done 800 (out of 1100)
done 900 (out of 1100)
done 1000 (out of 1100)
time: 3s
trcnt,tecnt: 1000,1000
yhat.bart <- bartfit$yhat.test.mean
mse7 <- mean (( ytest - yhat.bart)^2)
mse7
[1] 0.08958535

For BART, the MSE associates is: 0.089585 which is the smalles MSE, followed by Prunning with 6 nodes.

detach(Carseats)

9. This problem involves the OJ data set which is part of the ISLR2 package.

attach(OJ)

a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

set.seed(99)
train <- sample(1:nrow(OJ), 800)
training <- OJ[train,]
testing <- OJ[-train,]

(b) Fit a tree to the training data, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?

set.seed(99)
tree.OJ <- tree(Purchase ~ ., data=training)
summary(tree.OJ)

Classification tree:
tree(formula = Purchase ~ ., data = training)
Variables actually used in tree construction:
[1] "LoyalCH"       "PriceDiff"     "ListPriceDiff"
Number of terminal nodes:  7 
Residual mean deviance:  0.7337 = 581.8 / 793 
Misclassification error rate: 0.155 = 124 / 800 

The missclassification error rate is: 0.155 and the number of terminal nodes is: 7. 3 variables were used to construct the tree: LoyalCH, ProceDIff and ListPriceDiff.

(c) Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed.

tree.OJ
node), split, n, deviance, yval, (yprob)
      * denotes terminal node

 1) root 800 1066.000 CH ( 0.61500 0.38500 )  
   2) LoyalCH < 0.48285 294  313.100 MM ( 0.22449 0.77551 )  
     4) LoyalCH < 0.0356415 59   10.140 MM ( 0.01695 0.98305 ) *
     5) LoyalCH > 0.0356415 235  277.200 MM ( 0.27660 0.72340 ) *
   3) LoyalCH > 0.48285 506  441.700 CH ( 0.84190 0.15810 )  
     6) LoyalCH < 0.764572 240  291.500 CH ( 0.70417 0.29583 )  
      12) PriceDiff < -0.165 37   38.630 MM ( 0.21622 0.78378 )  
        24) ListPriceDiff < 0.115 13   17.940 CH ( 0.53846 0.46154 ) *
        25) ListPriceDiff > 0.115 24    8.314 MM ( 0.04167 0.95833 ) *
      13) PriceDiff > -0.165 203  207.000 CH ( 0.79310 0.20690 )  
        26) PriceDiff < 0.265 114  140.600 CH ( 0.69298 0.30702 ) *
        27) PriceDiff > 0.265 89   49.030 CH ( 0.92135 0.07865 ) *
     7) LoyalCH > 0.764572 266   78.640 CH ( 0.96617 0.03383 ) *

Node 4 is a terminal node with 59 observations and a deviance of 10.140. At this level most of the observations take the MM categorical value (98.3%).

(d) Create a plot of the tree, and interpret the results.

plot(tree.OJ)
text(tree.OJ, pretty=0)

The most important variable to predict Purchase and first node is Loyal CH. For a value of less than 0.48285 of LoyalCH, the purchas will most likely be fall in MM. For a value above 0.48 and less than 0.764, the purchase will depend of different nodes as Price Diff.

(e) Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?

pred <- predict(tree.OJ, newdata=testing, type = 'class')
matrix <- table(pred, testing$Purchase)
matrix
    
pred  CH  MM
  CH 131  28
  MM  30  81
err <- 1 - ((matrix[1,1]+matrix[2,2])/sum(matrix))
err
[1] 0.2148148

The associated Test error rate is: 0.2148148

  1. Apply the cv.tree() function to the training set in order to determine the optimal tree size.
cv.OJ <- cv.tree(tree.OJ, FUN=prune.misclass)
cv.OJ
$size
[1] 7 5 4 2 1

$dev
[1] 135 138 138 149 308

$k
[1]  -Inf   0.0   1.0  10.5 162.0

$method
[1] "misclass"

attr(,"class")
[1] "prune"         "tree.sequence"
  1. Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.
plot(cv.OJ, type='b')

  1. Which tree size corresponds to the lowest cross-validated classification error rate?

The lowest cross-validated classification error rate appears to be a size of 4; after which there is no significant improvement.

  1. Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.
prune.OJ <- prune.misclass(tree.OJ, best=4)
summary(prune.OJ)

Classification tree:
snip.tree(tree = tree.OJ, nodes = c(2L, 13L, 12L))
Variables actually used in tree construction:
[1] "LoyalCH"   "PriceDiff"
Number of terminal nodes:  4 
Residual mean deviance:  0.8007 = 637.4 / 796 
Misclassification error rate: 0.1562 = 125 / 800 
  1. Compare the training error rates between the pruned and unpruned trees. Which is higher?
summary(prune.OJ)

Classification tree:
snip.tree(tree = tree.OJ, nodes = c(2L, 13L, 12L))
Variables actually used in tree construction:
[1] "LoyalCH"   "PriceDiff"
Number of terminal nodes:  4 
Residual mean deviance:  0.8007 = 637.4 / 796 
Misclassification error rate: 0.1562 = 125 / 800 
summary(tree.OJ)

Classification tree:
tree(formula = Purchase ~ ., data = training)
Variables actually used in tree construction:
[1] "LoyalCH"       "PriceDiff"     "ListPriceDiff"
Number of terminal nodes:  7 
Residual mean deviance:  0.7337 = 581.8 / 793 
Misclassification error rate: 0.155 = 124 / 800 

Both errors are almost similar; however the unpruned tree has a lower error by 0.001.

  1. Compare the test error rates between the pruned and unpruned trees. Which is higher?
pred.OJ <- predict(prune.OJ, testing, type='class')
matrix2 <- table(pred.OJ, testing$Purchase)
matrix2
       
pred.OJ  CH  MM
     CH 129  25
     MM  32  84
err2 <- 1 - ((matrix2[1,1]+matrix2[2,2])/sum(matrix2))
err2
[1] 0.2111111

The associated Test error for the unpruned tree is: 0.2148148 The associated Test error for the pruned tree is: 0.211111