probPos <-seq(0.0, 1.0, .001)
probNeg <- 1 - probPos
error <- probPos
error[probPos > 0.5] <- 1 - probPos[probPos > 0.5]
gini <- probPos * (1 - probPos) + probNeg * (1 - probNeg)
entropy <- -(probPos * log2(probPos) + probNeg * log2(probNeg))
plot(probPos, entropy, typ = "l", xlab = "Probability", ylab = "measure")
lines(probPos, gini, lty = "solid", col = "red")
lines(probPos, error, lty = "solid", col = "blue")
legend("topright", legend = c("Entropy", "Gini", "Error"),col = c("black", "red", "blue"), lty = c("solid", "solid", "solid"))Assignment 7
3. Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of ˆpm1. The x-axis should display ˆpm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy. Hint: In a setting with two classes, ˆpm1 = 1 − ˆpm2. You could make this plot by hand, but it will be much easier to make in R.
8. In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.
library(tree)
library(ISLR2)
library(randomForest)randomForest 4.7-1.1
Type rfNews() to see new features/changes/bug fixes.
library(BART)Loading required package: nlme
Loading required package: nnet
Loading required package: survival
(a) Split the data set into a training set and a test set.
attach(Carseats)
set.seed(99)
train <- sample( x = nrow(Carseats), size = 0.7 * nrow(Carseats))
training <- Carseats[train,]
testing <- Carseats[-train,](b) Fit a regression tree to the training set. Plot the tree, and interpret the results. What test MSE do you obtain?
tree.car <- tree(Sales ~ ., data=training)
plot(tree.car)
text(tree.car, pretty=0)tree.carnode), split, n, deviance, yval
* denotes terminal node
1) root 280 2245.000 7.525
2) ShelveLoc: Bad,Medium 224 1270.000 6.788
4) Price < 105.5 83 396.700 8.209
8) ShelveLoc: Bad 28 107.400 6.770
16) CompPrice < 110 12 42.490 5.588 *
17) CompPrice > 110 16 35.600 7.656 *
9) ShelveLoc: Medium 55 201.800 8.942
18) Price < 86.5 13 24.770 10.790 *
19) Price > 86.5 42 118.700 8.369
38) CompPrice < 117 20 37.150 7.382 *
39) CompPrice > 117 22 44.310 9.267 *
5) Price > 105.5 141 606.600 5.952
10) ShelveLoc: Bad 44 151.400 4.782
20) Population < 196.5 19 62.510 3.775 *
21) Population > 196.5 25 55.030 5.546 *
11) ShelveLoc: Medium 97 367.600 6.483
22) Price < 127 53 146.100 7.204
44) CompPrice < 121.5 15 18.800 5.621 *
45) CompPrice > 121.5 38 74.870 7.829
90) Advertising < 6 23 34.750 7.164 *
91) Advertising > 6 15 14.350 8.849 *
23) Price > 127 44 160.800 5.614
46) Age < 64.5 35 70.210 6.146 *
47) Age > 64.5 9 42.140 3.546 *
3) ShelveLoc: Good 56 368.700 10.470
6) Price < 142.5 46 197.500 11.200
12) Age < 68.5 40 139.900 11.600
24) Price < 107.5 16 43.680 12.770 *
25) Price > 107.5 24 59.250 10.810
50) CompPrice < 120 8 11.250 9.400 *
51) CompPrice > 120 16 24.070 11.520 *
13) Age > 68.5 6 8.672 8.533 *
7) Price > 142.5 10 34.980 7.125 *
According to the selection of nodes, Shelve Location is the first criterion to explain Sales, followed by price.
pred <- predict(tree.car, newdata = testing)
mse <- mean((pred - testing$Sales)^2)
mse[1] 4.478367
The test MSE associated is: 4.478367
(c) Use cross-validation in order to determine the optimal level of tree complexity. Does pruning the tree improve the test MSE?
set.seed (99)
cv.car <- cv.tree(tree.car)
plot(cv.car, type='b')6 seems an optimal number of nodes. After 6, deviance increases and it decreases again after 10.
prune.car <- prune.tree(tree.car , best = 6)
plot(prune.car)
text(prune.car , pretty = 0)pred <- predict(prune.car, newdata = testing)
mse2 <- mean((pred - testing$Sales))
mse2[1] 0.2689288
For the pruned model, the associated test MSE is 0.2689288, which is a decrease from the original tree.
(d) Use the bagging approach in order to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important.
bag.car <- randomForest(Sales ~ ., data=training, importance=TRUE, ntree=500, mtry=(dim(training)[2]-1))
importance(bag.car) %IncMSE IncNodePurity
CompPrice 26.3158803 208.017324
Income 7.0427971 98.141296
Advertising 17.3424074 120.537745
Population 0.2562002 84.611076
Price 67.7991630 696.376483
ShelveLoc 77.3716585 745.784385
Age 16.9891760 169.372835
Education 3.6256098 60.446975
Urban -1.8416192 9.522343
US 4.1325398 11.394146
The most important variables to predict Sales are: ShelveLoc, Price, CompPrice, Advertising and Age.
pred <- predict(bag.car, newdata=testing)
mse3 <- mean((pred - testing$Sales)^2)
mse3[1] 2.698012
The test MSE associated to the bagging approach is: 2.698012
(e) Use random forests to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important. Describe the effect of m, the number of variables considered at each split, on the error rate obtained.
random.car3 <- randomForest(Sales ~ ., data=training, mtry=3, importance=TRUE)
importance(random.car3) %IncMSE IncNodePurity
CompPrice 13.49669978 183.76349
Income 4.73595066 157.22747
Advertising 13.20260934 178.85366
Population -0.07336213 146.20050
Price 40.09175888 529.80535
ShelveLoc 49.77274647 558.99264
Age 13.85402261 221.99139
Education 2.96752226 87.72847
Urban 0.21449770 20.73059
US 6.12376855 44.36670
pred <- predict(random.car3, newdata=testing)
mse4 <- mean((pred - testing$Sales)^2)
mse4[1] 3.022386
random.car6 <- randomForest(Sales ~ ., data=training, mtry=6, importance=TRUE)
importance(random.car6) %IncMSE IncNodePurity
CompPrice 19.90162798 192.96568
Income 5.78630210 118.15504
Advertising 15.57367021 147.26581
Population 0.09272235 105.86732
Price 55.49726726 638.11654
ShelveLoc 66.84964124 694.31717
Age 17.44318064 192.90676
Education 3.28870043 71.09437
Urban -0.30548496 11.92921
US 6.52292573 23.85291
pred <- predict(random.car6, newdata=testing)
mse5 <- mean((pred - testing$Sales)^2)
mse5[1] 2.621258
random.car9 <- randomForest(Sales ~ ., data=training, mtry=9, importance=TRUE)
importance(random.car9) %IncMSE IncNodePurity
CompPrice 25.0970296 208.08514
Income 8.9099512 105.85453
Advertising 15.7663079 122.03898
Population -0.5105425 87.76381
Price 64.3605523 682.57927
ShelveLoc 75.1341243 726.06571
Age 18.7602980 180.48013
Education 3.5500215 60.68802
Urban 0.3940331 10.32281
US 3.8596165 14.38256
pred <- predict(random.car9, newdata=testing)
mse6 <- mean((pred - testing$Sales)^2)
mse6[1] 2.599911
Increasing m helped decrease the test MSE; however, from m=6 to m=9, the decrease was small. The associated test MSE for m=6 is: 2.662114.
- Now analyze the data using BART, and report your results.
x <- Carseats[, 1:10]
y <- Carseats[, "Sales"]
xtrain <- x[train, ]
ytrain <- y[train]
xtest <- x[-train, ]
ytest <- y[-train]
set.seed (99)
bartfit <- gbart(xtrain , ytrain , x.test = xtest)*****Calling gbart: type=1
*****Data:
data:n,p,np: 280, 13, 120
y1,yn: -0.544750, 5.045250
x1,x[n*p]: 6.980000, 1.000000
xp1,xp[np*p]: 11.220000, 1.000000
*****Number of Trees: 200
*****Number of Cut Points: 100 ... 1
*****burn,nd,thin: 100,1000,1
*****Prior:beta,alpha,tau,nu,lambda,offset: 2,0.95,0.287616,3,2.89524e-30,7.52475
*****sigma: 0.000000
*****w (weights): 1.000000 ... 1.000000
*****Dirichlet:sparse,theta,omega,a,b,rho,augment: 0,0,1,0.5,1,13,0
*****printevery: 100
MCMC
done 0 (out of 1100)
done 100 (out of 1100)
done 200 (out of 1100)
done 300 (out of 1100)
done 400 (out of 1100)
done 500 (out of 1100)
done 600 (out of 1100)
done 700 (out of 1100)
done 800 (out of 1100)
done 900 (out of 1100)
done 1000 (out of 1100)
time: 3s
trcnt,tecnt: 1000,1000
yhat.bart <- bartfit$yhat.test.mean
mse7 <- mean (( ytest - yhat.bart)^2)
mse7[1] 0.08958535
For BART, the MSE associates is: 0.089585 which is the smalles MSE, followed by Prunning with 6 nodes.
detach(Carseats)9. This problem involves the OJ data set which is part of the ISLR2 package.
attach(OJ)a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
set.seed(99)
train <- sample(1:nrow(OJ), 800)
training <- OJ[train,]
testing <- OJ[-train,](b) Fit a tree to the training data, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?
set.seed(99)
tree.OJ <- tree(Purchase ~ ., data=training)
summary(tree.OJ)
Classification tree:
tree(formula = Purchase ~ ., data = training)
Variables actually used in tree construction:
[1] "LoyalCH" "PriceDiff" "ListPriceDiff"
Number of terminal nodes: 7
Residual mean deviance: 0.7337 = 581.8 / 793
Misclassification error rate: 0.155 = 124 / 800
The missclassification error rate is: 0.155 and the number of terminal nodes is: 7. 3 variables were used to construct the tree: LoyalCH, ProceDIff and ListPriceDiff.
(c) Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed.
tree.OJnode), split, n, deviance, yval, (yprob)
* denotes terminal node
1) root 800 1066.000 CH ( 0.61500 0.38500 )
2) LoyalCH < 0.48285 294 313.100 MM ( 0.22449 0.77551 )
4) LoyalCH < 0.0356415 59 10.140 MM ( 0.01695 0.98305 ) *
5) LoyalCH > 0.0356415 235 277.200 MM ( 0.27660 0.72340 ) *
3) LoyalCH > 0.48285 506 441.700 CH ( 0.84190 0.15810 )
6) LoyalCH < 0.764572 240 291.500 CH ( 0.70417 0.29583 )
12) PriceDiff < -0.165 37 38.630 MM ( 0.21622 0.78378 )
24) ListPriceDiff < 0.115 13 17.940 CH ( 0.53846 0.46154 ) *
25) ListPriceDiff > 0.115 24 8.314 MM ( 0.04167 0.95833 ) *
13) PriceDiff > -0.165 203 207.000 CH ( 0.79310 0.20690 )
26) PriceDiff < 0.265 114 140.600 CH ( 0.69298 0.30702 ) *
27) PriceDiff > 0.265 89 49.030 CH ( 0.92135 0.07865 ) *
7) LoyalCH > 0.764572 266 78.640 CH ( 0.96617 0.03383 ) *
Node 4 is a terminal node with 59 observations and a deviance of 10.140. At this level most of the observations take the MM categorical value (98.3%).
(d) Create a plot of the tree, and interpret the results.
plot(tree.OJ)
text(tree.OJ, pretty=0)The most important variable to predict Purchase and first node is Loyal CH. For a value of less than 0.48285 of LoyalCH, the purchas will most likely be fall in MM. For a value above 0.48 and less than 0.764, the purchase will depend of different nodes as Price Diff.
(e) Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?
pred <- predict(tree.OJ, newdata=testing, type = 'class')
matrix <- table(pred, testing$Purchase)
matrix
pred CH MM
CH 131 28
MM 30 81
err <- 1 - ((matrix[1,1]+matrix[2,2])/sum(matrix))
err[1] 0.2148148
The associated Test error rate is: 0.2148148
- Apply the cv.tree() function to the training set in order to determine the optimal tree size.
cv.OJ <- cv.tree(tree.OJ, FUN=prune.misclass)
cv.OJ$size
[1] 7 5 4 2 1
$dev
[1] 135 138 138 149 308
$k
[1] -Inf 0.0 1.0 10.5 162.0
$method
[1] "misclass"
attr(,"class")
[1] "prune" "tree.sequence"
- Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.
plot(cv.OJ, type='b')- Which tree size corresponds to the lowest cross-validated classification error rate?
The lowest cross-validated classification error rate appears to be a size of 4; after which there is no significant improvement.
- Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.
prune.OJ <- prune.misclass(tree.OJ, best=4)
summary(prune.OJ)
Classification tree:
snip.tree(tree = tree.OJ, nodes = c(2L, 13L, 12L))
Variables actually used in tree construction:
[1] "LoyalCH" "PriceDiff"
Number of terminal nodes: 4
Residual mean deviance: 0.8007 = 637.4 / 796
Misclassification error rate: 0.1562 = 125 / 800
- Compare the training error rates between the pruned and unpruned trees. Which is higher?
summary(prune.OJ)
Classification tree:
snip.tree(tree = tree.OJ, nodes = c(2L, 13L, 12L))
Variables actually used in tree construction:
[1] "LoyalCH" "PriceDiff"
Number of terminal nodes: 4
Residual mean deviance: 0.8007 = 637.4 / 796
Misclassification error rate: 0.1562 = 125 / 800
summary(tree.OJ)
Classification tree:
tree(formula = Purchase ~ ., data = training)
Variables actually used in tree construction:
[1] "LoyalCH" "PriceDiff" "ListPriceDiff"
Number of terminal nodes: 7
Residual mean deviance: 0.7337 = 581.8 / 793
Misclassification error rate: 0.155 = 124 / 800
Both errors are almost similar; however the unpruned tree has a lower error by 0.001.
- Compare the test error rates between the pruned and unpruned trees. Which is higher?
pred.OJ <- predict(prune.OJ, testing, type='class')
matrix2 <- table(pred.OJ, testing$Purchase)
matrix2
pred.OJ CH MM
CH 129 25
MM 32 84
err2 <- 1 - ((matrix2[1,1]+matrix2[2,2])/sum(matrix2))
err2[1] 0.2111111
The associated Test error for the unpruned tree is: 0.2148148 The associated Test error for the pruned tree is: 0.211111