Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of ˆpm1.
The x-axis should display ˆpm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy.
p = seq(0, 1, 0.001)
gini = p * (1 - p) * 2
entropy = -(p * log(p) + (1 - p) * log(1 - p))
classification.error = 1 - pmax(p, 1 - p)
data = data.frame(p, gini, entropy, classification.error)
matplot(p, cbind(gini, entropy, classification.error), col = c("Blue", "red", "brown"), ylab = "Value", xlab = "p", main = "Plot of metrics as a function of p", pch = 20, lwd = 1, type = "p")attach(Carseats)
set.seed(1)
Train = sample(nrow(Carseats), nrow(Carseats)/2)
cseat_train = Carseats[Train, ]
cseat_test = Carseats[-Train, ]carseat_tree = tree(Sales ~ ., data = cseat_train)
summary(carseat_tree)##
## Regression tree:
## tree(formula = Sales ~ ., data = cseat_train)
## Variables actually used in tree construction:
## [1] "ShelveLoc" "Price" "Age" "Advertising" "CompPrice"
## [6] "US"
## Number of terminal nodes: 18
## Residual mean deviance: 2.167 = 394.3 / 182
## Distribution of residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.88200 -0.88200 -0.08712 0.00000 0.89590 4.09900
plot(carseat_tree)
text(carseat_tree, pretty = 0, cex=.55)carseat_pred = predict(carseat_tree, cseat_test)
carseat_mse<-mean((cseat_test$Sales - carseat_pred)^2)
carseat_mse## [1] 4.922039
cv_carseat = cv.tree(carseat_tree, FUN = prune.tree)
par(mfrow = c(1, 2))
plot(cv_carseat$size, cv_carseat$dev, type = "b")
plot(cv_carseat$k, cv_carseat$dev, type = "b")- The best size is 9
carseat_prune = prune.tree(carseat_tree, best = 9)
par(mfrow = c(1, 1))
plot(carseat_prune)
text(carseat_prune, pretty = 0, cex=.55)prune_pred = predict(carseat_prune, cseat_test)
prune_mse=mean((cseat_test$Sales - prune_pred)^2)
prune_mse## [1] 4.918134
Pruning decreased the MSE error slightly.
car_bag = randomForest(Sales ~ ., data = cseat_train, mtry = 10, ntree = 100, importance = T)
pred_bag = predict(car_bag, cseat_test)
bag_mse=mean((cseat_test$Sales - pred_bag)^2)
bag_mse## [1] 2.747535
importance(car_bag)## %IncMSE IncNodePurity
## CompPrice 12.55183177 170.375825
## Income 0.93971125 90.503890
## Advertising 5.25205647 103.493698
## Population -1.46213439 59.646975
## Price 24.29931798 497.717927
## ShelveLoc 20.34955441 367.886332
## Age 6.75952990 154.875691
## Education -0.01964881 50.794521
## Urban -2.22920227 7.587184
## US 1.64730860 16.342743
Price, ShelveLoc, CompPrice and Age are important variables in predicting sales.
set.seed(1)
carseat_rf = randomForest(Sales ~ ., data = cseat_train, mtry = 9, ntree = 1000,
importance = T)
pred_rf = predict(carseat_rf, cseat_test)
(rf_mse=mean((cseat_test$Sales - pred_rf)^2))## [1] 2.604951
importance(carseat_rf)## %IncMSE IncNodePurity
## CompPrice 35.7427013 171.232879
## Income 7.9028240 92.450563
## Advertising 19.3708369 101.291431
## Population -2.0231580 60.542328
## Price 77.4780930 503.635424
## ShelveLoc 69.3276236 367.336695
## Age 21.9503880 158.867412
## Education -0.8694952 43.216765
## Urban -0.9832457 8.928158
## US 7.9185075 17.933675
- once again Price, ShelveLoc, CompPrice and Age are important variables in predicting sales. Changing m changes the test RSE as well.
This problem involves the OJ data set which is part of the ISLR package.
set.seed(1)
ttrain = sample(nrow(OJ), 800)
OJ.train = OJ[ttrain,]
OJ.test = OJ[-ttrain,]OJ.tree = tree(Purchase ~ ., data = OJ.train)
summary(OJ.tree)##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
OJ.tree## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1073.00 CH ( 0.60625 0.39375 )
## 2) LoyalCH < 0.5036 365 441.60 MM ( 0.29315 0.70685 )
## 4) LoyalCH < 0.280875 177 140.50 MM ( 0.13559 0.86441 )
## 8) LoyalCH < 0.0356415 59 10.14 MM ( 0.01695 0.98305 ) *
## 9) LoyalCH > 0.0356415 118 116.40 MM ( 0.19492 0.80508 ) *
## 5) LoyalCH > 0.280875 188 258.00 MM ( 0.44149 0.55851 )
## 10) PriceDiff < 0.05 79 84.79 MM ( 0.22785 0.77215 )
## 20) SpecialCH < 0.5 64 51.98 MM ( 0.14062 0.85938 ) *
## 21) SpecialCH > 0.5 15 20.19 CH ( 0.60000 0.40000 ) *
## 11) PriceDiff > 0.05 109 147.00 CH ( 0.59633 0.40367 ) *
## 3) LoyalCH > 0.5036 435 337.90 CH ( 0.86897 0.13103 )
## 6) LoyalCH < 0.764572 174 201.00 CH ( 0.73563 0.26437 )
## 12) ListPriceDiff < 0.235 72 99.81 MM ( 0.50000 0.50000 )
## 24) PctDiscMM < 0.196197 55 73.14 CH ( 0.61818 0.38182 ) *
## 25) PctDiscMM > 0.196197 17 12.32 MM ( 0.11765 0.88235 ) *
## 13) ListPriceDiff > 0.235 102 65.43 CH ( 0.90196 0.09804 ) *
## 7) LoyalCH > 0.764572 261 91.20 CH ( 0.95785 0.04215 ) *
Terminal node “20”. The splitting variable at this node is SpecialCH. The splitting value of this node is 0.5. There are 64 points in the subtree below this node.The deviance for all points contained in region below this node is 51.98. It is a terminal node denoted by the astrix sign.The prediction at this node is Sales = MM. 40.4% points in this node have CH as value of Sales. Remaining 59.6% points have MM as value of Sales.
plot(OJ.tree)
text(OJ.tree, pretty = 0,cex=0.55)- LoyalCH is the most important variable of the tree. If LoyalCH< 0.28, the tree predicts MM. If LoyalCH>0.76, the tree predicts CH. For intermediate values of LoyalCH, the decision also depends on the value of the PriceDiff, SpecialCH, ListPriceDiff and PctDiscMM variables.
pred.oj= predict(OJ.tree, OJ.test, type = "class")
caret:: confusionMatrix(OJ.test$Purchase, pred.oj)## Confusion Matrix and Statistics
##
## Reference
## Prediction CH MM
## CH 160 8
## MM 38 64
##
## Accuracy : 0.8296
## 95% CI : (0.7794, 0.8725)
## No Information Rate : 0.7333
## P-Value [Acc > NIR] : 0.0001259
##
## Kappa : 0.6154
##
## Mcnemar's Test P-Value : 1.904e-05
##
## Sensitivity : 0.8081
## Specificity : 0.8889
## Pos Pred Value : 0.9524
## Neg Pred Value : 0.6275
## Prevalence : 0.7333
## Detection Rate : 0.5926
## Detection Prevalence : 0.6222
## Balanced Accuracy : 0.8485
##
## 'Positive' Class : CH
##
cv.oj = cv.tree(OJ.tree, FUN = prune.tree)plot(cv.oj$size, cv.oj$dev, type = "b", xlab = "Tree Size", ylab = "Deviance")- Size of 9 gives lowest cross-validation error.
OJ.prune = prune.tree(OJ.tree, best = 9)####(j) Compare the training error rates between the pruned and unpruned trees. Which is higher?
summary(OJ.prune)##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM"
## Number of terminal nodes: 9
## Residual mean deviance: 0.7432 = 587.8 / 791
## Misclassification error rate: 0.1588 = 127 / 800
The results are exactly the same the original tree.
unpruned_pred = predict(OJ.tree, OJ.test, type = "class")
unpruned_error = sum(OJ.test$Purchase != unpruned_pred)
unpruned_error/length(unpruned_pred)## [1] 0.1703704
pruned_pred = predict(OJ.prune, OJ.test, type = "class")
pruned_error = sum(OJ.test$Purchase != pruned_pred)
pruned_error/length(pruned_pred)## [1] 0.1703704
The same result here as well.