Decision Trees can be applied to both regression and classification problems. The tree-based methods involve stratifying or segmenting the predictor space into number of simple regions. In order to make prediction for a given observation, we typically use the mean or the mode of the response value for the training observations in the region to which it belongs.
We first discuss Decision trees in the regression and classification problems followed by Bagging and Random Forest in the next section.
## Regression Trees Using Advertising Data ##
library(ISLR2)
setwd("C:\\Users\\Asus\\Documents\\UP Files\\UPV Subjects\\Stat 197 (Intro to BI)")
Advertising <- read.csv(".\\Advertising.csv")
# Divide Data to Train and Test Set
set.seed(27)
train.index <- sample(c(1:200), 150, replace=FALSE)
train <- Advertising[train.index,]
test <- Advertising[-train.index,]
# Build the tree
library(tree)
tree.ads <- tree(Sales ~ TV + Radio + Newspaper, data=train)
summary(tree.ads)
##
## Regression tree:
## tree(formula = Sales ~ TV + Radio + Newspaper, data = train)
## Variables actually used in tree construction:
## [1] "TV" "Radio"
## Number of terminal nodes: 9
## Residual mean deviance: 1.684 = 237.4 / 141
## Distribution of residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5.1910 -0.8625 0.0858 0.0000 0.7661 3.5330
plot(tree.ads)
text(tree.ads, pretty=0)
# Size of Tree and Prediction Performance
cv.tree.ads <- cv.tree(tree.ads)
plot(cv.tree.ads$size, cv.tree.ads$dev, type="b")
# Test Set Prediction
tree.ads.pred <- predict(tree.ads, newdata=test)
plot(tree.ads.pred, test$Sales) # Scatter Plot of Predicted vs Observed
abline(0,1) # Regression Line
mean((tree.ads.pred - test$Sales)^2) # MSE
## [1] 2.906198
## Classification Trees Using Iris Data ##
# Iris Data
iris <- datasets::iris
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# Divide Data into Train and Test Sets
library(caret)
set.seed(123)
train.index <- createDataPartition(iris$Species, list=FALSE, p=0.7)
train <- iris[train.index,]
test <- iris[-train.index,]
# Build the Classification Tree Using Train Data
tree.iris <- tree(Species ~ . , train)
summary(tree.iris)
##
## Classification tree:
## tree(formula = Species ~ ., data = train)
## Variables actually used in tree construction:
## [1] "Petal.Length" "Petal.Width"
## Number of terminal nodes: 5
## Residual mean deviance: 0.1173 = 11.73 / 100
## Misclassification error rate: 0.02857 = 3 / 105
plot(tree.iris)
text(tree.iris, pretty=0)
# Predict the Species in Test Data
tree.iris.pred <- predict(tree.iris, test, type="class")
# Construct Confusion Matrix
table(tree.iris.pred, test$Species)
##
## tree.iris.pred setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 14 2
## virginica 0 1 13
Decision trees for regression and classification have a number of advantages over more classical approaches
Trees are very easy to explain. Even easier to explain than linear regression.
It is believed that decision trees more closely mirror human decision-making than the regression and classification approaches.
Trees can be displayed graphically (esp if they are small).
Trees can handle qualitative predictors without the need to create dummy variables.
Trees generally do not have the same level of predictive accuracy as some of the other regression and classification approaches.
Trees can be very non-robust. In other words, a small change in the data can cause a large change in the final estimated tree.
The last disadvantage can be improved by incorporating methods like bagging, random forest, and boosting that are discussed in the next section.
An ensemble method is an approach that combines simple models (called weak learners) in order to obtain a single potentially powerful model.
Bootstrap aggregation, or bagging, is a general-purpose procedure for reducing the variance of a statistical learning method - particularly decision trees. The idea is to take many training sets through bootstrap resampling and build the model B times. Then for regression problems, the prediction will be based on the average over all the B trees constructed. For classification problems, the mode over all the B trees will be used as the predicted class. Note that the number of bootstrap resamples should not be too large to avoid overfitting with the training data set.
The bagging procedure for regression and classification is shown in R below.
## Bagging Using Advertising Data (m=p)
library(randomForest)
# Divide Data to Train and Test Set
set.seed(27)
train.index <- sample(c(1:200), 150, replace=FALSE)
train <- Advertising[train.index,]
test <- Advertising[-train.index,]
# Train the Model (B = 500)
set.seed(23)
bag.ads <- randomForest(Sales ~ TV + Radio + Newspaper,
mtry=3, data=train, importance=TRUE)
bag.ads
##
## Call:
## randomForest(formula = Sales ~ TV + Radio + Newspaper, data = train, mtry = 3, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 3
##
## Mean of squared residuals: 0.6962456
## % Var explained: 97.47
# Test the Model
bag.ads.pred <- predict(bag.ads, newdata=test)
plot(bag.ads.pred, test$Sales)
abline(0,1)
mean((bag.ads.pred - test$Sales)^2)
## [1] 0.532457
# Variable Importance
importance(bag.ads)
## %IncMSE IncNodePurity
## TV 99.47349400 2746.19196
## Radio 92.73291947 1266.24661
## Newspaper -0.05267697 25.06207
## Bagging Using Iris Data ##
library(randomForest)
# Divide Data into Train and Test Sets
library(caret)
set.seed(123)
train.index <- createDataPartition(iris$Species, list=FALSE, p=0.7)
train <- iris[train.index,]
test <- iris[-train.index,]
# Train the model
set.seed(21)
bag.iris <- randomForest(Species ~ . , data=train,
mtry=4, importance=TRUE) # mtry is the number of predictors used in the bagging
bag.iris
##
## Call:
## randomForest(formula = Species ~ ., data = train, mtry = 4, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 4
##
## OOB estimate of error rate: 4.76%
## Confusion matrix:
## setosa versicolor virginica class.error
## setosa 35 0 0 0.00000000
## versicolor 0 33 2 0.05714286
## virginica 0 3 32 0.08571429
# Test the model
bag.iris.pred <- predict(bag.iris, test)
table(bag.iris.pred, test$Species)
##
## bag.iris.pred setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 14 2
## virginica 0 1 13
# Variable Importance
importance(bag.iris) # The higher the better
## setosa versicolor virginica MeanDecreaseAccuracy
## Sepal.Length 1.001002 5.715499 1.162282 5.224539
## Sepal.Width 0.000000 2.032614 2.777429 3.412962
## Petal.Length 21.271979 30.320031 28.593691 32.827405
## Petal.Width 25.347255 35.362470 36.661378 38.108545
## MeanDecreaseGini
## Sepal.Length 0.6746152
## Sepal.Width 0.7066065
## Petal.Length 29.3222733
## Petal.Width 38.6080478
Random forests provide an improvement over bagged trees by way of randomly selecting a sample of the m predictors from a full set of p predictors for every bootstrapped training samples. By default, most software will use the square root of p as the value of m. The rationale behind this approach is that the bootstrapped trees are less correlated from each other, and that one very strong predictor does not always influence the prediction of the bootstrapped tree.
## Random Forest (m < p): Sales Regression Problem ##
# Divide Data to Train and Test Set
set.seed(27)
train.index <- sample(c(1:200), 150, replace=FALSE)
train <- Advertising[train.index,]
test <- Advertising[-train.index,]
# Train the model
set.seed(24)
rf.ads <- randomForest(Sales ~ TV + Radio + Newspaper, data=train,
mtry = 2, importance=TRUE) # (m = 2) < (3 = p)
rf.ads
##
## Call:
## randomForest(formula = Sales ~ TV + Radio + Newspaper, data = train, mtry = 2, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 2
##
## Mean of squared residuals: 0.8505621
## % Var explained: 96.9
# Test the model
rf.ads.pred <- predict(rf.ads, newdata=test)
plot(rf.ads.pred, test$Sales)
abline(0,1)
mean((rf.ads.pred - test$Sales)^2)
## [1] 0.6254764
## Random Forest (m < p): Iris Classification Problem ##
# Divide Data into Train and Test Sets
set.seed(123)
train.index <- createDataPartition(iris$Species, list=FALSE, p=0.7)
train <- iris[train.index,]
test <- iris[-train.index,]
# Train the model
set.seed(24)
rf.iris <- randomForest(Species ~ . , data=train,
mtry=2, importance=TRUE) # mtry used is sqrt of 4
# Test the model
rf.iris.pred <- predict(rf.iris, test)
table(rf.iris.pred, test$Species)
##
## rf.iris.pred setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 14 2
## virginica 0 1 13
# Variable Importance
importance(rf.iris)
## setosa versicolor virginica MeanDecreaseAccuracy
## Sepal.Length 7.009222 8.402265 7.739983 10.914456
## Sepal.Width 4.394139 1.608043 4.537595 5.462419
## Petal.Length 22.205962 29.103983 27.855758 33.110557
## Petal.Width 21.538905 26.952956 31.627524 31.306264
## MeanDecreaseGini
## Sepal.Length 7.469165
## Sepal.Width 1.550380
## Petal.Length 30.039106
## Petal.Width 30.223216