Boosting is another approach for improving the predictions resulting from a decision tree. Like bagging, boosting is a general approach that can be applied to many statistical learning methods for regression or classification.
Boosting works similarly as the bagging approach, except that the trees are grown sequentially: each trees is grown using information from previously grown trees. Boosting does not involve bootstrap sampling; instead each tree is fit on a modified version of the original data set.
The boosting approach learns slowly. Given the current model, we fit a decision tree to the residuals from the model. That is, we fit a tree using the current residuals, rather than the outcome \(Y\), as the response. We then update the residuals and fit the model again until a certain number of specified trees have been constructed, denote by B. The shrinkage parameter \(\lambda\), a small positive number, controls the rate at which boosting learns. Typical values are 0.01 or 0.001, depending on the problem.
The boosting method is applied to predict housing prices using the
Boston
data.
## BOOSTING ##
# Learns slowly
# Sequential (n.trees = ?)
# Decision Tree 1: Residual 1
# Decision Tree 2: (Residual 1 as outcome) -> Residual 2
# Decision Tree 3: (Residual 2 as outcome) -> Residual 3
# ...
library(gbm)
library(MASS)
Boston <- Boston
# Validation Set
set.seed(2)
train.index <- sample(c(1:506), 354, replace=F)
train <- Boston[train.index,]
test <- Boston[-train.index,]
# Build the Boosted Regression Model
set.seed(24)
boost.boston <- gbm(medv ~ . , distribution = "gaussian",
data=train, n.trees = 500, interaction.depth=4,
shrinkage = 0.1) # learning rate
summary(boost.boston)
## var rel.inf
## lstat lstat 38.59131366
## rm rm 31.39322601
## dis dis 7.70481752
## crim crim 4.53862014
## nox nox 4.53227291
## ptratio ptratio 3.90171594
## age age 3.45819700
## black black 2.61605484
## tax tax 1.53609406
## indus indus 1.01617657
## rad rad 0.56204710
## zn zn 0.12284648
## chas chas 0.02661777
# Validating the Model (MSE)
boost.pred <- predict(boost.boston, newdata=test)
mean((boost.pred - test$medv)^2)
## [1] 9.495608
# Comparison with Random Forest / Bagging
library(randomForest)
set.seed(25)
rf.boston <- randomForest(medv ~ . , data=train, ntrees=5000)
# MSE of random forest
rf.pred <- predict(rf.boston, newdata=test)
mean((rf.pred - test$medv)^2)
## [1] 10.82841