Model Ensembles

Mansun Kuo
2014-03-31

Outline

Chapter 11: Model Ensembles

Two Heads Are Better Than One

Ensemble Methods

  • Construct multiple diverse models
    • different model
    • different parameter
  • Use adopted versions of training data
    • reweighted
    • resampled
  • Derive the (weighted) prediction result
    • averaging: numerical response
    • voting: categorical response

Ensemble Methods

  • Pros:
    • Improve accuracy
  • Cons:
    • Hard to explain the meaning of model
    • More computing time
    • Beware of overfitting

Popular in Analytic Competitions

Data

Here we use iris data as our sample data.

data(iris)
str(iris, vec.len = 1)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 ...
 $ Sepal.Width : num  3.5 3 ...
 $ Petal.Length: num  1.4 1.4 ...
 $ Petal.Width : num  0.2 0.2 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 ...

Partition

Derive training and testing data via stratified sampling.

library(sampling)
Strata_part = function(data, stratanames, rate = 0.7, method = "srswor"){
    s_table = table(data[, stratanames])
    size =  c(s_table * rate)
    training_strata = strata(data, stratanames, size, method = method)
    index = training_strata$ID_unit
    return(list(training = data[index, ], 
                testing = data[-index, ]))
}
set.seed(331)
dataset = Strata_part(iris, "Species")

Base Model

model_formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
confusion = list()
library(rpart)
iris_rpart = rpart(model_formula, data = dataset$training, method = "class")

# training
confusion$iris_rpart$training = table(predict(iris_rpart, newdata = dataset$training, type = "class"), dataset$training$Species)

# testing
confusion$iris_rpart$testing = table(predict(iris_rpart, newdata = dataset$testing, type = "class"), dataset$testing$Species)
confusion$iris_rpart$training

             setosa versicolor virginica
  setosa         35          0         0
  versicolor      0         33         0
  virginica       0          2        35
confusion$iris_rpart$testing

             setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         11         1
  virginica       0          4        14

Bagging(Bootstrap aggregating)

  • Basic idea:
    • Boostrap sample: sampling with replacement
    • recalculate predictions
    • Average or majority vote
    • Similar bias, reduce variance

Algorithm 11.1

Bagging(\( D, T, A \)) - train an ensemble of models from bootstrap samples.

  • Input :
    • data set \( D \)
    • ensemble size \( T \)
    • learning algorithm \( A \)
  • Output : ensemble of models whose predictions are to be combined by voting or averaging.
  • for \( t=1 \) to \( T \) do
    • build a bootstrap sample \( D_t \) from \( D \) by sampling \( |D| \) data points with replacement;
    • run \( A \) on \( D_t \) to produce a model \( M_t \) ;
  • end
  • return \( \{M_t |1 \leq t \leq T \} \)
# bagging
book_bagging = function(formula, data, emsemble_size = 100){
    n = nrow(data)
    model = list()
    for (i in 1:emsemble_size){
        boot_index = sample(1:n, n, replace = T)
        bootstrap_sample = data[boot_index, ]
        model[[i]] = rpart(formula, data = bootstrap_sample, method = "class")
    }
    class(model) = "book_bagging"
    return(model)
}
# vote
get_vote = function(x = result){
    x_table = table(x)
    # find levels equal maximum
    index = (x_table == max(x_table))
    # random select one level
    out = sample(index[index], 1)
    return(names(out))
}
# predict bagging
predict.book_bagging = function(object, newdata){
    n = nrow(newdata)
    emsemble_size = length(object)
    results = matrix(character(n * emsemble_size), nrow = n, ncol = emsemble_size)
    for(i in 1:emsemble_size){
        result = predict(object[[i]], newdata = newdata, type = "class")
        result = as.character(result)
        results[, i] = result
    }
    vote = apply(results, 1, get_vote)
    return(vote)
}
# derive model
iris_book_bagging = book_bagging(model_formula, data = dataset$training, emsemble_size = 100)

# traininig
confusion$iris_book_bagging$training = table(predict(iris_book_bagging, newdata = dataset$training), dataset$training$Species)

# testing
confusion$iris_book_bagging$testing = table(predict(iris_book_bagging, newdata = dataset$testing), dataset$testing$Species)
confusion$iris_book_bagging$training

             setosa versicolor virginica
  setosa         35          0         0
  versicolor      0         33         0
  virginica       0          2        35
confusion$iris_book_bagging$testing

             setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         12         1
  virginica       0          3        14

Package adabag

library(adabag)
iris_bagging = bagging(model_formula, data = dataset$training, mfinal = 100)

# training
confusion$iris_bagging$training = predict(iris_bagging, newdata = dataset$training)$confusion

# testing
confusion$iris_bagging$testing = predict(iris_bagging, newdata = dataset$testing)$confusion
confusion$iris_bagging$training
               Observed Class
Predicted Class setosa versicolor virginica
     setosa         35          0         0
     versicolor      0         33         0
     virginica       0          2        35
confusion$iris_bagging$testing
               Observed Class
Predicted Class setosa versicolor virginica
     setosa         15          0         0
     versicolor      0         11         1
     virginica       0          4        14

Random Forest

  • Basic idea:
    • Bootstrap samples
    • At each split, bootstrap variables
    • Grow multiple trees and vote

Algorithm 11.2

Random Forest(\( D,T,d \)) - train an ensemble of tree models from bootstrap samples and random subspaces.

  • Input :

    • data set \( D \)
    • ensemble size \( T \)
    • subspace dimension \( d \)
  • Output : ensemble of tree models whose predictions are to be combined by voting or averaging.

  • for \( t = 1 \) to \( T \) do
    • build a bootstrap sample \( D_t \) from \( D \) by sampling \( |D| \) data points with replacement;
    • select \( d \) features at random and reduce dimensionality of \( D_t \) accordingly;
    • train a treemodel \( M_t \) on \( D_t \) without pruning;
  • end
  • return \( \{M_t |1 \leq t \leq T \} \)
Get_features = function(formula){
    features_str = deparse(formula[[3]])
    features_str = gsub(" *", "", features_str) 
    features = unlist(strsplit(features_str, "\\+"))
    return(features)
}

# random forest
book_rf = function(formula, data, emsemble_size = 100){
    n = nrow(data)
    y = as.character(formula[[2]])
    features = Get_features(formula)
    features_dim = 1:length(features)
    model = list()
    for (i in 1:emsemble_size){
        boot_index = sample(1:n, n, replace = T)
        bootstrap_sample = data[boot_index, ]
        temp_dim = sample(features_dim, 1)
        temp_features = sample(features, temp_dim)
        temp_formula = paste(temp_features, collapse = " + ")
        temp_formula = paste(y, "~", temp_formula)
        model[[i]] = rpart(temp_formula, data = bootstrap_sample, method = "class")
    }
    class(model) = "book_rf"
    return(model)
}

# predict random forest
predict.book_rf = predict.book_bagging

# derive model
iris_book_rf = book_rf(model_formula, data = dataset$training, emsemble_size = 100)

# traininig
confusion$iris_book_rf$traininig = table(predict(iris_book_rf, newdata = dataset$training), dataset$training$Species)

# testing
confusion$iris_book_rf$testinig = table(predict(iris_book_rf, newdata = dataset$testing), dataset$testing$Species)
confusion$iris_book_rf$traininig

             setosa versicolor virginica
  setosa         35          0         0
  versicolor      0         33         1
  virginica       0          2        34
confusion$iris_book_rf$testinig

             setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         12         1
  virginica       0          3        14

Package randomForest

library(randomForest)
iris_rf = randomForest(model_formula, data = dataset$training, ntree = 100)

# training
confusion$iris_rf$training = table(predict(iris_rf, newdata = dataset$training), dataset$training$Species)

# testing
confusion$iris_rf$testing = table(predict(iris_rf, newdata = dataset$testing), dataset$testing$Species)
confusion$iris_rf$training

             setosa versicolor virginica
  setosa         35          0         0
  versicolor      0         35         0
  virginica       0          0        35
confusion$iris_rf$testing

             setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         12         1
  virginica       0          3        14

Boosting

  • Basic idea
    • Update weight on each iteration
    • Increase weight for misclassified instances
    • Reduce bias

Boosting vs Bagging

Algorithm 11.3

Boosting(\( D,T,A \)) - train an ensemble of binary classifiers from reweighted training sets.

  • Input :
    • data set \( D \)
    • ensemble size \( T \)
    • learning algorithm \( A \)
  • Output : weighted ensemble of models.

  • \( w_{1i} \leftarrow 1 / |D| \) for all \( x_i \epsilon D \)
  • for \( t \) = 1 to \( T \) do
    • run \( A \) on \( D \) with weights \( w_{ti} \) to produce a model \( M_t \)
    • calculate weighted error \( \epsilon_t \)
    • if \( \epsilon_t \geq \frac{1}{2} \) then
      • set \( T \leftarrow t - 1 \) and break
    • end
    • \( \alpha_t \leftarrow \frac{1}{2}ln\frac{\varepsilon_t}{1-\varepsilon_t} \)
    • \( \omega_{(t+1)i} \leftarrow \frac{\omega_{ti}}{2\varepsilon_t} \) for misclassified instances \( x_i \epsilon D \)
    • \( \omega_{(t+1)i} \leftarrow \frac{\omega_{ti}}{2(1-\varepsilon_t)} \) for correctly classified instances \( x_j \epsilon D \)
  • end
  • return \( M(x)=\sum_{t=1}^{T}\alpha_tM_t(x) \)

  • \( w_{ti} \): weight of data point(i) in each iteration()
    • start with uniform weight
    • update in each iteration
  • \( \alpha_t \): confidence factor
    • decreasing function of \( \epsilon_t \)
    • \( \alpha_t = \frac{1}{2}ln\frac{\varepsilon_t}{1-\varepsilon_t} \) minimize overall error

Boosting in adabag

  • Generalizations of AdaBoost for more than two classes
    • AdaBoost.M1(Freund and Schapire, 1996)
      • coeflearn = 'Breiman' sets \( \alpha = \frac{1}{2}ln\frac{\varepsilon}{1-\varepsilon} \)
      • coeflearn = 'Freund' sets \( \alpha = ln\frac{\varepsilon}{1-\varepsilon} \)
    • SAMME(Zhu et al., 2009)
      • coeflearn = 'Zhu' sets \( \alpha = ln\frac{\varepsilon}{1-\varepsilon}+ln(nclasses-1) \)
iris_boosting = boosting(model_formula, data = dataset$training, mfinal = 100, coeflearn = 'Breiman')

# training
confusion$iris_boosting$training = predict(iris_boosting, newdata = dataset$training)$confusion

# testing
confusion$iris_boosting$testing = predict(iris_boosting, newdata = dataset$testing)$confusion
confusion$iris_boosting$training
               Observed Class
Predicted Class setosa versicolor virginica
     setosa         35          0         0
     versicolor      0         35         0
     virginica       0          0        35
confusion$iris_boosting$testing
               Observed Class
Predicted Class setosa versicolor virginica
     setosa         15          0         0
     versicolor      0         13         1
     virginica       0          2        14

References





Thanks