Title: Bagging Ensemble Selection
Authors: Sun and Pfahringer
Year: 2011
Conference: Advanced in Artificial Intelligence
DOI: https://doi.org/10.1007/978-3-642-25832-9_26


1. Topic

  • Improving ensemble selection by using the bagging strategy
    • Ensemble selection is to construct an ensemble of classifiers from a library of base classifiers (Caruana et al., Ensemble Selection from Libraries of Models, ICML, 2004)
        1. Generate the library of base classifier: SVM, ANN, DT, KNN, …
        1. Train M models
        1. Add to the ensemble the model in the library that maximizes the ensemble’s performance to the error metric on a hillclimb set


2. Problem

  • Simple ensemble selection is unstable and sometimes overfits the hillclimb set
    • (a): As the number of models in the model library increases, the performance of ensemble selection on the hillclimb set gradually increases but the performance of test dataset does not always increase
  • We don’t know how much data should be used for the hillclimb set (validation set)
    • (b): Different dataset may have different optimal hillclimb set ratio


3. Motivation

  • Apply the bagging idea to construct an ensemble of simple ensemble selection classifiers, which should be more robust than an individual ensemble selection classifier


4. Proposed method

  • BaggingES-Simple algorithm
    • It is the straightforward application of bagging to ensemble selection
    • As user-specified parameter, each bootstrap sample is split into train(ex. 70%) and a hillclimbing set(ex. 30%)
    • Models in the library are trained by train dataset(ex. 70%)
    • Trained models are selected by the performance metric(ex. AUC, ACC, …) until the performance of ensemble is maximized on the hillclimbing set
  • BaggingES-OOB algorithm
    • It uses the full bootstrap sample for model generation(training)
    • The respective out-of-bag instances are the hillclimb set for selection
    • algorithm
  • BaggingES-OOB-EX
    • An extreme case of BaggingES-OOB, where in each bagging iteration only the single best classfier is selected

5. Experiments

  • Dataset
    • Multiclass dataset were converted to binary problems
  • Base classifier: Random tree
    • the number of random attributes: 5
    • the minumum number of instances at each leaf node is set to 50
  • Comparison with ES, ES++
    • The number of bootstrap samples(bagging size): 50
    • Model library size: 500, 1000, 1500, …, 5000
    • Model library size per bag: 10, 20, 30, …, 100
    • Repeated 5 times
    • Learning curves
    • Final ensemble sizes
  • Comparison with other ensemble learning algorithms
    • Train dataset 66%, Test dataset 34%
    • 5000 base classifiers
    • AUC results

6. References

  • Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004 (2004) [Ensemble Selection을 제안한 논문]