Title: Bagging Ensemble Selection
Authors: Sun and Pfahringer
Year: 2011
Conference: Advanced in Artificial Intelligence
DOI: https://doi.org/10.1007/978-3-642-25832-9_26

1. Topic

Improving ensemble selection by using the bagging strategy
- Ensemble selection is to construct an ensemble of classifiers from a library of base classifiers (Caruana et al., Ensemble Selection from Libraries of Models, ICML, 2004)
  - 1. Generate the library of base classifier: SVM, ANN, DT, KNN, …
  - 1. Train M models
  - 1. Add to the ensemble the model in the library that maximizes the ensemble’s performance to the error metric on a hillclimb set

2. Problem

Simple ensemble selection is unstable and sometimes overfits the hillclimb set
- (a): As the number of models in the model library increases, the performance of ensemble selection on the hillclimb set gradually increases but the performance of test dataset does not always increase
We don’t know how much data should be used for the hillclimb set (validation set)
- (b): Different dataset may have different optimal hillclimb set ratio

Apply the bagging idea to construct an ensemble of simple ensemble selection classifiers, which should be more robust than an individual ensemble selection classifier

BaggingES-Simple algorithm
- It is the straightforward application of bagging to ensemble selection
- As user-specified parameter, each bootstrap sample is split into train(ex. 70%) and a hillclimbing set(ex. 30%)
- Models in the library are trained by train dataset(ex. 70%)
- Trained models are selected by the performance metric(ex. AUC, ACC, …) until the performance of ensemble is maximized on the hillclimbing set
BaggingES-OOB algorithm
- It uses the full bootstrap sample for model generation(training)
- The respective out-of-bag instances are the hillclimb set for selection
- algorithm
BaggingES-OOB-EX
- An extreme case of BaggingES-OOB, where in each bagging iteration only the single best classfier is selected

Dataset
- Multiclass dataset were converted to binary problems
Base classifier: Random tree
- the number of random attributes: 5
- the minumum number of instances at each leaf node is set to 50
Comparison with ES, ES++
- The number of bootstrap samples(bagging size): 50
- Model library size: 500, 1000, 1500, …, 5000
- Model library size per bag: 10, 20, 30, …, 100
- Repeated 5 times
- Learning curves
- Final ensemble sizes
Comparison with other ensemble learning algorithms
- Train dataset 66%, Test dataset 34%
- 5000 base classifiers
- AUC results

Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004 (2004) [Ensemble Selection을 제안한 논문]