1. Topic
- Improving ensemble selection by using the bagging strategy
- Ensemble selection is to construct an ensemble of classifiers from a library of base classifiers (Caruana et al., Ensemble Selection from Libraries of Models, ICML, 2004)
- Generate the library of base classifier: SVM, ANN, DT, KNN, …
- Train M models
- Add to the ensemble the model in the library that maximizes the ensemble’s performance to the error metric on a hillclimb set
2. Problem
- Simple ensemble selection is unstable and sometimes overfits the hillclimb set
- (a): As the number of models in the model library increases, the performance of ensemble selection on the hillclimb set gradually increases but the performance of test dataset does not always increase
- We don’t know how much data should be used for the hillclimb set (validation set)
- (b): Different dataset may have different optimal hillclimb set ratio

3. Motivation
- Apply the bagging idea to construct an ensemble of simple ensemble selection classifiers, which should be more robust than an individual ensemble selection classifier
4. Proposed method
- BaggingES-Simple algorithm
- It is the straightforward application of bagging to ensemble selection
- As user-specified parameter, each bootstrap sample is split into train(ex. 70%) and a hillclimbing set(ex. 30%)
- Models in the library are trained by train dataset(ex. 70%)
- Trained models are selected by the performance metric(ex. AUC, ACC, …) until the performance of ensemble is maximized on the hillclimbing set
- BaggingES-OOB algorithm
- It uses the full bootstrap sample for model generation(training)
- The respective out-of-bag instances are the hillclimb set for selection
- algorithm

- BaggingES-OOB-EX
- An extreme case of BaggingES-OOB, where in each bagging iteration only the single best classfier is selected
5. Experiments
- Dataset
- Multiclass dataset were converted to binary problems

- Base classifier: Random tree
- the number of random attributes: 5
- the minumum number of instances at each leaf node is set to 50
- Comparison with ES, ES++
- The number of bootstrap samples(bagging size): 50
- Model library size: 500, 1000, 1500, …, 5000
- Model library size per bag: 10, 20, 30, …, 100
- Repeated 5 times
- Learning curves

- Final ensemble sizes

- Comparison with other ensemble learning algorithms
- Train dataset 66%, Test dataset 34%
- 5000 base classifiers
- AUC results

6. References
- Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004 (2004) [Ensemble Selection을 제안한 논문]