Title: Ensemble Pruning via Individual Contribution Ordering
Authors: Lu et al.
Year: 2010
Conference: KDD
DOI: https://dl.acm.org/doi/abs/10.1145/1835804.1835914

1. Topic

Effective ensemble pruning
Propose the pairwise evaluation metric to measure individual contribution

2. Motivation

To assess the amount of contribution that each ensemble member can bring to the ensemble

3. Problems

Optimal ensemble pruning is NP-complete
Handling of the trade-off between accuracy and diversity

4. Proposed method

Accuracy/Diversity trade-off

When members of two ensembles are similarly accurate, the more diverse ensemble should perform better than the other one
When two ensembles are similarly diverse, the one with more accurate individual members should perform better

Proof

Theorem 1. Given a two-class learning problem, and two classifiers $\small c_i$ and $\small c_j$ of a classifier ensemble $\small E$, if $\small c_i$ and $\small c_j$ are equally accurate, the classifier which disagrees more with other ensemble members is expected to contain more useful knowledge, in terms if diversity contribution, for constructing the classifier ensemble.

Diversity of $\small c_i$ and $\small c_j$ \[\small Div_{i,j} = \frac {N^{(01)}+N^{(10)}}{N} \text {($\small N^{(01)}$: $\small c_i$ incorrect & $\small c_j$ correct)}\]
$\small c_i$’s diversity contribution (given two-class classification) \[\small ConDiv_i = \sum_{j=1}^{M} Div_{i,j} = \frac {1}{N} \sum_{k=1}^{N}(M - v_{c_{i}(\mathbf{x}_k)}^{(k)})\] \[\scriptsize v_{c_{i}(\mathbf{x}_k)}^{(k)} \text{: the number of classifiers that agree with $\scriptsize c_i$ in prediction} \]
The expected diversity contribution of a classifier $\small c_i$ \[\small E(ConDiv_i) = \frac {(N_{ma}^{(i)}AVG_{mi} + N_{mi}^{(i)}AVG_{ma})}{N}\] \[\scriptsize N_{ma}^{(i)} \text{: total number of data points that $\scriptsize c_i$ votes in the majority groups} \] \[\scriptsize AVG_{ma} \text{: average the number of votes in the majority groups} \]
For classifier $\small c_i$ and $\small c_j$, difference between their expected diversity contributions \[\small \frac {(N_{ma}^{(i)} -N_{ma}^{(j)})AVG_{mi} + (N_{mi}^{(i)}-N_{mi}^{(j)})AVG_{ma}}{N}\]
Four cases

Case	classifier $\small i$	ensemble	group of classifier $\small i$
$\small a_i$	correct	incorrect	minor
$\small b_i$	correct	correct	major
$\small u_i$	incorrect	correct	minor
$\small t_i$	incorrect	incorrect	major

If $\small c_i$ and $\small c_j$ have the same accuracy, difference between their expected diversity contributions \[\small \frac {(b_i + t_i - b_j - t_j)AVG_{mi} + (a_i + u_i - a_j - u_j)AVG_{ma}}{N} = \frac {(a_i + u_i - a_j - u_j)(AVG_{ma} - AVG_{mi})}{N}\]
Since $\small AVG_{ma} - AVG_{mi} >0$, if $\small a_i + u_i > a_j + u_j$, then difference of diversity contribution is greater than 0.
It means that if two classifiers have the same accuracy, the classifier with more votes that are in the minority group is expected to bring more diversity contribution to the ensemble.

Evaluation of individual contributions
- Rules for evaluating contributions of predictions
- Positive contribution
  - When $\small c_i$ makes correct predictions on $\small d_j$, if $\small c_i(\mathbf{x}_j)$ is in the minority group \[\small IC_i^{(j)} = 2v_{max}^{(j)} - v_{c_i(\mathbf{x}_j)}^{(j)}\] \[\scriptsize v_{max}^{(j)} \text{: the number of majority votes on $\scriptsize d_j$} \] \[\scriptsize v_{c_{i}(\mathbf{x}_j)}^{(j)} \text{: the number of predictions on $\scriptsize c_{i}(\mathbf{x}_j)$} \]
  - When $\small c_i$ makes correct predictions on $\small d_j$, if $\small c_i(\mathbf{x}_j)$ is in the majority group \[\small IC_i^{(j)} = v_{sec}^{(j)}\]
- Negative contribution \[\small IC_i^{(j)} = v_{correct}^{(j)} - v_{c_i(\mathbf{x}_j)}^{(j)} - v_{max}^{(j)}\]
- Individual Contribution of a classifier $\small c_i$ \[\small IC_i = \sum_{j=1}^N IC_i^{(j)} \]
Algorithm (EPIC: Ensemble Pruning via Individual Contribution ordering)

5. Experiments

Competitor: Bagging, Orientation Ordering(OO)
Base classifier: C4.5 200개
26 datasets (UCI)
Train, Pruning, Test set 필요
- 데이터셋을 1/3씩 나누고, 각각이 train, pruning, test set 으로 한번씩 사용됨 ($\small 3! = 6$번)
- 각 subset 에 대해 50번씩 반복
- 데이터 하나에 대해 총 $\small 6*50 = 300$ 반복
Error curve
Error rate

6. Reference

G. Martínez-Munoz and A. Suárez. Pruning in ordered bagging ensembles. In Proceedings of the 23rd International Conference on Machine Learning, pages 609 – 616, 2006. [OO]
R. E. Banfield, L. O. Hall, K.W. Bowyer, and W. P. Kegelmeyer. Ensemble diversity measures and their application to thinning. Information Fusion, 6(1):49–62, 2005.[4가지 경우를 언급한 논문]
R. Caruana, A. Munson, and A. Niculescu-Mizil. Getting the most out of ensemble selection. In Proceedings of the 6th International Conference on Data Mining, pages 828–833, 2006. [트리를 다양하게, 매우 많이 만들어서 정확도나 AUC를 이용해 pruning]
I. Partalas, G. Tsoumakas, and I. Vlahavas. Focused ensemble selection: A diversity-based method for greedy ensemble selection. In Proceeding of the 18th European Conference on Artificial Intelligence, pages 117–121, 2008.[본 논문과 비슷한 아이디어, 4가지 경우 언급]
T. G. Dietterich. Ensemble methods in machine learning. In Proceedings of the 1st International Workshop on Multiple Classifier Systems, pages 1–15, 2000.[앙상블의 성능 향상을 위한 조건]
C. Tamon and J. Xiang. On the boosting pruning problem. In Proceedings of the 11th European Conference on Machine Learning, pages 404 – 412, 2000.[pruning boosting ensemble 문제가 NP-complete 임을 보인 논문]

Ensemble pruning via Individual Contribution Ordering

by Stoney

2020-02-08