Title: Ensemble Pruning via Individual Contribution Ordering
Authors: Lu et al. 
Year: 2010
Conference: KDD
DOI: https://dl.acm.org/doi/abs/10.1145/1835804.1835914


1. Topic

  • Effective ensemble pruning
  • Propose the pairwise evaluation metric to measure individual contribution


2. Motivation

  • To assess the amount of contribution that each ensemble member can bring to the ensemble


3. Problems

  • Optimal ensemble pruning is NP-complete
  • Handling of the trade-off between accuracy and diversity


4. Proposed method

  • Accuracy/Diversity trade-off
    • When members of two ensembles are similarly accurate, the more diverse ensemble should perform better than the other one
    • When two ensembles are similarly diverse, the one with more accurate individual members should perform better

    Proof

    • Theorem 1. Given a two-class learning problem, and two classifiers \(\small c_i\) and \(\small c_j\) of a classifier ensemble \(\small E\), if \(\small c_i\) and \(\small c_j\) are equally accurate, the classifier which disagrees more with other ensemble members is expected to contain more useful knowledge, in terms if diversity contribution, for constructing the classifier ensemble.
      • Diversity of \(\small c_i\) and \(\small c_j\) \[\small Div_{i,j} = \frac {N^{(01)}+N^{(10)}}{N} \text {($\small N^{(01)}$: $\small c_i$ incorrect & $\small c_j$ correct)}\]
      • \(\small c_i\)’s diversity contribution (given two-class classification) \[\small ConDiv_i = \sum_{j=1}^{M} Div_{i,j} = \frac {1}{N} \sum_{k=1}^{N}(M - v_{c_{i}(\mathbf{x}_k)}^{(k)})\] \[\scriptsize v_{c_{i}(\mathbf{x}_k)}^{(k)} \text{: the number of classifiers that agree with $\scriptsize c_i$ in prediction} \]
      • The expected diversity contribution of a classifier \(\small c_i\) \[\small E(ConDiv_i) = \frac {(N_{ma}^{(i)}AVG_{mi} + N_{mi}^{(i)}AVG_{ma})}{N}\] \[\scriptsize N_{ma}^{(i)} \text{: total number of data points that $\scriptsize c_i$ votes in the majority groups} \] \[\scriptsize AVG_{ma} \text{: average the number of votes in the majority groups} \]

      • For classifier \(\small c_i\) and \(\small c_j\), difference between their expected diversity contributions \[\small \frac {(N_{ma}^{(i)} -N_{ma}^{(j)})AVG_{mi} + (N_{mi}^{(i)}-N_{mi}^{(j)})AVG_{ma}}{N}\]

      • Four cases

      Case classifier \(\small i\) ensemble group of classifier \(\small i\)
      \(\small a_i\) correct incorrect minor
      \(\small b_i\) correct correct major
      \(\small u_i\) incorrect correct minor
      \(\small t_i\) incorrect incorrect major

      • If \(\small c_i\) and \(\small c_j\) have the same accuracy, difference between their expected diversity contributions \[\small \frac {(b_i + t_i - b_j - t_j)AVG_{mi} + (a_i + u_i - a_j - u_j)AVG_{ma}}{N} = \frac {(a_i + u_i - a_j - u_j)(AVG_{ma} - AVG_{mi})}{N}\]
      • Since \(\small AVG_{ma} - AVG_{mi} >0\), if \(\small a_i + u_i > a_j + u_j\), then difference of diversity contribution is greater than 0.
      • It means that if two classifiers have the same accuracy, the classifier with more votes that are in the minority group is expected to bring more diversity contribution to the ensemble.


  • Evaluation of individual contributions
    • Rules for evaluating contributions of predictions

    • Positive contribution
      • When \(\small c_i\) makes correct predictions on \(\small d_j\), if \(\small c_i(\mathbf{x}_j)\) is in the minority group \[\small IC_i^{(j)} = 2v_{max}^{(j)} - v_{c_i(\mathbf{x}_j)}^{(j)}\] \[\scriptsize v_{max}^{(j)} \text{: the number of majority votes on $\scriptsize d_j$} \] \[\scriptsize v_{c_{i}(\mathbf{x}_j)}^{(j)} \text{: the number of predictions on $\scriptsize c_{i}(\mathbf{x}_j)$} \]
      • When \(\small c_i\) makes correct predictions on \(\small d_j\), if \(\small c_i(\mathbf{x}_j)\) is in the majority group \[\small IC_i^{(j)} = v_{sec}^{(j)}\]
    • Negative contribution \[\small IC_i^{(j)} = v_{correct}^{(j)} - v_{c_i(\mathbf{x}_j)}^{(j)} - v_{max}^{(j)}\]

    • Individual Contribution of a classifier \(\small c_i\) \[\small IC_i = \sum_{j=1}^N IC_i^{(j)} \]


  • Algorithm (EPIC: Ensemble Pruning via Individual Contribution ordering)


5. Experiments

  • Competitor: Bagging, Orientation Ordering(OO)
  • Base classifier: C4.5 200개
  • 26 datasets (UCI)
  • Train, Pruning, Test set 필요
    • 데이터셋을 1/3씩 나누고, 각각이 train, pruning, test set 으로 한번씩 사용됨 (\(\small 3! = 6\)번)
    • 각 subset 에 대해 50번씩 반복
    • 데이터 하나에 대해 총 \(\small 6*50 = 300\) 반복
  • Error curve
  • Error rate


6. Reference

  • G. Martínez-Munoz and A. Suárez. Pruning in ordered bagging ensembles. In Proceedings of the 23rd International Conference on Machine Learning, pages 609 – 616, 2006. [OO]
  • R. E. Banfield, L. O. Hall, K.W. Bowyer, and W. P. Kegelmeyer. Ensemble diversity measures and their application to thinning. Information Fusion, 6(1):49–62, 2005.[4가지 경우를 언급한 논문]
  • R. Caruana, A. Munson, and A. Niculescu-Mizil. Getting the most out of ensemble selection. In Proceedings of the 6th International Conference on Data Mining, pages 828–833, 2006. [트리를 다양하게, 매우 많이 만들어서 정확도나 AUC를 이용해 pruning]
  • I. Partalas, G. Tsoumakas, and I. Vlahavas. Focused ensemble selection: A diversity-based method for greedy ensemble selection. In Proceeding of the 18th European Conference on Artificial Intelligence, pages 117–121, 2008.[본 논문과 비슷한 아이디어, 4가지 경우 언급]
  • T. G. Dietterich. Ensemble methods in machine learning. In Proceedings of the 1st International Workshop on Multiple Classifier Systems, pages 1–15, 2000.[앙상블의 성능 향상을 위한 조건]
  • C. Tamon and J. Xiang. On the boosting pruning problem. In Proceedings of the 11th European Conference on Machine Learning, pages 404 – 412, 2000.[pruning boosting ensemble 문제가 NP-complete 임을 보인 논문]