Title: Negative Correlation Ensemble Learning for Ordinal Regression
Authors: Fernandez et al.
Year: 2013
Journal: IEEE Transactions on Neural Networks and Learning Systems
DOI: https://doi.org/10.1109/TNNLS.2013.2268279
\(\small \mathbf{\theta}\): vector of thresholds
Prediction (\(\small \mathbf{z} = \{\mathbf{w}, \mathbf{\beta}, \mathbf{\theta} \}\)) \[\small r_{(f(\mathbf{x}), \mathbf{z})} = \min \{j: f(\mathbf{x}) \le \theta_j \}\]
Posterior probability if \(\small f(\mathbf{x})\) follows a logistic cumulative distribution \[\small \begin{aligned} P(y=C_j|\mathbf{x}, \mathbf{z}) = P(y \le C_j|\mathbf{x}, \mathbf{z}) - P(y \le C_{j-1}|\mathbf{x}, \mathbf{z}) \\ = \frac {1}{1+exp(f(\mathbf{x})-\theta_j)} - \frac {1}{1+exp(f(\mathbf{x})-\theta_{j-1})} \end{aligned}\]
Example of good and bad threshold models
Optimal classification rule \[\small \mathbf{C(x)} = \hat{j},\; where \; \hat{j}= argmax_j \{\bar{\hat{P}}(y=C_j|\mathbf{x}, \mathbf{Z}) \}\]
Comparison methods
[1] T.-S. Lim, W.-Y. Loh, and Y.-S. Shih, “A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms,” Mach. Learn., vol. 40, no. 3, pp. 203–228, 2000.
[2] D. Kibler, D. W. Aha, and M. K. Albert, “Instance-based prediction of real-valued attributes,” Comput. Intell., vol. 5, no. 2, pp. 51–57, 1989.
[3] K.-J. Kim and H. Ahn, “A corporate credit rating model using multi-class support vector machines with an ordinal pairwise partitioning approach,” Comput. Oper. Res., vol. 39, no. 8, pp. 1800–1811, 2012.
[4] E. Frank and M. Hall, “A simple approach to ordinal classification,” in Proc. ECML, 2001, pp. 145–156.
[5] W. Waegeman and L. Boullart, “An ensemble of weighted support vector machines for ordinal regression,” Int. J. Comput. Syst. Sci. Eng., vol. 3, no. 1, pp. 47–51, 2009.
[6] H.-T. Lin and L. Li, “Large-margin thresholded ensembles for ordinal regression: Theory and practice,” in Proc. 17th Int. Conf. Algorithmic Learn. Theory, 2006, pp. 319–333.
[7] J. S. Cardoso and J. F. Pinto da Costa, “Learning to classify ordinal data: The data replication method,” J. Mach. Learn. Res., vol. 8, pp. 1393–1429, Sep. 2007.
[8] W. Chu and S. S. Keerthi, “Support vector ordinal regression,” Neural Comput., vol. 19, no. 3, pp. 792–815, 2007.
[9] W. Chu and S. S. Keerthi, “New approaches to support vector ordinal regression,” in Proc. 22nd Int. Conf. Mach. Learn., 2005, pp. 145–152.
[10] W. Chu and Z. Ghahramani, “Gaussian processes for ordinal regression,” J. Mach. Learn. Res., vol. 6, pp. 1019–1041, Jan. 2005.