Motivating question from Brendan:
Binary models have well established measures of performance, such as precision and accuracy. When using multi classification methods, it is not this simple. What measures exist for assessing and describing performance of multi-class classification models, and how can we express those measures, both visually and semantically in a way that is interpretable by our (non-data scientist) business partners?
We’ve decided to begin by
Here’s a function that simulates a confusion matrix for a multinomial classifier where
The function takes as its arguments
Nj specifying the number of observations in each class \(1, ..., C\)Li.given.j specifying the likelihood of the classifier predicting that an observation is from class \(i\) when it is actually from class \(j\).Hmmm… this is not quite going to work because we are not simulating the underlying continuous values… dang.
And to do that, we have to decide how to implement a multinomial classifier from multiple binary classifiers.
See
…creates a binary classifier for each of the \(C\) classes. Each of these classifiers learns to detect its specific class and reject all others.
Points to note:
…creates \(C(C-1)/2\) classifiers, one for each pair of classes. The final decision is based on a majority vote.
Points to note:
…creates a set of binary classifiers that partition the
HOw we combine binary classifier has a big impact on how we interpret the performance of a multiple classifiers.