Motivation: summarizing multinomial classifier performance

Motivating question from Brendan:

Binary models have well established measures of performance, such as precision and accuracy. When using multi classification methods, it is not this simple. What measures exist for assessing and describing performance of multi-class classification models, and how can we express those measures, both visually and semantically in a way that is interpretable by our (non-data scientist) business partners?

We’ve decided to begin by

Simulating confusion matrices

Here’s a function that simulates a confusion matrix for a multinomial classifier where

The function takes as its arguments

Hmmm… this is not quite going to work because we are not simulating the underlying continuous values… dang.

And to do that, we have to decide how to implement a multinomial classifier from multiple binary classifiers.

Implementing multinomial classification

See

One versus rest (OvR)

…creates a binary classifier for each of the \(C\) classes. Each of these classifiers learns to detect its specific class and reject all others.

Points to note:

  • Still we have to compare and combine the outputs of all \(C\) classifiers
    • Do we combine the classes? or do we consider the continuous values prior to thresholding somehow?
  • Contention between two or more classifiers could arise
    • Or could be avoided if a sequence of binary classifications is followed
  • How do we treat situations where no classifier detects an observation
  • Note that these situations are masked by using softmax

One versus one (OvO)

…creates \(C(C-1)/2\) classifiers, one for each pair of classes. The final decision is based on a majority vote.

Points to note:

  • Outputs still have to be combined, but the

Error correcting outcput codes (ECOC)

Hierarchical classification (sequence of binary classifiers)

…creates a set of binary classifiers that partition the

  • No contention

HOw we combine binary classifier has a big impact on how we interpret the performance of a multiple classifiers.