C. Donovan
13 April 2018
NB: If it's not in the lecture or lab, it's not in the exam
Evaluating and comparing classifiers
\[ P(C_1 | \mathbf{X_1}, \mathbf{X_2}, \ldots , \mathbf{X_p}, M_1) \mathrm{~and~} P(C_1 | \mathbf{X_1}, \mathbf{X_2}, \ldots , \mathbf{X_p}, M_2) \]
Problem: predicted probs \( \hat{p} \) don't automatically give a class prediction.
Consider a logistic regression:
Since this is a two-class problem
\[ P(C_1) = 1 - P(C_0) \]
and
\[ P(C_1 | \mathbf{X_1}, \mathbf{X_2}, \ldots , \mathbf{X_p}, M_k) = 1 - P(C_0 | \mathbf{X_1}, \mathbf{X_2}, \ldots , \mathbf{X_p}, M_k) \]
We need to define a positive prediction to discuss. Let's say \( C_1 \) is the positive subclass.
Instance | True Class | \( P(C_1|x_1,x_2,\ldots,x_p,M_1) \) | \( P(C_1|x_1,x_2,\ldots,x_p,M_2 \)) |
---|---|---|---|
1 | \( C_1 \) | 0.73 | 0.61 |
2 | \( C_1 \) | 0.69 | 0.03 |
3 | \( C_0 \) | 0.44 | 0.68 |
4 | \( C_0 \) | 0.55 | 0.31 |
5 | \( C_1 \) | 0.67 | 0.45 |
6 | \( C_1 \) | 0.47 | 0.09 |
7 | \( C_0 \) | 0.08 | 0.38 |
8 | \( C_0 \) | 0.15 | 0.05 |
9 | \( C_1 \) | 0.45 | 0.01 |
10 | \( C_0 \) | 0.35 | 0.04 |
Test data and predicted probabilities for models \( M_1 \) and \( M_2 \)
These are plots:
Models will give rise to a line that:
Area Under the Curve (AUC) is the integral under a model line (usually > 0.5), which is <= 1. Bigger is better
\[ TPR = \frac{TP}{TP + FN} \qquad FPR = \frac{FP}{FP + TN} \]
Threshold \( p \) = ?
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | TP | FN |
\( C_0 \) | FP | TN |
TPR | ? | |
FPR | ? |
Threshold \( p \) = 0
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 5 | 0 |
\( C_0 \) | 5 | 0 |
TPR | 1 | |
FPR | 1 |
Threshold \( p \) = 0.25
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 5 | 0 |
\( C_0 \) | 3 | 2 |
TPR | 1 | |
FPR | 0.6 |
Threshold \( p \) = 0.5
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 3 | 2 |
\( C_0 \) | 1 | 4 |
TPR | 0.6 | |
FPR | 0.2 |
Threshold \( p \) = 0.75
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 0 | 5 |
\( C_0 \) | 0 | 5 |
TPR | 0 | |
FPR | 0 |
Threshold \( p \) = 1
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 0 | 5 |
\( C_0 \) | 0 | 5 |
TPR | 0 | |
FPR | 0 |
This demonstrates the method. We would have a better estimate for the AUC for \( M_1 \) if we'd calculated at more \( p \)-values.
Threshold \( p \) = ?
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | TP | FN |
\( C_0 \) | FP | TN |
TPR | ? | |
FPR | ? |
Threshold \( p \) = 0
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 5 | 0 |
\( C_0 \) | 5 | 0 |
TPR | 1 | |
FPR | 1 |
Threshold \( p \) = 0.25
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 2 | 3 |
\( C_0 \) | 3 | 2 |
TPR | 0.4 | |
FPR | 0.6 |
Threshold \( p \) = 0.5
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 1 | 4 |
\( C_0 \) | 1 | 4 |
TPR | 0.2 | |
FPR | 0.2 |
Threshold \( p \) = 0.75
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 0 | 5 |
\( C_0 \) | 0 | 5 |
TPR | 0 | |
FPR | 0 |
Threshold \( p \) = 1
\( \hat{C}_1 \) | \( \hat{C}_0 \) | |
---|---|---|
\( C_1 \) | 0 | 5 |
\( C_0 \) | 0 | 5 |
TPR | 0 | |
FPR | 0 |
From the ROC
We can also determine:
Another type of summary plot for comparing classifiers
In every case the \( x \)-axis will be Percentile with the \( y \)-axis being something like the following:
The common \( x \)-axis can be determined relatively easily.
Imagine as a customer database ordered by \( \hat{p} \) - from left-to-right we are penetrating' deeper in to the database from high \( \hat{p} \) observations to low \( \hat{p} \) observations:
In most cases it is informative to consider a base-line case where there is no model:
In this case our observations are arranged randomly with respect to the \( x \)-axis
All observations have the same predicted probability of being a positive - the proportion of positives in the training dataset.
The utility of these charts is hopefully clear: