ROC class notes

Le Kang

2024-01-24

Conditional probabilities

Negative Positive
\(s \leq t\) \(p(s \leq t|N)\) \(p(s \leq t|P)\)
\(s>t\) \(p(s>t|N)\) \(p(s>t|P)\)

Why these probabilities are independent of the disease prevalence?

ROC

The ROC curve is a complete representation of classifier performance, as the choice of the classification threshold \(t\) varies.

  • The classifier will be least successful when the two populations are exactly the same, \(tp=fp\) always (??).

  • At the other (usually unattainable) extreme there is complete separation, at least one \(t\) such that \(tp=1, fp=0\).

  • In this case, for all smaller values of \(t\), \(tp = 1\) while \(fp\) varies from 0 to 1 (upper borders of the graph).

ROC in the lower triangel

  • The score distribution has the wrong orientation

  • A reversal is needed

  • No loss in generality in assuming that the curve will always lie in the upper triangle of the graph.

rm(list=ls())
library(mvtnorm)
library(ggplot2)
Psub=rnorm(100,mean=0)
Nsub=rnorm(100,mean=1)

           x class
1  1.5894868     1
2 -1.2510017     1
3 -0.8095787     1
4 -0.2261170     1
5  0.7260182     1
6 -1.6305611     1

Properties of the ROC

  1. \(tp=y=h(x)=h(fp)\) based on \((x,y)\), or \((x(t),y(t))\) is a monotone increasing function in the positive quadrant, between (0,0) and (1,1). (??)

  2. The ROC curve is unaltered if the classification scores undergo a strictly increasing transformation.

  3. The slope of the ROC at the point with \(t\) is \[\frac{dy}{dx}=\frac{p(t|P)}{p(t|N)}\]

Continuous scores

For population N, assume pdf and cdf to be \(f\) and \(F\), and for population P, assume pdf and cdf to be \(g\) and \(G\), respectively.

\(fp=x(t)=1-F(t)\), \(tp=y(t)=1-G(t)\), so \[y=1-G[F^{-1}(1-x)], ~~0\leq x \leq 1\]

Cost-weighted misclassification rates

Misclassification rate is \(x(t)+1-y(t)\). Suppose the costs of making these errors are \(c(P|N)\) and \(c(N|P)\), and the relative proportions of P and N individuals in the populations are \(q\) and \(1-q\), respectively. Then the expected cost due to missification when using the classifier threshold is

\[C=q[1-y(t)]c(N|P)+(1-q)x(t)c(P|N)\] How to find the threshold \(t\) that minimizes the cost?

In the case when two costs are equal and \(q=0.5\), the optimal threshold is at the point where the ROC has slope 1.

The Neyman-Pearson Lemma

The slope of the ROC curve at the point with threshold value \(t\) is equal to the likelihood ratio

\[\mathcal{L}(t)=\frac{p(t|P)}{p(t|N)}\] It is about how much more probable a value \(t\) of the classifier to have occurred in population P than in population N, which in turn can be interpreted as a measure of confidence in allocation to population P.

Now consider the case of testing the simple null hypothesis \(H_0\) that an individual belongs to population N against the simple alternative \(H_1\) that the individual belongs to population P.

The classification score \(S\) forms the data on which the hypothesis test is conducted as \(S\) is used to make the allocation of the individual.

Suppose that \(\mathcal{R}\) is the set of values of \(S\) for which we allocate the individual to population P, i.e., the set of values of \(S\) for which we reject the null hypothesis.

By the Neyman-Pearson Lemma, the most powerful test of size \(\alpha\) has region \(\mathcal{R}\) comprising all values \(s\) of \(S\) such that \[\mathcal{L}(s)=\frac{p(s|P)}{p(s|N)}\geq k,\] where \(k\) is determined by the condition \(p(s\in \mathcal{R}|N)=\alpha\).

For the classifier under consideration, \(p(s\in \mathcal{R}|N)=\alpha \Leftrightarrow fp=\alpha\), and the power of the test, i.e., the probability of correctly rejecting the null hypothesis is just the \(tp\).

For a fixed \(fp\) rate, the \(tp\) rate will be maximized by a classifier whose set of score values \(S\) for allocation to population P is given by \(\mathcal{L}(S)\geq k\), where \(k\) is determined by the target \(fp\) rate.

Ex: If \(\mathcal{L}(S)\) is monotonically increasing LR, then a classification rule based on \(S\) exceeding a threshold is an optimal decision rule, and the ROC curve for \(\mathcal{L}(S)\) is uniformly above all othe rROC curves based on \(S\).

AUC

Particular attention has been focused on single scalar values that might capture the essential features of a ROC curve, motivated by the way that simple summary measures such as mean and variance capture the essential features of statistical data sets. \[AUC=\int_0^1 y(x)dx\]

  • AUC is the average true positive rate, taken uniformly over all possible false positive rates in the range (0, 1).

  • If A and B are two classifiers such that the ROC curve for A nowhere lies below the ROC curve for B, then AUC for A must be greater than or equal to AUC for B. However, the reverse is not true.

  • It is also the probability that the classifier will allocate a higher score to a randomly chosen individual from population P than it will to a randomly and independently chosen individual from population N.

\[AUC=p(S_p>S_N)\] - Some other interpretations connected with Lorenz curve and the Gini index (income inequality).