ROC class notes

Le Kang

2024-01-31

AUC

\[AUC=\int_0^1 y(x)dx=p(S_p>S_N)\] - The latter is the probability that the classiﬁer will allocate a higher score to a randomly chosen individual from population P than it will to a randomly and independently chosen individual from population N.

Partial AUC (pAUC)

\[pAUC(a,b)=\int_a^b y(x)dx\]

\(M=(b-a)\), \(m=(b-a)(b+a)/2\),

Standardized pAUC between 0.5 and 1: \[\dfrac{1}{2}\left(1+\dfrac{pAUC(a,b)-m}{M-m}\right)\] This allows direct comparisons between two pAUCs.

Youdex index (YI)

YI = \(\max({tp-fp})=\max(tp+tn-1)\)

The threshold \(t\) at the point on the ROC curve corresponding to the Youden Index is often taken to be the optimal classiﬁcation threshold.

Maximum vertical distance (MVD)

MVD between the chance diagonal and the ROC curve:

\(\max|y(x)-x|=\max_t|y(t)-x(t)|\\~~~~~~~~~~~~~~~~~~~~~~~~~ =\max_t|p(S>t|P)-p(S>t|N)|\)

It is the maximum distance between the cdfs of \(S\) in P and N, and ranges between 0 and 1, which is just the maximum of \(tp − fp\), so MVD is equivalent to YI.

The binormal model

Empirical evidence suggests that many measurements taken in practice do actually behave roughly like observations from normal populations, but also partly because mathematical results such as the central limit theorem show that the normal distribution provides a perfectly adequate approximation to the true probability distribution of many important statistics.

The normal model is thus a “standard” against which any other suggestion is usually measured in common statistical practice.

We assume the scores \(S\) of the classiﬁer to have a normal distribution in each of the two populations P and N. This model will always be “correct” if the original measurements \(\boldsymbol{X}\) have multivariate normal distributions in the two populations and the classiﬁer is a linear function of the measurements.

Parameters in binormal model

\(\mu_P, \mu_N, \sigma_P, \sigma_N\)

Assuming that \(\mu_P>\mu_N\) in accord with the convention that large values of \(S\) are indicative of population P and small ones indicative of population N.

Consider standard normalization.

\[x(t)=p(S>t|N)=p\left(Z>\dfrac{t-\mu_N}{\sigma_N}\right)=\Phi\left(\dfrac{\mu_N-t}{\sigma_N}\right)\] \[y(x)=p(S>t|P)=\Phi\left(\dfrac{\mu_P-t}{\sigma_P}\right)\]

ROC curve under binormal model

\[\Phi^{-1}(y)=a+b\Phi^{-1}(x),\] where \(a=(\mu_P-\mu_N)/\sigma_P, b=\sigma_N/\sigma_P.\)

It follows from the earlier assumptions that \(a>0\), while \(b\) is clearly non-negative by deﬁnition.

\[AUC=p(S_p>S_N)=\Phi\left(\dfrac{\mu_P-\mu_N}{\sqrt{\sigma^2_P+\sigma^2_N}}\right)=\Phi\left(\dfrac{a}{\sqrt{1+b^2}}\right)\]

Unfortunately, there is no correspondingly simple analytical form for \(pAUC(a, b)\), which must therefore be evaluated either by numerical integration or by some approximation formula in any speciﬁc application.

The binormal model will be appropriate for any ROC curve pertaining to populations that can be transformed to normality by some monotone transformation.

ROC Estimation

Empirical counting process (jagged)
Parametric (binormal, bigamma)
Nonparametric
Semi-parametric

ROC curve

\[y=1-G[F^{-1}(1-x)], ~~0\leq x \leq 1,\]

assuming pdf and cdf to be \(f\) and \(F\) for population N, and pdf and cdf to be \(g\) and \(G\) for population P.

\[y=1-\hat{G}[\hat{F}^{-1}(1-x)], ~~0\leq x \leq 1,\]

Empirical CDFs are step functions, depending only on the ranks of the combined set of test scores.

Parametric estimation

Sometimes the irregular appearance of the empirical ROC curve is not deemed adequate as an estimate of the underlying “true” smooth curve.

Ex: The binormal model - estimating \(a\) and \(b\)

Dorfman and Alf method

With ordered categorical data, Dorfman and Alf ¹ proposed a maximum likelihood method.

Assume the score \(S\) can take on only one of a finite set of ranked values or categories \(C_1, C_2, \ldots, C_k\) say. Then there is a latent random variable \(W\), and a set of unknown thresholds \(-\infty=w_0<w_1<w_2,\ldots,<w_k=\infty\), such that \(S\) falls in category \(C_i\) if and only if \(w_{i-1}<W\leq w_i\). Then we could define \(p_{iN}\) and \(p_{iP}\).

The log-likelihood function \[\mathcal{L}=\sum_{i=1}^k(n_{iN}\log p_{iN}+n_{iP}\log p_{iP})\]