ROC class notes

Le Kang

2024-01-29

Breast Cancer Image Dataset

Scanned films from mammography studies.

Breast Imaging Reporting and Data System (BI-RADS) by the American College of Radiology.

Benign vs malignant cases with verified pathology information.

Dense breast tissue can look light gray or white on a mammogram. calcifications/patterns/textures and etc.

Check VCU Canvas Course Page

Continuous scores

For population N, assume pdf and cdf to be \(f\) and \(F\), and for population P, assume pdf and cdf to be \(g\) and \(G\), respectively.

\(fp=x(t)=1-F(t)\),

\(tp=y(t)=1-G(t)\),

therefore, \[y=1-G[F^{-1}(1-x)], ~~0\leq x \leq 1\]

The Neyman-Pearson Lemma

The slope of the ROC curve at the point with threshold value \(t\) is equal to the likelihood ratio

\[\mathcal{L}(t)=\left.\frac{dy}{dx}\right|_t=\frac{p(t|P)}{p(t|N)}\] It is about how much more probable a value \(t\) of the classifier to have occurred in population P than in population N, which in turn can be interpreted as a measure of confidence in allocation to population P.

Now consider the case of testing the simple null hypothesis \(H_0\) that an individual belongs to population N against the simple alternative \(H_1\) that the individual belongs to population P.

The classification score \(S\) forms the data on which the hypothesis test is conducted as \(S\) is used to make the allocation of the individual.

Suppose that \(\mathcal{R}\) is the set of values of \(S\) for which we allocate the individual to population P, i.e., the set of values of \(S\) for which we reject the null hypothesis.

By the Neyman-Pearson Lemma, the most powerful test of size \(\alpha\) has region \(\mathcal{R}\) comprising all values \(s\) of \(S\) such that \[\mathcal{L}(s)=\frac{p(s|P)}{p(s|N)}\geq k,\] where \(k\) is determined by the condition \(p(s\in \mathcal{R}|N)=\alpha\).

For the classifier under consideration, \(p(s\in \mathcal{R}|N)=\alpha \Leftrightarrow fp=\alpha\), and the power of the test, i.e., the probability of correctly rejecting the null hypothesis is just the \(tp\).

For a fixed \(fp\) rate, the \(tp\) rate will be maximized by a classifier whose set of score values \(S\) for allocation to population P is given by \(\mathcal{L}(S)\geq k\), where \(k\) is determined by the target \(fp\) rate.

Ex: If \(\mathcal{L}(S)\) is monotonically increasing LR, then a classification rule based on \(S\) exceeding a threshold is an optimal decision rule, and the ROC curve for \(\mathcal{L}(S)\) is uniformly above all other ROC curves based on \(S\).

AUC

Particular attention has been focused on single scalar values that might capture the essential features of a ROC curve, motivated by the way that simple summary measures such as mean and variance capture the essential features of statistical data sets. \[AUC=\int_0^1 y(x)dx\]

  • AUC is the average true positive rate, taken uniformly over all possible false positive rates in the range (0, 1).

  • If A and B are two classifiers such that the ROC curve for A nowhere lies below the ROC curve for B, then AUC for A must be greater than or equal to AUC for B. However, the reverse is not true.

  • It is also the probability that the classifier will allocate a higher score to a randomly chosen individual from population P than it will to a randomly and independently chosen individual from population N. \[AUC=p(S_p>S_N)\]

  • Some other interpretations connected with Lorenz curve and the Gini index (income inequality).

Partial AUC (pAUC)

If a specific \(fp\) rate \(x_0\) is of interest, then the relevant summary from the ROC curve is the single point \(y(x_0)\). This is just the \(tp\) rate corresponding to a \(fp\) rate of \(x_0\) for the given classifier.

If just a single threshold is used, then it is better to calculate all four rates previously defined, while if several classifiers are being compared then it may be difficult to control the \(fp\) rate at a specified value \(x_0\) and hence \(y(x_0)\) may not be calculable for all classifiers.

More commonly, interest centers on a range of values \((a, b)\) of \(fp\) that is greater than a single value but less than the full range \((0, 1)\).

\[pAUC(a,b)=\int_a^b y(x)dx\] Both the maximum and minimum values depend on \(a\) and \(b\) which makes interpretation of any one value, and comparison of two or more values, somewhat problematic.

Standaized pAUC

\(M=(b-a)\), \(m=(b-a)(b+a)/2\),

Standardized pAUC: \[\dfrac{1}{2}\left(1+\dfrac{pAUC(a,b)-m}{M-m}\right),\] between 0.5 and 1.

Youdex index (YI)

YI = \(\max({tp-fp})=\max(tp+tn-1)\)

The threshold \(t\) at the point on the ROC curve corresponding to the Youden Index is often taken to be the optimal classification threshold.

Maximum vertical distance (MVD)

MVD between the chance diagonal and the ROC curve: \[\max|y(x)-x|=\max_t|y(t)-x(t)|=\max_t|p(S>t|P)-p(S>t|N)|\] It is the maximum distance between the cdfs of \(S\) in P and N, and ranges between 0 and 1, which is just the maximum of \(tp − fp\), so MVD is equivalent to YI.

The binormal model

Empirical evidence suggests that many measurements taken in practice do actually behave roughly like observations from normal populations, but also partly because mathematical results such as the central limit theorem show that the normal distribution provides a perfectly adequate approximation to the true probability distribution of many important statistics.

The normal model is thus a “standard” against which any other suggestion is usually measured in common statistical practice.

We assume the scores \(S\) of the classifier to have a normal distribution in each of the two populations P and N. This model will always be “correct” if the original measurements \(\boldsymbol{X}\) have multivariate normal distributions in the two populations and the classifier is a linear function of the measurements.

Parameters in binormal model

\(\mu_P, \mu_N, \sigma_P, \sigma_N\)

Assuming that \(\mu_P>\mu_N\) in accord with the convention that large values of \(S\) are indicative of population P and small ones indicative of population N.

Consider standard normalization.

\[x(t)=p(S>t|N)=p(Z>\dfrac{t-\mu_N}{\sigma_N})=\Phi(\dfrac{\mu_N-t}{\sigma_N})\] \[y(x)=p(S>t|P)=\Phi(\dfrac{\mu_P-t}{\sigma_P})\]

ROC curve under binormal model

\[\Phi^{-1}(y)=a+b\Phi^{-1}(x),\] where \(a=(\mu_P-\mu_N)/\sigma_P, b=\sigma_N/\sigma_P.\)

It follows from the earlier assumptions that \(a>0\), while \(b\) is clearly non-negative by definition.

\[AUC=\Phi\left(\dfrac{\mu_P-\mu_N}{\sqrt{\sigma^2_P+\sigma^2_N}}\right)\]

Unfortunately, there is no correspondingly simple analytical form for \(pAUC(a, b)\), which must therefore be evaluated either by numerical integration or by some approximation formula in any specific application.

The binormal model will be appropriate for any ROC curve pertaining to populations that can be transformed to normality by some monotone transformation.