ROC class notes

Le Kang

2024-02-12

ROC Estimation

Empirical counting process (jagged)
Parametric (binormal, bigamma)
Nonparametric
Semi-parametric

The nonparametric empirical method

\[y=1-\hat{G}[\hat{F}^{-1}(1-x)], ~~0\leq x \leq 1,\]

\[\hat{tp}(t)=\dfrac{\sum\mathbf{1}(S_P>t)}{ n_P}=1-\hat{G}(t)\] \[\hat{fp}(t)=\dfrac{\sum\mathbf{1}(S_N>t)}{ n_N}=1-\hat{F}(t)\] \[\widehat{AUC}=\hat{p}(S_p>S_N)=\dfrac{\sum_i\sum_j I(S_i^P>S_j^N)}{n_P n_N}\]

Parametric estimation

Ex: The binormal model - estimating \(a\) and \(b\) \[\Phi^{-1}(y)=a+b\Phi^{-1}(x),\] where \(a=(\mu_P-\mu_N)/\sigma_P, b=\sigma_N/\sigma_P.\)

It follows from the earlier assumptions that \(a>0\), while \(b\) is clearly non-negative by deﬁnition.

\[\widehat{AUC}=\Phi\left(\dfrac{\hat{\mu}_P-\hat{\mu}_N}{\sqrt{\hat{\sigma}^2_P+\hat{\sigma}^2_N}}\right)=\Phi\left(\dfrac{\hat{a}}{\sqrt{1+\hat{b}^2}}\right)\]

The Dorfman and Alf method

With ordered categorical data, Dorfman and Alf ¹ proposed a maximum likelihood method.

Assume the score \(S\) can take on only one of a finite set of ranked values or categories \(C_1, C_2, \ldots, C_k\) say. Then there is a latent random variable \(W\), and a set of unknown thresholds \(-\infty=w_0<w_1<w_2\ldots<w_{k-1}<w_k=\infty\), such that \(S\) falls in category \(C_i\) if and only if \(w_{i-1}<W\leq w_i\). Then we could define \(p_{i|N}\) and \(p_{i|P}\).

The log-likelihood function \[\log\mathcal{L}=\sum_{i=1}^k(n_{i|N}\log p_{i|N}+n_{i|P}\log p_{i|P})\] where \(n_{i|N}\) and \(n_{i|P}\) are the observed numbers of individuals from populations N and P respectively falling in category \(C_i\).

\[p_{i|N}=\Phi(w_i)-\Phi(w_{i-1})\] \[p_{i|P}=\Phi(b w_i-a)-\Phi(bw_{i-1}-a)\]

\[\dfrac{\partial \log\mathcal{L}}{\partial a}=\sum_{i=1}^k\dfrac{n_{i|P}}{p_{i|P}}\left\{-\phi(b w_i-a)+\phi(bw_{i-1}-a)\right\} = 0\] or equivalently, \[\sum_{i=1}^k\phi(b w_i-a)\left\{\dfrac{n_{i+1|P}}{p_{i+1|P}}-\dfrac{n_{i|P}}{p_{i|P}}\right\} = 0\]

\[\dfrac{\partial \log\mathcal{L}}{\partial b}=\sum_{i=1}^k\dfrac{n_{i|P}}{p_{i|P}}\left\{\phi(b w_i-a)w_i-\phi(bw_{i-1}-a)w_{i-1}\right\}\] or equivalently, set \[\sum_{i=1}^k\phi(b w_i-a) w_i \left\{\dfrac{n_{i|P}}{p_{i|P}}-\dfrac{n_{i+1|P}}{p_{i+1|P}}\right\} = 0\]

Set

\[\dfrac{\partial \log\mathcal{L}}{\partial w_i}=\phi(w_i)\left\{\dfrac{n_{i|N}}{p_{i|N}}-\dfrac{n_{i+1|N}}{p_{i+1|N}}\right\}+\\ ~~~~~b\phi(b w_i-a)\left\{\dfrac{n_{i|P}}{p_{i|P}}-\dfrac{n_{i+1|P}}{p_{i+1|P}}\right\}=0\]

Second derivatives for the Hessian matrix \(\Rightarrow\) Asympototic variance-covariance matrix

The Metz method

With continuous data, Metz et al.¹ considered truth-state runs in rank-ordered data for a natural categorization of continuously-distributed test results for maximum likelihood (ML) estimation of ROC curves.

Truth-state runs in rank-ordered data

An example:

N sample: (6.24, 1.77, 4.61, 8.29)

P sample: (12.87, 10.22, 15.90, 5.01, 13.35)

The information for ROC curve is preserved in the sequence \(\{n, n, p, n, n, p, p, p, p\}\), or \(\{2n, 1p, 2n, 4p\}\)

N sample: \(\{2, 0, 2, 0\}\)

P sample: \(\{0, 1, 0, 4\}\)

The runs of truth states in rank-ordered test result outcomes provide a natural categorization of inherently continuous data that retains any information relevant to ROC curve fitting.

- What if there are too many categories?

Binning into 20 categories to improve computational efficiency.

Semiparametric estimation

The kernel density methods: obtain smooth estimates of the functions \(F\) and \(G\) directly from the data, without imposing any distributional constraints.

\[\hat{f}(x)=\dfrac{1}{n_N h_N}\sum_{i=1}^{n_N}k\left(\dfrac{x-s_{N_i}}{h_N}\right)\] \[\hat{g}(x)=\dfrac{1}{n_P h_P}\sum_{i=1}^{n_P}k\left(\dfrac{x-s_{P_i}}{h_P}\right)\]

where \(k(\cdot)\) is the kernel function and \(h_N, h_P\) are the bandwidths in each.

x <- rnorm(500)
hist(x, freq = FALSE)
dens <- density(x)
lines(dens, col = "red")

dens


Call:
    density.default(x = x)

Data: x (500 obs.); Bandwidth 'bw' = 0.2767

       x                 y            
 Min.   :-4.1913   Min.   :0.0000351  
 1st Qu.:-2.1852   1st Qu.:0.0131271  
 Median :-0.1791   Median :0.0634206  
 Mean   :-0.1791   Mean   :0.1244967  
 3rd Qu.: 1.8270   3rd Qu.:0.2349642  
 Max.   : 3.8331   Max.   :0.3759868

Choosing between the many available kernel functions is relatively unimportant as all give comparable results, but more care needs to be taken over the selection of bandwidth. We may use the general-purpose bandwidths.

since \(F\) and \(G\) are estimated separately, the final ROC curve estimator is not invariant under a monotone transformation of the data.

\[\hat{AUC}=\dfrac{1}{n_N n_P}\sum_{i=1}^{n_N}\sum_{j=1}^{n_P} \Phi\left(\dfrac{s_{P_j}-s_{N_i}}{\sqrt{h_N^2+h_P^2}}\right)\]

The spline smoothing is also a popular in density estimation.

PAUC estimation

\[PAUC(f_1,f_2)=\int_{f_1}^{f_2} y(x)dx\]

Under binormal model,

\[\hat{PAUC}(f_1,f_2)=\int_{f_1}^{f_2} \Phi(\hat{a}+\hat{b}z_x)dx\]

The nonparametric method

\[PAUC(f_1,f_2)=P(S_P>S_N, f_1\leq 1-F(S_N)\leq f_2)\]

\[\hat{PAUC}(f_1,f_2)=\dfrac{1}{n_N n_P}\sum_{i=1}^{n_N}\sum_{j=1}^{n_P} I(S_{P_j}>S_{N_i}) I(f_1\leq n_N(t)/n_N\leq f_2)\]