ROC class notes

Le Kang

2024-03-20

rocreg in package [pcvsuite]

https://faculty.washington.edu/abansal/software.html

Any successful experience or questions?

Another look at DeLong¹’s approach

\[\Delta \widehat{AUC}=\dfrac{1}{n_N n_P}\sum_{i=1}^{n_N}\sum_{j=1}^{n_P}\left[\psi(X_i,Y_j)-\psi(U_i,V_j)\right]\\ =\dfrac{1}{n_N n_P}\sum_{i=1}^{n_N}\sum_{j=1}^{n_P} \delta(X_i,Y_j,U_i,V_j) \]

where a kernel \(\delta(X,Y,U,V)=\psi(X,Y)-\psi(U,V)\). For conciseness, we write \(\delta(X_i,Y_j,U_i,V_j)\) as \(\delta_{ij}\).

For the variance of this estimated difference \(\Delta\widehat{AUC}\), it is not hard to show,

\[\textrm{var}(\Delta\widehat{AUC}) =\dfrac{1}{n_N n_P} \left\{(n_N-1)\textrm{cov}\left[\delta_{ij},\delta_{i^{\prime}j}\right]+\\ (n_P-1)\textrm{cov}\left[\delta_{ij},\delta_{ij^{\prime}}\right]+\textrm{cov}\left[\delta_{ij},\delta_{ij}\right]\right\},\] \(i^{\prime}\neq i,\) \(j^{\prime}\neq j.\)

DeLong et al. considered estimating \(\textrm{cov}\left[\delta_{ij},\delta_{i^{\prime}j}\right]\) and \(\textrm{cov}\left[\delta_{ij},\delta_{ij^{\prime}}\right]\) using a method of structural components while ignoring the term \(\textrm{cov}\left[\delta_{ij},\delta_{ij}\right]\), as the last term of \(\mathcal{O}(n_N^{-1}n_P^{-1})\) converges to zero faster than the first two terms.

More specifically, after obtaining the difference kernel matrix \([\delta_{ij}]_{n_N\times n_P }\), the row-components and column-components are defined, respectively, as \[R_i = \dfrac{1}{n_P}\sum\limits_{j=1}^{n_P} \delta_{ij}, i= 1, 2, \ldots, n_N,\] \[C_j = \dfrac{1}{n_N}\sum\limits_{i=1}^{n_N} \delta_{ij}, j= 1, 2, \ldots, n_P,\] which essentially are row and column marginal means of the difference kernel matrix \([\delta_{ij}]_{n_N\times n_P }\).

Then the structural components (SC) estimator for \(\textrm{cov}\left[\delta_{ij},\delta_{i^{\prime}j}\right]\) is \[\widehat{\textrm{cov}}^{SC}\left[\delta_{ij},\delta_{i^{\prime}j}\right]=\dfrac{1}{n_P-1}\sum\limits_{j=1}^{n_P}\left(C_j-\bar{C}\right)^2,\] and the structural components estimator for \(\textrm{cov}\left[\delta_{ij},\delta_{ij^{\prime}}\right]\) is \[\widehat{\textrm{cov}}^{SC}\left[\delta_{ij},\delta_{ij^{\prime}}\right]=\dfrac{1}{n_N-1}\sum\limits_{i=1}^{n_N}\left(R_i-\bar{R}\right)^2.\]

Notice that \(\bar{R}=\bar{C}=\frac{1}{n_N n_P}\sum_i\sum_j \delta_{ij}=\Delta \widehat{AUC}\).

Consequently, DeLong et al. proposed an estimator for \(\textrm{var}(\Delta\widehat{AUC})\),

\[\widehat{\textrm{var}}(\Delta\widehat{AUC})=\dfrac{\widehat{\textrm{cov}}^{SC}\left[\delta_{ij},\delta_{i^{\prime}j}\right]}{n_P}+\dfrac{\widehat{\textrm{cov}}^{SC}\left[\delta_{ij},\delta_{ij^{\prime}}\right]}{n_N},\] as \(\frac{n_N-1}{n_N} \rightarrow 1, \frac{n_P-1}{n_P} \rightarrow 1\).

There are several approximations associated with DeLong et al.’s variance estimator. Yet, the direction of the possible bias is not obvious. For instance, in addition to the fact that the structural components variance estimators \(\widehat{\textrm{cov}}^{SC}\left[\delta_{ij},\delta_{i^{\prime}j}\right]\) and \(\widehat{\textrm{cov}}^{SC}\left[\delta_{ij},\delta_{ij^{\prime}}\right]\) are biased, DeLong ’s variance estimator ignores the non-negative term \(\textrm{cov}\left[\delta_{ij},\delta_{ij}\right]/n_N n_P\), but it also amplifies \(\textrm{cov}\left[\delta_{ij},\delta_{i^{\prime}j}\right]/n_P\) and \(\textrm{cov}\left[\delta_{ij},\delta_{ij^{\prime}}\right]/n_N\) by replacing \(\frac{n_N-1}{n_N}\) and \(\frac{n_P-1}{n_P}\) with \(1\).

U-statistic method

\[ \widehat{\textrm{cov}}^{U}\left[\delta_{ij},\delta_{i^{\prime}j}\right]=\sum_{i=1}^{n_N}\sum_{j=1}^{n_P}\sum_{i^{\prime}\neq i}^{n_N}\dfrac{\delta_{ij}\delta_{i^{\prime}j}}{n_N n_P(n_N-1)}-\\ \sum_{i=1}^{n_N}\sum_{j=1}^{n_P}\sum_{i^{\prime}\neq i}^{n_N}\sum_{j^{\prime}\neq j}^{n_P}\dfrac{\delta_{ij}\delta_{i^{\prime}j^{\prime}}}{n_N n_P (n_N-1)(n_P-1)}, \]

\[ \widehat{\textrm{cov}}^{U}\left[\delta_{ij},\delta_{ij^{\prime}}\right]=\sum_{i=1}^{n_N}\sum_{j=1}^{n_P}\sum_{j^{\prime}\neq j}^{n_P}\dfrac{\delta_{ij}\delta_{ij^{\prime}}}{n_N n_P(n_P-1)}-\\ \sum_{i=1}^{n_N}\sum_{j=1}^{n_P}\sum_{i^{\prime}\neq i}^{n_N}\sum_{j^{\prime}\neq j}^{n_P}\dfrac{\delta_{ij}\delta_{i^{\prime}j^{\prime}}}{n_N n_P (n_N-1)(n_P-1)}, \]

\[ \widehat{\textrm{cov}}^{U}\left[\delta_{ij},\delta_{ij}\right]=\sum_{i=1}^{n_N}\sum_{j=1}^{n_P}\dfrac{\delta_{ij}^2}{n_N n_P}-\\ \sum_{i=1}^{n_N}\sum_{j=1}^{n_P}\sum_{i^{\prime}\neq i}^{n_N}\sum_{j^{\prime}\neq j}^ {n_P}\dfrac{\delta_{ij}\delta_{i^{\prime}j^{\prime}}}{n_N n_P (n_N-1)(n_P-1)}.\]

Once we substitute these unbiased estimators for the covariance terms, also with some simplifications, we arrive at the following estimator,

\[\widehat{\textrm{var}}(\Delta\widehat{AUC})= \left(\sum\limits_{i=1}^{n_N}\sum\limits_{j=1}^{n_P}\dfrac{\delta_{ij}}{n_N n_P}\right)^2 -\\ \sum\limits_{i=1}^{n_N}\sum\limits_{j=1}^{n_P}\sum\limits_{i^{\prime}\neq i}^{n_N}\sum\limits_{j^{\prime} \neq j}^{n_P} \dfrac{\delta_{ij}\delta_{i^{\prime}j^{\prime}}}{n_N n_P(n_N-1)(n_P-1)}\]


kernel=function(x,y) (sign(x-y)+1)/2
Kdiff=outer(S1_P,S1_N,kernel)-outer(S2_P,S2_N,kernel)
AUC_diff=mean(Kdiff)
# (method=="DeLong")
varAUC_diff = var(rowMeans(Kdiff))/n_P + var(colMeans(Kdiff))/n_N


# (method=="Ustats")
varAUC_diff = AUC_diff^2 - mean(mapply(
function(i,j) Kdiff[i,j]*mean(Kdiff[-i,-j]),rep(1:m,each=n),rep(1:n,times=m)))

Comparing entire curves