ROC class notes

Le Kang

2024-02-21

Measurement errors

\(S^{\prime}_{N_i}=S_{N_i}+\varepsilon_i\)
\(S^{\prime}_{P_j}=S_{P_j}+\eta_j\)

\(\widehat{AUC}^{\prime}\) is still an unbiased estimator of \(AUC^{\prime}=P(S_P^{\prime}>S_N^{\prime})\)

But it will no longer be unbiased for \(AUC\).

Bias correction

Assume \(\sigma_P^2=\sigma_N^2=\sigma^2\), and \(\sigma_{\varepsilon}^2=\sigma_{\eta}^2=\sigma_E^2\),

\(E(\widehat{AUC}^{\prime})\approx AUC-\frac{1}{2}E[(\varepsilon-\eta)^2]E[G^{\prime\prime}(S_N)]\)

What is the variance for \(S^{\prime}_{N_i}\) or \(S^{\prime}_{P_j}\)?

\[AUC^{\prime}=\Phi\left(\dfrac{\mu_P-\mu_N}{\sqrt{2\sigma^2(1+\theta^2)}}\right),~~~AUC=\Phi\left(\dfrac{\mu_P-\mu_N}{\sqrt{2\sigma^2}}\right).\] where \(\theta=\sigma_E/\sigma\)

Bias correction with unknown \(\theta\)

Consider the analogy between the measurement error model and the model with random effects ANOVA model assuming normality.

Intraclass correlation coefficient (ICC)
Variance components
“MSTR” terms

ROC curves and covariates

Once a classifier \(S(\boldsymbol{X})\) has been constructed from the vector \(\boldsymbol{X}\) of primary variables and is in use for allocating individuals to one or other of the populations N and P, it frequently transpires that a further variable or set of variables will provide useful classificatory information which will modify the behavior of the classifier in one or both of the populations.

Indirect adjustment: the effect of the covariates on the distributions of \(S\) is first modeled in the two populations and the ROC curve is then derived from the modified distributions
Direct adjustment: the effect of the covariates is modeled on the ROC curve itself.

Indirect adjustment

Define \(\boldsymbol{Z}_P\) and \(\boldsymbol{Z}_N\) as covariates for P and N,

the means of \(S_P\) and \(S_N\) for given values of the covariates can be modeled

\(\mu_P(\boldsymbol{Z}_P)=\alpha_P+\beta^T_P\boldsymbol{Z}_P\)

\(\mu_N(\boldsymbol{Z}_N)=\alpha_N+\beta^T_N\boldsymbol{Z}_N\)

Under normality assumption, this model is essentially the same as the one underlying the binormal model, the only difference being in the specification of the population means.

ROC curve is given by

\[y=\Phi\left(\frac{\mu_P(\boldsymbol{Z}_P)-\mu_N(\boldsymbol{Z}_N)+\sigma_N \times \Phi^{-1}(x)}{\sigma_P}\right)\]

Ordinary least-squares regression can be used to obtain point estimates for parameters and substitution of these estimates into the formula at given values of \(\boldsymbol{z}_P\) and \(\boldsymbol{z}_N\) will yield the covariate-specific ROC curves.

Direct adjustment

We could model the effects of the covariates directly on the ROC curve itself, such an approach means that any parameters associated with the covariates have a direct interpretation in terms of the curve.

A natural choice is the use of generalized linear model methodology for direct modeling of ROC curves.

ROC-GLM model

\[h(y)=b(x)+\beta^T\boldsymbol{Z}\]

\(b(\cdot)\) is an unknown baseline function monotonic on (0, 1) and \(h((\cdot))\) is the link function, specified as part of the model and also monotonic on (0, 1), such as inverse normal CDF, logit, or logarithmic.

Covariate adjustment of AUC

Plugging adjusted means of P and N, we have

\[AUC(\boldsymbol{z}_P,\boldsymbol{z}_N)\]