ROC class notes

Le Kang

2024-04-10

Jackknife variance estimation

We use the sample mean \(\bar{X}\) as an example. The mean with the \(i^{th}\) observation removed as \(\bar{X}_{(i)}=\dfrac{\sum_{j=1}^n X_j-X_i}{n-1}\), thus an individual \(X_i\) can be written as

\[X_i=\sum_{j=1}^n X_j-(n-1)\bar{X}_{(i)}=n\bar{X}-(n-1)\bar{X}_{(i)}\] This applies to other statistics: \(i^{th}\) pseudo-value = \(n\hat{\theta}-(n-1)\hat{\theta}_{(i)}\).

Each pseudo-value can be viewed as an estimate of \(\theta\), the average of the pseudo-value is \[n\hat{\theta}-(n-1)\hat{\theta}_{(\cdot)}\]

Equivalently, the jackknife variance estimator is \[\dfrac{n-1}{n}\sum_{i=1}^n \left(\hat{\theta}_{(i)}-\hat{\theta}_{(\cdot)}\right)^2\]

n=500
x=rnorm(n)
1/n

PV=NULL
theta=NULL
for (i in 1:n) {
PV[i]= n*mean(x)-(n-1)*mean(x[-i])
theta[i]=mean(x[-i])
}
var(PV)/n


var(theta)
(n-1)*var(theta)
(n-1)*mean((theta-mean(theta))^2)

VUS estimation

Nonparametric
Parametric
Semi-parametric

Nonparametric estimation

Assume the sample sizes for non-diseased, intermediate and diseased subjects are \(n_1\), \(n_2\) and \(n_3\), respectively. The unbiased nonparametric Mann-Whitney U statistic of the probability \(P(Y_{1}<Y_{2}<Y_{3})\), i.e, the VUS, is given by \[\begin{equation*} U=\frac{1}{n_1 n_2 n_3}\sum_{i=1}^{n_1}\sum_{j=1}^{n_2}\sum_{k=1}^{n_3} I(Y_{1i} < Y_{2j} < Y_{3k}), \end{equation*}\] where \(I(\cdot)\) stands for the indicator function.

vus.MWU <- function(new.1,new.2,new.3) {
n1=length(new.1)
n2=length(new.2)
n3=length(new.3)
usum <- function(u) {
is.small = new.2[u<new.2]
if (length(is.small)>0) return(sum(sapply(is.small, function(v) sum(v<new.3)))) else return(0)
}
return(sum(sapply(new.1,usum))/n1/n2/n3) 
}

Let \(\hat{F}_1\), \(\hat{F}_2\) and \(\hat{F}_3\) denote the empirical distribution functions for the diagnostic measures from non-diseased, intermediate and diseased category, respectively, \[ \widehat{VUS}=\int_{0}^{1}\int_{0}^{1-\hat{F}_{3}\left[ \hat{F}_{1}^{-1}(\delta_{1})\right] } \hat{F}_{2}\left[ \hat{F}_{3}^{-1}(1-\delta_{3})\right] -\hat{F}_{2}\left[ \hat{F}_{1}^{-1} (\delta_{1})\right] d\delta_{3}d\delta_{1}\]

vus.empirical <- function(new.1,new.2,new.3,nbin=100) {
F.func <- function(u) quantile(new.1,probs=u,type=1)
H.func <- function(u) quantile(new.3,probs=u,type=1)
G.func <- ecdf(new.2)
Q.mat = matrix(0,nrow=nbin,ncol=nbin)
u <- seq(0,1,length=nbin)
v <- seq(0,1,length=nbin)
for (i in 1:nbin) {
for (j in 1:nbin) {
if (F.func(v[j])<=H.func(1-u[i])) Q.mat[i,j]=
G.func(H.func(1-u[i]))-G.func(F.func(v[j])) }}
return(sum(Q.mat)/(nbin-1)^2)
}

Trinormal model

A simple parametric approach is to assume \(Y_d \thicksim N(\mu_d,\sigma_d^2)\), \(d=1, 2, 3\). The VUS under the normality assumption can be expressed as \[ VUS=\int_{-\infty}^{\infty}\Phi\left( a s-b\right) \Phi \left(-c s+d \right) \phi\left( s\right) ds, \label{kernelVUS:equXiong} \] where \(a=\left. \sigma_{2}\right/ \sigma_{1}, b=\left. \left(\mu_{1}-\mu_{2}\right) \right/ \sigma_{1}, c=\left. \sigma_{2}\right/ \sigma_{3}, d=\left. \left( \mu_{3}-\mu_{2}\right) \right/ \sigma_{3}.\)

\(\Phi\left( \cdot\right)\) is the standard normal CDF, and \(\phi\left(\cdot\right)\) is the standard normal PDF.

The maximum likelihood estimate (MLE) of VUS can then be obtained by substitution (invariance property).}

vus.normal <- function(new.1,new.2,new.3) {
mu1 = mean(new.1)
mu2 = mean(new.2)
mu3 = mean(new.3)
sigma1 = var(new.1)
sigma2 = var(new.2)
sigma3 = var(new.3)
if ((sigma1<=0)|(sigma2<=0)|(sigma3<=0)) return(NA)
a = sqrt(sigma2/sigma1)
c = sqrt(sigma2/sigma3)
b = (mu1-mu2)/sqrt(sigma1)
d = (mu3-mu2)/sqrt(sigma3)
integrand <- function(s) { pnorm(a*s-b)*pnorm(-c*s+d)*dnorm(s) }
integral <- try(integrate(integrand,lower=-Inf,upper=Inf),silent=T)
if (class(integral)=="try-error") return(NA) else return(integral$value)
}

Box-Cox type transformation: \[ Y_d^{(\lambda)}= (Y_d^{\lambda}-1)/\lambda ~\text{if}~ \lambda\neq 0, ~\text{or}~ \log\left( Y_{d}\right) ~\text{if}~\lambda=0 \] where it is assumed that \(Y_{d}^{\left( \lambda\right) }\thicksim N(\mu_d,\sigma_d^2)\), \(d=1, 2, 3.\)

The estimated VUS can be calculated using transformed data due to the following equality, \[ VUS=P(Y_1<Y_2<Y_3)=P(Y_1^{(\lambda)}<Y_2^{(\lambda)}<Y_3^{(\lambda)}) \]

Kernel method

The Gaussian kernel to estimate the PDF \(f_1(t)=F_1^{\prime} (t)\) by \[ \hat{f}_1(t)=\frac{1}{n_1}\sum_{i=1}^{n_1}\frac{1}{h_{1}} \phi\left(\frac{t-Y_{1i}}{h_{1}}\right)\label{kernelVUS:equPDF} \] and the CDF \(F_1(t)\) by \[ \hat{F}_1(t)=\frac{1}{n_1}\sum_{i=1}^{n_1} \Phi\left(\frac{t-Y_{1i}}{h_{1}}\right)\label{kernelVUS:equCDF}. \]

For bandwidth \(h_{1}\) which controls the amount of smoothing, we considered the asymptotic value \[ h_{1}=\left(\dfrac{4}{3 n_1}\right)^{1/5} \min(SD_1,IQR_1/1.349) \] where \(SD_1\) and \(IQR_1\) are the standard deviation and interquartile range for the samples \(Y_{1i}\) from non-diseased subjects.

\[ P(Y_{1}<Y_{2}<Y_{3}) = E_{(Y_1, Y_2, Y_3)} I(Y_{1}<Y_{2}<Y_{3})\\ = E_{Y_2} E_{(Y_1, Y_3)}\left[ I(Y_{1}<Y_{2}<Y_{3}) | Y_2 = y \right]\\ = E_{Y_2} E_{(Y_1, Y_3)}\left[ I(Y_{1}<Y_{2})\cap I(Y_{3}>Y_{2}) | Y_2 = y \right]\\ = E_{Y_2} P(Y_1<y) P(Y_3>y)\\ = \int_{-\infty}^{\infty} F_1(y) [1-F_3(y)] f_2(y) dy. \]

A straightforward estimator for the VUS with kernel smoothing technique is \[ \widehat{VUS}_{K1}=\int_{-\infty}^{\infty}\hat{F}_1(y) [1-\hat{F}_3(y)] \hat{f}_2(y) dy. \]

vus.ker1 <- function(new.1,new.2,new.3) {
n1=length(new.1)
n2=length(new.2)
n3=length(new.3)
h1=(4/3/n1)^(1/5)*min(sd(new.1),IQR(new.1)/1.349)
h2=(4/3/n2)^(1/5)*min(sd(new.2),IQR(new.2)/1.349)
h3=(4/3/n3)^(1/5)*min(sd(new.3),IQR(new.3)/1.349)
F1 <- function(x)  sapply(x, function(u)      mean(pnorm((u-new.1)/h1)) )
f2 <- function(x)  sapply(x, function(u) 1/h2*mean(dnorm((u-new.2)/h2)) )
F3 <- function(x)  sapply(x, function(u)    1-mean(pnorm((u-new.3)/h3)) )
integrand <- function(s) { F1(s)*F3(s)*f2(s) }
integral <- try(integrate(integrand,lower=-Inf,upper=Inf),silent=T)
if (class(integral)=="try-error") return(NA) else return(integral$value)
}

Using kernel density estimator to estimate the VUS \[ \widehat{VUS}_{K2}=\frac{1}{n_1 n_2 n_3}\sum_{i=1}^{n_1}\sum_{j=1}^{n_2}\sum_{k=1}^{n_3} \Phi\left(\frac{Y_{2j} - Y_{1i}}{\sqrt{h_1^2+h_2^2}}\right) \Phi\left(\frac{Y_{3k} - Y_{2j}}{\sqrt{h_2^2+h_3^2}}\right) \] Again, we use the asymptotically optimal bandwidths \[ h_{d}=\left(\dfrac{4}{3 n_d}\right)^{1/5} \min(SD_{d},IQR_{d}/1.349),\ d=1,\ 2,\ 3. \]

vus.ker2 <- function(new.1,new.2,new.3) {
n1=length(new.1)
n2=length(new.2)
n3=length(new.3)
h1=(4/3/n1)^(1/5)*min(sd(new.1),IQR(new.1)/1.349)
h2=(4/3/n2)^(1/5)*min(sd(new.2),IQR(new.2)/1.349)
h3=(4/3/n3)^(1/5)*min(sd(new.3),IQR(new.3)/1.349)
sum.idx=0
for (i in 1:n1) {
for (j in 1:n2) {
for (k in 1:n3) {
sum.idx=sum.idx+pnorm((new.2[j]-new.1[i])/sqrt(h1^2+h2^2))*pnorm((new.3[k]-new.2[j])/sqrt(h3^2+h2^2))
}}}
return(sum.idx/n1/n2/n3)
}

Linear combinations of biomarkers to improve VUS

Note: AUC as a special case.

Question of interest: to find the linear combination coefficient \(\boldsymbol{c}^{\prime}\) such that the VUS associated with univariate scores \(\boldsymbol{c}^{\prime} \boldsymbol{Y}_d\) \((d=1,2,3)\) is improved.

Multivariate normal data

Assume that \(\boldsymbol{Y}_{d}\thicksim\) \(N_p\left(\boldsymbol{\mu}_{d}, \boldsymbol{\Sigma}_{d}\right)\), \(d=1,2,3\).

To minimize \(\boldsymbol{c}^{\prime}\left( {\sum_{d=1}^{3}} \boldsymbol{\Sigma}_{d}\right) \boldsymbol{c}\) (similar to total within-group variance) and maximize \(\boldsymbol{c}^{\prime}\sum_{d=1}^{3}\left[ \boldsymbol{\mu}_{d} -\overline{\boldsymbol{\mu}}\right] \left[ \boldsymbol{\mu}_{d} -\overline{\boldsymbol{\mu}}\right] ^{T}\boldsymbol{c}\) (similar to between-group variance).

Penalized-Distance: \[\boldsymbol{c}^{\prime}\left\{ \sum\limits_{d=1}^{3}\left(\boldsymbol{\mu}_{d}-\overline{\boldsymbol{\mu}}\right) \left(\boldsymbol{\mu}_{d}-\overline{\boldsymbol{\mu}}\right) ^{T}-\left( \boldsymbol{\Sigma}_{1}+ \boldsymbol{\Sigma}_{2}+\boldsymbol{\Sigma}_{3}\right)\right\}\boldsymbol{c}\]

Scaled-Distance: \[ \boldsymbol{c}^{\prime}\left\{\left( \boldsymbol{\Sigma}_{1}+ \boldsymbol{\Sigma}_{2}+\boldsymbol{\Sigma}_{3}\right) ^{-1} \sum\limits_{d=1}^{3}\left(\boldsymbol{\mu}_{d}-\overline{\boldsymbol{\mu}}\right) \left(\boldsymbol{\mu}_{d}-\overline{\boldsymbol{\mu}}\right) ^{T}\right\}\boldsymbol{c} \] How to find \(\boldsymbol{c}\)?

Multivariate non-normal data

Stepwise search using empirical distribution-free approach to maximize the Mann-Whitney U statistic.

Order the \(p\) diagnostic tests or biomarkers according to their Mann-Whitney statistic, which is a nonparametric estimation for VUS.
Employ empirical approach to combine two diagnostic tests or biomarkers with first two largest VUSs.

After combining the first two diagnostic tests or biomarkers to get a new biomarker, employ empirical search approach again to combine the newly obtained biomarker with the original biomarker having third largest VUS.
Proceed stepwisely until only one biomarker is obtained, with which we maximize the VUS.

Step-Down? Step-Up?

Other considerations

A cumulative logistic regression approach (multinomial logistic regression approach is not working, why?)
A min-max nonparametric approach