2024-04-15
rm(list=ls())
kernel=function(x,y) (sign(y-x)+1)/2
m=n=50
simiter=10000
AUC=varAUC_Ustat=varAUC_jack=varAUC_boot=numeric(simiter)
for (iter in 1:simiter) {
S_N=rnorm(m,mean=0)
S_P=rnorm(n,mean=1)
Ker=outer(S_N,S_P,kernel)
AUC[iter]=mean(Ker) # nonparametric AUC estimator
# U statistics based variance estimator
varAUC_Ustat[iter] = mean(Ker)^2 - mean(mapply(
function(i,j) Ker[i,j]*mean(Ker[-i,-j]),rep(1:m,each=n),rep(1:n,times=m)))
# Jackknife variance estimator
rAUC=numeric(m)
cAUC=numeric(n)
for (i in 1:m) rAUC[i]=mean(Ker[-i,]) #pseudo-value
for (j in 1:n) cAUC[j]=mean(Ker[,-j])
pseudo_i = m*mean(Ker)-(m-1)*rAUC
pseudo_j = n*mean(Ker)-(n-1)*cAUC
varAUC_jack[iter]=var(pseudo_i)/m + var(pseudo_j)/n
# Bootstrap variance estimator
AUC.boot=numeric(2000)
for (i in 1:2000) {
S_N.boot=sample(S_N,size=m,replace=T)
S_P.boot=sample(S_P,size=n,replace=T)
AUC.boot[i]=mean(outer(S_N.boot,S_P.boot,kernel))
}
varAUC_boot[iter]=var(AUC.boot)
}
hist(AUC,100)
# sample variance is an unbiased estimator to the population variance
var(AUC) > [1] 0.002199891
mean(varAUC_Ustat) > [1] 0.002263924
mean(varAUC_jack) > [1] 0.002292265
mean(varAUC_boot) > [1] 0.002273582
Note: AUC as a special case.
Question of interest: to find the linear combination coefficient \(\boldsymbol{c}^{\prime}\) such that the VUS associated with univariate scores \(\boldsymbol{c}^{\prime} \boldsymbol{Y}_d\) \((d=1,2,3)\) is improved.
Assume that \(\boldsymbol{Y}_{d}\thicksim\) \(N_p\left(\boldsymbol{\mu}_{d}, \boldsymbol{\Sigma}_{d}\right)\), \(d=1,2,3\).
To minimize \(\boldsymbol{c}^{\prime}\left( {\sum_{d=1}^{3}} \boldsymbol{\Sigma}_{d}\right) \boldsymbol{c}\) (similar to total within-group variance) and maximize \(\boldsymbol{c}^{\prime}\sum_{d=1}^{3}\left[ \boldsymbol{\mu}_{d} -\overline{\boldsymbol{\mu}}\right] \left[ \boldsymbol{\mu}_{d} -\overline{\boldsymbol{\mu}}\right] ^{T}\boldsymbol{c}\) (similar to between-group variance).
Penalized-Distance: \[\boldsymbol{c}^{\prime}\left\{ \sum\limits_{d=1}^{3}\left(\boldsymbol{\mu}_{d}-\overline{\boldsymbol{\mu}}\right) \left(\boldsymbol{\mu}_{d}-\overline{\boldsymbol{\mu}}\right) ^{T}-\left( \boldsymbol{\Sigma}_{1}+ \boldsymbol{\Sigma}_{2}+\boldsymbol{\Sigma}_{3}\right)\right\}\boldsymbol{c}\]
Scaled-Distance: \[ \boldsymbol{c}^{\prime}\left\{\left( \boldsymbol{\Sigma}_{1}+ \boldsymbol{\Sigma}_{2}+\boldsymbol{\Sigma}_{3}\right) ^{-1} \sum\limits_{d=1}^{3}\left(\boldsymbol{\mu}_{d}-\overline{\boldsymbol{\mu}}\right) \left(\boldsymbol{\mu}_{d}-\overline{\boldsymbol{\mu}}\right) ^{T}\right\}\boldsymbol{c} \] How to find \(\boldsymbol{c}\)?
Stepwise search using empirical distribution-free approach to maximize the Mann-Whitney U statistic.
Order the \(p\) diagnostic tests or biomarkers according to their Mann-Whitney statistic, which is a nonparametric estimation for VUS.
Employ empirical approach to combine two diagnostic tests or biomarkers with first two largest VUSs.
After combining the first two diagnostic tests or biomarkers to get a new biomarker, employ empirical search approach again to combine the newly obtained biomarker with the original biomarker having third largest VUS.
Proceed stepwisely until only one biomarker is obtained, with which we maximize the VUS.
Step-Down? Step-Up?