Compilation of ROC curve values and comprhension concepts

Two Normal Distirbution.

Often ROC curve is plotted based on the empirical distribution for two normal distribution with different means. More classified ROC curve with higher AUC value is due to less intersected points in common.

# random samples
a3<-rnorm(1000)
b3<-rnorm(1000, mean=4, sd=1)

# bimodal distribution
hist(c(a3,b3),breaks=50) 
abline(v=2, pch=5)

da3<-as.data.frame(cbind(c(a3,b3),c(rep(0,1000), rep(1,1000))))
colnames(da3)<-c('p','o')

Significant values

Positive likelihood ratio (\(LR_+\)) is defined as \(LR_+ = \frac{sensitivity}{1-specificity} = \frac{P(\hat{y}=1\mid y=1)}{P(\hat{y}=1\mid y=0)}\). Note that the \(\hat{y}\) is the prediction in ROC curve classification. This term could be interpreted as “the probability of a person who has the disease testing positive (=1) divided by the probability of a person who does not have the disease testing positive”. Hence, the greater the value of the \(LR_+\) for a particular test, the more likely a positive test result is a true positive. In other words, the higher value found is better for the test to be statistically credible in diagnosis of positive result. Negative likelihood ratio (\(LR_-\)) is defined as \(LR_- = \frac{1-sensitivity}{specificity} = \frac{P(\hat{y}=0\mid y=1)}{P(\hat{y}=0\mid y=0)}\). This term could be interpreted as “the probability of a person who has the disease testing negative divided by the probability of a person who does not have the disease testing negative”. Hence, the greater the value of the \(LR_-\) for a particular test, the more likely a negative test result is a true negative. In other words, the higher value found is better for the test to be statistically credible in diagnosis of negative result. The outcome could be calculated as \(\frac{1}{e^{(-\beta_0+\beta_1 x_i)}}\) where \(x_i\) is the distribution points.

# regular ROC curve
dd<-roc(o~p, da3)
plot(dd)

# ROC curve with confidence interval
pROC_obj <- roc(o~p, da3, smoothed = TRUE, ci=TRUE, ci.alpha=0.9, stratified=FALSE,
plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE, print.auc=TRUE, show.thres=TRUE)

sens.ci <- ci.se(pROC_obj)
plot(sens.ci, type="shape", col="lightblue")
plot(sens.ci, type="bars")

Some equations of ROC curve

In the emprirical approach, the ROC curve is defined as \(\tilde{ROC}(t)=1-\tilde{G}(\tilde{F}^{-1}(1-t))\), where \(F\) and \(G\) is FP and TP values. In other words, \(se(c) = 1-G(c)\) for cutoff point \(c\) and \(sp(c)=F(c)\). Notice \(1-G(c)\) is sensitivity and \(1-F(c)\) is specificity at the cutoff point. Hence, for \(t\in [0,1], F^{-1}(1-t) = inf\{F(x)\geq 1-t\}\). The values of threshold is can be shown. Now, we can define \(ROC(t) = \Phi\{a+b\phi^{-1}(t)\}, t\in [0,1]\) for \(X\) and \(Y\) being independent normal variables. In this way, the \(F(t) = \Phi(\frac{t-\mu_1}{\sigma_1})\) and \(G(t) = \Phi(\frac{t-\mu_2}{\sigma_2})\). Then, AUC\(=\Phi\big(\frac{a}{\sqrt{1+b^2}}\big)\), where \(a=\frac{\mu_1-\mu_0}{\sigma_1}\) and \(b=\frac{\sigma_0}{\sigma_1}\) as \(X\sim N(\mu_0, \sigma_0^2)\) and \(Y\sim N(\mu_1, \sigma_1^2)\).

Information theory

Hughes and Bhattacharya (2013) characterize the symmetry properties of bi-normal and bi-gamma ROC curves in terms of the Kullback–Leibler divergences.

ROC curves as CDFs

Recognizing the similarity between ROC curves and CDFs motivated our choice of EDF statistics. As we can show that the ROC curve, in terms of a randomobservation \(V\) fromthe positive class and the CDF of the negative class, the term placement value refers to \(1-F(V)\). Then, \(Pr(1-F(V)\leq p) = Pr(F(V)\geq 1-p) = Pr(V\geq F^{-1}(1-p)) = 1-Pr(V\leq F^{-1}(1-p)) = 1-G(F^{-1}(1-p)) =ROC(p)\).

# Epi package ROC curve
a1=ROC(form=o~p,data=da3,plot="ROC")

## AUC value
a1$AUC
## [1] 0.998787
## FPR, TPR values
head(a1$res)
##                      sens  spec pvp       pvn       lr.eta
##                         1 0.000 NaN 0.5000000         -Inf
## 7.22594409431227e-12    1 0.001   0 0.4997499 7.225944e-12
## 3.36357377388641e-11    1 0.002   0 0.4994995 3.363574e-11
## 1.58203418985643e-10    1 0.003   0 0.4992489 1.582034e-10
## 3.70761276043604e-10    1 0.004   0 0.4989980 3.707613e-10
## 4.65828859565068e-10    1 0.005   0 0.4987469 4.658289e-10
## optimal threshold, slope
optimal_lr.eta=function(x){
  no=which.max(x$res$sens+x$res$spec)[1]
  result=x$res$lr.eta[no]
  result
}
optimal_lr.eta(a1) 
## [1] 0.3773207
## optimal point in distribution
optimal_cutpoint=function(x){
   y=optimal_lr.eta(x)
   b0=unname(x$lr$coeff[1])
   b1=unname(x$lr$coeff[2])
   result=(-log(1/y-1)-b0)/b1
   result
} 
optimal_cutpoint(a1)
## [1] 1.912751
# ROC classification values
roc4 <- rocit(score = da3$p, class = da3$o, negref = 0) 
da<-cbind(Cutoff=roc4$Cutoff, TPR=roc4$TPR, FPR=roc4$FPR)
head(da)
##        Cutoff   TPR FPR
## [1,]      Inf 0.000   0
## [2,] 7.260327 0.001   0
## [3,] 6.827645 0.002   0
## [4,] 6.749615 0.003   0
## [5,] 6.727526 0.004   0
## [6,] 6.665403 0.005   0

Youden index

The Youden index is often characterized as a optimal threshold. (Goal:) In the ROC analysis, we could find that the optimal cutoff value, for example, the value providing the best tradeoff between sensitivity and specificity. The higher cut-off value is needed to optimize diagnostic accuracy in such data between variables of 0 and 1.

The Youden index (Youden’s J statistics) is defined \(J = sensitivity +specificity -1\). Moreover, this is characterized as \(J_{max} = max_t\{sensitivity(t) +specificity(t) -1\}\) for probability measure point of view, where \(t\) is the classificaition thershold for which \(J\) is maximal.

For example, in the data where \(sen.=0.8\) and \(spe.=0.8\), the value of \(J_{max} = 0.6\). From the above data, \(sen.=0.971\), and \(spe.=0.97\), then the optimal point by Youden index is \(0.97+0.971-1=0.941\)

Notice that this index is further decomposed as \(J=\frac{\text{true positive}}{\text{true positive+false negative}}+\frac{\text{true negative}}{\text{true negative+false positive}}\) from the ROC curve analysis. Hence, the Jouden index is a single statistic that captures the performance of a dichotomous diagnostic test.

# ROC curve with Youden Index (optimal threshold in graph)
roc4 <- rocit(score = da3$p, class = da3$o, negref = 0) 
plot(roc4)

KS statistics for FPR and TPR values

The KS statistic shows the maximum distance between the two curves of TPR and FPR. Understanding that \(FPR = p(\hat{x}=1\mid x=0)\) and \(TPR = p(\hat{x}=1\mid x=1)\), the difference represents thresholds values of each point. The maximum difference between TPR and FPR is achieved at the optimal threshold point.

# KS statistic shows the maximum distance between the two curves.
par(mfrow=c(1,3))
ksplot(roc4)
plot(roc4$TPR)
plot(roc4$FPR)

## calculation
d<-c(roc4$TPR-roc4$FPR)
j<-numeric()
for (i in 1:length(d)){
    if (d[i]==max(d)){
    j<-c(j, i)  
    }
}
j
## [1] 1005 1007 1009
roc4$TPR[j]
## [1] 0.987 0.988 0.989
roc4$FPR[j]
## [1] 0.017 0.018 0.019

Gini Coefficient

Gini coefficient is another measure of goodness of a binary classifier. The Gini coefficient is a ratio of two areas. The AUC is related to the Gini coefficient \((G_{1}=2\sum_{k=1}^n (X_K-X_{k-1})(Y_k+Y_{k+1})\), where it is also used for economics.

C-Statistics

C-index (a.k.a) is a measure of goodness of fit for binary outcomes in a logistic regression model. In clinical studies, the C-statistic gives the probability a randomly selected patient who experienced an event (e.g. a disease or condition) had a higher risk score than a patient who had not experienced the event. It is equal to the area under the Receiver Operating Characteristic (ROC) curve and ranges from 0.5 to 1.

Kernel Density Estimation(KDE)

Let \((x_1,..., x_n)\) and \((y_1,..., y_m)\) be two independent samples from \(X\) and \(Y\). Then, the kernal density estimators of \(f\) and \(g\), the probability functions associated with \(F\) and \(G\), are: \[\hat{f}(x) = \frac{1}{nh_0}\sum_{i=1}^n K_0\big(\frac{x-x_i}{h_0}\big), \hat{g}(y) = \frac{1}{mh_1}\sum_{i=1}^m K_1\big(\frac{y-y_i}{h_1}\big)\], where \(h_i>0\), \(K_i\) are kernel functions which satisfy \(\int K_i(x)dx = 1, \int xK_i(x)dx = 0\) and \(\int x^2K_i(x)dx >0\) for \(i=0,1\). Hence, the cdf of each \(\hat{F}\) and \(\hat{G}\) is estimated as \(\hat{F}(x) = \frac{1}{n}\sum^n_{i=1} \int^{x}_{-\infty} \frac{1}{h_0}K_0\big(\frac{u-x_i}{h_0}\big), \hat{G}(y) = \frac{1}{m}\sum^n_{i=1} \int^{y}_{-\infty} \frac{1}{h_1}K_1\big(\frac{v-y_i}{h_1}\big)\) for estimating sensitivity and specificity. Equivalently, \(\hat{F}(x) = \frac{1}{n}\sum^n_{i=1} \Phi\big(\frac{x-x_i}{h_0}\big), \hat{G}(x) = \frac{1}{m}\sum^m_{i=1} \Phi\big(\frac{y-y_i}{h_1}\big)\), where normal distribution comes in. Notice that the kernel based AUC is estimated as \(\hat{AUC} = \frac{1}{nm}\sum_{i=1}^n\sum_{j=1}^m\Phi \bigg(\frac{y_j-x_i}{\sqrt{h_0^2+h_1^2}}\bigg)\).

# kernel density ROC curve estimation
a3<-rnorm(1000)
b3<-rnorm(1000, mean=4, sd=1)
xy.ROC <- kROC(b3, a3, bw.x="pi_sj",bw.y="pi_sj")
xy.ROC
## 
## Call:
##  kROC(x = b3, y = a3, bw.x = "pi_sj", bw.y = "pi_sj")
## 
## Data: x (1000 obs.) and y (1000 obs.);
## 
##  Bandwidth 'bw.x' = 0.3468 'bw.y' = 0.3348
## 
##       FPR               TPR          
##  Min.   :0.00000   Min.   :0.000002  
##  1st Qu.:0.00000   1st Qu.:0.187928  
##  Median :0.02644   Median :0.976531  
##  Mean   :0.33623   Mean   :0.670848  
##  3rd Qu.:0.84126   3rd Qu.:1.000000  
##  Max.   :1.00000   Max.   :1.000000
plot(xy.ROC)

Dirichlet process estimation

Bayesian approaches enable the introduction of prior information into the estimation process, which reduces the uncertainty of the inferences. This point is specially important for standard test, which correctly classifies all subjects. Nonparametric Bayesian methods are meant to overcome the restrictions imposed by considering a fixed parametric model and the consequent difficulties in capturing nonstandard data features, such as multimodality and skewness. The nonparametric approach entails a modeling framework that requires specifying a prior distribution over the space of all probability measures. A nonparametric Bayesian method reported uses the Dirichlet process mixtures and mixtures of Polya trees for analyzing continuous serologic data.

# TPR
dp <- DirichletProcessGaussian(roc4$TPR)
dp <- Fit(dp, 2)
dp
## Dirichlet process object run for 2 iterations.
##                                      
##   Mixing distribution          normal
##   Base measure parameters  0, 1, 1, 1
##   Alpha Prior parameters         2, 4
##   Conjugacy                 conjugate
##   Sample size                    2001
##                                      
##   Mean number of clusters        1.50
##   Median alpha                   0.50
plot(dp)

dd<-data.frame(Weights=dp$weights, mu=c(dp$clusterParameters[[1]]),sigma=c(dp$clusterParameters[[2]]))
xGrid <- seq(-3, 3, by=0.01)
postSamples <- data.frame(replicate(100, PosteriorFunction(dp)(xGrid)))
postFrame <- data.frame(x=xGrid, y=rowMeans(postSamples))

# FPR
dp2 <- DirichletProcessGaussian(roc4$FPR)
dp2 <- Fit(dp2, 2)
dp2
## Dirichlet process object run for 2 iterations.
##                                      
##   Mixing distribution          normal
##   Base measure parameters  0, 1, 1, 1
##   Alpha Prior parameters         2, 4
##   Conjugacy                 conjugate
##   Sample size                    2001
##                                      
##   Mean number of clusters        2.00
##   Median alpha                   0.35
plot(dp2)

dd2<-data.frame(Weights=dp2$weights, mu=c(dp2$clusterParameters[[1]]),sigma=c(dp2$clusterParameters[[2]]))
xGrid <- seq(-3, 3, by=0.01)
postSamples <- data.frame(replicate(100, PosteriorFunction(dp2)(xGrid)))
postFrame <- data.frame(x=xGrid, y=rowMeans(postSamples))