2024-03-27
\(H_0: a_{12}=a_1-a_2=0 ~\text{and}~b_{12}=b_1-b_2=0\) \(H_1: a_{12}=a_1-a_2\neq 0 ~\text{or}~~b_{12}=b_1-b_2\neq 0\)
Recall for a single ROC curve, \[\chi^2_{(2)}=(\hat{a}-a_0,\hat{b}-b_0)[\boldsymbol{S}]^{-1}\begin{pmatrix} \hat{a}-a_0 \\ \hat{b}-b_0 \end{pmatrix},\] \(\boldsymbol{S}\) is the covariance matrix for \(\hat{a}\) and \(\hat{b}\).
Typically, \(a_0=0, b_0=1\).
\[\chi^2_{(2)}=(\hat{a}_1-\hat{a}_2,\hat{b}_1-\hat{b}_2)[\boldsymbol{S}]^{-1}\begin{pmatrix} \hat{a}_1-\hat{a}_2 \\ \hat{b}_1-\hat{b}_2 \end{pmatrix},\] \(\boldsymbol{S}\) is the covariance matrix for \(\hat{a}_{12}\) and \(\hat{b}_{12}\), i.e.,
\[\chi^2_{(2)}=\dfrac{\hat{a}^2_{12}\text{var}(\hat{b}_{12})+\hat{b}^2_{12}\text{var}(\hat{a}_{12})-2\hat{a}_{12}\hat{b}_{12}\text{cov}(\hat{a}_{12},\hat{b}_{12})}{\text{var}(\hat{a}_{12})\text{var}(\hat{b}_{12})-\text{cov}^2(\hat{a}_{12},\hat{b}_{12})}\]
Once again, assume \(f_i\) and \(g_i\) are the PDFs, and \(F_i\) and \(G_i\) are the CDFs of class N and P scores respectively, for classifier \(i\). Let \(x_{i\pi}\) be the \(\pi\)th quantile for classifier \(i\), so that
\[p(N)F_1(x_{1\pi})+[1-p(N)]G_1(x_{1\pi})=\pi\] \[p(N)F_2(x_{2\pi})+[1-p(N)]G_2(x_{2\pi})=\pi\]
The ROC curves are identical if and only if the misclassification rates of the classifiers are the same for all \(\pi\).
Testing the integrated unsigned difference between the misclassification rates is zero
\[\int |e_1 ({\pi})-e_2 ({\pi}) | d\pi=0\] based on \(\int |\hat{e}_1 ({\pi})-\hat{e}_2 ({\pi}) | d\pi\).
We model the true positive rate \(y\) in terms of the false positive rate \(x\) by a generalized linear model \[h(y)=b(x)+\boldsymbol{\beta}^T \boldsymbol{Z},\] where \(h(\cdot)\) is the link function, \(b(\cdot)\) is a baseline model, both being monotonic on (0,1), and \(\boldsymbol{Z}\) is a vector of covariates.
Testing \(H_0: \boldsymbol{\beta}=\boldsymbol{0}\) with dummy variables.
An essential requirement for conducting any of the ROC analyses described earlier is that the classification score \(S\) has been obtained for samples of individuals, each of which has been labeled, and labeled correctly, as either N or P.
Very often, to achieve correct labeling the sample members may have to be,
Methods of ROC analysis should cater for the possibility of sample mislabeling as well as being able to cope with missing labels.
In addition to the classification score \(S\), we denote the “group label” variable as \(L\).
Assume that the “true” group labels are given by the latent (i.e., unobservable) binary variable \(Z\).
\[L|Z \sim \text{Bernoulli}(\pi)\]
where \(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0\) when \(Z=\) N, and \(\log\left(\dfrac{\pi}{1-\pi}\right)=\beta_0+\beta_1\) when \(Z=\) P.
In addition, \(Z \sim \text{Bernoulli}(\zeta)\) where \(\zeta \sim \text{Beta}(a,b)\).