The kernel density methods: obtain smooth estimates of the functions \(F\) and \(G\) directly from the data, without imposing any distributional constraints.
where \(k(\cdot)\) is the kernel function and \(h_N, h_P\) are the bandwidths in each.
x <-rnorm(500)hist(x, freq =FALSE)dens <-density(x)lines(dens, col ="red")
dens
Call:
density.default(x = x)
Data: x (500 obs.); Bandwidth 'bw' = 0.266
x y
Min. :-4.1683 Min. :0.0000337
1st Qu.:-2.2261 1st Qu.:0.0057572
Median :-0.2838 Median :0.0715865
Mean :-0.2838 Mean :0.1285882
3rd Qu.: 1.6585 3rd Qu.:0.2440612
Max. : 3.6008 Max. :0.3553230
Choosing between the many available kernel functions is relatively unimportant as all give comparable results, but more care needs to be taken over the selection of bandwidth. We may use the general-purpose bandwidths.
since \(F\) and \(G\) are estimated separately, the final ROC curve estimator is not invariant under a monotone transformation of the data.
The spline smoothing is also a popular in density estimation.
Variance estimation for AUC
For parametric and semiparametric methods, maximum likelihood theory will yield asymptotic expressions for the variances and covariances of the parameters and so the delta method will yield the required variance.
where \(Q_1\) is the probability that the classification scores of two randomly chosen individuals from population P exceed the score of a randomly chosen individual from population N, and \(Q_2\) is the converse probability that the classification score of a randomly chosen individual from population P exceeds both scores of two randomly chosen individuals from population N.