Percentage of positive and negative agreement

These are functions to obtain percentage of positive and negative agreement for binary ratings from two judges. A common approach here is to use Cohen’s kappa but that’s not necessarily a good idea. You can use kappa to test overall agreement is better than chance - but you could just as easily use a Chi-square test for that. The main issue is that kappa can be high just because most responses are positive or negative. The simple solution is to compute agreement separately for positive and negative ratings (e.g., see https://www.john-uebersax.com/stat/raw.htm#binspe).

The big advantage of the popana approach is that it has an intuitive interpretation. It is the (conditional) probability that a random selected rater will agree with the other rater. Good agreement is indicated when both pona and popa are high. It is also helpful if the lower bound for both is reasonably high - and if not you probably don’t have enough data to estimate agreement precisely for the positive response (for popa) or negative response (for pona).

If you have prior information about agreement you can adjust the prior. For example if you have prior information that agreement is high you might set alpha1 and alpha4 higher than alpha2 and alpha3 (e.g., alpha1 = alpha 4 = 0.5 and alpha2 = alpha3 = 0.25). However, these priors won’t have much impact if there are many ratings. The sum of alpha1 to alpha4 indicates the weighting of the prior in terms of the data (i.e., the default prior is equivalent to 1 data point).

These computations are very simple and the popana function below calculates them for a 2x2 table. Simply replace the values in the example matrix with your table values (traditionally labelled a, b, c and d by row). Its easy enough to compute the table from two vectors also (see the example of simulated data with no agreement).

Getting CIs is a bit trickier. Graham and Bull (1998) compare a range of methods but most of them seem to do badly in small samples. One of the exceptions is the Bayesian interval (technically a credibility or posterior probability interval). I have implemented that here with the default dirichlet prior that they propose. This function also computes overall agreement, the Egon-Pearson corrected Chi-square test for agreement and the CI for overall agreement (using a Beta(1,1) prior by default).

One advantage of the Bayesian approach is that you can also easily obtain the CI for the differences between popa and pona. This is also reported along with its interval estimate.

Graham P., & Bull B. (1998). Approximate standard errors and confidence intervals for indices of positive and negative agreement. Journal of Clinical Epidemiology, 51(9), 763-771.

# simulate some data

rating1 <- rbinom(30, 1, .5)
rating2 <- rbinom(30, 1, .5)
data.tab <- table(rating1, rating2)

# data from Graham and Bull 1998
data.tab <- matrix(c(73,12,27,344),2,2, byrow=T)
data.tab

##      [,1] [,2]
## [1,]   73   12
## [2,]   27  344

# just popa and pona
popana <- function(abcd_table){
    a = abcd_table[1,1] ; b = abcd_table[1,2] ; c = abcd_table[2,1] ; d = abcd_table[2,2]
    popa <- 2*a/(2*a+b+c)
    pona <- 2*d/(2*d+b+c)
    list("Proportion of positive agreement" = popa, "Proportion of negative agreement" = pona)
}

popana(data.tab)

## $`Proportion of positive agreement`
## [1] 0.7891892
## 
## $`Proportion of negative agreement`
## [1] 0.9463549

# full output including CI


popana.ci <- function(abcd_table, alpha1 = 0.25, alpha2=alpha1, alpha3=alpha1, alpha4=alpha1, confidence=.95, K=10^6, op=c(1,1)){
    # implements Bayesian CI for popa and pona from Graham & Bull (1998) - technically this is posterior/credibility interval
    # uses dirichlet prior alpha1 = alpha2=alpha3=alpha4=0.25 by default for popa and pona
    # default beta(1,1) for overall agreement
    a = abcd_table[1,1] ; b = abcd_table[1,2] ; c = abcd_table[2,1] ; d = abcd_table[2,2]
    popa <- 2*a/(2*a+b+c)
    pona <- 2*d/(2*d+b+c)
    pi1_draws <- rbeta(K, a + alpha1, b + c + d + alpha2 + alpha3 + alpha4)
    beta4_draws <- rbeta(K, d + alpha4, b + c + alpha2 + alpha3)    
    pi4_draws <- (1 - pi1_draws) * beta4_draws 
    pi1_pos <- 2 * pi1_draws / (pi1_draws + 1 - pi4_draws)
    pi4_pos <- 2 * pi4_draws / (pi4_draws + 1 - pi1_draws)
    qts <- c((1-confidence)/2, 1- (1-confidence)/2)
    # Egon-Pearson corrected Chi-square test of independence (NHST for agreement)
    uncorrected.test <- suppressWarnings(chisq.test(abcd_table, simulate.p.value=F, correct=F))
    corrected.stat <- uncorrected.test$stat[[1]] * (a+b+c+d-1)/(a+b+c+d)
    pval <- pchisq(corrected.stat, uncorrected.test$par, lower.tail = FALSE)
    epc.list <- list('Egon Pearson corrected Chi-Square (1 d.f.)' = corrected.stat, 'p'=pval)
    list("Proportion of overall agreement" = (a+d)/(a+b+c+d), "95% CI for overall agreement" = qbeta(qts, a+d+op[1], b+c+op[2]),
      "Proportion of positive agreement (popa)" = popa, '95% CI for popa' = quantile(pi1_pos, qts),
      "Proportion of negative agreement" = pona, '95% CI for pona' = quantile(pi4_pos, qts),
      "Difference in agreement" = popa - pona, '95% CI for difference' = quantile(pi1_pos - pi4_pos, qts),
      "Test of agreement" = epc.list)
}

popana.ci(data.tab)

## $`Proportion of overall agreement`
## [1] 0.9144737
## 
## $`95% CI for overall agreement`
## [1] 0.8851847 0.9367334
## 
## $`Proportion of positive agreement (popa)`
## [1] 0.7891892
## 
## $`95% CI for popa`
##      2.5%     97.5% 
## 0.7176442 0.8470188 
## 
## $`Proportion of negative agreement`
## [1] 0.9463549
## 
## $`95% CI for pona`
##      2.5%     97.5% 
## 0.9275009 0.9612328 
## 
## $`Difference in agreement`
## [1] -0.1571657
## 
## $`95% CI for difference`
##       2.5%      97.5% 
## -0.2161975 -0.1102818 
## 
## $`Test of agreement`
## $`Test of agreement`$`Egon Pearson corrected Chi-Square (1 d.f.)`
## [1] 249.0299
## 
## $`Test of agreement`$p
## [1] 4.225969e-56