Here is an example of Fleiss’ Kappa for hypothetical data. Fleiss’ kappa is most appropriate in this instance because there are more than two raters.
Assume we have 4 raters (Tom, Brooks, Chris, & Steve) and they are rating 10 different images for correct placement of ECG electrodes (1 = correct, 0 = incorrect).
The following are their ratings…
tom <- c(1, 0, 1, 0, 1, 0, 1, 1, 0, 1)
brooks <- c(1, 0, 0, 0, 1, 1, 1, 1, 0, 1)
chris <- c(1, 0, 1, 0, 1, 1, 1, 1, 0, 0)
steve <- c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0)
data <- data.frame(tom, brooks, chris, steve)
data
## tom brooks chris steve
## 1 1 1 1 0
## 2 0 0 0 0
## 3 1 0 1 0
## 4 0 0 0 0
## 5 1 1 1 1
## 6 0 1 1 1
## 7 1 1 1 1
## 8 1 1 1 1
## 9 0 0 0 0
## 10 1 1 0 0
Using the “IRR” package in R, I can obtain the Fleiss’ kappa easily.
library(irr)
library(dplyr)
kappam.fleiss(data, detail=TRUE)
## Fleiss' Kappa for m Raters
##
## Subjects = 10
## Raters = 4
## Kappa = 0.529
##
## z = 4.09
## p-value = 4.23e-05
##
## Kappa z p.value
## 0 0.529 4.095 0.000
## 1 0.529 4.095 0.000
Individual kappa scores for each row won’t be useful. This is because, with dichotimous scoring and only a few raters, the probability of any such level of agreement by chance will usually exceed the observed value, giving a negative kappa (see below).
data[1,]
## tom brooks chris steve
## 1 1 1 1 0
kappam.fleiss(data[1,])
## Fleiss' Kappa for m Raters
##
## Subjects = 1
## Raters = 4
## Kappa = -0.333
##
## z = -0.816
## p-value = 0.414
Instead, if you wanted individual agreements, you could simply calculate the average of each row of data (noting that agreement MUST be >50% for a group of 4 raters). Not sure how useful this would be.
data$Percent_Agreement <- rowSums(data)/4
data
## tom brooks chris steve Percent_Agreement
## 1 1 1 1 0 0.75
## 2 0 0 0 0 0.00
## 3 1 0 1 0 0.50
## 4 0 0 0 0 0.00
## 5 1 1 1 1 1.00
## 6 0 1 1 1 0.75
## 7 1 1 1 1 1.00
## 8 1 1 1 1 1.00
## 9 0 0 0 0 0.00
## 10 1 1 0 0 0.50
Finally, here is a dataset where all four had near-perfect agreement.
tom <- c(rep(2:1, 5))
brooks <- c(rep(2:1, 5))
chris <- c(rep(2:1, 5))
steve <- c(rep(2:1, 3), 2, 2, 2, 2)
data <- data.frame(tom, brooks, chris, steve)
data
## tom brooks chris steve
## 1 2 2 2 2
## 2 1 1 1 1
## 3 2 2 2 2
## 4 1 1 1 1
## 5 2 2 2 2
## 6 1 1 1 1
## 7 2 2 2 2
## 8 1 1 1 2
## 9 2 2 2 2
## 10 1 1 1 2
kappam.fleiss(data)
## Fleiss' Kappa for m Raters
##
## Subjects = 10
## Raters = 4
## Kappa = 0.798
##
## z = 6.18
## p-value = 6.36e-10