Written: 2019-05-29
Last run: 2019-05-29


Comparison between 4 raters on different bases

This analysis reviews inter-rater agreement of DWI QC ratings of the POND dataset (N=99), on the basis of volume-to-volume visual review (Hajer), dtifit QC outputs (Grace; binary ratings), eddy QUAD QC reports (Navona), and on the basis of a multivariate calculation with data from eddy QUAD and CSD residuals (John; N=77).

This plot displays ratings (y axis) and absolute difference in rating (colour). Here, 1=FAIL, 2=INDETERMINATE, 3=PASS.

For visual review, a list of the participants with absolute difference ratings from smallest (high agreement) to largest (low agreement) is as follows. Note that this list does not adjust for 22 missing Hclust ratings.

##  [1] "sub-0880044" "sub-0880046" "sub-0880050" "sub-0880100" "sub-0880155"
##  [6] "sub-0880157" "sub-0880221" "sub-0880279" "sub-0880293" "sub-0880353"
## [11] "sub-0880418" "sub-0880419" "sub-0880464" "sub-0880494" "sub-0880533"
## [16] "sub-0880601" "sub-0880664" "sub-0880665" "sub-1050015" "sub-1050019"
## [21] "sub-1050090" "sub-1050219" "sub-1050221" "sub-1050239" "sub-1050253"
## [26] "sub-1050288" "sub-1050307" "sub-1050349" "sub-1050353" "sub-1050363"
## [31] "sub-1050369" "sub-1050378" "sub-1050399" "sub-1050431" "sub-1050432"
## [36] "sub-1050440" "sub-1130160" "sub-0880085" "sub-0880096" "sub-0880107"
## [41] "sub-0880254" "sub-0880265" "sub-0880624" "sub-0880669" "sub-0880692"
## [46] "sub-0880693" "sub-0880738" "sub-0880138" "sub-0880303" "sub-0880308"
## [51] "sub-0880372" "sub-0880396" "sub-0880473" "sub-1050029" "sub-1050065"
## [56] "sub-1050089" "sub-1050134" "sub-1050172" "sub-1050178" "sub-1050224"
## [61] "sub-1050254" "sub-1050265" "sub-1050336" "sub-1050341" "sub-1050528"
## [66] "sub-0880061" "sub-0880081" "sub-0880112" "sub-0880369" "sub-0880428"
## [71] "sub-0880558" "sub-0880703" "sub-0880742" "sub-1050136" "sub-1050195"
## [76] "sub-1050218" "sub-1050229" "sub-1050403" "sub-1050509" "sub-0880043"
## [81] "sub-0880397" "sub-1050105" "sub-1050250" "sub-1050402" "sub-0885005"
## [86] "sub-1050007" "sub-1050032" "sub-1050054" "sub-1050055" "sub-1050092"
## [91] "sub-1050131" "sub-1050179" "sub-1050188" "sub-1050194" "sub-1050235"
## [96] "sub-1050383" "sub-1050429" "sub-0880146" "sub-1050135"

In total, we have perfect agreement in only 37 out of 99 instances. As before, we see a trend whereby the ratings based on eddy QC (those made by Navona, average=2.03 and John’s calculation, average=2.23) are lower - i.e., more conservative / more likely to rate FAIL than those based on both visual review of volumes (Hajer, average=2.67) and review of the tensor residuals (Grace, average=2.76). However, Grace may be the most liberal in part because she rated on a binary PASS / FAIL scale.

We can calculate inter-rater agreement for our 4 raters using Fleiss’s Kappa:

##  Fleiss' Kappa for m Raters
## 
##  Subjects = 77 
##    Raters = 4 
##     Kappa = 0.277 
## 
##         z = 8.07 
##   p-value = 6.66e-16

Compare this to Krippendorff’s alpha for ordinal ratings of two or more raters:

##  Krippendorff's alpha
## 
##  Subjects = 99 
##    Raters = 4 
##     alpha = 0.363

We can also calculate intra-class correlation (ICC):

## Call: ICC(x = df[, c("Hajer", "Grace", "Navona", "Hclust")])
## 
## Intraclass correlation coefficients 
##                          type  ICC   F df1 df2       p lower bound
## Single_raters_absolute   ICC1 0.44 4.2  98 297 1.5e-21        0.34
## Single_random_raters     ICC2 0.47 6.2  98 294 1.4e-34        0.29
## Single_fixed_raters      ICC3 0.57 6.2  98 294 1.4e-34        0.47
## Average_raters_absolute ICC1k 0.76 4.2  98 297 1.5e-21        0.67
## Average_random_raters   ICC2k 0.78 6.2  98 294 1.4e-34        0.62
## Average_fixed_raters    ICC3k 0.84 6.2  98 294 1.4e-34        0.78
##                         upper bound
## Single_raters_absolute         0.55
## Single_random_raters           0.61
## Single_fixed_raters            0.66
## Average_raters_absolute        0.83
## Average_random_raters          0.86
## Average_fixed_raters           0.89
## 
##  Number of subjects = 99     Number of Judges =  4

Exploratory cut-off on basis of extracted eddy QC metrics


Below is a table that visually summarizes all raters’ ratings for the 99 participants, alongside 5 quantitative metrics extracted from various eddy QUAD reports: (1) percent outliers, (2) average signal-to-noise ratio, (3) average contrast-to-noise ratio, (4) average relative motion, and (5) noise. Note: Absolute motion has been replace by noise since the last report. We have also included a summary value indicating the count of problematic metrics, and estimating a PASS, CAUTION, or FAIL value.

The thresholds for these variables were set as follows:

Metric Threshold for FAIL
Percent_Outliers > 0.2
Average_SNR < 20
Average_CNR < 1.4
Rel_Motion > 1mm
Noise >.1
Multiple_Issues simple count
Weighted_Score <=2 PASS | 3 CAUTION | >3 FAIL

The ‘threshold’ for the first 5 eddy-extracted values was set by John based on review of this sample. These thresholds can be adjusted on the basis of a larger subset, or adjusted on the basis of discussion.

The purpose of this visualization is to discuss which of these metrics could/should be used to inform our QC ratings, and to see how each aligns with our 4 raters’ ratings, which were done independently of their review.

Participant Hajer Grace Navona Hclust Percent_Outliers Average_SNR Average_CNR Rel_Motion Noise Multiple_Issues Weighted_Score
sub-0880043 3 1 3 3 0.024 20.1 1.37 0.21 0.070 1 Pass
sub-0880044 3 3 3 3 0.083 28.3 1.60 0.31 0.045 0 Pass
sub-0880046 3 3 3 3 0.000 20.3 1.46 0.30 0.065 0 Pass
sub-0880050 3 3 3 3 0.179 23.9 1.45 0.39 0.063 0 Pass
sub-0880061 3 3 2 2 0.500 20.1 1.34 0.38 0.059 2 Caution
sub-0880112 3 3 2 2 0.512 21.2 1.21 0.35 0.083 2 Caution
sub-0880138 3 3 2 3 0.095 26.0 1.48 0.24 0.038 0 Pass
sub-0880146 3 3 1 1 1.607 15.6 1.04 0.64 0.103 4 Fail
sub-0880155 3 3 3 3 0.405 28.7 1.58 0.26 0.027 1 Pass
sub-0880157 3 3 3 3 0.107 26.6 1.60 0.38 0.048 0 Pass
sub-0880221 3 3 3 3 0.452 21.6 1.61 0.25 0.045 1 Pass
sub-0880279 3 3 3 3 0.083 25.3 1.57 0.30 0.056 0 Pass
sub-0880293 3 3 3 3 0.083 24.0 1.56 0.30 0.085 0 Pass
sub-0880303 3 3 2 3 0.083 18.1 1.62 0.24 0.041 1 Pass
sub-0880308 3 3 2 3 1.262 30.9 1.81 0.31 0.029 1 Pass
sub-0880353 3 3 3 3 0.143 26.3 1.66 0.32 0.042 0 Pass
sub-0880369 3 3 2 2 0.667 20.8 1.33 0.43 0.112 3 Fail
sub-0880372 3 3 2 3 0.833 27.0 1.65 0.20 0.039 1 Pass
sub-0880396 3 3 2 3 0.298 24.8 1.44 0.22 0.052 1 Pass
sub-0880397 1 3 1 1 2.214 12.9 0.89 0.66 0.100 4 Fail
sub-0880418 3 3 3 3 0.024 22.0 1.49 0.24 0.059 0 Pass
sub-0880419 3 3 3 3 0.179 23.0 1.44 0.41 0.057 0 Pass
sub-0880428 3 3 2 2 0.048 21.7 1.34 0.37 0.109 2 Caution
sub-0880464 3 3 3 3 0.143 25.0 1.57 0.37 0.054 0 Pass
sub-0880473 2 1 1 1 1.976 9.8 0.45 0.42 0.063 3 Fail
sub-0885005 3 3 1 2 2.202 21.9 1.33 0.38 0.075 2 Caution
sub-1050007 3 3 1 2 0.524 20.6 1.24 0.40 0.090 2 Caution
sub-1050015 1 1 1 1 2.310 10.7 1.04 0.67 0.120 4 Fail
sub-1050019 1 1 1 1 2.893 9.4 0.65 1.03 0.204 5 Fail
sub-1050029 2 3 2 2 1.381 15.9 1.19 0.44 0.096 3 Fail
sub-1050032 2 3 1 1 1.690 7.8 0.76 0.65 0.149 4 Fail
sub-1050054 2 3 1 1 1.976 9.5 1.03 0.94 0.113 4 Fail
sub-1050055 3 3 1 2 0.667 21.4 1.26 0.51 0.103 3 Fail
sub-1050065 3 3 3 2 0.226 20.2 1.25 0.32 0.086 2 Caution
sub-1050089 3 3 3 2 0.190 16.5 1.24 0.29 0.063 2 Caution
sub-1050090 3 3 3 3 0.202 26.1 1.59 0.22 0.038 1 Pass
sub-1050092 3 3 1 2 0.452 18.5 1.24 0.33 0.066 3 Fail
sub-1050105 1 3 1 1 1.298 14.3 0.79 0.92 0.102 4 Fail
sub-1050131 3 3 1 2 1.619 23.0 1.26 0.46 0.059 2 Caution
sub-1050134 3 3 3 2 0.071 17.0 0.89 0.25 0.087 2 Caution
sub-1050135 3 3 1 1 0.655 14.1 0.95 0.57 0.126 4 Fail
sub-1050136 3 3 2 2 0.881 21.5 1.29 0.37 0.072 2 Caution
sub-1050172 3 3 2 3 0.238 26.2 1.47 0.29 0.067 1 Pass
sub-1050178 3 3 2 3 0.429 21.6 1.31 0.31 0.037 2 Caution
sub-1050179 2 3 1 1 4.274 12.1 1.04 0.52 0.066 3 Fail
sub-1050188 3 3 1 2 1.786 17.5 1.27 0.53 0.043 3 Fail
sub-1050194 2 3 1 1 0.536 17.6 0.98 0.67 0.134 4 Fail
sub-1050195 3 3 2 2 0.524 19.5 1.11 0.44 0.131 4 Fail
sub-1050218 3 3 2 2 0.500 20.6 1.32 0.31 0.070 2 Caution
sub-1050219 3 3 3 3 0.083 26.8 1.61 0.30 0.045 0 Pass
sub-1050221 3 3 3 3 0.155 28.8 1.56 0.22 0.041 0 Pass
sub-1050224 3 3 2 3 0.464 28.7 1.79 0.22 0.035 1 Pass
sub-1050229 3 3 2 2 0.464 22.5 1.28 0.29 0.080 2 Caution
sub-1050235 1 3 1 2 0.917 15.0 1.08 0.41 0.062 3 Fail
sub-1050239 3 3 3 3 0.143 32.4 1.80 0.23 0.031 0 Pass
sub-1050250 3 1 1 1 2.167 15.5 1.05 0.55 0.174 4 Fail
sub-1050253 1 1 1 1 5.500 13.3 0.85 0.64 0.098 3 Fail
sub-1050254 3 3 2 3 0.321 23.8 1.50 0.34 0.063 1 Pass
sub-1050265 3 3 2 3 0.560 23.4 1.63 0.29 0.058 1 Pass
sub-1050336 3 3 2 3 0.250 24.0 1.44 0.33 0.090 1 Pass
sub-1050341 3 3 2 3 0.952 26.2 1.57 0.35 0.069 1 Pass
sub-1050349 1 1 1 1 2.583 8.2 0.53 0.86 0.150 4 Fail
sub-1050353 1 1 1 1 5.702 12.8 0.65 1.07 0.115 5 Fail
sub-1050363 3 3 3 3 0.107 25.5 1.64 0.27 0.023 0 Pass
sub-1050369 1 1 1 1 2.012 12.2 0.86 0.68 0.161 4 Fail
sub-1050378 1 1 1 1 6.060 10.6 0.84 0.70 0.096 3 Fail
sub-1050383 3 3 1 2 0.905 21.5 1.37 0.36 0.060 2 Caution
sub-1050399 3 3 3 3 0.048 22.5 1.42 0.33 0.035 0 Pass
sub-1050402 3 1 2 2 0.595 16.1 1.02 0.41 0.086 3 Fail
sub-1050403 3 3 2 2 0.988 15.2 0.96 0.35 0.107 4 Fail
sub-1050429 2 3 1 1 3.345 16.7 1.09 0.58 0.121 4 Fail
sub-1050431 3 3 3 3 0.095 23.7 1.47 0.23 0.074 0 Pass
sub-1050432 3 3 3 3 0.095 25.8 1.60 0.22 0.076 0 Pass
sub-1050440 3 3 3 3 0.012 41.2 2.06 0.25 0.012 0 Pass
sub-1050509 3 3 2 2 2.190 18.9 1.23 0.47 0.039 3 Fail
sub-1050528 3 3 2 3 0.833 23.0 1.87 0.33 0.025 1 Pass
sub-1130160 3 3 3 3 0.774 28.8 1.99 0.16 0.016 1 Pass