Written: 2019-05-29
Last run: 2019-05-29
This analysis reviews inter-rater agreement of DWI QC ratings of the POND dataset (N=99), on the basis of volume-to-volume visual review (Hajer), dtifit QC outputs (Grace; binary ratings), eddy QUAD QC reports (Navona), and on the basis of a multivariate calculation with data from eddy QUAD and CSD residuals (John; N=77).
This plot displays ratings (y axis) and absolute difference in rating (colour). Here, 1=FAIL, 2=INDETERMINATE, 3=PASS.
For visual review, a list of the participants with absolute difference ratings from smallest (high agreement) to largest (low agreement) is as follows. Note that this list does not adjust for 22 missing Hclust ratings.
## [1] "sub-0880044" "sub-0880046" "sub-0880050" "sub-0880100" "sub-0880155"
## [6] "sub-0880157" "sub-0880221" "sub-0880279" "sub-0880293" "sub-0880353"
## [11] "sub-0880418" "sub-0880419" "sub-0880464" "sub-0880494" "sub-0880533"
## [16] "sub-0880601" "sub-0880664" "sub-0880665" "sub-1050015" "sub-1050019"
## [21] "sub-1050090" "sub-1050219" "sub-1050221" "sub-1050239" "sub-1050253"
## [26] "sub-1050288" "sub-1050307" "sub-1050349" "sub-1050353" "sub-1050363"
## [31] "sub-1050369" "sub-1050378" "sub-1050399" "sub-1050431" "sub-1050432"
## [36] "sub-1050440" "sub-1130160" "sub-0880085" "sub-0880096" "sub-0880107"
## [41] "sub-0880254" "sub-0880265" "sub-0880624" "sub-0880669" "sub-0880692"
## [46] "sub-0880693" "sub-0880738" "sub-0880138" "sub-0880303" "sub-0880308"
## [51] "sub-0880372" "sub-0880396" "sub-0880473" "sub-1050029" "sub-1050065"
## [56] "sub-1050089" "sub-1050134" "sub-1050172" "sub-1050178" "sub-1050224"
## [61] "sub-1050254" "sub-1050265" "sub-1050336" "sub-1050341" "sub-1050528"
## [66] "sub-0880061" "sub-0880081" "sub-0880112" "sub-0880369" "sub-0880428"
## [71] "sub-0880558" "sub-0880703" "sub-0880742" "sub-1050136" "sub-1050195"
## [76] "sub-1050218" "sub-1050229" "sub-1050403" "sub-1050509" "sub-0880043"
## [81] "sub-0880397" "sub-1050105" "sub-1050250" "sub-1050402" "sub-0885005"
## [86] "sub-1050007" "sub-1050032" "sub-1050054" "sub-1050055" "sub-1050092"
## [91] "sub-1050131" "sub-1050179" "sub-1050188" "sub-1050194" "sub-1050235"
## [96] "sub-1050383" "sub-1050429" "sub-0880146" "sub-1050135"
In total, we have perfect agreement in only 37 out of 99 instances. As before, we see a trend whereby the ratings based on eddy QC (those made by Navona, average=2.03 and John’s calculation, average=2.23) are lower - i.e., more conservative / more likely to rate FAIL than those based on both visual review of volumes (Hajer, average=2.67) and review of the tensor residuals (Grace, average=2.76). However, Grace may be the most liberal in part because she rated on a binary PASS / FAIL scale.
We can calculate inter-rater agreement for our 4 raters using Fleiss’s Kappa:
## Fleiss' Kappa for m Raters
##
## Subjects = 77
## Raters = 4
## Kappa = 0.277
##
## z = 8.07
## p-value = 6.66e-16
Compare this to Krippendorff’s alpha for ordinal ratings of two or more raters:
## Krippendorff's alpha
##
## Subjects = 99
## Raters = 4
## alpha = 0.363
We can also calculate intra-class correlation (ICC):
## Call: ICC(x = df[, c("Hajer", "Grace", "Navona", "Hclust")])
##
## Intraclass correlation coefficients
## type ICC F df1 df2 p lower bound
## Single_raters_absolute ICC1 0.44 4.2 98 297 1.5e-21 0.34
## Single_random_raters ICC2 0.47 6.2 98 294 1.4e-34 0.29
## Single_fixed_raters ICC3 0.57 6.2 98 294 1.4e-34 0.47
## Average_raters_absolute ICC1k 0.76 4.2 98 297 1.5e-21 0.67
## Average_random_raters ICC2k 0.78 6.2 98 294 1.4e-34 0.62
## Average_fixed_raters ICC3k 0.84 6.2 98 294 1.4e-34 0.78
## upper bound
## Single_raters_absolute 0.55
## Single_random_raters 0.61
## Single_fixed_raters 0.66
## Average_raters_absolute 0.83
## Average_random_raters 0.86
## Average_fixed_raters 0.89
##
## Number of subjects = 99 Number of Judges = 4
Below is a table that visually summarizes all raters’ ratings for the 99 participants, alongside 5 quantitative metrics extracted from various eddy QUAD reports: (1) percent outliers, (2) average signal-to-noise ratio, (3) average contrast-to-noise ratio, (4) average relative motion, and (5) noise. Note: Absolute motion has been replace by noise since the last report. We have also included a summary value indicating the count of problematic metrics, and estimating a PASS, CAUTION, or FAIL value.
The thresholds for these variables were set as follows:
Metric | Threshold for FAIL |
---|---|
Percent_Outliers | > 0.2 |
Average_SNR | < 20 |
Average_CNR | < 1.4 |
Rel_Motion | > 1mm |
Noise | >.1 |
Multiple_Issues | simple count |
Weighted_Score | <=2 PASS | 3 CAUTION | >3 FAIL |
The ‘threshold’ for the first 5 eddy-extracted values was set by John based on review of this sample. These thresholds can be adjusted on the basis of a larger subset, or adjusted on the basis of discussion.
The purpose of this visualization is to discuss which of these metrics could/should be used to inform our QC ratings, and to see how each aligns with our 4 raters’ ratings, which were done independently of their review.
Participant | Hajer | Grace | Navona | Hclust | Percent_Outliers | Average_SNR | Average_CNR | Rel_Motion | Noise | Multiple_Issues | Weighted_Score |
---|---|---|---|---|---|---|---|---|---|---|---|
sub-0880043 | 3 | 1 | 3 | 3 | 0.024 | 20.1 | 1.37 | 0.21 | 0.070 | 1 | Pass |
sub-0880044 | 3 | 3 | 3 | 3 | 0.083 | 28.3 | 1.60 | 0.31 | 0.045 | 0 | Pass |
sub-0880046 | 3 | 3 | 3 | 3 | 0.000 | 20.3 | 1.46 | 0.30 | 0.065 | 0 | Pass |
sub-0880050 | 3 | 3 | 3 | 3 | 0.179 | 23.9 | 1.45 | 0.39 | 0.063 | 0 | Pass |
sub-0880061 | 3 | 3 | 2 | 2 | 0.500 | 20.1 | 1.34 | 0.38 | 0.059 | 2 | Caution |
sub-0880112 | 3 | 3 | 2 | 2 | 0.512 | 21.2 | 1.21 | 0.35 | 0.083 | 2 | Caution |
sub-0880138 | 3 | 3 | 2 | 3 | 0.095 | 26.0 | 1.48 | 0.24 | 0.038 | 0 | Pass |
sub-0880146 | 3 | 3 | 1 | 1 | 1.607 | 15.6 | 1.04 | 0.64 | 0.103 | 4 | Fail |
sub-0880155 | 3 | 3 | 3 | 3 | 0.405 | 28.7 | 1.58 | 0.26 | 0.027 | 1 | Pass |
sub-0880157 | 3 | 3 | 3 | 3 | 0.107 | 26.6 | 1.60 | 0.38 | 0.048 | 0 | Pass |
sub-0880221 | 3 | 3 | 3 | 3 | 0.452 | 21.6 | 1.61 | 0.25 | 0.045 | 1 | Pass |
sub-0880279 | 3 | 3 | 3 | 3 | 0.083 | 25.3 | 1.57 | 0.30 | 0.056 | 0 | Pass |
sub-0880293 | 3 | 3 | 3 | 3 | 0.083 | 24.0 | 1.56 | 0.30 | 0.085 | 0 | Pass |
sub-0880303 | 3 | 3 | 2 | 3 | 0.083 | 18.1 | 1.62 | 0.24 | 0.041 | 1 | Pass |
sub-0880308 | 3 | 3 | 2 | 3 | 1.262 | 30.9 | 1.81 | 0.31 | 0.029 | 1 | Pass |
sub-0880353 | 3 | 3 | 3 | 3 | 0.143 | 26.3 | 1.66 | 0.32 | 0.042 | 0 | Pass |
sub-0880369 | 3 | 3 | 2 | 2 | 0.667 | 20.8 | 1.33 | 0.43 | 0.112 | 3 | Fail |
sub-0880372 | 3 | 3 | 2 | 3 | 0.833 | 27.0 | 1.65 | 0.20 | 0.039 | 1 | Pass |
sub-0880396 | 3 | 3 | 2 | 3 | 0.298 | 24.8 | 1.44 | 0.22 | 0.052 | 1 | Pass |
sub-0880397 | 1 | 3 | 1 | 1 | 2.214 | 12.9 | 0.89 | 0.66 | 0.100 | 4 | Fail |
sub-0880418 | 3 | 3 | 3 | 3 | 0.024 | 22.0 | 1.49 | 0.24 | 0.059 | 0 | Pass |
sub-0880419 | 3 | 3 | 3 | 3 | 0.179 | 23.0 | 1.44 | 0.41 | 0.057 | 0 | Pass |
sub-0880428 | 3 | 3 | 2 | 2 | 0.048 | 21.7 | 1.34 | 0.37 | 0.109 | 2 | Caution |
sub-0880464 | 3 | 3 | 3 | 3 | 0.143 | 25.0 | 1.57 | 0.37 | 0.054 | 0 | Pass |
sub-0880473 | 2 | 1 | 1 | 1 | 1.976 | 9.8 | 0.45 | 0.42 | 0.063 | 3 | Fail |
sub-0885005 | 3 | 3 | 1 | 2 | 2.202 | 21.9 | 1.33 | 0.38 | 0.075 | 2 | Caution |
sub-1050007 | 3 | 3 | 1 | 2 | 0.524 | 20.6 | 1.24 | 0.40 | 0.090 | 2 | Caution |
sub-1050015 | 1 | 1 | 1 | 1 | 2.310 | 10.7 | 1.04 | 0.67 | 0.120 | 4 | Fail |
sub-1050019 | 1 | 1 | 1 | 1 | 2.893 | 9.4 | 0.65 | 1.03 | 0.204 | 5 | Fail |
sub-1050029 | 2 | 3 | 2 | 2 | 1.381 | 15.9 | 1.19 | 0.44 | 0.096 | 3 | Fail |
sub-1050032 | 2 | 3 | 1 | 1 | 1.690 | 7.8 | 0.76 | 0.65 | 0.149 | 4 | Fail |
sub-1050054 | 2 | 3 | 1 | 1 | 1.976 | 9.5 | 1.03 | 0.94 | 0.113 | 4 | Fail |
sub-1050055 | 3 | 3 | 1 | 2 | 0.667 | 21.4 | 1.26 | 0.51 | 0.103 | 3 | Fail |
sub-1050065 | 3 | 3 | 3 | 2 | 0.226 | 20.2 | 1.25 | 0.32 | 0.086 | 2 | Caution |
sub-1050089 | 3 | 3 | 3 | 2 | 0.190 | 16.5 | 1.24 | 0.29 | 0.063 | 2 | Caution |
sub-1050090 | 3 | 3 | 3 | 3 | 0.202 | 26.1 | 1.59 | 0.22 | 0.038 | 1 | Pass |
sub-1050092 | 3 | 3 | 1 | 2 | 0.452 | 18.5 | 1.24 | 0.33 | 0.066 | 3 | Fail |
sub-1050105 | 1 | 3 | 1 | 1 | 1.298 | 14.3 | 0.79 | 0.92 | 0.102 | 4 | Fail |
sub-1050131 | 3 | 3 | 1 | 2 | 1.619 | 23.0 | 1.26 | 0.46 | 0.059 | 2 | Caution |
sub-1050134 | 3 | 3 | 3 | 2 | 0.071 | 17.0 | 0.89 | 0.25 | 0.087 | 2 | Caution |
sub-1050135 | 3 | 3 | 1 | 1 | 0.655 | 14.1 | 0.95 | 0.57 | 0.126 | 4 | Fail |
sub-1050136 | 3 | 3 | 2 | 2 | 0.881 | 21.5 | 1.29 | 0.37 | 0.072 | 2 | Caution |
sub-1050172 | 3 | 3 | 2 | 3 | 0.238 | 26.2 | 1.47 | 0.29 | 0.067 | 1 | Pass |
sub-1050178 | 3 | 3 | 2 | 3 | 0.429 | 21.6 | 1.31 | 0.31 | 0.037 | 2 | Caution |
sub-1050179 | 2 | 3 | 1 | 1 | 4.274 | 12.1 | 1.04 | 0.52 | 0.066 | 3 | Fail |
sub-1050188 | 3 | 3 | 1 | 2 | 1.786 | 17.5 | 1.27 | 0.53 | 0.043 | 3 | Fail |
sub-1050194 | 2 | 3 | 1 | 1 | 0.536 | 17.6 | 0.98 | 0.67 | 0.134 | 4 | Fail |
sub-1050195 | 3 | 3 | 2 | 2 | 0.524 | 19.5 | 1.11 | 0.44 | 0.131 | 4 | Fail |
sub-1050218 | 3 | 3 | 2 | 2 | 0.500 | 20.6 | 1.32 | 0.31 | 0.070 | 2 | Caution |
sub-1050219 | 3 | 3 | 3 | 3 | 0.083 | 26.8 | 1.61 | 0.30 | 0.045 | 0 | Pass |
sub-1050221 | 3 | 3 | 3 | 3 | 0.155 | 28.8 | 1.56 | 0.22 | 0.041 | 0 | Pass |
sub-1050224 | 3 | 3 | 2 | 3 | 0.464 | 28.7 | 1.79 | 0.22 | 0.035 | 1 | Pass |
sub-1050229 | 3 | 3 | 2 | 2 | 0.464 | 22.5 | 1.28 | 0.29 | 0.080 | 2 | Caution |
sub-1050235 | 1 | 3 | 1 | 2 | 0.917 | 15.0 | 1.08 | 0.41 | 0.062 | 3 | Fail |
sub-1050239 | 3 | 3 | 3 | 3 | 0.143 | 32.4 | 1.80 | 0.23 | 0.031 | 0 | Pass |
sub-1050250 | 3 | 1 | 1 | 1 | 2.167 | 15.5 | 1.05 | 0.55 | 0.174 | 4 | Fail |
sub-1050253 | 1 | 1 | 1 | 1 | 5.500 | 13.3 | 0.85 | 0.64 | 0.098 | 3 | Fail |
sub-1050254 | 3 | 3 | 2 | 3 | 0.321 | 23.8 | 1.50 | 0.34 | 0.063 | 1 | Pass |
sub-1050265 | 3 | 3 | 2 | 3 | 0.560 | 23.4 | 1.63 | 0.29 | 0.058 | 1 | Pass |
sub-1050336 | 3 | 3 | 2 | 3 | 0.250 | 24.0 | 1.44 | 0.33 | 0.090 | 1 | Pass |
sub-1050341 | 3 | 3 | 2 | 3 | 0.952 | 26.2 | 1.57 | 0.35 | 0.069 | 1 | Pass |
sub-1050349 | 1 | 1 | 1 | 1 | 2.583 | 8.2 | 0.53 | 0.86 | 0.150 | 4 | Fail |
sub-1050353 | 1 | 1 | 1 | 1 | 5.702 | 12.8 | 0.65 | 1.07 | 0.115 | 5 | Fail |
sub-1050363 | 3 | 3 | 3 | 3 | 0.107 | 25.5 | 1.64 | 0.27 | 0.023 | 0 | Pass |
sub-1050369 | 1 | 1 | 1 | 1 | 2.012 | 12.2 | 0.86 | 0.68 | 0.161 | 4 | Fail |
sub-1050378 | 1 | 1 | 1 | 1 | 6.060 | 10.6 | 0.84 | 0.70 | 0.096 | 3 | Fail |
sub-1050383 | 3 | 3 | 1 | 2 | 0.905 | 21.5 | 1.37 | 0.36 | 0.060 | 2 | Caution |
sub-1050399 | 3 | 3 | 3 | 3 | 0.048 | 22.5 | 1.42 | 0.33 | 0.035 | 0 | Pass |
sub-1050402 | 3 | 1 | 2 | 2 | 0.595 | 16.1 | 1.02 | 0.41 | 0.086 | 3 | Fail |
sub-1050403 | 3 | 3 | 2 | 2 | 0.988 | 15.2 | 0.96 | 0.35 | 0.107 | 4 | Fail |
sub-1050429 | 2 | 3 | 1 | 1 | 3.345 | 16.7 | 1.09 | 0.58 | 0.121 | 4 | Fail |
sub-1050431 | 3 | 3 | 3 | 3 | 0.095 | 23.7 | 1.47 | 0.23 | 0.074 | 0 | Pass |
sub-1050432 | 3 | 3 | 3 | 3 | 0.095 | 25.8 | 1.60 | 0.22 | 0.076 | 0 | Pass |
sub-1050440 | 3 | 3 | 3 | 3 | 0.012 | 41.2 | 2.06 | 0.25 | 0.012 | 0 | Pass |
sub-1050509 | 3 | 3 | 2 | 2 | 2.190 | 18.9 | 1.23 | 0.47 | 0.039 | 3 | Fail |
sub-1050528 | 3 | 3 | 2 | 3 | 0.833 | 23.0 | 1.87 | 0.33 | 0.025 | 1 | Pass |
sub-1130160 | 3 | 3 | 3 | 3 | 0.774 | 28.8 | 1.99 | 0.16 | 0.016 | 1 | Pass |