DWI QC: inter-rater agreement review (N=99)

Written: 2019-05-29
Last run: 2019-05-29

Comparison between 4 raters on different bases

This analysis reviews inter-rater agreement of DWI QC ratings of the POND dataset (N=99), on the basis of volume-to-volume visual review (Hajer), dtifit QC outputs (Grace; binary ratings), eddy QUAD QC reports (Navona), and on the basis of a multivariate calculation with data from eddy QUAD and CSD residuals (John; N=77).

This plot displays ratings (y axis) and absolute difference in rating (colour). Here, 1=FAIL, 2=INDETERMINATE, 3=PASS.

For visual review, a list of the participants with absolute difference ratings from smallest (high agreement) to largest (low agreement) is as follows. Note that this list does not adjust for 22 missing Hclust ratings.

##  [1] "sub-0880044" "sub-0880046" "sub-0880050" "sub-0880100" "sub-0880155"
##  [6] "sub-0880157" "sub-0880221" "sub-0880279" "sub-0880293" "sub-0880353"
## [11] "sub-0880418" "sub-0880419" "sub-0880464" "sub-0880494" "sub-0880533"
## [16] "sub-0880601" "sub-0880664" "sub-0880665" "sub-1050015" "sub-1050019"
## [21] "sub-1050090" "sub-1050219" "sub-1050221" "sub-1050239" "sub-1050253"
## [26] "sub-1050288" "sub-1050307" "sub-1050349" "sub-1050353" "sub-1050363"
## [31] "sub-1050369" "sub-1050378" "sub-1050399" "sub-1050431" "sub-1050432"
## [36] "sub-1050440" "sub-1130160" "sub-0880085" "sub-0880096" "sub-0880107"
## [41] "sub-0880254" "sub-0880265" "sub-0880624" "sub-0880669" "sub-0880692"
## [46] "sub-0880693" "sub-0880738" "sub-0880138" "sub-0880303" "sub-0880308"
## [51] "sub-0880372" "sub-0880396" "sub-0880473" "sub-1050029" "sub-1050065"
## [56] "sub-1050089" "sub-1050134" "sub-1050172" "sub-1050178" "sub-1050224"
## [61] "sub-1050254" "sub-1050265" "sub-1050336" "sub-1050341" "sub-1050528"
## [66] "sub-0880061" "sub-0880081" "sub-0880112" "sub-0880369" "sub-0880428"
## [71] "sub-0880558" "sub-0880703" "sub-0880742" "sub-1050136" "sub-1050195"
## [76] "sub-1050218" "sub-1050229" "sub-1050403" "sub-1050509" "sub-0880043"
## [81] "sub-0880397" "sub-1050105" "sub-1050250" "sub-1050402" "sub-0885005"
## [86] "sub-1050007" "sub-1050032" "sub-1050054" "sub-1050055" "sub-1050092"
## [91] "sub-1050131" "sub-1050179" "sub-1050188" "sub-1050194" "sub-1050235"
## [96] "sub-1050383" "sub-1050429" "sub-0880146" "sub-1050135"

In total, we have perfect agreement in only 37 out of 99 instances. As before, we see a trend whereby the ratings based on eddy QC (those made by Navona, average=2.03 and John’s calculation, average=2.23) are lower - i.e., more conservative / more likely to rate FAIL than those based on both visual review of volumes (Hajer, average=2.67) and review of the tensor residuals (Grace, average=2.76). However, Grace may be the most liberal in part because she rated on a binary PASS / FAIL scale.

We can calculate inter-rater agreement for our 4 raters using Fleiss’s Kappa:

##  Fleiss' Kappa for m Raters
## 
##  Subjects = 77 
##    Raters = 4 
##     Kappa = 0.277 
## 
##         z = 8.07 
##   p-value = 6.66e-16

Compare this to Krippendorff’s alpha for ordinal ratings of two or more raters:

##  Krippendorff's alpha
## 
##  Subjects = 99 
##    Raters = 4 
##     alpha = 0.363

We can also calculate intra-class correlation (ICC):

## Call: ICC(x = df[, c("Hajer", "Grace", "Navona", "Hclust")])
## 
## Intraclass correlation coefficients 
##                          type  ICC   F df1 df2       p lower bound
## Single_raters_absolute   ICC1 0.44 4.2  98 297 1.5e-21        0.34
## Single_random_raters     ICC2 0.47 6.2  98 294 1.4e-34        0.29
## Single_fixed_raters      ICC3 0.57 6.2  98 294 1.4e-34        0.47
## Average_raters_absolute ICC1k 0.76 4.2  98 297 1.5e-21        0.67
## Average_random_raters   ICC2k 0.78 6.2  98 294 1.4e-34        0.62
## Average_fixed_raters    ICC3k 0.84 6.2  98 294 1.4e-34        0.78
##                         upper bound
## Single_raters_absolute         0.55
## Single_random_raters           0.61
## Single_fixed_raters            0.66
## Average_raters_absolute        0.83
## Average_random_raters          0.86
## Average_fixed_raters           0.89
## 
##  Number of subjects = 99     Number of Judges =  4

Exploratory cut-off on basis of extracted eddy QC metrics

Below is a table that visually summarizes all raters’ ratings for the 99 participants, alongside 5 quantitative metrics extracted from various eddy QUAD reports: (1) percent outliers, (2) average signal-to-noise ratio, (3) average contrast-to-noise ratio, (4) average relative motion, and (5) noise. Note: Absolute motion has been replace by noise since the last report. We have also included a summary value indicating the count of problematic metrics, and estimating a PASS, CAUTION, or FAIL value.

The thresholds for these variables were set as follows:

Metric	Threshold for FAIL
Percent_Outliers	> 0.2
Average_SNR	< 20
Average_CNR	< 1.4
Rel_Motion	> 1mm
Noise	>.1
Multiple_Issues	simple count
Weighted_Score	<=2 PASS \| 3 CAUTION \| >3 FAIL

The ‘threshold’ for the first 5 eddy-extracted values was set by John based on review of this sample. These thresholds can be adjusted on the basis of a larger subset, or adjusted on the basis of discussion.

The purpose of this visualization is to discuss which of these metrics could/should be used to inform our QC ratings, and to see how each aligns with our 4 raters’ ratings, which were done independently of their review.

Participant	Hajer	Grace	Navona	Hclust	Percent_Outliers	Average_SNR	Average_CNR	Rel_Motion	Noise	Multiple_Issues	Weighted_Score
sub-0880043	3	1	3	3	0.024	20.1	1.37	0.21	0.070	1	Pass
sub-0880044	3	3	3	3	0.083	28.3	1.60	0.31	0.045	0	Pass
sub-0880046	3	3	3	3	0.000	20.3	1.46	0.30	0.065	0	Pass
sub-0880050	3	3	3	3	0.179	23.9	1.45	0.39	0.063	0	Pass
sub-0880061	3	3	2	2	0.500	20.1	1.34	0.38	0.059	2	Caution
sub-0880112	3	3	2	2	0.512	21.2	1.21	0.35	0.083	2	Caution
sub-0880138	3	3	2	3	0.095	26.0	1.48	0.24	0.038	0	Pass
sub-0880146	3	3	1	1	1.607	15.6	1.04	0.64	0.103	4	Fail
sub-0880155	3	3	3	3	0.405	28.7	1.58	0.26	0.027	1	Pass
sub-0880157	3	3	3	3	0.107	26.6	1.60	0.38	0.048	0	Pass
sub-0880221	3	3	3	3	0.452	21.6	1.61	0.25	0.045	1	Pass
sub-0880279	3	3	3	3	0.083	25.3	1.57	0.30	0.056	0	Pass
sub-0880293	3	3	3	3	0.083	24.0	1.56	0.30	0.085	0	Pass
sub-0880303	3	3	2	3	0.083	18.1	1.62	0.24	0.041	1	Pass
sub-0880308	3	3	2	3	1.262	30.9	1.81	0.31	0.029	1	Pass
sub-0880353	3	3	3	3	0.143	26.3	1.66	0.32	0.042	0	Pass
sub-0880369	3	3	2	2	0.667	20.8	1.33	0.43	0.112	3	Fail
sub-0880372	3	3	2	3	0.833	27.0	1.65	0.20	0.039	1	Pass
sub-0880396	3	3	2	3	0.298	24.8	1.44	0.22	0.052	1	Pass
sub-0880397	1	3	1	1	2.214	12.9	0.89	0.66	0.100	4	Fail
sub-0880418	3	3	3	3	0.024	22.0	1.49	0.24	0.059	0	Pass
sub-0880419	3	3	3	3	0.179	23.0	1.44	0.41	0.057	0	Pass
sub-0880428	3	3	2	2	0.048	21.7	1.34	0.37	0.109	2	Caution
sub-0880464	3	3	3	3	0.143	25.0	1.57	0.37	0.054	0	Pass
sub-0880473	2	1	1	1	1.976	9.8	0.45	0.42	0.063	3	Fail
sub-0885005	3	3	1	2	2.202	21.9	1.33	0.38	0.075	2	Caution
sub-1050007	3	3	1	2	0.524	20.6	1.24	0.40	0.090	2	Caution
sub-1050015	1	1	1	1	2.310	10.7	1.04	0.67	0.120	4	Fail
sub-1050019	1	1	1	1	2.893	9.4	0.65	1.03	0.204	5	Fail
sub-1050029	2	3	2	2	1.381	15.9	1.19	0.44	0.096	3	Fail
sub-1050032	2	3	1	1	1.690	7.8	0.76	0.65	0.149	4	Fail
sub-1050054	2	3	1	1	1.976	9.5	1.03	0.94	0.113	4	Fail
sub-1050055	3	3	1	2	0.667	21.4	1.26	0.51	0.103	3	Fail
sub-1050065	3	3	3	2	0.226	20.2	1.25	0.32	0.086	2	Caution
sub-1050089	3	3	3	2	0.190	16.5	1.24	0.29	0.063	2	Caution
sub-1050090	3	3	3	3	0.202	26.1	1.59	0.22	0.038	1	Pass
sub-1050092	3	3	1	2	0.452	18.5	1.24	0.33	0.066	3	Fail
sub-1050105	1	3	1	1	1.298	14.3	0.79	0.92	0.102	4	Fail
sub-1050131	3	3	1	2	1.619	23.0	1.26	0.46	0.059	2	Caution
sub-1050134	3	3	3	2	0.071	17.0	0.89	0.25	0.087	2	Caution
sub-1050135	3	3	1	1	0.655	14.1	0.95	0.57	0.126	4	Fail
sub-1050136	3	3	2	2	0.881	21.5	1.29	0.37	0.072	2	Caution
sub-1050172	3	3	2	3	0.238	26.2	1.47	0.29	0.067	1	Pass
sub-1050178	3	3	2	3	0.429	21.6	1.31	0.31	0.037	2	Caution
sub-1050179	2	3	1	1	4.274	12.1	1.04	0.52	0.066	3	Fail
sub-1050188	3	3	1	2	1.786	17.5	1.27	0.53	0.043	3	Fail
sub-1050194	2	3	1	1	0.536	17.6	0.98	0.67	0.134	4	Fail
sub-1050195	3	3	2	2	0.524	19.5	1.11	0.44	0.131	4	Fail
sub-1050218	3	3	2	2	0.500	20.6	1.32	0.31	0.070	2	Caution
sub-1050219	3	3	3	3	0.083	26.8	1.61	0.30	0.045	0	Pass
sub-1050221	3	3	3	3	0.155	28.8	1.56	0.22	0.041	0	Pass
sub-1050224	3	3	2	3	0.464	28.7	1.79	0.22	0.035	1	Pass
sub-1050229	3	3	2	2	0.464	22.5	1.28	0.29	0.080	2	Caution
sub-1050235	1	3	1	2	0.917	15.0	1.08	0.41	0.062	3	Fail
sub-1050239	3	3	3	3	0.143	32.4	1.80	0.23	0.031	0	Pass
sub-1050250	3	1	1	1	2.167	15.5	1.05	0.55	0.174	4	Fail
sub-1050253	1	1	1	1	5.500	13.3	0.85	0.64	0.098	3	Fail
sub-1050254	3	3	2	3	0.321	23.8	1.50	0.34	0.063	1	Pass
sub-1050265	3	3	2	3	0.560	23.4	1.63	0.29	0.058	1	Pass
sub-1050336	3	3	2	3	0.250	24.0	1.44	0.33	0.090	1	Pass
sub-1050341	3	3	2	3	0.952	26.2	1.57	0.35	0.069	1	Pass
sub-1050349	1	1	1	1	2.583	8.2	0.53	0.86	0.150	4	Fail
sub-1050353	1	1	1	1	5.702	12.8	0.65	1.07	0.115	5	Fail
sub-1050363	3	3	3	3	0.107	25.5	1.64	0.27	0.023	0	Pass
sub-1050369	1	1	1	1	2.012	12.2	0.86	0.68	0.161	4	Fail
sub-1050378	1	1	1	1	6.060	10.6	0.84	0.70	0.096	3	Fail
sub-1050383	3	3	1	2	0.905	21.5	1.37	0.36	0.060	2	Caution
sub-1050399	3	3	3	3	0.048	22.5	1.42	0.33	0.035	0	Pass
sub-1050402	3	1	2	2	0.595	16.1	1.02	0.41	0.086	3	Fail
sub-1050403	3	3	2	2	0.988	15.2	0.96	0.35	0.107	4	Fail
sub-1050429	2	3	1	1	3.345	16.7	1.09	0.58	0.121	4	Fail
sub-1050431	3	3	3	3	0.095	23.7	1.47	0.23	0.074	0	Pass
sub-1050432	3	3	3	3	0.095	25.8	1.60	0.22	0.076	0	Pass
sub-1050440	3	3	3	3	0.012	41.2	2.06	0.25	0.012	0	Pass
sub-1050509	3	3	2	2	2.190	18.9	1.23	0.47	0.039	3	Fail
sub-1050528	3	3	2	3	0.833	23.0	1.87	0.33	0.025	1	Pass
sub-1130160	3	3	3	3	0.774	28.8	1.99	0.16	0.016	1	Pass