Multi-Center Multi-Vendor Validation of Liver QSM in Patients with Iron Overload

Subject characteristics

Descriptive statistics for subject-level variables are tabulated below by site and overall. Quantitative variables are summarized by median (inter-quartile range; IQR) and categorical variables by N (%).

Table 1. Patient-level summary.
	UW (N = 52)	UTSW (N = 42)	JHU (N = 35)	Stanford (N = 39)	Overall (N = 168)
Age	50.5 (22.2, 59.2)	46 (36, 56.8)	48 (21, 63)	17 (13, 24)	42.5 (19, 58)
Sex - m	33 (63.5%)	25 (59.5%)	18 (51.4%)	22 (56.4%)	98 (58.3%)
Sex - f	19 (36.5%)	17 (40.5%)	17 (48.6%)	17 (43.6%)	70 (41.7%)
Weight	74.7 (65.4, 96.7)	83.6 (61.5, 105.7)	68 (61, 82.8)	53 (37, 67.5)	69.6 (58.6, 90.8)
Height	171.4 (162.6, 178)	172.7 (162.5, 182.9)	167.6 (161, 174.5)	157 (140.2, 165.2)	168 (158, 177.8)
Ferriscan	3.6 (2, 6.5)	1.4 (0.7, 2.5)	2.4 (1.1, 6.5)	2.7 (1.4, 7.6)	2.5 (1.2, 6.3)

Linear regressions of QSM vs R2* and QSM vs LIC

Numbers of observations per site, field strength (FS), and test/retest are tabulated below.

Table 2. Number (n) of observations per site, FS, and test/retest
Site	FS	Retest	n
UW	1.5	test	50
UW	1.5	retest	30
UW	3.0	test	50
UW	3.0	retest	28
UTSW	1.5	test	42
UTSW	1.5	retest	33
UTSW	3.0	test	42
UTSW	3.0	retest	30
JHU	1.5	test	29
JHU	1.5	retest	11
JHU	3.0	test	33
JHU	3.0	retest	12
Stanford	1.5	test	33
Stanford	3.0	test	34

QSM vs R2*

Site-specific regression lines of QSM vs R2* for each FS and test/retest scenario are plotted below.

We perform F-tests (with 2 degrees of freedom for intercept and slope) to compare the regression lines between pairs of sites. The p-values are summarized in Table 3a.

Table 3a. P-values for pairwise tests of site-specific regression lines for QSM vs R2*
	UW vs UTSW	UW vs JHU	UW vs Stanford	UTSW vs JHU	UTSW vs Stanford	JHU vs Stanford
1.5T	0.902	0.836	0.024	0.762	0.01	0.237
3.0T	0.542	<0.001	0.17	<0.001	0.373	<0.001

Overall regression

1.5T: \[\begin{align} y =& -0.3920289 + 0.0034519x \\ &\pm 1.9794387 \sqrt{6.3324567\times 10^{-4} -5.0394595\times 10^{-6}x + 1.6460459\times 10^{-8}x^2 } \end{align}\]

3.0T: \[\begin{align} y =& -0.4065473 + 0.0019006x \\ &\pm 1.9792801 \sqrt{9.2935532\times 10^{-4} -3.6591751\times 10^{-6}x + 6.266775\times 10^{-9}x^2 } \end{align}\]

QSM vs LIC

Site-specific regression lines of QSM vs liver iron concentration (LIC) for each FS and test/retest scenario are plotted below. Test data also show stronger associations than retest data do.

Similarly, we perform F-tests to compare the regression lines between pairs of sites. The p-values are summarized in Table 3b.

Table 3b. P-values for pairwise tests of site-specific regression lines for QSM vs LIC
	UW vs UTSW	UW vs JHU	UW vs Stanford	UTSW vs JHU	UTSW vs Stanford	JHU vs Stanford
1.5T	0.691	0.397	0.056	0.56	0.008	0.079
3.0T	0.233	0.008	0.002	0.397	0.392	0.034

Overall regression

1.5T: \[\begin{align} y =& -0.2896426 + 0.1083183x \\ &\pm 1.9756939 \sqrt{0.0010086 -2.9226344\times 10^{-4}x + 3.8046129\times 10^{-5}x^2 } \end{align}\]

3.0T: \[\begin{align} y =& -0.3156332 + 0.1169661x \\ &\pm 1.9792801 \sqrt{0.0016396 -4.4648903\times 10^{-4}x + 5.6280129\times 10^{-5}x^2 } \end{align}\]

Repeatability & reproducibility of QSM

Test-retest repeatability

Bland-Altman plots (difference vs mean) for test-retest QSM are plotted by FS and overall below. Data points are color-coded by site.

The bias (mean test-retest difference), repeatability coefficient (RC; range covering 95% test-retest differences), intraclass correlation coefficient (ICC) with 95% confidence interval (CI) and p-value (for testing ICC = 0) are presented in Table 4 below.

Table 4. Test-retest repeatability analysis of QSM.
FS	Bias	RC	ICC	P
1.5T	-0.013	0.283	0.944 (0.912, 0.964)	<0.001
3T	0.029	0.247	0.958 (0.934, 0.974)	<0.001
Overall	0.007	0.268	0.951 (0.932, 0.964)	<0.001

Field strength reproducibility

The Bland-Altman plot for reproducibility between 1.5T and 3T is shown below, with bias, ICC (95% CI), and p-value indicated on the figure.

Sex differences in LIC

LIC values are compared between females and males by site and overall in Table 5 below. The p-values are based on the Wilcoxon rank sum test.

Table 5. Median (IQR) of LIC by site and sex.
Site	Female	Male	P
UW	3.8 (2.3, 6.1)	3.4 (1.8, 7.5)	0.82
UTSW	2.4 (1.4, 5.8)	0.9 (0.7, 1.6)	0.021
JHU	3.8 (1.2, 6.5)	2.1 (1.1, 3.7)	0.346
Stanford	6.2 (2.3, 9.4)	2.1 (1.2, 3.4)	0.018
Overall	3.8 (1.8, 6.6)	2 (1, 4)	0.006

The following boxplot visualizes the comparisons.

Sex differences in QSM vs (R2*, LIC)

Sex-specific regression lines of QSM vs R2* for each FS and test/retest scenario are plotted below, with F-test p-values comparing the two lines indicated on the plots.

Similarly, sex-specific regression lines of QSM vs LIC for each FS and test/retest scenario are plotted below, with F-test p-values comparing the two lines indicated on the plots.