CCA analysis

Same analysis as stats class, but all participants

Methods

Participants. Data from 162 participants (78 HC, 84 SSD) were available for analysis.

Canonical Correlation Analysis. Our \(X\) set was comprised of 6 white matter tracts: AF (L), AF (R), ILF (L), ILF (R), UF (L), UF (R). Note that, due to what are assumed to be pre-processing issues, there are a total of 36 missing values across the X set (which were interpolated via a multivariate imputation method), as follows:

FA.AF_L	FA.AF_R	FA.ILF_L	FA.ILF_R	FA.UF_L	FA.UF_R
4	32	0	0	0	0

The \(Y\) set was comprised of 8 behavioural variables: 6 neurocognition variables (MATRICS MCCB factor scores for Processing speed, Attention & vigilance, Working memory, Verbal learning, Visual learning, and Problem solving), and 2 social cognition variables (Mentalizing and Simulation; Oliver, 2018).

Results

Statistical preprocessing. We examined raw correlations within and between the \(X\) and \(Y\) sets, to affirm our conceptual grouping. As expected, we found small-to-moderate positive correlations between brain variables (\(R_{xx}\) matrix mean r=0.2474466, range= -0.0266006 to 0.5631986) and moderate-to-strong correlations between neurocognitive and social cognitive variables (\(R_{yy}\) matrix mean r= 0.4865209, range= 0.3060815 to 0.8643321.

Significance test and canonical correlation. We employed a permutation test (500 bootstraps) to evaluate the null hypothesis of no correlation between the \(X\) and \(Y\) sets. We rejected the null hypothesis; the first function was significant, Wilk’s \(\lambda\) = 0.6238036, F(48, 732.2843661), p = 0.012947. Similar values were seen when employing the Hotelling-Lawley Trace (p=0.012), the Pillai-Bartlett Trace (p=0.012), but not Roy’s Largest Root (p=0.218), with all canonical correlations included in all models.

root	id	stat	approx	df1	df2	p.value
canonical variate 1-6	Wilks	0.6238036	1.5356690	48	732.2844	0.0129470
canonical variate 2-6	Wilks	0.7335834	1.3740115	35	629.2162	0.0766686
canonical variate 3-6	Wilks	0.8194504	1.2836731	24	524.4975	0.1668902
canonical variate 4-6	Wilks	0.9101444	0.9650733	15	417.2459	0.4916385
canonical variate 5-6	Wilks	0.9568344	0.8476901	8	304.0000	0.5613245
canonical variate 6-6	Wilks	0.9971584	0.1453366	3	153.0000	0.9325380

Redundancy. The Stewart-Love redundancy index (i.e., variance in one set that can be explained by the other set) was 0.0834046% for the entire \(X\) set and 0.0934333% for the \(Y\) set. These values are very low.

Canonical correlation. The canonical correlation between X and Y scores of the first function is \(r\)= 0.3868445. This \(r\) value is within the null distribution of \(r\) values, i.e., it is similar to canonical correlation values derived from the null distribution, with randomly shuffled rows of Y set (500 permutations).

Structure coefficients. The table shows standardized canonical function coefficients (\(\beta\)), structure coefficents (\(r_s\)), and squared structure coefficients \(r_s^2\) (here analogous with communalities) for the first derived function. Structure coefficients are mostly commonly reported, and represent the correlation between a given observed variable and the synthetic set to which it belongs. We see that all variables in the \(Y\) set contribute highly to the \(Y\) set, and the left AF and left and right ULF contribute highly to the \(X\) set. The right AF nor left or right UF contributed highly to the \(X\) set, suggesting these tracts may not be very strongly related to the synthetic variable combining neurocognition and social cognition.

	Function 1
Variable	\(\beta\)	\(r_s\)	\(r_s^2\) (\(h^2\))
X
AF (L)	0.243	0.587	0.344
AF (R)	-0.194	0.155	0.024
ILF (L)	0.210	0.735	0.540
ILF (R)	0.742	0.875	0.765
UF (L)	-0.118	0.116	0.013
UF (R)	0.307	0.318	0.101
Y
Processing speed	-0.012	0.669	0.448
Attention & vigilance	-0.080	0.497	0.247
Working memory	-0.077	0.494	0.244
Verbal learning	0.372	0.744	0.553
Visual learning	0.041	0.709	0.503
Problem solving	0.278	0.578	0.334
Simulation	0.554	0.892	0.795
Mentalizing	0.147	0.852	0.726
Note:
\(r_s\) values > .450 are emphasized, following convention; \(\beta\) = standardized canonical function coefficient; \(r_s\) = structure coefficient; \(r_s^2\) = squared structure coefficient, here also communality \(h^2\)

Validation. Lastly, we validated our model via iterative feature removal, i.e., we removed one of the features from the combined 14 features of the \(X\) and \(Y\) sets, ran the CCA, and compared the derived canonical correlation coefficients. This procedure leverages the CCA property that each variable relates to all other variables in both sets: if the model is stable, canonical correlation estimates will remain similar; if unstable, estimate will vary widely. We observed similar values across all iterations, suggesting that our model was stable.

Exploratory analysis. We also compared the respective \(R_{xx}\) and \(R_{yy}\) correlation matrices, and \(R_{xy}\) cross-correlation matrix, between HC and SSD populations. HC is first, followed by SSD. Some differences are evident, particularly in the \(R_{xy}\) matrix:

Discussion

Summary

X set size: 6
Y set size: 8 (scog factor scores)
Number of participants: 162

We found one significant component, λ=.624, F(48, 732.28)=1.53, p=.013, with a moderate magnitude, \(R_c\)=.387. Examination of structure coefficients (\(r_s\)) for the \(X\) set revealed that FA values in the right ILF contributed most highly (\(r_s\)=.875), followed by the left ILF (\(r_s\)=.735) and right AF (\(r_s\)=.587), as in the original stats class analysis . Examination of the \(Y\) set showed high structure coefficients. As is the stats class analysis , the social cognition variables, represented by Simulation (\(r_s\)=.892) and Mentalizing (\(r_s\)=.852) factor scores made the highest contribution to the synthetic \(Y\) set. All neurocognition variables made contributions of threshold (cf. 4/6 in stats class analysis). All structure coefficients had the same sign, indicating that all variables are positively related (cf. negative contribution of left UF in stats class analysis).

Comparison to stats class analysis.

X set size: 6
Y set size: 8 (scog factor scores)
Number of participants: 91

We found one significant component, λ=.397, F(48, 382.93)=1.65, p=.006, with a large magnitude, \(R_c\)=.571. Examination of structure coefficients (\(r_s\)) for the \(X\) set revealed that FA values in the right ILF contributed most highly (\(r_s\)=.832), followed by the left ILF (\(r_s\)=.509) and right AF (\(r_s\)=.464). In contrast, the left AF and right UF made modest contributions, and the left UF a negligible contribution. Examination of the \(Y\) set also showed some very high structure coefficients. Specifically, the social cognition variables, represented by Simulation (\(r_s\)=.911) and Mentalizing (\(r_s\)=.943) factor scores made substantive contributions. Additionally, four of six neurocognition variables made large contributions: Processing speed (\(r_s\)=.593), Working memory (\(r_s\)=.512), Verbal learning (\(r_s\)=.547), and Visual learning (\(r_s\)=.489). All of the \(Y\) set structure coefficients had the same sign, indicating that all variables are positively related; however, the left UF in the \(X\) set was negatively related to the synthetic \(X\) variable. The Attention & vigilance and Problem solving factors did not reveal high structure coefficients, suggesting that those factors were not strongly related to white matter integrity in the included tracts.

Comparison to PAC analysis.

X set size: 72
Y set size: 14 (scog total scores)
Number of participants: 162

No component was significant. The model should not be interpreted. However, for exploration, \(R_c\)=.866. Examination of structure coefficients for the \(X\) set showed no large contribution above threshold. The neurocognition variables Processing speed and Attention & vigilance contributed above threshold to the \(Y\) set.