07 | Confound correction / covariate adjustment

Description. We have decided to "correct", or adjust, all estimates of white matter FA for (i) age, (ii) sex, (iii) parental education (highest years of education across mother and father), and, for patients, (iv) antipsychotic exposure.

Correction method. Correction was applied via residual linear adjustment, in keeping with an earlier lab paper employing PLS. Essentially, this means we fit a linear model between a given white matter FA estimate, covarying for age, sex, parental education and, when present, antipsychotic exposure (i.e., fit <- lm(TOI_FA ~ age + sex + parental_ed + cpze). I extracted participant-wise residuals (resid(fit)) from this model, and added them to the model's intercept (coef(fit)[1]). Note that n=25 patients were missing CPZE data (either because it was unavailable, or because they were not taking medication), and n=5 (unique) patients were missing parental education data. In both cases, the model was fit on the remaining three present covariates. This is important to note explicitly, as the lm method in R does not natively compute over missing values. Note also that this correction happens after data cleaning (removal of impossible values), and interpolation of missing values, and before review/removal of outliers, and transformation to normality.

A note on corrected variables. This code applies correction to all white matter FA estimates, irrespective of if they are intended for inclusion in the CCA X set. We do not apply correction to neurocognition or social cognition variables intended for the Y set. The neurocognition variables have already been corrected at point of data entry, for age and sex, in accordance with published norms. The social cognition variables have not been corrected, but do not show large correlations with age or sex. Note that this is a departure from the lab PLS paper, that corrected only for age and antipsychotic exposure, but did so for both brain measures and cognitive performance scores.

A note on potential caveats. Some recent literature suggests that care should be taken when performing this sort of correction before CCA, as doing so may introduce more error (citation).

#melt dataframe
df_plot <- reshape2::melt(df_plot, id.vars=c('group','demo_sex','demo_age','parental_ed', 'cpz_eq'))
df_plot_SSD <- df_plot[df_plot$group == 'case',]

#function for ggplots -- continuous
plot_con_fn <- function(df, xvar, xlab){
ggplot(df, aes_string(x=xvar, y='value', color='group')) +
  geom_point(alpha=.5) +
  geom_smooth(method=lm, se=FALSE) +
  geom_rug() +  
  theme(legend.position = 'top',
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        legend.title = element_blank()) +
  ylab('FA') +
  xlab(xlab) +
  facet_wrap(~variable, scales = 'free', ncol=6)
}

#function for ggplots -- categorical
plot_cat_fn <- function(df){
ggplot(df, aes(x=demo_sex, y=value)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(alpha=.5, aes(color=group)) +
  geom_rug(aes(color=group)) +  
  theme(legend.position = 'top',
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        legend.title = element_blank()) +
  scale_color_brewer(palette='Dark2') +
  ylab('FA') +
  xlab('Sex') +
  facet_wrap(~variable, scales = 'free', ncol=6)
}

Pre-correction

Visualization. In the tabs below, we visualize correlations, pre-correction, between estimates of white matter FA in all available tracts, and the four covariates/confounds.

Age

plot_con_fn(df_plot, 'demo_age', 'Age')

Parental education

plot_con_fn(df_plot, 'parental_ed', 'Parental Education (highest value)')

CPZE

plot_con_fn(df_plot_SSD, 'cpz_eq', 'CPZ Equivalents')