imaging data pulled: 2021-10-12
clinical data pulled: 2020-11-16
code written: 2021-11-05
last ran: 2021-11-15
Description. Here, we examine the distributions of cognitive and LC NM-MRI data. Both cognition and imaging variables have already been corrected for age and sex, in the prior script 01_dataCleaning.Rmd
. The present script finds that we need to additionally correct the reaction time cognition variables on the DKEFS for normality. (Note that variables are not standardized to zero mean and unit variance; this is done prior to the CCA in 06_CCA
).
Load libraries and data
#clear environment
rm(list = ls())
#list required libraries
packages <- c(
'tidyverse',
'kableExtra', #pretty tables
'reshape2', #wrangling
'bestNormalize',
'MVN'
)
#load required libraries
lapply(packages, require, character.only = TRUE)
#read in cleaned participant demographic/clinical/cognition/lc data
df <- read.csv(dir('../clinical', full.names=T, pattern="^df_2021")) #48
Functions for analyses
#function to plot violin distributions for univariate analysis
plotDistribution_fn <- function(df){
ggplot(df, aes(x=Diagnosis, y=value, color=Diagnosis, fill=Diagnosis)) +
geom_violin(alpha=.1) +
geom_dotplot(dotsize=1, binaxis='y', stackdir='center') +
geom_boxplot(alpha=.2, width=0.5, outlier.colour='black', outlier.size=3, outlier.alpha=1) +
theme_minimal() +
theme(legend.position = 'none',
axis.title = element_blank()) +
facet_wrap(~variable, scales='free')
}
The plots below show the LC NM-MRI data distributions, by diagnosis. Outliers are plotted in black. Note that we decided not to remove any outlying LC NM-MRI values, as all appear to be plausible.
#identify the LC variables
vars_LC <- names(df[,grep('^avg_max.+[0-9]_cor$', names(df))])
#pull out LC variables, ID, and diagnosis
df_LC <- df[, c('id', 'Diagnosis', vars_LC)]
#melt data, for easier plotting
df_LC <- melt(df_LC, id.vars=c('id', 'Diagnosis'))
#run plotting function
plotDistribution_fn(df_LC)
#function to pull outliers out of data -- for manual review
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
#identify outliers
df_LC <- df_LC %>%
group_by(variable) %>%
mutate(outlier = ifelse(is_outlier(value), value, as.numeric(NA)))
The participants to double check the values of, for possible manual intervention, are as follows: SEN039 avg_max_seg_1_cor, SEN046 avg_max_seg_3_cor, SEN047 avg_max_seg_5_cor, SEN029 avg_max_seg_6_cor, SEN087 avg_max_seg_6_cor.
We also reviewed univariate normality, via the Shapiro-Wilk’s test. Normality is indicated by p values >.05. We see that LC NM-MRI values are effectively normal, with a potential small deviation in avg_max_seg_6
, i.e., right caudal LC.
mvn(df[,vars_LC], mvnTest = 'mardia', univariatePlot='qqplot')$univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk avg_max_seg_1_cor 0.9731 0.3322 YES
## 2 Shapiro-Wilk avg_max_seg_2_cor 0.9774 0.4769 YES
## 3 Shapiro-Wilk avg_max_seg_3_cor 0.9722 0.3087 YES
## 4 Shapiro-Wilk avg_max_seg_4_cor 0.9929 0.9916 YES
## 5 Shapiro-Wilk avg_max_seg_5_cor 0.9677 0.2065 YES
## 6 Shapiro-Wilk avg_max_seg_6_cor 0.9468 0.0299 NO
mvn(df[df$Diagnosis == 'LLD', vars_LC], mvnTest = 'mardia', univariatePlot='qqplot')$univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk avg_max_seg_1_cor 0.9699 0.6437 YES
## 2 Shapiro-Wilk avg_max_seg_2_cor 0.9359 0.1190 YES
## 3 Shapiro-Wilk avg_max_seg_3_cor 0.9785 0.8540 YES
## 4 Shapiro-Wilk avg_max_seg_4_cor 0.9780 0.8423 YES
## 5 Shapiro-Wilk avg_max_seg_5_cor 0.9330 0.1017 YES
## 6 Shapiro-Wilk avg_max_seg_6_cor 0.9468 0.2120 YES
mvn(df[df$Diagnosis == 'HC', vars_LC], mvnTest = 'mardia', univariatePlot='qqplot')$univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk avg_max_seg_1_cor 0.9254 0.0871 YES
## 2 Shapiro-Wilk avg_max_seg_2_cor 0.9733 0.7673 YES
## 3 Shapiro-Wilk avg_max_seg_3_cor 0.9523 0.3257 YES
## 4 Shapiro-Wilk avg_max_seg_4_cor 0.9590 0.4432 YES
## 5 Shapiro-Wilk avg_max_seg_5_cor 0.9826 0.9457 YES
## 6 Shapiro-Wilk avg_max_seg_6_cor 0.9066 0.0346 NO
Next, we assessed multivariate normality via Mardia’s test. Normality is indicated by p values >.05. Though the combined sample has evidence of skewness, it does not seem particularly problematic, and we have opted not to adjusted these the LC NM-MRI values as a consequence.
mvn(df[,vars_LC], mvnTest = 'mardia', multivariateOutlierMethod='adj')$multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 80.2701250017644 0.0184102830303479 NO
## 2 Mardia Kurtosis 0.160730717974568 0.872305494533103 YES
## 3 MVN <NA> <NA> NO
mvn(df[df$Diagnosis == 'LLD', vars_LC], mvnTest = 'mardia', multivariateOutlierMethod='adj')$multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 57.4114310562735 0.422626028728036 YES
## 2 Mardia Kurtosis -0.102512139294482 0.918350177847724 YES
## 3 MVN <NA> <NA> YES
mvn(df[df$Diagnosis == 'HC', vars_LC], mvnTest = 'mardia', multivariateOutlierMethod='adj')$multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 62.5737714401465 0.254353775116634 YES
## 2 Mardia Kurtosis -0.656660408378788 0.511399297074907 YES
## 3 MVN <NA> <NA> YES
Lastly, we performanced a variance test for all LC NM-MRI values, to ensure that LLD and HC groups show similar variance. Equal variance is indicated by p values >.5. We see all LC NM-MRI values have indistinguishable variance.
lapply(df[, vars_LC], function(x) var.test(x ~ df$Diagnosis, alternative='two.sided'))
## $avg_max_seg_1_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 1.3011, num df = 22, denom df = 24, p-value = 0.5287
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.5666854 3.0334117
## sample estimates:
## ratio of variances
## 1.301056
##
##
## $avg_max_seg_2_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.5961, num df = 22, denom df = 24, p-value = 0.2267
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2596359 1.3898055
## sample estimates:
## ratio of variances
## 0.5960994
##
##
## $avg_max_seg_3_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.73734, num df = 22, denom df = 24, p-value = 0.4758
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.3211538 1.7191049
## sample estimates:
## ratio of variances
## 0.7373387
##
##
## $avg_max_seg_4_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 1.4009, num df = 22, denom df = 24, p-value = 0.4208
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.6101724 3.2661933
## sample estimates:
## ratio of variances
## 1.400898
##
##
## $avg_max_seg_5_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.57311, num df = 22, denom df = 24, p-value = 0.1937
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.249623 1.336207
## sample estimates:
## ratio of variances
## 0.5731108
##
##
## $avg_max_seg_6_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 1.5124, num df = 22, denom df = 24, p-value = 0.3242
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.6587491 3.5262195
## sample estimates:
## ratio of variances
## 1.512426
As above, these plots show the cognition data distributions, by diagnosis. Outliers are plotted in black. Note that we decided not to remove any outlying cognition values, as all are “real” values, within the possible range of scores on the various assessments.
#identify the cognition variables
vars_cognition <- names(df[,grep('^rbans.+index_cor$|^dkefs.+_cor$', names(df))])
#put cognition, ID, and diagnosis in a separate df
df_cognition <- df[, c('id', 'Diagnosis', vars_cognition)]
#melt data
df_cognition <- melt(df_cognition, id.vars=c('id', 'Diagnosis'))
#run function
plotDistribution_fn(df_cognition)
As above, we review univariate normality with Shapiro-Wilk’s test (normality shown by p values >.05). We see that several variables are not normal. Most are reaction time scores on the D-KEFS. We have opted to correct all reaction time variables for non-normality. We do not correct non-normal, non-reaction time variables, as they are not appear to pose significant deviations from normality.
mvn(df[, vars_cognition], mvnTest = 'mardia', univariatePlot='qqplot')$univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk rbans_immmemory_index_cor 0.9664 0.1834 YES
## 2 Shapiro-Wilk rbans_visuo_index_cor 0.9856 0.8137 YES
## 3 Shapiro-Wilk rbans_language_index_cor 0.9712 0.2811 YES
## 4 Shapiro-Wilk rbans_attention_index_cor 0.9763 0.4341 YES
## 5 Shapiro-Wilk rbans_delmem_index_cor 0.9346 0.0101 NO
## 6 Shapiro-Wilk dkefs_trails4_time_cor 0.9319 0.008 NO
## 7 Shapiro-Wilk dkefs_trails5_time_cor 0.8303 <0.001 NO
## 8 Shapiro-Wilk dkefs_cwi_1_time_cor 0.8402 <0.001 NO
## 9 Shapiro-Wilk dkefs_cwi_2_time_cor 0.8349 <0.001 NO
## 10 Shapiro-Wilk dkefs_cwi_3_time_cor 0.8569 <0.001 NO
## 11 Shapiro-Wilk dkefs_cwi_4_time_cor 0.9679 0.2104 YES
## 12 Shapiro-Wilk dkefs_vf_lftotalcorrect_cor 0.9799 0.5766 YES
## 13 Shapiro-Wilk dkefs_vf_cftotalcorrect_cor 0.9759 0.4225 YES
## 14 Shapiro-Wilk dkefs_vf_csswitchtotcor_cor 0.9337 0.0094 NO
As expected, not all variables are normal in LLD.
mvn(df[df$Diagnosis == 'LLD', vars_cognition], mvnTest = 'mardia', univariatePlot='qqplot')$univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk rbans_immmemory_index_cor 0.9430 0.1736 YES
## 2 Shapiro-Wilk rbans_visuo_index_cor 0.9806 0.8967 YES
## 3 Shapiro-Wilk rbans_language_index_cor 0.9280 0.0783 YES
## 4 Shapiro-Wilk rbans_attention_index_cor 0.9730 0.7220 YES
## 5 Shapiro-Wilk rbans_delmem_index_cor 0.8709 0.0045 NO
## 6 Shapiro-Wilk dkefs_trails4_time_cor 0.8942 0.0137 NO
## 7 Shapiro-Wilk dkefs_trails5_time_cor 0.9739 0.7431 YES
## 8 Shapiro-Wilk dkefs_cwi_1_time_cor 0.8034 0.0003 NO
## 9 Shapiro-Wilk dkefs_cwi_2_time_cor 0.7882 0.0001 NO
## 10 Shapiro-Wilk dkefs_cwi_3_time_cor 0.8150 0.0004 NO
## 11 Shapiro-Wilk dkefs_cwi_4_time_cor 0.9476 0.2217 YES
## 12 Shapiro-Wilk dkefs_vf_lftotalcorrect_cor 0.9693 0.6283 YES
## 13 Shapiro-Wilk dkefs_vf_cftotalcorrect_cor 0.9766 0.8105 YES
## 14 Shapiro-Wilk dkefs_vf_csswitchtotcor_cor 0.9166 0.0428 NO
As expected, not all variables are normal in HC.
mvn(df[df$Diagnosis == 'HC', vars_cognition], mvnTest = 'mardia', univariatePlot='qqplot')$univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk rbans_immmemory_index_cor 0.9686 0.6565 YES
## 2 Shapiro-Wilk rbans_visuo_index_cor 0.9673 0.6244 YES
## 3 Shapiro-Wilk rbans_language_index_cor 0.9370 0.1549 YES
## 4 Shapiro-Wilk rbans_attention_index_cor 0.9508 0.3046 YES
## 5 Shapiro-Wilk rbans_delmem_index_cor 0.9697 0.6826 YES
## 6 Shapiro-Wilk dkefs_trails4_time_cor 0.9639 0.5454 YES
## 7 Shapiro-Wilk dkefs_trails5_time_cor 0.7399 <0.001 NO
## 8 Shapiro-Wilk dkefs_cwi_1_time_cor 0.8983 0.0233 NO
## 9 Shapiro-Wilk dkefs_cwi_2_time_cor 0.8624 0.0046 NO
## 10 Shapiro-Wilk dkefs_cwi_3_time_cor 0.9277 0.0977 YES
## 11 Shapiro-Wilk dkefs_cwi_4_time_cor 0.9803 0.9107 YES
## 12 Shapiro-Wilk dkefs_vf_lftotalcorrect_cor 0.9633 0.5322 YES
## 13 Shapiro-Wilk dkefs_vf_cftotalcorrect_cor 0.9600 0.4638 YES
## 14 Shapiro-Wilk dkefs_vf_csswitchtotcor_cor 0.9479 0.2642 YES
As above, we review Mardia’s test for multivariate normality (normality shown by p values >.05). As with the LC NM-MRI values, we see evidence of skewness in the combined sample, but this is not present in the separate diagnostic groups.
mvn(df[,vars_cognition], mvnTest = 'mardia', multivariateOutlierMethod='adj')$multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 687.45883813858 0.000176887770014033 NO
## 2 Mardia Kurtosis 0.624677068615212 0.532183026436021 YES
## 3 MVN <NA> <NA> NO
mvn(df[df$Diagnosis == 'LLD', vars_cognition], mvnTest = 'mardia', multivariateOutlierMethod='adj')$multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 544.834053007212 0.669074356353767 YES
## 2 Mardia Kurtosis -1.16565590897873 0.24375359338542 YES
## 3 MVN <NA> <NA> YES
mvn(df[df$Diagnosis == 'HC', vars_cognition], mvnTest = 'mardia', multivariateOutlierMethod='adj')$multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 511.497343769698 0.929667137951828 YES
## 2 Mardia Kurtosis -1.85729924566618 0.0632685916860407 YES
## 3 MVN <NA> <NA> YES
Lastly, we review variance, to ensure that participants in the HC and LLD groups show similar values. three reaction time variables show unequal variance (before these variables have been corrected for non-normality), which contributes to our decision to correct the reaction-time cognition variables.
lapply(df[, vars_cognition], function(x) var.test(x ~ df$Diagnosis, alternative='two.sided'))
## $rbans_immmemory_index_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.73587, num df = 22, denom df = 24, p-value = 0.4729
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.3205138 1.7156790
## sample estimates:
## ratio of variances
## 0.7358693
##
##
## $rbans_visuo_index_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.82984, num df = 22, denom df = 24, p-value = 0.6635
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.3614423 1.9347653
## sample estimates:
## ratio of variances
## 0.8298373
##
##
## $rbans_language_index_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 1.627, num df = 22, denom df = 24, p-value = 0.2469
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7086466 3.7933161
## sample estimates:
## ratio of variances
## 1.626986
##
##
## $rbans_attention_index_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.91167, num df = 22, denom df = 24, p-value = 0.8314
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.3970834 2.1255487
## sample estimates:
## ratio of variances
## 0.9116659
##
##
## $rbans_delmem_index_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 1.6607, num df = 22, denom df = 24, p-value = 0.2278
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7233127 3.8718225
## sample estimates:
## ratio of variances
## 1.660658
##
##
## $dkefs_trails4_time_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.67567, num df = 22, denom df = 24, p-value = 0.3589
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.294294 1.575327
## sample estimates:
## ratio of variances
## 0.6756713
##
##
## $dkefs_trails5_time_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 3.2655, num df = 22, denom df = 24, p-value = 0.005829
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 1.422326 7.613574
## sample estimates:
## ratio of variances
## 3.265527
##
##
## $dkefs_cwi_1_time_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.43191, num df = 22, denom df = 24, p-value = 0.05186
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.1881199 1.0069875
## sample estimates:
## ratio of variances
## 0.4319055
##
##
## $dkefs_cwi_2_time_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.65002, num df = 22, denom df = 24, p-value = 0.3136
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2831203 1.5155151
## sample estimates:
## ratio of variances
## 0.6500174
##
##
## $dkefs_cwi_3_time_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.51505, num df = 22, denom df = 24, p-value = 0.1223
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2243334 1.2008351
## sample estimates:
## ratio of variances
## 0.5150484
##
##
## $dkefs_cwi_4_time_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.40693, num df = 22, denom df = 24, p-value = 0.03773
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.1772430 0.9487646
## sample estimates:
## ratio of variances
## 0.4069332
##
##
## $dkefs_vf_lftotalcorrect_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 0.79548, num df = 22, denom df = 24, p-value = 0.5929
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.3464759 1.8546518
## sample estimates:
## ratio of variances
## 0.795476
##
##
## $dkefs_vf_cftotalcorrect_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 1.1254, num df = 22, denom df = 24, p-value = 0.7747
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.4901959 2.6239707
## sample estimates:
## ratio of variances
## 1.125443
##
##
## $dkefs_vf_csswitchtotcor_cor
##
## F test to compare two variances
##
## data: x by df$Diagnosis
## F = 1.1225, num df = 22, denom df = 24, p-value = 0.7795
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.4888947 2.6170058
## sample estimates:
## ratio of variances
## 1.122456
Identify best transform.
Here, we review the success of several common transformations to all 6 of the reaction-time cognition variables. We find that the two D-KEFS TMT variables are best left untransformed. We see that all four of the D-KEFS CWI variables ought to be transformed. Arcsinh
is best for two of the four variables; thus, we opt to apply it to all four of the D-KEFS CWI variables.
#first, pull out the age- and sex- corrected reaction time variables
df_cognitionTime <- df[ ,grep('time_cor', names(df))]
#compare transformations on all age- and sex- corrected reaction time variables
transformComparison <- lapply(df_cognitionTime, function(x) bestNormalize(x, allow_orderNorm = FALSE, out_of_sample = FALSE))
#extract goodness of fit and model selection
transform_fit <- lapply(transformComparison, `[`, 'norm_stats')
#bind together elements of list into a df
df_transform <- do.call(cbind, transform_fit) %>%
as.data.frame() %>% gather() %>% separate(col=value, sep=',', remove=TRUE,
into=c('arcsinh_x','boxcox ','exp_x','log_x','no_transform','sqrt_x','yeojohnson'))
#remove non-numeric components of variables
df_transform <- cbind(df_transform[1], sapply(df_transform[2:ncol(df_transform)], function(x) gsub("[^0-9.-]", "", x)))
#make sure all variables with numbers are numeric
df_transform[2:ncol(df_transform)] <- sapply(df_transform[2:ncol(df_transform)],as.numeric)
#make a variable, indicating selection
df_transform$selection <- colnames(df_transform)[apply(df_transform,1,which.min)]
#round all numeric values
df_transform <- df_transform %>% mutate_if(is.numeric, round, 4)
#remove unneeded tables, dfs
rm(transform_fit, transformComparison)
#put into table
df_transform %>% kable() %>% kable_styling()
key | arcsinh_x | boxcox | exp_x | log_x | no_transform | sqrt_x | yeojohnson | selection |
---|---|---|---|---|---|---|---|---|
dkefs_trails4_time_cor | 20.2262 | 58.9167 | 7.7857 | 2.0119 | 1.0595 | 1.2976 | NA | no_transform |
dkefs_trails5_time_cor | 2.0714 | 58.9167 | 4.7500 | 2.6667 | 0.1667 | 0.2857 | NA | no_transform |
dkefs_cwi_1_time_cor | 0.6429 | 0.7619 | 58.9167 | 0.6429 | 1.7143 | 0.7619 | 0.9405 | arcsinh_x |
dkefs_cwi_2_time_cor | 2.8452 | 1.2976 | 56.1786 | 2.8452 | 3.3214 | 2.2500 | 0.5238 | yeojohnson |
dkefs_cwi_3_time_cor | 0.2857 | 0.4048 | 58.9167 | 0.2857 | 1.5357 | 1.0000 | 0.4048 | arcsinh_x |
dkefs_cwi_4_time_cor | 0.8214 | 0.3452 | 58.9167 | 0.8214 | 0.3452 | 0.3452 | 0.3452 | boxcox |
Apply transformation. Note: though the variables are not standardized here, they are substantially different from their original values.
#transform the four variables with arcsinh
dkefs_cwi_1_time_normcor <- arcsinh_x(df_cognitionTime$dkefs_cwi_1_time_cor, standardize = F)$x.t
dkefs_cwi_2_time_normcor <- arcsinh_x(df_cognitionTime$dkefs_cwi_2_time_cor, standardize = F)$x.t
dkefs_cwi_3_time_normcor <- arcsinh_x(df_cognitionTime$dkefs_cwi_3_time_cor, standardize = F)$x.t
dkefs_cwi_4_time_normcor <- arcsinh_x(df_cognitionTime$dkefs_cwi_4_time_cor, standardize = F)$x.t
#replace the D-KEFS CWI reaction time variables in the dataset
df <- df[, !names(df) %in% names(df_cognitionTime[, grep('cwi', names(df_cognitionTime))])]
df <- cbind(df, dkefs_cwi_1_time_normcor, dkefs_cwi_2_time_normcor, dkefs_cwi_3_time_normcor, dkefs_cwi_4_time_normcor)
#remove unneeded dfs
rm(df_cognitionTime, df_transform)
Write out data
#pull dataframe with variables of interest
df <- df[, grep('^id$|Diagnosis|Age|Sex|^rbans.+index_cor$|^dkefs.+_cor$|^dkefs.+normcor$|avg_max_seg.+[0-9]_cor', names(df))]
#write out
write.csv(df, paste0('../clinical/dfCorrected_', Sys.Date(), '.csv', sep=''), row.names = F)