imaging data pulled: 2021-01-21
clinical data pulled: 2020-11-16
code written: 2020-12-22
last ran: 2021-04-13
website: http://rpubs.com/navona/NM_prediction
code: https://github.com/navonacalarco/NM-MRI/blob/master/analyses/06_prediction.Rmd
related analysis: CCA | Hierarchical clustering
Description
This report contains a series of analyses to relate the CCA results (namely, participants’ CV1 scores on the \(X\) and \(Y\) set) to a variety of unseen variables of interest. The lenend in all coloured plots is as follows:
We opted to determine if participants’ CV1 scores on the \(X\) (brain) set and/or \(Y\) (behaviour/cognition) set predicted depression symptom severity, as measured by the PHQ-9 and MADRS total scores. The PHQ-9 was administered at four study timepoints (screening, T0, T1, and T2), and the MADRS was administered at three (T0, T1, T2). However, not all of the n=48 participants in our dataset with NM scans returned for followup assessments. Frequency counts of the PHQ-9 and MADRS at each timepoint are as follows:
timepoint | screening | T0 | T1 | T2 | screening | T0 | T1 | T2 |
---|---|---|---|---|---|---|---|---|
participant count | 48 | 10 | 22 | 7 | 0 | 48 | 22 | 6 |
Here, we review depression severity scores taken at two timepoints: ‘closest-to-scan’, and ‘longitudinal’. The closest-to-scan timepoint for the PHQ-9 is ‘screening’ with n=48, and for the MADRS is ‘T0’, with n=48), coloured in green. The ‘longitudinal’ timepoint for both is ‘T1’, which provides PHQ-9 data from 22 participants (14 LLD), and MADRS data from n=22 participants (14 LLD), coloured in yellow. Thus, note that this longitudinal estimation of depression severity is based on roughly 46% of participants included in our dataset.
For both the ‘closest-to-scan’ and ‘longitudinal’ depression data, and for both the separate LLD and HC and combined LLD-HC diagnostic groups, we review two methods of prediction: (i) simple linear regression, with \(X\) and \(Y\) as separate predictors, and (ii) multiple linear regression, with \(X\) and \(Y\) as combined predictors. Neither the simple nor multiple linear regression model finds an association between any diagnostic combination of participants’ \(X\) and/or \(Y\) score and (continuous) depression symptom severity, at either the ‘closest-to-scan’ or ‘longitudinal’ timepoints, i.e., participants’ CCA CV1 scores do not predict depression symptom severity.
Simple linear regression
Multiple linear regression
Combined HC-LLD group: Adjusted R-squared = -0.016, model p = 0.539.
HC only: Adjusted R-squared = 0.048, model p = 0.236.
LLD only: Adjusted R-squared = -0.062, model p = 0.742.
Simple linear regression
Multiple linear regression
Combined HC-LLD group: Adjusted R-squared = 0.03, model p = 0.289.
HC only: Adjusted R-squared = 0.712, model p = 0.019.
LLD only: Adjusted R-squared = -0.029, model p = 0.467.
Simple linear regression
Multiple linear regression
Combined HC-LLD group: Adjusted R-squared = -0.019, model p = 0.579.
HC only: Adjusted R-squared = -0.063, model p = 0.712.
LLD only: Adjusted R-squared = -0.064, model p = 0.76.
Simple linear regression
Multiple linear regression
Combined HC-LLD group: Adjusted R-squared = -0.003, model p = 0.399.
HC only: Adjusted R-squared = -0.01, model p = 0.442.
LLD only: Adjusted R-squared = -0.151, model p = 0.865.
We also reviewed associations with the SASP (Senescence-Associated Secretory Phenotype) Index. The SASP Index is a composite, integrated measure that reflects the dysregulation of distinct senescence-related pathways. We reviewed associationsbetween the SASP Index and number of variables including participants’ CV1 scores on the X (brain) set and/or Y (behaviour/cognition) set. Note that we are missing SASP data from 2 participants. We find no predictive association between participants’ CCA CV1 scores and the SASP Index.
We were also interested to see if participants’ CV1 scores on the X (brain) set and/or Y (behaviour/cognition) set correlated with the CIRS-G. We find no predictive association between participants’ CCA CV1 scores and the CIRS-G.
For post-hoc exploration, we reviewed associations between participants’ CV1 scores on the X (brain) set and/or Y (behaviour/cognition) set, and a number of total scores from the ‘health’ variables in the SENDEP dataset. The variables are:
Variable | Scale name | Rating scale |
---|---|---|
health_ecog_total | Everyday Cognition | higher=worse |
health_fas_total | Fatigue Assessment Scale | higher=worse |
health_frail_total | FRAIL scale | higher=worse |
health_gad7_total | Generalized Anxiety Disorder Scale | higher=worse |
health_moca_total | Montreal Cognitive Assessment | higher=better |
health_pss_total | Perceived Stress Scale | higher=worse |
health_ucla3_total | UCLA Loneliness Scale | higher=worse |
health_whodasv2_total | WHO Disability Assessment Schedule | higher=worse |
health_wrat_total | Word Reading Subtest | higher=better |
Correlations and significance tests, by diagnosis and across diagnosis, with the CCA CV1 scores are are shown below. We find no predictive association between participants’ CCA CV1 scores and the health measures.
Lastly, we opted to determine if participants’ CV1 scores on the \(X\) (brain) set and/or \(Y\) (behaviour/cognition) set predicted longitudinal cognition, as measured by the same scales included in the \(Y\) set (namely RBANS and D-KEFS), at followup. Both scales were administered at three study timepoints (T0, T1, and T2). However, not all of the 48 participants in our dataset with NM scans returned for followup assessments. Frequency counts of the RBANS and D-KEFS at each timepoint are as follows:
timepoint | screening | T0 | T1 | T2 | screening | T0 | T1 | T2 |
---|---|---|---|---|---|---|---|---|
participant count | 0 | 48 | 22 | 5 | 0 | 48 | 22 | 6 |
Based on participant counts, we review longitudinal cognition scores at the T1 timepoint, which is the same ‘longitudinal timepoint’ reviewed re: depression severity scores, above. We also calculated a delta score, representing the difference in the T0 and T1 timepoitns. The RBANS has complete data from 22 participants (14 LLD), and D-KEFS data from n=22 participants (14 LLD), coloured in green. Thus, note that this longitudinal evaluation of cognition at follow-up is based on roughly 46% of participants included in our dataset.
First, we wanted to assess if participants’ cognition scores were relatively stable over time. The plots below show cognition scores across the two timepoints, coloured by diagnostic group.
The following visualization shows the delta (change) in cognition score between the two timepoints, coloured by group. The t-test reports if there is a difference in delta between the HC and LLD groups. The table reports the mean and SD per group, per cognitive task.
Diagnosis | timepoint | var | n | mean | sd |
---|---|---|---|---|---|
HC | t0 | dkefs_cwi_1_time | 23 | 34.217 | 6.987 |
HC | t0 | dkefs_cwi_2_time | 23 | 25.522 | 5.526 |
HC | t0 | dkefs_cwi_3_time | 23 | 62.391 | 16.784 |
HC | t0 | dkefs_cwi_4_time | 23 | 64.435 | 11.965 |
HC | t0 | dkefs_trails4_time | 23 | 105.435 | 47.958 |
HC | t0 | dkefs_trails5_time | 23 | 41.565 | 28.428 |
HC | t0 | rbans_attention_index | 23 | 107.174 | 16.892 |
HC | t0 | rbans_delmem_index | 23 | 97.000 | 14.894 |
HC | t0 | rbans_immmemory_index | 23 | 96.435 | 12.427 |
HC | t0 | rbans_language_index | 23 | 98.870 | 12.524 |
HC | t0 | rbans_visuo_index | 23 | 97.174 | 15.799 |
HC | t1 | dkefs_cwi_1_time | 8 | 35.500 | 8.053 |
HC | t1 | dkefs_cwi_2_time | 8 | 27.125 | 4.794 |
HC | t1 | dkefs_cwi_3_time | 8 | 61.125 | 10.934 |
HC | t1 | dkefs_cwi_4_time | 8 | 64.500 | 13.617 |
HC | t1 | dkefs_trails4_time | 8 | 106.875 | 29.705 |
HC | t1 | dkefs_trails5_time | 8 | 43.000 | 12.750 |
HC | t1 | rbans_attention_index | 8 | 104.500 | 18.868 |
HC | t1 | rbans_delmem_index | 8 | 109.375 | 12.153 |
HC | t1 | rbans_immmemory_index | 8 | 106.750 | 12.903 |
HC | t1 | rbans_language_index | 8 | 104.500 | 14.531 |
HC | t1 | rbans_visuo_index | 8 | 100.750 | 15.746 |
LLD | t0 | dkefs_cwi_1_time | 25 | 34.160 | 10.351 |
LLD | t0 | dkefs_cwi_2_time | 25 | 24.280 | 6.361 |
LLD | t0 | dkefs_cwi_3_time | 25 | 68.120 | 23.446 |
LLD | t0 | dkefs_cwi_4_time | 25 | 68.680 | 22.026 |
LLD | t0 | dkefs_trails4_time | 25 | 122.680 | 58.996 |
LLD | t0 | dkefs_trails5_time | 25 | 41.240 | 14.301 |
LLD | t0 | rbans_attention_index | 25 | 101.240 | 16.435 |
LLD | t0 | rbans_delmem_index | 25 | 100.960 | 11.077 |
LLD | t0 | rbans_immmemory_index | 25 | 98.480 | 13.292 |
LLD | t0 | rbans_language_index | 25 | 99.640 | 9.331 |
LLD | t0 | rbans_visuo_index | 25 | 92.680 | 17.303 |
LLD | t1 | dkefs_cwi_1_time | 14 | 33.714 | 7.087 |
LLD | t1 | dkefs_cwi_2_time | 14 | 24.286 | 4.548 |
LLD | t1 | dkefs_cwi_3_time | 14 | 67.357 | 27.692 |
LLD | t1 | dkefs_cwi_4_time | 13 | 74.538 | 31.384 |
LLD | t1 | dkefs_trails4_time | 12 | 119.583 | 63.572 |
LLD | t1 | dkefs_trails5_time | 14 | 40.643 | 17.046 |
LLD | t1 | rbans_attention_index | 14 | 93.643 | 16.118 |
LLD | t1 | rbans_delmem_index | 14 | 102.643 | 11.126 |
LLD | t1 | rbans_immmemory_index | 14 | 101.000 | 14.486 |
LLD | t1 | rbans_language_index | 14 | 102.571 | 7.439 |
LLD | t1 | rbans_visuo_index | 14 | 89.857 | 17.168 |
Next, we performed repeated-measures ANOVAs for each of the cognition variables. First, we first consider just one within-subject factor, timepoint, to evaluate whether there is any difference in cognition across the two timepoints. The code takes the form of aov(score ~ timepoint + Error(ID/timepoint), data)
. The predictor is timepoint
, and the outcome is score
. Error(ID/timepoint)
is used to divide the error variance into two different clusters, which therefore takes into account the repeated measures. To examine the effect of the timepoint in the results, check the output for the Error: Within
section. If the p value is significant, this means that there is a significant different among the two timepoints. Most of the RBANS components show a difference over time; few of the DKEFS do. Note that this analysis does not take diagnosis into consideration.
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 96 96.37 0.428 0.516
## Residuals 46 10358 225.16
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 378.2 378.2 5.717 0.0262 *
## Residuals 21 1389.3 66.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 533 533.5 1.423 0.239
## Residuals 46 17246 374.9
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 349.5 349.5 7.515 0.0122 *
## Residuals 21 976.5 46.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 0 0.03 0 0.988
## Residuals 46 6394 139.01
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 327.3 327.3 5.354 0.0309 *
## Residuals 21 1283.7 61.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 1393 1393.1 3.703 0.0605 .
## Residuals 46 17307 376.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 46 46.02 0.627 0.437
## Residuals 21 1540 73.36
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 258 257.6 1.246 0.27
## Residuals 46 9514 206.8
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 311.1 311.1 4.831 0.0393 *
## Residuals 21 1352.4 64.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 13 13.20 0.132 0.718
## Residuals 46 4588 99.73
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 9.09 9.091 1.201 0.285
## Residuals 21 158.91 7.567
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 2.7 2.69 0.06 0.807
## Residuals 46 2050.3 44.57
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 9.09 9.091 2.618 0.121
## Residuals 21 72.91 3.472
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 932 932.2 1.541 0.221
## Residuals 46 27827 604.9
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 396 396.0 5.092 0.0348 *
## Residuals 21 1633 77.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 548 547.6 0.931 0.34
## Residuals 46 27066 588.4
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 13.7 13.71 0.221 0.643
## Residuals 20 1239.3 61.96
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 284 284 0.077 0.783
## Residuals 46 169234 3679
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 109 108.9 0.106 0.748
## Residuals 19 19476 1025.1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 770 769.5 2 0.164
## Residuals 46 17700 384.8
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 270 270.0 0.638 0.434
## Residuals 21 8893 423.5
Second, we consider one within-subject factor and one between-subject factor, diagnosis. With the two factors, we can also test the interaction effect between timepoint
and diagnosis
. The code takes the form of aov(score ~ timepoint *Diagnosis + Error(ID/(timepoint*Diagnosis)), data)
. As above,most of the RBANS components show a difference over time; few of the DKEFS do. There is no effect of diagnosis or interaction with diagnosis.
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 96 96.4 0.438 0.5116
## Diagnosis 1 1 0.7 0.003 0.9541
## timepoint:Diagnosis 1 673 673.2 3.059 0.0873 .
## Residuals 44 9684 220.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 378.2 378.2 5.455 0.03 *
## timepoint:Diagnosis 1 2.5 2.5 0.037 0.85
## Residuals 20 1386.8 69.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 533 533.5 1.499 0.2274
## Diagnosis 1 1038 1037.9 2.915 0.0948 .
## timepoint:Diagnosis 1 544 544.4 1.529 0.2228
## Residuals 44 15664 356.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 349.5 349.5 7.344 0.0135 *
## timepoint:Diagnosis 1 24.9 24.9 0.522 0.4782
## Residuals 20 951.7 47.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 0 0.03 0.000 0.988
## Diagnosis 1 2 2.35 0.016 0.899
## timepoint:Diagnosis 1 21 21.22 0.147 0.704
## Residuals 44 6371 144.79
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 327.3 327.3 5.181 0.034 *
## timepoint:Diagnosis 1 20.3 20.3 0.321 0.577
## Residuals 20 1263.5 63.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 1393 1393.1 3.700 0.0609 .
## Diagnosis 1 673 673.3 1.788 0.1880
## timepoint:Diagnosis 1 67 67.1 0.178 0.6750
## Residuals 44 16567 376.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 46.0 46.02 0.632 0.436
## timepoint:Diagnosis 1 84.7 84.68 1.163 0.294
## Residuals 20 1455.8 72.79
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 258 257.61 1.222 0.275
## Diagnosis 1 6 5.85 0.028 0.868
## timepoint:Diagnosis 1 232 232.49 1.103 0.299
## Residuals 44 9275 210.81
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 311.1 311.11 5.372 0.0312 *
## timepoint:Diagnosis 1 194.1 194.09 3.351 0.0821 .
## Residuals 20 1158.3 57.91
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 13 13.20 0.128 0.722
## Diagnosis 1 2 2.43 0.024 0.879
## timepoint:Diagnosis 1 54 54.15 0.526 0.472
## Residuals 44 4531 102.98
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 9.09 9.091 1.144 0.298
## timepoint:Diagnosis 1 0.01 0.007 0.001 0.976
## Residuals 20 158.90 7.945
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 2.7 2.69 0.062 0.804
## Diagnosis 1 44.6 44.58 1.033 0.315
## timepoint:Diagnosis 1 107.1 107.08 2.482 0.122
## Residuals 44 1898.6 43.15
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 9.09 9.091 2.747 0.113
## timepoint:Diagnosis 1 6.72 6.722 2.031 0.170
## Residuals 20 66.19 3.309
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 932 932.2 1.493 0.228
## Diagnosis 1 327 327.1 0.524 0.473
## timepoint:Diagnosis 1 32 31.8 0.051 0.822
## Residuals 44 27468 624.3
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 396.0 396.0 4.864 0.0393 *
## timepoint:Diagnosis 1 4.8 4.8 0.059 0.8104
## Residuals 20 1628.2 81.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 548 547.6 0.918 0.343
## Diagnosis 1 487 487.1 0.816 0.371
## timepoint:Diagnosis 1 318 317.8 0.533 0.469
## Residuals 44 26262 596.9
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 13.7 13.71 0.212 0.650
## timepoint:Diagnosis 1 10.4 10.39 0.161 0.693
## Residuals 19 1228.9 64.68
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 284 284 0.076 0.784
## Diagnosis 1 4031 4031 1.077 0.305
## timepoint:Diagnosis 1 516 516 0.138 0.712
## Residuals 44 164687 3743
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 109 108.9 0.104 0.751
## timepoint:Diagnosis 1 561 561.2 0.534 0.474
## Residuals 18 18915 1050.8
##
## Error: ID
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 770 769.5 2.005 0.164
## Diagnosis 1 95 95.0 0.248 0.621
## timepoint:Diagnosis 1 717 717.1 1.868 0.179
## Residuals 44 16888 383.8
##
## Error: ID:timepoint
## Df Sum Sq Mean Sq F value Pr(>F)
## timepoint 1 270 270.0 0.627 0.438
## timepoint:Diagnosis 1 280 279.7 0.649 0.430
## Residuals 20 8614 430.7
We also wanted to review the correlation between the longitudinal cognition score visualized against the CV1 score. There are a small number of significant associations.
Here, we review the correlation between the cognition score delta visualized against the CV1 score. There are no/few significant associations.
Lastly, we review the correlation between the cognition scores at the follow-up visit, residualized by cognition at baseline , visualized against the CV1 score. There are no significant associations.