EXIT Project - Analysis

Procedure:
Step 1 - Identify good candidate items to include in a principal component analysis of externalizing phenotypes

Step 2 - Create principal component (PC) of externalizing behavior
Step 3 - Estimate direct and indirect genetic effects of externalizing PGS

Step 1:

Estimate \(\Delta\)R2 of the externalizing PGS in Texas Twins phenotypes

My first task was to merge together all of the possible externalizing phenotypes in the Texas Twins. After collecting all of the phenotypes, I computed the \(\Delta\)R2 for each phenotype by (1) estimating one linear model of the phenotype on age, sex, zygosity, and the first ten genetic PCs, then (2) estimating the same linear model but also including the externalizing PGS and finally (3) taking the difference in R2.

Below are displayed the results from the second linear model, as well as the calculated \(\Delta\)R2:

##                         DV  Beta     P            CI   N Inc. R2
##  1:           bfi_opn_mean  0.03 0.436  (-0.04,0.09) 900   0.001
##  2:           bfi_con_mean -0.14 0.000 (-0.21,-0.08) 900   0.020
##  3:         bfi_extrv_mean  0.08 0.024   (0.01,0.14) 900   0.006
##  4:           bfi_agr_mean -0.10 0.002 (-0.17,-0.04) 900   0.010
##  5:           bfi_neu_mean  0.06 0.085  (-0.01,0.12) 900   0.003
##  6:       bfi_agr_mean_rev  0.10 0.002   (0.04,0.17) 900   0.010
##  7:       bfi_con_mean_rev  0.14 0.000   (0.08,0.21) 900   0.020
##  8:          cbcl_agg_mean  0.09 0.009   (0.02,0.16) 867   0.008
##  9:      cbcl_withdep_mean  0.06 0.080  (-0.01,0.13) 864   0.004
## 10:       cbcl_anxdep_mean  0.05 0.134  (-0.02,0.12) 863   0.003
## 11:      cbcl_rulebrk_mean  0.05 0.102  (-0.01,0.12) 863   0.003
## 12:  conners_adhdinat_mean  0.11 0.002   (0.04,0.18) 826   0.011
## 13:   conners_adhdimp_mean  0.10 0.004   (0.03,0.17) 823   0.010
## 14:                EXT_PC1  0.14 0.000   (0.08,0.21) 822   0.019
## 15:           impulse_mean  0.05 0.289  (-0.04,0.14) 431   0.002
## 16:          exp_seek_mean  0.04 0.477  (-0.06,0.14) 412   0.001
## 17:     disinhibition_mean  0.04 0.371  (-0.05,0.14) 412   0.002
## 18:          bore_sus_mean  0.13 0.015   (0.03,0.23) 411   0.014
## 19:            thrill_mean -0.09 0.074  (-0.19,0.01) 411   0.007
## 20: conners_learnprob_mean  0.04 0.471  (-0.07,0.15) 391   0.001
## 21:       conners_odd_mean  0.03 0.554  (-0.07,0.14) 391   0.001
## 22:        conners_cd_mean  0.06 0.340  (-0.06,0.17) 391   0.002
## 23:       conners_agg_mean  0.05 0.370  (-0.06,0.16) 391   0.002
## 24:               icu_mean  0.14 0.007   (0.04,0.24) 389   0.016
## 25:           callous_mean  0.16 0.003   (0.06,0.26) 389   0.020
## 26:             uncar_mean  0.15 0.005   (0.05,0.25) 389   0.018
## 27:             unemo_mean  0.00 0.974  (-0.11,0.11) 389   0.000
## 28:     prosocaggress_mean  0.05 0.362  (-0.06,0.15) 386   0.002
## 29:            fright_mean -0.12 0.034 (-0.22,-0.01) 380   0.010
## 30:         hurt_sick_mean -0.08 0.136  (-0.19,0.03) 380   0.005
## 31:           harmful_mean -0.02 0.755  (-0.13,0.09) 380   0.000
## 32:             risks_mean -0.07 0.210  (-0.19,0.04) 380   0.004
## 33:               del_mean  0.09 0.108  (-0.02,0.19) 379   0.006
## 34:         pos_risk_scale  0.02 0.813  (-0.13,0.17) 253   0.000
## 35:           rut_alc_mean  0.04 0.708  (-0.17,0.26) 107   0.001
## 36:           drk_mot_mean -0.01 0.950   (-0.2,0.19) 106   0.000
##                         DV  Beta     P            CI   N Inc. R2

Note that the above table includes the entire sample (DZ and MZ). The following table includes only the DZ members of the sample (Note: these models did not include zygosity as a covariate):

##                         DV  Beta     P           CI   N Inc. R2
##  1:           bfi_opn_mean  0.05 0.253 (-0.03,0.12) 545   0.002
##  2:           bfi_con_mean -0.08 0.044    (-0.16,0) 545   0.007
##  3:         bfi_extrv_mean  0.09 0.034  (0.01,0.17) 545   0.008
##  4:           bfi_agr_mean -0.07 0.067    (-0.15,0) 545   0.006
##  5:           bfi_neu_mean  0.01 0.898 (-0.07,0.08) 545   0.000
##  6:       bfi_agr_mean_rev  0.07 0.067     (0,0.15) 545   0.006
##  7:       bfi_con_mean_rev  0.08 0.044     (0,0.16) 545   0.007
##  8:          cbcl_agg_mean  0.07 0.100 (-0.01,0.16) 525   0.005
##  9:       cbcl_anxdep_mean  0.03 0.461 (-0.05,0.11) 524   0.001
## 10:      cbcl_withdep_mean  0.07 0.105 (-0.01,0.15) 523   0.005
## 11:      cbcl_rulebrk_mean  0.06 0.118 (-0.02,0.15) 523   0.004
## 12:  conners_adhdinat_mean  0.02 0.563 (-0.06,0.11) 505   0.001
## 13:   conners_adhdimp_mean  0.08 0.064     (0,0.17) 503   0.006
## 14:                EXT_PC1  0.09 0.035  (0.01,0.18) 502   0.008
## 15:           impulse_mean  0.13 0.030  (0.01,0.24) 273   0.017
## 16:          exp_seek_mean  0.00 0.949 (-0.14,0.13) 235   0.000
## 17:     disinhibition_mean  0.00 0.967 (-0.13,0.12) 235   0.000
## 18:          bore_sus_mean  0.10 0.150 (-0.03,0.23) 234   0.009
## 19:            thrill_mean -0.01 0.907 (-0.13,0.12) 234   0.000
## 20: conners_learnprob_mean  0.04 0.540  (-0.1,0.19) 226   0.002
## 21:       conners_odd_mean  0.09 0.208 (-0.05,0.22) 226   0.007
## 22:        conners_cd_mean  0.09 0.284 (-0.07,0.25) 226   0.005
## 23:       conners_agg_mean  0.11 0.159 (-0.04,0.26) 226   0.008
## 24:               icu_mean  0.11 0.111 (-0.02,0.24) 225   0.009
## 25:           callous_mean  0.13 0.060     (0,0.26) 225   0.013
## 26:             uncar_mean  0.06 0.346  (-0.07,0.2) 225   0.003
## 27:             unemo_mean  0.04 0.604 (-0.11,0.19) 225   0.001
## 28:     prosocaggress_mean  0.04 0.530 (-0.09,0.17) 222   0.002
## 29:            fright_mean -0.05 0.458 (-0.19,0.08) 220   0.002
## 30:         hurt_sick_mean -0.11 0.122 (-0.24,0.03) 220   0.010
## 31:           harmful_mean  0.03 0.706 (-0.11,0.17) 220   0.001
## 32:             risks_mean  0.00 0.961 (-0.14,0.14) 220   0.000
## 33:               del_mean  0.11 0.101 (-0.02,0.24) 219   0.010
## 34:         pos_risk_scale  0.20 0.089 (-0.03,0.42) 132   0.021
## 35:           rut_alc_mean -0.03 0.809 (-0.24,0.18)  75   0.001
## 36:           drk_mot_mean  0.01 0.908 (-0.21,0.24)  74   0.000
##                         DV  Beta     P           CI   N Inc. R2

Both table are sorted according to sample size. In both cases, there is a sharp contrast in sample size about half way through. The BFI, CBCL, and half of the Conners inventory (specifically the ADHD-related scales) are much better powered compared to the other inventories. Given this, I selected two elements from each domain to include in a principal components analysis (PCA): conscientiousness and agreeableness (BFI), aggression and rule-breaking (CBCL), and impulsive and inattentive ADHD symptoms (Conners). (NOTE: the first PC of the PCA of these items is present in the above tables, but for completeness I will describe the process of attaining it).

Step 2:

Create principal component (PC) of externalizing behavior

I begin by examining the correlations between the selected items prior to PCA (Note: BFI items are reverse coded):

After PCA, the factor loadings of each item were not especially convincing (ranging from .36-.46), although the first PC did explain around 53% of the variance in these items:

## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6
## Standard deviation     1.7837 0.9424 0.8512 0.7881 0.55699 0.52393
## Proportion of Variance 0.5303 0.1480 0.1207 0.1035 0.05171 0.04575
## Cumulative Proportion  0.5303 0.6783 0.7990 0.9025 0.95425 1.00000

Factor loadings:

##                                  PC1        PC2          PC3         PC4
## bfi_con_mean_rev_std      -0.3750466 -0.6736202  0.133559992 -0.23137613
## bfi_agr_mean_rev_std      -0.3671269  0.1099506  0.801236982  0.32639778
## cbcl_agg_mean_std         -0.4657734  0.3857812 -0.001483291  0.09915666
## cbcl_rulebrk_mean_std     -0.3817820  0.3916450 -0.039183803 -0.77270079
## conners_adhdinat_mean_std -0.4367067 -0.4441475 -0.258595784  0.03821611
## conners_adhdimp_mean_std  -0.4136940  0.1861901 -0.521317251  0.48121778
##                                   PC5           PC6
## bfi_con_mean_rev_std       0.56899068 -0.1022362103
## bfi_agr_mean_rev_std      -0.11269480  0.3031705216
## cbcl_agg_mean_std          0.04866791 -0.7886857402
## cbcl_rulebrk_mean_std     -0.03590842  0.3177504955
## conners_adhdinat_mean_std -0.73735189  0.0004440832
## conners_adhdimp_mean_std   0.34088663  0.4179050380

After reverse coding, the new externalizing PC correlated well with all of the constituent items:

Having constructed an externalizing PC, consider again the results from the above tables of coefficients. The externalizing PC is in most ways comparable to conscientiousness (full sample: both were \(\beta\)=0.14 and an \(\Delta\)R2=.02-.019; DZ sample: both were \(\beta\)=.08-.09 and an \(\Delta\)R2=.07-.08). The conscientiousness scale actually has more cases, however. Despite this, I proceed using the externalizing PC as it will no doubt be the method we require other analysts to use for future analyses.

Step 3:

Estimate direct and indirect genetic effects of externalizing PGS

Finally, we proceed to the main analysis. I began setting up the mixed linear model for estimating the different family-level effects. This was accomplished in few steps.

First, I generated the analytic sample of DZ twins and z-standardized the externalizing PC and PGS within that sample:

Descriptive statistics of the DZ twins in the Texas Twins Project
Overall
(N=423)
Externalizing PC
Mean (SD) -0.00000946 (1.00)
Median [Min, Max] -0.133 [-2.14, 4.34]
Externalizing PGS
Mean (SD) 0.0000426 (1.00)
Median [Min, Max] 0.0560 [-3.22, 3.54]
Age (yrs)
Mean (SD) 12.9 (2.74)
Median [Min, Max] 13.0 [8.00, 19.0]
factor(sex)
Female 195 (46.1%)
Male 228 (53.9%)
factor(pair_sex)
Diff. Sex 211 (49.9%)
Same Sex 212 (50.1%)


Next, I computed within-family average scores for the externalizing PGS:

## # A tibble: 20 x 2
##    fam_id EXT_pgs_btwnFam
##     <int>           <dbl>
##  1   1027         -1.58  
##  2   1053          0.854 
##  3   1060          0.220 
##  4   1082         -0.211 
##  5   1232         -0.0405
##  6   1241         -0.850 
##  7   1252          0.0548
##  8   1262          0.772 
##  9   1265         -0.0085
## 10   1278          0.606 
## 11   1281         -0.124 
## 12   1282         -1.01  
## 13   1283         -0.193 
## 14   1302          0.174 
## 15   1314          0.115 
## 16   1350         -0.976 
## 17   1404          0.136 
## 18   1418          0.389 
## 19   1445          0.310 
## 20   1592         -0.232


And after merging these scores with the sample, we can see that each twin pair has an identical family-specific mean PGS score:

## # A tibble: 20 x 5
##    twin_id fam_id eventname EXT_60_std[,1] EXT_pgs_btwnFam
##      <int>  <int>     <int>          <dbl>           <dbl>
##  1   10821   1082         1         -0.154         -0.211 
##  2   10822   1082         1         -0.267         -0.211 
##  3   12321   1232         1         -1.03          -0.0405
##  4   12321   1232         2         -1.03          -0.0405
##  5   12321   1232         3         -1.03          -0.0405
##  6   12322   1232         1          0.953         -0.0405
##  7   12322   1232         2          0.953         -0.0405
##  8   12322   1232         3          0.953         -0.0405
##  9   12521   1252         1         -0.662          0.0548
## 10   12621   1262         2          0.915          0.772 
## 11   12621   1262         3          0.915          0.772 
## 12   12622   1262         1          0.676          0.772 
## 13   12622   1262         2          0.676          0.772 
## 14   12622   1262         3          0.676          0.772 
## 15   13021   1302         1         -0.137          0.174 
## 16   13021   1302         2         -0.137          0.174 
## 17   13021   1302         3         -0.137          0.174 
## 18   13022   1302         1          0.485          0.174 
## 19   13022   1302         2          0.485          0.174 
## 20   13022   1302         3          0.485          0.174

I now compute within-family externalizing PGSs by subtracting the family-specific mean PGS from individual twin PGS values.

## # A tibble: 20 x 6
##    twin_id fam_id eventname EXT_60_std[,1] EXT_pgs_btwnFam EXT_pgs_wthnFam[,1]
##      <int>  <int>     <int>          <dbl>           <dbl>               <dbl>
##  1   10821   1082         1         -0.154         -0.211               0.0565
##  2   10822   1082         1         -0.267         -0.211              -0.0565
##  3   12321   1232         1         -1.03          -0.0405             -0.994 
##  4   12321   1232         2         -1.03          -0.0405             -0.994 
##  5   12321   1232         3         -1.03          -0.0405             -0.994 
##  6   12322   1232         1          0.953         -0.0405              0.994 
##  7   12322   1232         2          0.953         -0.0405              0.994 
##  8   12322   1232         3          0.953         -0.0405              0.994 
##  9   12521   1252         1         -0.662          0.0548             -0.717 
## 10   12621   1262         2          0.915          0.772               0.143 
## 11   12621   1262         3          0.915          0.772               0.143 
## 12   12622   1262         1          0.676          0.772              -0.0956
## 13   12622   1262         2          0.676          0.772              -0.0956
## 14   12622   1262         3          0.676          0.772              -0.0956
## 15   13021   1302         1         -0.137          0.174              -0.311 
## 16   13021   1302         2         -0.137          0.174              -0.311 
## 17   13021   1302         3         -0.137          0.174              -0.311 
## 18   13022   1302         1          0.485          0.174               0.311 
## 19   13022   1302         2          0.485          0.174               0.311 
## 20   13022   1302         3          0.485          0.174               0.311
Finally, I estimate a series of linear models:

I included this final model in order to accommodate the multiple observations for some twins. This is essentially a three-level MLM with observations (level 1) nested within twins (level 2) nested within families (level 3).

#OLS model including PGS and covariates
m1.0 <- lm(EXT_PC1_std ~ EXT_60_std + age + sex + age*sex +
             PC_1 + PC_2 + PC_3 + PC_4 + PC_5 + PC_6 + PC_7 + PC_8 + PC_9 + PC_10, 
           na.action=na.omit, 
           data=dz_sample)

#mixed linear model including within and between PGSs (random effect for family only)
m1.1 <- lme(EXT_PC1_std ~ EXT_pgs_wthnFam + EXT_pgs_btwnFam, 
            random=list(~1|fam_id), 
            method="ML",
            na.action=na.omit, 
            data=dz_sample)

#add covariates (random effect for family only)
m1.2 <- lme(EXT_PC1_std ~ EXT_pgs_wthnFam + EXT_pgs_btwnFam + age + sex + age*sex +
              PC_1 + PC_2 + PC_3 + PC_4 + PC_5 + PC_6 + PC_7 + PC_8 + PC_9 + PC_10, 
            random=~1|fam_id, 
            method="ML", 
            na.action=na.omit, 
            data=dz_sample)

#add random effect for twin_id
m1.3 <- lme(EXT_PC1_std ~ EXT_pgs_wthnFam + EXT_pgs_btwnFam + age + sex + age*sex + 
              PC_1 + PC_2 + PC_3 + PC_4 + PC_5 + PC_6 + PC_7 + PC_8 + PC_9 + PC_10, 
            random=~1|fam_id/twin_id, 
            method="ML", 
            na.action=na.omit, 
            data=dz_sample)

#combine models in table
mlm_ext_pc <- stargazer(m1.0, m1.1, m1.2, m1.3,
                        type = "html", 
                        keep = c("EXT_60_std", "EXT_pgs_btwnFam", "EXT_pgs_wthnFam"))
Dependent variable:
EXT_PC1_std
OLS linear
mixed effects
(1) (2) (3) (4)
EXT_60_std 0.067
(0.048)
EXT_pgs_wthnFam 0.118 0.149* 0.124
(0.079) (0.078) (0.098)
EXT_pgs_btwnFam 0.055 0.070 0.065
(0.077) (0.076) (0.074)
Observations 423 423 423 423
R2 0.095
Adjusted R2 0.063
Log Likelihood -568.472 -551.423 -536.896
Akaike Inf. Crit. 1,146.945 1,138.845 1,111.793
Bayesian Inf. Crit. 1,167.181 1,211.698 1,188.693
Residual Std. Error 0.968 (df = 408)
F Statistic 3.043*** (df = 14; 408)
Note: p<0.1; p<0.05; p<0.01

This is where I am really getting tripped up. The effects are as follows:

The indirect effect is calculated by subtracting the between (total) effect from the within (direct) effect; however, the results I am getting are a little non-intuitive, as the direct effect is much larger than the total effect. We should be seeing the reverse, with the total being much larger than the direct effect, and the indirect making up the difference.

Results are similar when the externalizing PC scores are averaged across events for each twin (this removes a level (events) from the analysis):

Dependent variable:
EXT_PC1_std_avg
OLS linear
mixed effects
(1) (2) (3)
EXT_60_std 0.067
(0.043)
EXT_pgs_wthnFam 0.118* 0.152**
(0.063) (0.061)
EXT_pgs_btwnFam 0.057 0.080
(0.077) (0.076)
Observations 423 423 423
R2 0.116
Adjusted R2 0.085
Log Likelihood -497.038 -472.127
Akaike Inf. Crit. 1,004.076 980.254
Bayesian Inf. Crit. 1,024.313 1,053.107
Residual Std. Error 0.869 (df = 408)
F Statistic 3.816*** (df = 14; 408)
Note: p<0.1; p<0.05; p<0.01


Looking at the supplemental files from Demange et al.(https://www.biorxiv.org/content/10.1101/2020.09.15.296236v2), I noticed that this pattern of results did occur in at least one of their analyses, but the size of the gap between total and direct was much smaller than we are seeing here.
I have been going back through all of my code to make sure everything is coded correctly and I spotted a few errors, but fixing them really have no impact on these results. I would like to know if you have any thoughts about what we’re seeing here and maybe any ideas about how to proceed.

Thanks!