Step 2 - Create principal component (PC) of externalizing behavior
Step 3 - Estimate direct and indirect genetic effects of externalizing PGS
My first task was to merge together all of the possible externalizing phenotypes in the Texas Twins. After collecting all of the phenotypes, I computed the \(\Delta\)R2 for each phenotype by (1) estimating one linear model of the phenotype on age, sex, zygosity, and the first ten genetic PCs, then (2) estimating the same linear model but also including the externalizing PGS and finally (3) taking the difference in R2.
Below are displayed the results from the second linear model, as well as the calculated \(\Delta\)R2:
## DV Beta P CI N Inc. R2
## 1: bfi_opn_mean 0.03 0.436 (-0.04,0.09) 900 0.001
## 2: bfi_con_mean -0.14 0.000 (-0.21,-0.08) 900 0.020
## 3: bfi_extrv_mean 0.08 0.024 (0.01,0.14) 900 0.006
## 4: bfi_agr_mean -0.10 0.002 (-0.17,-0.04) 900 0.010
## 5: bfi_neu_mean 0.06 0.085 (-0.01,0.12) 900 0.003
## 6: bfi_agr_mean_rev 0.10 0.002 (0.04,0.17) 900 0.010
## 7: bfi_con_mean_rev 0.14 0.000 (0.08,0.21) 900 0.020
## 8: cbcl_agg_mean 0.09 0.009 (0.02,0.16) 867 0.008
## 9: cbcl_withdep_mean 0.06 0.080 (-0.01,0.13) 864 0.004
## 10: cbcl_anxdep_mean 0.05 0.134 (-0.02,0.12) 863 0.003
## 11: cbcl_rulebrk_mean 0.05 0.102 (-0.01,0.12) 863 0.003
## 12: conners_adhdinat_mean 0.11 0.002 (0.04,0.18) 826 0.011
## 13: conners_adhdimp_mean 0.10 0.004 (0.03,0.17) 823 0.010
## 14: EXT_PC1 0.14 0.000 (0.08,0.21) 822 0.019
## 15: impulse_mean 0.05 0.289 (-0.04,0.14) 431 0.002
## 16: exp_seek_mean 0.04 0.477 (-0.06,0.14) 412 0.001
## 17: disinhibition_mean 0.04 0.371 (-0.05,0.14) 412 0.002
## 18: bore_sus_mean 0.13 0.015 (0.03,0.23) 411 0.014
## 19: thrill_mean -0.09 0.074 (-0.19,0.01) 411 0.007
## 20: conners_learnprob_mean 0.04 0.471 (-0.07,0.15) 391 0.001
## 21: conners_odd_mean 0.03 0.554 (-0.07,0.14) 391 0.001
## 22: conners_cd_mean 0.06 0.340 (-0.06,0.17) 391 0.002
## 23: conners_agg_mean 0.05 0.370 (-0.06,0.16) 391 0.002
## 24: icu_mean 0.14 0.007 (0.04,0.24) 389 0.016
## 25: callous_mean 0.16 0.003 (0.06,0.26) 389 0.020
## 26: uncar_mean 0.15 0.005 (0.05,0.25) 389 0.018
## 27: unemo_mean 0.00 0.974 (-0.11,0.11) 389 0.000
## 28: prosocaggress_mean 0.05 0.362 (-0.06,0.15) 386 0.002
## 29: fright_mean -0.12 0.034 (-0.22,-0.01) 380 0.010
## 30: hurt_sick_mean -0.08 0.136 (-0.19,0.03) 380 0.005
## 31: harmful_mean -0.02 0.755 (-0.13,0.09) 380 0.000
## 32: risks_mean -0.07 0.210 (-0.19,0.04) 380 0.004
## 33: del_mean 0.09 0.108 (-0.02,0.19) 379 0.006
## 34: pos_risk_scale 0.02 0.813 (-0.13,0.17) 253 0.000
## 35: rut_alc_mean 0.04 0.708 (-0.17,0.26) 107 0.001
## 36: drk_mot_mean -0.01 0.950 (-0.2,0.19) 106 0.000
## DV Beta P CI N Inc. R2
Note that the above table includes the entire sample (DZ and MZ). The following table includes only the DZ members of the sample (Note: these models did not include zygosity as a covariate):
## DV Beta P CI N Inc. R2
## 1: bfi_opn_mean 0.05 0.253 (-0.03,0.12) 545 0.002
## 2: bfi_con_mean -0.08 0.044 (-0.16,0) 545 0.007
## 3: bfi_extrv_mean 0.09 0.034 (0.01,0.17) 545 0.008
## 4: bfi_agr_mean -0.07 0.067 (-0.15,0) 545 0.006
## 5: bfi_neu_mean 0.01 0.898 (-0.07,0.08) 545 0.000
## 6: bfi_agr_mean_rev 0.07 0.067 (0,0.15) 545 0.006
## 7: bfi_con_mean_rev 0.08 0.044 (0,0.16) 545 0.007
## 8: cbcl_agg_mean 0.07 0.100 (-0.01,0.16) 525 0.005
## 9: cbcl_anxdep_mean 0.03 0.461 (-0.05,0.11) 524 0.001
## 10: cbcl_withdep_mean 0.07 0.105 (-0.01,0.15) 523 0.005
## 11: cbcl_rulebrk_mean 0.06 0.118 (-0.02,0.15) 523 0.004
## 12: conners_adhdinat_mean 0.02 0.563 (-0.06,0.11) 505 0.001
## 13: conners_adhdimp_mean 0.08 0.064 (0,0.17) 503 0.006
## 14: EXT_PC1 0.09 0.035 (0.01,0.18) 502 0.008
## 15: impulse_mean 0.13 0.030 (0.01,0.24) 273 0.017
## 16: exp_seek_mean 0.00 0.949 (-0.14,0.13) 235 0.000
## 17: disinhibition_mean 0.00 0.967 (-0.13,0.12) 235 0.000
## 18: bore_sus_mean 0.10 0.150 (-0.03,0.23) 234 0.009
## 19: thrill_mean -0.01 0.907 (-0.13,0.12) 234 0.000
## 20: conners_learnprob_mean 0.04 0.540 (-0.1,0.19) 226 0.002
## 21: conners_odd_mean 0.09 0.208 (-0.05,0.22) 226 0.007
## 22: conners_cd_mean 0.09 0.284 (-0.07,0.25) 226 0.005
## 23: conners_agg_mean 0.11 0.159 (-0.04,0.26) 226 0.008
## 24: icu_mean 0.11 0.111 (-0.02,0.24) 225 0.009
## 25: callous_mean 0.13 0.060 (0,0.26) 225 0.013
## 26: uncar_mean 0.06 0.346 (-0.07,0.2) 225 0.003
## 27: unemo_mean 0.04 0.604 (-0.11,0.19) 225 0.001
## 28: prosocaggress_mean 0.04 0.530 (-0.09,0.17) 222 0.002
## 29: fright_mean -0.05 0.458 (-0.19,0.08) 220 0.002
## 30: hurt_sick_mean -0.11 0.122 (-0.24,0.03) 220 0.010
## 31: harmful_mean 0.03 0.706 (-0.11,0.17) 220 0.001
## 32: risks_mean 0.00 0.961 (-0.14,0.14) 220 0.000
## 33: del_mean 0.11 0.101 (-0.02,0.24) 219 0.010
## 34: pos_risk_scale 0.20 0.089 (-0.03,0.42) 132 0.021
## 35: rut_alc_mean -0.03 0.809 (-0.24,0.18) 75 0.001
## 36: drk_mot_mean 0.01 0.908 (-0.21,0.24) 74 0.000
## DV Beta P CI N Inc. R2
Both table are sorted according to sample size. In both cases, there is a sharp contrast in sample size about half way through. The BFI, CBCL, and half of the Conners inventory (specifically the ADHD-related scales) are much better powered compared to the other inventories. Given this, I selected two elements from each domain to include in a principal components analysis (PCA): conscientiousness and agreeableness (BFI), aggression and rule-breaking (CBCL), and impulsive and inattentive ADHD symptoms (Conners). (NOTE: the first PC of the PCA of these items is present in the above tables, but for completeness I will describe the process of attaining it).
I begin by examining the correlations between the selected items prior to PCA (Note: BFI items are reverse coded):
After PCA, the factor loadings of each item were not especially convincing (ranging from .36-.46), although the first PC did explain around 53% of the variance in these items:
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 1.7837 0.9424 0.8512 0.7881 0.55699 0.52393
## Proportion of Variance 0.5303 0.1480 0.1207 0.1035 0.05171 0.04575
## Cumulative Proportion 0.5303 0.6783 0.7990 0.9025 0.95425 1.00000
Factor loadings:
## PC1 PC2 PC3 PC4
## bfi_con_mean_rev_std -0.3750466 -0.6736202 0.133559992 -0.23137613
## bfi_agr_mean_rev_std -0.3671269 0.1099506 0.801236982 0.32639778
## cbcl_agg_mean_std -0.4657734 0.3857812 -0.001483291 0.09915666
## cbcl_rulebrk_mean_std -0.3817820 0.3916450 -0.039183803 -0.77270079
## conners_adhdinat_mean_std -0.4367067 -0.4441475 -0.258595784 0.03821611
## conners_adhdimp_mean_std -0.4136940 0.1861901 -0.521317251 0.48121778
## PC5 PC6
## bfi_con_mean_rev_std 0.56899068 -0.1022362103
## bfi_agr_mean_rev_std -0.11269480 0.3031705216
## cbcl_agg_mean_std 0.04866791 -0.7886857402
## cbcl_rulebrk_mean_std -0.03590842 0.3177504955
## conners_adhdinat_mean_std -0.73735189 0.0004440832
## conners_adhdimp_mean_std 0.34088663 0.4179050380
After reverse coding, the new externalizing PC correlated well with all of the constituent items:
Having constructed an externalizing PC, consider again the results from the above tables of coefficients. The externalizing PC is in most ways comparable to conscientiousness (full sample: both were \(\beta\)=0.14 and an \(\Delta\)R2=.02-.019; DZ sample: both were \(\beta\)=.08-.09 and an \(\Delta\)R2=.07-.08). The conscientiousness scale actually has more cases, however. Despite this, I proceed using the externalizing PC as it will no doubt be the method we require other analysts to use for future analyses.
Finally, we proceed to the main analysis. I began setting up the mixed linear model for estimating the different family-level effects. This was accomplished in few steps.
First, I generated the analytic sample of DZ twins and z-standardized the externalizing PC and PGS within that sample:
| Overall (N=423) |
|
|---|---|
| Externalizing PC | |
| Mean (SD) | -0.00000946 (1.00) |
| Median [Min, Max] | -0.133 [-2.14, 4.34] |
| Externalizing PGS | |
| Mean (SD) | 0.0000426 (1.00) |
| Median [Min, Max] | 0.0560 [-3.22, 3.54] |
| Age (yrs) | |
| Mean (SD) | 12.9 (2.74) |
| Median [Min, Max] | 13.0 [8.00, 19.0] |
| factor(sex) | |
| Female | 195 (46.1%) |
| Male | 228 (53.9%) |
| factor(pair_sex) | |
| Diff. Sex | 211 (49.9%) |
| Same Sex | 212 (50.1%) |
Next, I computed within-family average scores for the externalizing PGS:
## # A tibble: 20 x 2
## fam_id EXT_pgs_btwnFam
## <int> <dbl>
## 1 1027 -1.58
## 2 1053 0.854
## 3 1060 0.220
## 4 1082 -0.211
## 5 1232 -0.0405
## 6 1241 -0.850
## 7 1252 0.0548
## 8 1262 0.772
## 9 1265 -0.0085
## 10 1278 0.606
## 11 1281 -0.124
## 12 1282 -1.01
## 13 1283 -0.193
## 14 1302 0.174
## 15 1314 0.115
## 16 1350 -0.976
## 17 1404 0.136
## 18 1418 0.389
## 19 1445 0.310
## 20 1592 -0.232
And after merging these scores with the sample, we can see that each twin pair has an identical family-specific mean PGS score:
## # A tibble: 20 x 5
## twin_id fam_id eventname EXT_60_std[,1] EXT_pgs_btwnFam
## <int> <int> <int> <dbl> <dbl>
## 1 10821 1082 1 -0.154 -0.211
## 2 10822 1082 1 -0.267 -0.211
## 3 12321 1232 1 -1.03 -0.0405
## 4 12321 1232 2 -1.03 -0.0405
## 5 12321 1232 3 -1.03 -0.0405
## 6 12322 1232 1 0.953 -0.0405
## 7 12322 1232 2 0.953 -0.0405
## 8 12322 1232 3 0.953 -0.0405
## 9 12521 1252 1 -0.662 0.0548
## 10 12621 1262 2 0.915 0.772
## 11 12621 1262 3 0.915 0.772
## 12 12622 1262 1 0.676 0.772
## 13 12622 1262 2 0.676 0.772
## 14 12622 1262 3 0.676 0.772
## 15 13021 1302 1 -0.137 0.174
## 16 13021 1302 2 -0.137 0.174
## 17 13021 1302 3 -0.137 0.174
## 18 13022 1302 1 0.485 0.174
## 19 13022 1302 2 0.485 0.174
## 20 13022 1302 3 0.485 0.174
I now compute within-family externalizing PGSs by subtracting the family-specific mean PGS from individual twin PGS values.
## # A tibble: 20 x 6
## twin_id fam_id eventname EXT_60_std[,1] EXT_pgs_btwnFam EXT_pgs_wthnFam[,1]
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 10821 1082 1 -0.154 -0.211 0.0565
## 2 10822 1082 1 -0.267 -0.211 -0.0565
## 3 12321 1232 1 -1.03 -0.0405 -0.994
## 4 12321 1232 2 -1.03 -0.0405 -0.994
## 5 12321 1232 3 -1.03 -0.0405 -0.994
## 6 12322 1232 1 0.953 -0.0405 0.994
## 7 12322 1232 2 0.953 -0.0405 0.994
## 8 12322 1232 3 0.953 -0.0405 0.994
## 9 12521 1252 1 -0.662 0.0548 -0.717
## 10 12621 1262 2 0.915 0.772 0.143
## 11 12621 1262 3 0.915 0.772 0.143
## 12 12622 1262 1 0.676 0.772 -0.0956
## 13 12622 1262 2 0.676 0.772 -0.0956
## 14 12622 1262 3 0.676 0.772 -0.0956
## 15 13021 1302 1 -0.137 0.174 -0.311
## 16 13021 1302 2 -0.137 0.174 -0.311
## 17 13021 1302 3 -0.137 0.174 -0.311
## 18 13022 1302 1 0.485 0.174 0.311
## 19 13022 1302 2 0.485 0.174 0.311
## 20 13022 1302 3 0.485 0.174 0.311
Finally, I estimate a series of linear models:
I included this final model in order to accommodate the multiple observations for some twins. This is essentially a three-level MLM with observations (level 1) nested within twins (level 2) nested within families (level 3).
#OLS model including PGS and covariates
m1.0 <- lm(EXT_PC1_std ~ EXT_60_std + age + sex + age*sex +
PC_1 + PC_2 + PC_3 + PC_4 + PC_5 + PC_6 + PC_7 + PC_8 + PC_9 + PC_10,
na.action=na.omit,
data=dz_sample)
#mixed linear model including within and between PGSs (random effect for family only)
m1.1 <- lme(EXT_PC1_std ~ EXT_pgs_wthnFam + EXT_pgs_btwnFam,
random=list(~1|fam_id),
method="ML",
na.action=na.omit,
data=dz_sample)
#add covariates (random effect for family only)
m1.2 <- lme(EXT_PC1_std ~ EXT_pgs_wthnFam + EXT_pgs_btwnFam + age + sex + age*sex +
PC_1 + PC_2 + PC_3 + PC_4 + PC_5 + PC_6 + PC_7 + PC_8 + PC_9 + PC_10,
random=~1|fam_id,
method="ML",
na.action=na.omit,
data=dz_sample)
#add random effect for twin_id
m1.3 <- lme(EXT_PC1_std ~ EXT_pgs_wthnFam + EXT_pgs_btwnFam + age + sex + age*sex +
PC_1 + PC_2 + PC_3 + PC_4 + PC_5 + PC_6 + PC_7 + PC_8 + PC_9 + PC_10,
random=~1|fam_id/twin_id,
method="ML",
na.action=na.omit,
data=dz_sample)
#combine models in table
mlm_ext_pc <- stargazer(m1.0, m1.1, m1.2, m1.3,
type = "html",
keep = c("EXT_60_std", "EXT_pgs_btwnFam", "EXT_pgs_wthnFam"))
| Dependent variable: | ||||
| EXT_PC1_std | ||||
| OLS | linear | |||
| mixed effects | ||||
| (1) | (2) | (3) | (4) | |
| EXT_60_std | 0.067 | |||
| (0.048) | ||||
| EXT_pgs_wthnFam | 0.118 | 0.149* | 0.124 | |
| (0.079) | (0.078) | (0.098) | ||
| EXT_pgs_btwnFam | 0.055 | 0.070 | 0.065 | |
| (0.077) | (0.076) | (0.074) | ||
| Observations | 423 | 423 | 423 | 423 |
| R2 | 0.095 | |||
| Adjusted R2 | 0.063 | |||
| Log Likelihood | -568.472 | -551.423 | -536.896 | |
| Akaike Inf. Crit. | 1,146.945 | 1,138.845 | 1,111.793 | |
| Bayesian Inf. Crit. | 1,167.181 | 1,211.698 | 1,188.693 | |
| Residual Std. Error | 0.968 (df = 408) | |||
| F Statistic | 3.043*** (df = 14; 408) | |||
| Note: | p<0.1; p<0.05; p<0.01 | |||
The indirect effect is calculated by subtracting the between (total) effect from the within (direct) effect; however, the results I am getting are a little non-intuitive, as the direct effect is much larger than the total effect. We should be seeing the reverse, with the total being much larger than the direct effect, and the indirect making up the difference.
Results are similar when the externalizing PC scores are averaged across events for each twin (this removes a level (events) from the analysis):
| Dependent variable: | |||
| EXT_PC1_std_avg | |||
| OLS | linear | ||
| mixed effects | |||
| (1) | (2) | (3) | |
| EXT_60_std | 0.067 | ||
| (0.043) | |||
| EXT_pgs_wthnFam | 0.118* | 0.152** | |
| (0.063) | (0.061) | ||
| EXT_pgs_btwnFam | 0.057 | 0.080 | |
| (0.077) | (0.076) | ||
| Observations | 423 | 423 | 423 |
| R2 | 0.116 | ||
| Adjusted R2 | 0.085 | ||
| Log Likelihood | -497.038 | -472.127 | |
| Akaike Inf. Crit. | 1,004.076 | 980.254 | |
| Bayesian Inf. Crit. | 1,024.313 | 1,053.107 | |
| Residual Std. Error | 0.869 (df = 408) | ||
| F Statistic | 3.816*** (df = 14; 408) | ||
| Note: | p<0.1; p<0.05; p<0.01 | ||
Looking at the supplemental files from Demange et al.(https://www.biorxiv.org/content/10.1101/2020.09.15.296236v2), I noticed that this pattern of results did occur in at least one of their analyses, but the size of the gap between total and direct was much smaller than we are seeing here.
I have been going back through all of my code to make sure everything is coded correctly and I spotted a few errors, but fixing them really have no impact on these results. I would like to know if you have any thoughts about what we’re seeing here and maybe any ideas about how to proceed.
Thanks!