\[ \renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\Expec{\text{E}} \def\logit{\text{logit}} \def\diag{\text{diag}} \]
Aim: “Research to improve human health”
Analysis of twin data in health science
What is it about and how to compute?
## tvparnr bmi age gender zyg id num
## 1 1 26.3 57.5 male DZ 1 1
## 2 1 25.5 57.5 male DZ 1 2
## 3 2 28.7 56.6 male MZ 2 1
## 5 3 28.4 57.7 male DZ 3 1
## 7 4 27.3 53.7 male DZ 4 1
## 8 4 28.1 53.7 male DZ 4 2
## tvparnr bmi1 age gender zyg id num bmi2
## 1 1 26.3 57.5 male DZ 1 1 25.5
## 3 2 28.7 56.6 male MZ 2 1 NA
## 5 3 28.4 57.7 male DZ 3 1 NA
## 7 4 27.3 53.7 male DZ 4 1 28.1
## 9 5 27.8 52.6 male DZ 5 1 NA
## 11 6 28.0 52.5 male DZ 6 1 22.3
R code in supplement.
To target the within-pair dependencies in twin pairs of MZ and DZ origin.
To infer on sources of variation in continuous and dichotomous traits
We review the classic twin design of study to introduce measures of within pair dependence and see how they constitute the basis for biometric modelling.
We need a model for interpretation and comparison of within pair dependence.
##
## Pearson's product-moment correlation
##
## data: mz[, 1] and mz[, 2]
## t = 37, df = 1481, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.665 0.718
## sample estimates:
## cor
## 0.692
##
## Pearson's product-moment correlation
##
## data: dz[, 1] and dz[, 2]
## t = 22, df = 2786, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.356 0.419
## sample estimates:
## cor
## 0.388
\[ \left\{\begin{array}{l} Y = \textrm{Genes} + \textrm{Environment} \\ \Sigma_Y = \Sigma_{\textrm{Genes}} + \Sigma_{\textrm{Environment}} \\ \end{array}\right. \] Biometric model for the contribution of genetic and environmental factors to the variation in outcome.
What is this?
The pair \((Y_{1},Y_{2})\) is bivariate normal distributed with mean \((\mu_1,\mu_2)\) and variance-covariance matrix given by
\[\Sigma_{(Y_1,Y_2)} = \left( \begin{array}{cc} {\sigma_{Y_1}^2} & \rho{\sigma_{Y_1}}{\sigma_{Y_2}} \\ \rho{\sigma_{Y_1}}{\sigma_{Y_2}} & {\sigma_{Y_2}^2} \\ \end{array} \right) \]
The biometric ADCE-model, in which the individual outcome decomposes into \[Y_i=A_i+D_i+C_i+E_i,\] where
Assume \(z=u=1\) for MZ, \(z=\frac{1}{2}\) and \(u=\frac{1}{4}\) for DZ pairs: \[ \Sigma_{(G_1,G_2)} = \underbrace{\left( \begin{array}{cc} {\sigma_A^2} & z{\sigma_A^2} \\ z{\sigma_A^2} & {\sigma_A^2} \\ \end{array} \right)}_{\text{additive genetic covariance}} + \underbrace{\left( \begin{array}{cc} {\sigma_D^2} & u{\sigma_D^2} \\ u{\sigma_D^2} & {\sigma_D^2} \\ \end{array} \right)}_{\text{dominant genetic covariance}} \]
\[ \Sigma_{(E_1,E_2)} = \underbrace{\left( \begin{array}{cc} {\sigma_C^2} & {\sigma_C^2} \\ {\sigma_C^2} & {\sigma_C^2} \\ \end{array} \right)}_{\text{Common environment covariance}} + \underbrace{\left( \begin{array}{cc} {\sigma_E^2} & {0} \\ {0} & {\sigma_E^2} \\ \end{array} \right)}_{\text{Unique environment covariance}} \]
\[ \begin{align*} \Sigma_{(Y_1,Y_2)} &= \left( \begin{array}{cc} {\sigma_A^2} & z{\sigma_A^2} \\ z{\sigma_A^2} & {\sigma_A^2} \\ \end{array} \right) + \left( \begin{array}{cc} {\sigma_D^2} & u{\sigma_D^2} \\ u{\sigma_D^2} & {\sigma_D^2} \\ \end{array} \right) \\ &+ \left( \begin{array}{cc} {\sigma_C^2} & {\sigma_C^2} \\ {\sigma_C^2} & {\sigma_C^2} \\ \end{array} \right) + \left( \begin{array}{cc} {\sigma_E^2} & {0} \\ {0} & {\sigma_E^2} \\ \end{array} \right) \end{align*} \]
\[ h^2_{Y}=\frac{\sigma_{A}^2+\sigma_{D}^2}{\sigma_{A}^2+\sigma_{D}^2+ \sigma_{C}^2+ \sigma_{E}^2} \]
\[ c^2_{Y}=\frac{\sigma_{C}^2}{\sigma_{A}^2+\sigma_{D}^2+ \sigma_{C}^2+ \sigma_{E}^2} \]
The polygenic models may be represented in terms of a path diagram, the ACE below:
New trait: Liability to state “Yes”
\[ Y_i = \left\{ \begin{array}{rl} \textrm{disease yes} & \text{if } Z_i>\tau\\ \textrm{disease no} & \text{if } Z_i\le\tau \end{array} \right. \]
Bivariate standard normality of liabilities
\[ (Z_1,Z_2)\sim\textrm{MvN}\{(0,\left( \begin{array}{cc} 1 & \rho \ \\ \rho & 1 \\ \end{array} \right) )\} \] Tetrachoric correlation of categorical variables is by definition the usual correlation in liabilities to outcomes, \(\rho\). Thresholds and tetrachorics are estimated from the liability-threshold model.
The biometric ADCE model can now be applied to the continuous liability to outome:
\[Z_i=A_i+D_i+C_i+E_i\] providing heritability and common environmental effects:
\[ h^2_{Z}=\sigma_{A}^2+\sigma_{D}^2, \qquad c^2_{Z}=\sigma_{C}^2 \]
We illustrate the classic biometric modelling in application to BMI data in twins and then to the dichotoumous twin data of stuttering.
## tvparnr bmi age gender zyg id num
## 1 1 26.3 57.5 male DZ 1 1
## 2 1 25.5 57.5 male DZ 1 2
## 3 2 28.7 56.6 male MZ 2 1
## 5 3 28.4 57.7 male DZ 3 1
## 7 4 27.3 53.7 male DZ 4 1
## 8 4 28.1 53.7 male DZ 4 2
Regression model of BMI on sex and (non-linear) age.
l1 <- lm(bmi ~ gender*ns(age,3), data=twinbmi)
estimate(l1, id=twinbmi$tvparnr) # Robust standard errors ## Estimate Std.Err 2.5% 97.5% P-value
## (Intercept) 22.2413 0.1642 21.9194 22.5632 0.000e+00
## gendermale 1.4538 0.2449 0.9738 1.9338 2.916e-09
## ns(age, 3)1 2.3792 0.2376 1.9134 2.8450 1.358e-23
## ns(age, 3)2 4.0896 0.4180 3.2704 4.9089 1.312e-22
## ns(age, 3)3 3.2738 0.2310 2.8212 3.7265 1.309e-45
## gendermale:ns(age, 3)1 -0.5309 0.3278 -1.1733 0.1116 1.053e-01
## gendermale:ns(age, 3)2 -0.3136 0.6111 -1.5113 0.8842 6.079e-01
## gendermale:ns(age, 3)3 -1.4572 0.3126 -2.0698 -0.8446 3.128e-06
-providing curves above and suggests sex and age dependence.
| Zygosity | Number of pairs | Correlation (95% CI) |
|---|---|---|
| MZ pairs | 1483 | ? (?,?) |
| DZ pairs | 2788 | ? (?,?) |
-we aim for the intraclass correlation in log of BMI for MZ and DZ pairs
Analysis implemented in R script ‘twinbmi22.R’ in supplements.
We begin by fitting the saturated bivariate normal model.
lnbmi.sat <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="sat")
lnbmi.sat## ____________________________________________________
## Group 1: MZ (n=1483)
## Estimate Std. Error Z value Pr(>|z|)
## Regressions:
## logbmi.1~age.1 0.006 0.001 10.221 <1e-12
## logbmi.1~gendermale.1 0.165 0.042 3.939 8.2e-05
## logbmi.1~age:gendermale.1 -0.002 0.001 -2.678 0.0074
## logbmi.2~age.1 0.006 0.001 11.069 <1e-12
## logbmi.2~gendermale.1 0.186 0.041 4.592 4.4e-06
## logbmi.2~age:gendermale.1 -0.003 0.001 -3.064 0.0022
## Intercepts:
## logbmi.1 2.889 0.026 110.667 <1e-12
## logbmi.2 2.871 0.025 113.421 <1e-12
## Additional Parameters:
## log(var(MZ)).1 -4.009 0.037 -108.912 <1e-12
## log(var(MZ)).2 -4.071 0.037 -110.588 <1e-12
## atanh(rhoMZ) 0.770 0.026 29.493 <1e-12
## ____________________________________________________
## Group 2: DZ (n=2788)
## Estimate Std. Error Z value Pr(>|z|)
## Regressions:
## logbmi.1~age.1 0.006 0.000 13.117 <1e-12
## logbmi.1~gendermale.1 0.155 0.030 5.170 2.3e-07
## logbmi.1~age:gendermale.1 -0.002 0.001 -3.007 0.0026
## logbmi.2~age.1 0.006 0.000 13.270 <1e-12
## logbmi.2~gendermale.1 0.168 0.030 5.609 2e-08
## logbmi.2~age:gendermale.1 -0.003 0.001 -3.865 0.00011
## Intercepts:
## logbmi.1 2.914 0.019 149.928 <1e-12
## logbmi.2 2.914 0.019 150.096 <1e-12
## Additional Parameters:
## log(var(DZ)).1 -4.024 0.027 -149.867 <1e-12
## log(var(DZ)).2 -4.026 0.027 -149.946 <1e-12
## atanh(rhoDZ) 0.314 0.019 16.574 <1e-12
##
## Estimate 2.5% 97.5%
## Correlation within MZ: 0.647 0.616 0.675
## Correlation within DZ: 0.304 0.270 0.337
##
## 'log Lik.' 5629 (df=22)
## AIC: -11214
## BIC: -11074
## Equal regressions, intercepts and residual variances for twin 1 and twin 2.
lnbmi.flex <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi,
type="flex")
lnbmi.flex## ____________________________________________________
## Group 1: MZ (n=1483)
## Estimate Std. Error Z value Pr(>|z|)
## Regressions:
## logbmi.1~age.1 0.006 0.001 11.796 <1e-12
## logbmi.1~gendermale.1 0.175 0.037 4.695 2.7e-06
## logbmi.1~age:gendermale.1 -0.003 0.001 -3.161 0.0016
## Intercepts:
## logbmi.1 2.880 0.023 124.195 <1e-12
## Additional Parameters:
## log(var(MZ)) -4.039 0.031 -130.319 <1e-12
## atanh(rhoMZ) 0.768 0.026 29.532 <1e-12
## ____________________________________________________
## Group 2: DZ (n=2788)
## Estimate Std. Error Z value Pr(>|z|)
## Regressions:
## logbmi.1~age.1 0.006 0.000 16.336 <1e-12
## logbmi.1~gendermale.1 0.162 0.024 6.674 2.5e-11
## logbmi.1~age:gendermale.1 -0.002 0.001 -4.255 2.1e-05
## Intercepts:
## logbmi.1 2.914 0.016 185.745 <1e-12
## Additional Parameters:
## log(var(DZ)) -4.024 0.020 -202.753 <1e-12
## atanh(rhoDZ) 0.313 0.019 16.495 <1e-12
##
## Estimate 2.5% 97.5%
## Correlation within MZ: 0.646 0.615 0.674
## Correlation within DZ: 0.303 0.269 0.337
##
## 'log Lik.' 5623 (df=12)
## AIC: -11223
## BIC: -11146
##
## - Likelihood ratio test -
##
## data:
## chisq = 12, df = 10, p-value = 0.3
## sample estimates:
## log likelihood (model 1) log likelihood (model 2)
## 5629 5623
## Further, equal for MZ and DZ twins as well.
lnbmi.u <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg", data=twinbmi, type="u", control=list(refit=TRUE))
lnbmi.u## ____________________________________________________
## Group 1: MZ (n=1483)
## Estimate Std. Error Z value Pr(>|z|)
## Regressions:
## logbmi.1~age.1 0.006 0.000 20.151 <1e-12
## logbmi.1~gendermale.1 0.166 0.020 8.156 <1e-12
## logbmi.1~age:gendermale.1 -0.002 0.000 -5.278 1.3e-07
## Intercepts:
## logbmi.1 2.903 0.013 222.926 <1e-12
## Additional Parameters:
## log(var) -4.026 0.017 -240.678 <1e-12
## atanh(rhoMZ) 0.775 0.023 33.521 <1e-12
## ____________________________________________________
## Group 2: DZ (n=2788)
## Estimate Std. Error Z value Pr(>|z|)
## Regressions:
## logbmi.1~age.1 0.006 0.000 20.151 <1e-12
## logbmi.1~gendermale.1 0.166 0.020 8.156 <1e-12
## logbmi.1~age:gendermale.1 -0.002 0.000 -5.278 1.3e-07
## Intercepts:
## logbmi.1 2.903 0.013 222.926 <1e-12
## Additional Parameters:
## log(var) -4.026 0.017 -240.678 <1e-12
## atanh(rhoDZ) 0.313 0.019 16.747 <1e-12
##
## Estimate 2.5% 97.5%
## Correlation within MZ: 0.650 0.623 0.675
## Correlation within DZ: 0.303 0.270 0.336
##
## 'log Lik.' 5614 (df=7)
## AIC: -11215
## BIC: -11170
##
## - Likelihood ratio test -
##
## data:
## chisq = 18, df = 5, p-value = 0.003
## sample estimates:
## log likelihood (model 1) log likelihood (model 2)
## 5614 5623
Saturated submodels - model selection
| Submodel | Likelihood | df | \(-2\Delta\textrm{X}^2\) | \(\Delta\textrm{df}\) | p | AIC |
|---|---|---|---|---|---|---|
| Saturated | 5629.137 | 22 | -11214.27 | |||
| “equal 1 and 2” | 5623.369 | 12 | 11.537 | 10 | 0.3172 | -11222.74 |
| “equal MZ and DZ” | 5614.387 | 7 | 17.962 | 5 | 0.002994 | -11214.77 |
# A formal test of genetic effects can be obtained by comparing the MZ and DZ correlation:
estimate(lnbmi.u,contr(6:7,7))## Estimate Std.Err 2.5% 97.5% P-value
## [atanh(rhoMZ)@1] - [a.... 0.4622 0.03848 0.3867 0.5376 3.114e-33
##
## Null Hypothesis:
## [atanh(rhoMZ)@1] - [atanh(rhoDZ)@2] = 0
| Zygosity | Number of pairs | Correlation (95% CI) |
|---|---|---|
| MZ pairs | 1483 | 0.65 (0.62,0.68) |
| DZ pairs | 2788 | 0.30 (0.27,0.34) |
The within pair intraclass correlation is the amount of variance between pairs of the total variance in the trait.
lnbmi.ace <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="ace", control=list(start=coef(lnbmi.sat)))
lnbmi.ace## Estimate Std. Error Z value Pr(>|z|)
## logbmi 2.90e+00 1.31e-02 221.76 < 2e-16
## sd(A) 1.07e-01 1.66e-03 64.60 < 2e-16
## sd(C) 6.10e-09 1.59e-02 0.00 1
## sd(E) 7.97e-02 1.35e-03 58.92 < 2e-16
## logbmi~age 5.83e-03 2.91e-04 20.05 < 2e-16
## logbmi~gendermale 1.66e-01 2.05e-02 8.11 5.0e-16
## logbmi~age:gendermale -2.36e-03 4.49e-04 -5.25 1.5e-07
##
## MZ-pairs DZ-pairs
## 1483 2788
##
## Variance decomposition:
## Estimate 2.5% 97.5%
## A 0.645 0.619 0.671
## C 0.000 0.000 0.000
## E 0.355 0.329 0.381
##
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.645 0.619 0.671
##
## Estimate 2.5% 97.5%
## Correlation within MZ: 0.645 0.618 0.670
## Correlation within DZ: 0.322 0.310 0.335
##
## 'log Lik.' 5614 (df=7)
## AIC: -11213
## BIC: -11169
## df AIC
## lnbmi.u 7 -11215
## lnbmi.ace 7 -11213
-corresponds in fit to the selected correlation model.
lnbmi.ade <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="ade", control=list(refit=TRUE))
lnbmi.ade ## Estimate Std. Error Z value Pr(>|z|)
## logbmi 2.902614 0.013020 222.93 < 2e-16
## sd(A) 0.100269 0.006090 16.46 < 2e-16
## sd(D) 0.039375 0.015531 2.54 0.011
## sd(E) 0.079048 0.001428 55.34 < 2e-16
## logbmi~age 0.005833 0.000289 20.15 < 2e-16
## logbmi~gendermale 0.166060 0.020360 8.16 3.5e-16
## logbmi~age:gendermale -0.002355 0.000446 -5.28 1.3e-07
##
## MZ-pairs DZ-pairs
## 1483 2788
##
## Variance decomposition:
## Estimate 2.5% 97.5%
## A 0.563 0.434 0.693
## D 0.087 -0.048 0.221
## E 0.350 0.324 0.376
##
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.650 0.624 0.676
##
## Estimate 2.5% 97.5%
## Correlation within MZ: 0.650 0.623 0.675
## Correlation within DZ: 0.303 0.271 0.335
##
## 'log Lik.' 5614 (df=7)
## AIC: -11215
## BIC: -11170
## df AIC
## lnbmi.ace 7 -11213
## lnbmi.ade 7 -11215
lnbmi.ae <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="ae",control=list(refit=TRUE))
lnbmi.ae ## Estimate Std. Error Z value Pr(>|z|)
## logbmi 2.902481 0.013088 221.76 < 2e-16
## sd(A) 0.107432 0.001663 64.60 < 2e-16
## sd(E) 0.079726 0.001353 58.92 < 2e-16
## logbmi~age 0.005834 0.000291 20.05 < 2e-16
## logbmi~gendermale 0.166111 0.020481 8.11 5.0e-16
## logbmi~age:gendermale -0.002356 0.000449 -5.25 1.5e-07
##
## MZ-pairs DZ-pairs
## 1483 2788
##
## Variance decomposition:
## Estimate 2.5% 97.5%
## A 0.645 0.619 0.671
## E 0.355 0.329 0.381
##
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.645 0.619 0.671
##
## Estimate 2.5% 97.5%
## Correlation within MZ: 0.645 0.618 0.670
## Correlation within DZ: 0.322 0.310 0.335
##
## 'log Lik.' 5614 (df=6)
## AIC: -11215
## BIC: -11177
##
## - Likelihood ratio test -
##
## data:
## chisq = 2, df = 1, p-value = 0.2
## sample estimates:
## log likelihood (model 1) log likelihood (model 2)
## 5614 5614
lnbmi.ce <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="ce",control=list(method="NR"))
lnbmi.ce ## Estimate Std. Error Z value Pr(>|z|)
## logbmi 2.900785 0.013069 221.96 < 2e-16
## sd(C) 0.086848 0.001712 50.72 < 2e-16
## sd(E) 0.101487 0.001099 92.33 < 2e-16
## logbmi~age 0.005856 0.000291 20.15 < 2e-16
## logbmi~gendermale 0.166762 0.020482 8.14 3.9e-16
## logbmi~age:gendermale -0.002372 0.000449 -5.28 1.3e-07
##
## MZ-pairs DZ-pairs
## 1483 2788
##
## Variance decomposition:
## Estimate 2.5% 97.5%
## C 0.423 0.398 0.447
## E 0.577 0.553 0.602
##
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0 0 0
##
## Estimate 2.5% 97.5%
## Correlation within MZ: 0.423 0.398 0.447
## Correlation within DZ: 0.423 0.398 0.447
##
## 'log Lik.' 5496 (df=6)
## AIC: -10979
## BIC: -10941
## df AIC
## lnbmi.ae 6 -11215
## lnbmi.ce 6 -10979
##
## - Likelihood ratio test -
##
## data:
## chisq = 236, df = 1, p-value <2e-16
## sample estimates:
## log likelihood (model 1) log likelihood (model 2)
## 5614 5496
## df AIC
## lnbmi.ace 7 -11213
## lnbmi.ade 7 -11215
## lnbmi.ae 6 -11215
## lnbmi.ce 6 -10979
## lnbmi.e 5 -10141
| Models | ‘Likelihood’ | df | \(-2\Delta\textrm{X}^2\) | \(\Delta\textrm{df}\) | p | AIC |
|---|---|---|---|---|---|---|
| Saturated | 5629.13 | 22 | -11214.27 | |||
| ACE | 5613.62 | 7 | -11213.25 | |||
| ADE | 5614.39 | 7 | -11214.77 | |||
| AE (*) | 5613.62 | 6 | 1.527 | 1 | \(0.22\) | -11215.25 |
| CE | 5495.68 | 6 | 235.9 | 1 | \(<0.0001\) | -10979.37 |
Remarks below
| Zygosity | Number of pairs | Correlation (95% CI) | Heritability (95% CI) |
|---|---|---|---|
| MZ pairs | 1483 | 0.65 (0.62,0.68) | 0.64 (0.62,0.67) |
| DZ pairs | 2788 | 0.30 (0.27,0.34) | AE model |
We illustrate the classic biometric modelling in application to the dichotoumous twin data of stuttering.
Case study: Genetic influence on Stuttering
| Zyg and sex | reported answer | positive answer | prevalence |
|---|---|---|---|
| MZ females | 4947 | 187 | 0.0378 |
| DZ females | 6674 | 245 | 0.0367 |
| MZ males | 3936 | 325 | 0.0826 |
| DZ males | 6005 | 498 | 0.0829 |
## tvparnr zyg stutter sex age nr binstut
## 1 1 mz no female 71 1 0
## 2 1 mz no female 71 2 0
## 3 2 dz no female 71 1 0
## 4 3 os no male 71 1 0
## 6 4 os no male 71 2 0
## 8 5 mz no female 71 1 0
twinstutwide <- fast.reshape(twinstut, id="tvparnr",varying=c("stutter","binstut"))
#head(twinstutwide)
with(twinstutwide[twinstutwide$zyg=="mz",], table(stutter1,stutter2))## stutter2
## stutter1 no yes
## no 2992 85
## yes 88 90
## stutter2
## stutter1 no yes
## no 3640 204
## yes 193 21
stut.flex <- bptwin(binstut~sex,data=twinstut,id="tvparnr",zyg="zyg",DZ="dz",
type="flex", control=list(method="NR"))
#score(stut.flex)
summary(stut.flex)##
## Estimate Std.Err Z p-value
## (Intercept) MZ -1.8609 0.0243 -76.72 <2e-16 ***
## sexmale MZ 0.4885 0.0306 15.96 <2e-16 ***
## (Intercept) DZ -1.8035 0.0296 -60.99 <2e-16 ***
## sexmale DZ 0.4113 0.0379 10.85 <2e-16 ***
## atanh(rho) MZ 0.6868 0.0485 14.17 <2e-16 ***
## atanh(rho) DZ 0.1325 0.0624 2.12 0.034 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 20383/12511 6762/4058
##
## Estimate 2.5% 97.5%
## Tetrachoric correlation MZ 0.596 0.531 0.654
## Tetrachoric correlation DZ 0.132 0.010 0.249
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.008 0.007 0.010
## Casewise Concordance 0.265 0.222 0.312
## Marginal 0.031 0.028 0.035
## Rel.Recur.Risk 8.430 6.910 9.949
## log(OR) 2.690 2.398 2.982
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.002 0.001 0.004
## Casewise Concordance 0.064 0.038 0.103
## Marginal 0.036 0.031 0.040
## Rel.Recur.Risk 1.784 0.927 2.640
## log(OR) 0.639 0.092 1.185
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.928 0.659 1.198
Note we estimate from the bivariate probit model:
stut.u <- bptwin(binstut~sex,data=twinstut,id="tvparnr",zyg="zyg",DZ="dz",type="un")
summary(stut.u)##
## Estimate Std.Err Z p-value
## (Intercept) -1.8394 0.0188 -98.09 <2e-16 ***
## sexmale 0.4603 0.0238 19.32 <2e-16 ***
## atanh(rho) MZ 0.6894 0.0486 14.18 <2e-16 ***
## atanh(rho) DZ 0.1339 0.0622 2.15 0.031 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 20383/12511 6762/4058
##
## Estimate 2.5% 97.5%
## Tetrachoric correlation MZ 0.598 0.533 0.655
## Tetrachoric correlation DZ 0.133 0.012 0.250
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.009 0.007 0.011
## Casewise Concordance 0.270 0.227 0.317
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 8.195 6.779 9.612
## log(OR) 2.674 2.384 2.964
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.002 0.001 0.003
## Casewise Concordance 0.060 0.036 0.098
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 1.824 0.928 2.720
## log(OR) 0.659 0.104 1.214
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.929 0.660 1.198
##
## - Likelihood ratio test -
##
## data:
## chisq = 3, df = 2, p-value = 0.2
## sample estimates:
## log likelihood (model 1) log likelihood (model 2)
## -6765 -6766
-looks fine!
stut.sex.u <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz",type="un")
summary(stut.sex.u)##
## Estimate Std.Err Z p-value
## (Intercept) -1.8407 0.0188 -97.70 <2e-16 ***
## atanh(rho) MZ 1.0542 0.1127 9.36 <2e-16 ***
## atanh(rho) DZ 0.1915 0.0957 2.00 0.045 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 11352/6581 1900/2336
##
## Estimate 2.5% 97.5%
## Tetrachoric correlation MZ 0.783 0.682 0.855
## Tetrachoric correlation DZ 0.189 0.004 0.362
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.014 0.011 0.019
## Casewise Concordance 0.440 0.343 0.541
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 13.401 10.302 16.501
## log(OR) 3.703 3.119 4.286
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.002 0.001 0.005
## Casewise Concordance 0.075 0.037 0.146
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 2.279 0.703 3.854
## log(OR) 0.914 0.109 1.719
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 1 NaN NaN
##
##
## Estimate Std.Err Z p-value
## (Intercept) -1.3822 0.0153 -90.52 <2e-16 ***
## atanh(rho) MZ 1.1160 0.0964 11.58 <2e-16 ***
## atanh(rho) DZ 0.0934 0.0799 1.17 0.24
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 9031/5930 1355/1722
##
## Estimate 2.5% 97.5%
## Tetrachoric correlation MZ 0.806 0.729 0.863
## Tetrachoric correlation DZ 0.093 -0.063 0.245
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.046 0.039 0.054
## Casewise Concordance 0.550 0.475 0.623
## Marginal 0.083 0.079 0.088
## Rel.Recur.Risk 6.596 5.678 7.513
## log(OR) 3.357 2.887 3.827
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.009 0.006 0.015
## Casewise Concordance 0.112 0.070 0.176
## Marginal 0.083 0.079 0.088
## Rel.Recur.Risk 1.343 0.723 1.964
## log(OR) 0.362 -0.222 0.945
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 1 NaN NaN
stut.ace <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz", type="ace")
summary(stut.ace)##
## Estimate Std.Err Z p-value
## (Intercept) -3.723 0.366 -10.18 < 2e-16 ***
## log(var(A)) 1.128 0.261 4.33 1.5e-05 ***
## log(var(C)) -16.397 0.413 -39.67 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 11352/6581 1900/2336
##
## Estimate 2.5% 97.5%
## A 0.756 0.661 0.850
## C 0.000 0.000 0.000
## E 0.244 0.150 0.339
## MZ Tetrachoric Cor 0.756 0.645 0.835
## DZ Tetrachoric Cor 0.378 0.330 0.424
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.013 0.010 0.018
## Casewise Concordance 0.409 0.312 0.513
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 12.446 9.315 15.577
## log(OR) 3.519 2.927 4.112
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.005 0.004 0.006
## Casewise Concordance 0.143 0.122 0.167
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 4.361 3.707 5.014
## log(OR) 1.719 1.521 1.917
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.756 0.661 0.850
##
##
## Estimate Std.Err Z p-value
## (Intercept) -2.860 0.247 -11.56 < 2e-16 ***
## log(var(A)) 1.193 0.226 5.28 1.3e-07 ***
## log(var(C)) -16.763 0.264 -63.41 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 9031/5930 1355/1722
##
## Estimate 2.5% 97.5%
## A 0.767 0.688 0.846
## C 0.000 0.000 0.000
## E 0.233 0.154 0.312
## MZ Tetrachoric Cor 0.767 0.676 0.835
## DZ Tetrachoric Cor 0.384 0.343 0.422
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.043 0.036 0.051
## Casewise Concordance 0.510 0.430 0.589
## Marginal 0.084 0.079 0.089
## Rel.Recur.Risk 6.080 5.113 7.047
## log(OR) 3.096 2.608 3.584
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.020 0.017 0.022
## Casewise Concordance 0.235 0.214 0.258
## Marginal 0.084 0.079 0.089
## Rel.Recur.Risk 2.806 2.556 3.057
## log(OR) 1.408 1.265 1.551
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.767 0.688 0.846
stut.ade <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz", type="ade")
summary(stut.ade)##
## Estimate Std.Err Z p-value
## (Intercept) -3.952 0.394 -10.04 < 2e-16 ***
## log(var(A)) -12.257 13.579 -0.90 0.37
## log(var(D)) 1.284 0.255 5.04 4.8e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 11352/6581 1900/2336
##
## Estimate 2.5% 97.5%
## A 0.000 0.000 0.000
## D 0.783 0.698 0.868
## E 0.217 0.132 0.302
## MZ Tetrachoric Cor 0.783 0.683 0.855
## DZ Tetrachoric Cor 0.196 0.174 0.217
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.014 0.011 0.019
## Casewise Concordance 0.440 0.344 0.540
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 13.389 10.307 16.470
## log(OR) 3.700 3.120 4.280
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.003 0.002 0.003
## Casewise Concordance 0.077 0.069 0.085
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 2.336 2.144 2.528
## log(OR) 0.943 0.848 1.037
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.783 0.698 0.868
##
##
## Estimate Std.Err Z p-value
## (Intercept) -3.097 0.270 -11.5 < 2e-16 ***
## log(var(A)) -14.081 0.697 -20.2 < 2e-16 ***
## log(var(D)) 1.393 0.218 6.4 1.6e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 9031/5930 1355/1722
##
## Estimate 2.5% 97.5%
## A 0.000 0.000 0.000
## D 0.801 0.733 0.869
## E 0.199 0.131 0.267
## MZ Tetrachoric Cor 0.801 0.722 0.859
## DZ Tetrachoric Cor 0.200 0.183 0.217
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.046 0.039 0.053
## Casewise Concordance 0.545 0.469 0.619
## Marginal 0.084 0.079 0.088
## Rel.Recur.Risk 6.521 5.595 7.446
## log(OR) 3.320 2.846 3.793
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.013 0.011 0.014
## Casewise Concordance 0.151 0.142 0.161
## Marginal 0.084 0.079 0.088
## Rel.Recur.Risk 1.811 1.727 1.894
## log(OR) 0.754 0.693 0.815
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.801 0.733 0.869
## df AIC
## stut.ace 6 13491
## stut.ade 6 13471
stut.ae <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz", type="ae")
summary(stut.ae)##
## Estimate Std.Err Z p-value
## (Intercept) -3.723 0.366 -10.18 < 2e-16 ***
## log(var(A)) 1.128 0.261 4.33 1.5e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 11352/6581 1900/2336
##
## Estimate 2.5% 97.5%
## A 0.756 0.661 0.850
## E 0.244 0.150 0.339
## MZ Tetrachoric Cor 0.756 0.645 0.835
## DZ Tetrachoric Cor 0.378 0.330 0.424
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.013 0.010 0.018
## Casewise Concordance 0.409 0.312 0.513
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 12.446 9.315 15.577
## log(OR) 3.519 2.927 4.112
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.005 0.004 0.006
## Casewise Concordance 0.143 0.122 0.167
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 4.361 3.707 5.014
## log(OR) 1.719 1.521 1.917
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.756 0.661 0.850
##
##
## Estimate Std.Err Z p-value
## (Intercept) -2.860 0.247 -11.56 < 2e-16 ***
## log(var(A)) 1.193 0.226 5.28 1.3e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 9031/5930 1355/1722
##
## Estimate 2.5% 97.5%
## A 0.767 0.688 0.846
## E 0.233 0.154 0.312
## MZ Tetrachoric Cor 0.767 0.676 0.835
## DZ Tetrachoric Cor 0.384 0.343 0.422
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.043 0.036 0.051
## Casewise Concordance 0.510 0.430 0.589
## Marginal 0.084 0.079 0.089
## Rel.Recur.Risk 6.080 5.113 7.047
## log(OR) 3.096 2.608 3.584
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.020 0.017 0.022
## Casewise Concordance 0.235 0.214 0.258
## Marginal 0.084 0.079 0.089
## Rel.Recur.Risk 2.806 2.556 3.057
## log(OR) 1.408 1.265 1.551
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0.767 0.688 0.846
##
## - Likelihood ratio test -
##
## data:
## chisq = 19, df = 2, p-value = 6e-05
## sample estimates:
## log likelihood (model 1) log likelihood (model 2)
## -6730 -6739
stut.ce <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz", type="ce", control=list(method="NR"))
summary(stut.ce)##
## Estimate Std.Err Z p-value
## (Intercept) -2.715 0.141 -19.23 <2e-16 ***
## log(var(C)) 0.160 0.192 0.84 0.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 11352/6581 1900/2336
##
## Estimate 2.5% 97.5%
## C 0.540 0.447 0.633
## E 0.460 0.367 0.553
## MZ Tetrachoric Cor 0.540 0.440 0.627
## DZ Tetrachoric Cor 0.540 0.440 0.627
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.008 0.006 0.010
## Casewise Concordance 0.230 0.175 0.296
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 7.026 5.199 8.853
## log(OR) 2.413 1.999 2.827
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.008 0.006 0.010
## Casewise Concordance 0.230 0.175 0.296
## Marginal 0.033 0.030 0.036
## Rel.Recur.Risk 7.026 5.199 8.853
## log(OR) 2.413 1.999 2.827
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0 0 0
##
##
## Estimate Std.Err Z p-value
## (Intercept) -1.9940 0.0858 -23.2 <2e-16 ***
## log(var(C)) 0.0814 0.1634 0.5 0.62
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total MZ/DZ Complete pairs MZ/DZ
## 9031/5930 1355/1722
##
## Estimate 2.5% 97.5%
## C 0.520 0.440 0.600
## E 0.480 0.400 0.560
## MZ Tetrachoric Cor 0.520 0.436 0.596
## DZ Tetrachoric Cor 0.520 0.436 0.596
##
## MZ:
## Estimate 2.5% 97.5%
## Concordance 0.026 0.022 0.032
## Casewise Concordance 0.314 0.264 0.368
## Marginal 0.084 0.079 0.088
## Rel.Recur.Risk 3.749 3.139 4.359
## log(OR) 1.922 1.608 2.236
## DZ:
## Estimate 2.5% 97.5%
## Concordance 0.026 0.022 0.032
## Casewise Concordance 0.314 0.264 0.368
## Marginal 0.084 0.079 0.088
## Rel.Recur.Risk 3.749 3.139 4.359
## log(OR) 1.922 1.608 2.236
##
## Estimate 2.5% 97.5%
## Broad-sense heritability 0 0 0
## df AIC
## stut.ade 6 13471
## stut.ace 6 13491
## stut.ae 4 13487
## stut.ce 4 13576
Liability threshold model:
| Zyg and sex | prevalence | concordance | tetrachorics | heritability (95% CI) |
|---|---|---|---|---|
| MZ females | .04 | .46 (.36,.56) | .79 (.69,.86) | .79 (.70,.87) |
| DZ females | .04 | .08 (.07,.09) | .20 (.18,.22) | ADE model |
| MZ males | .08 | .54 (.46,.62) | .80 (.72,.86) | .80 (.73,.87) |
| DZ males | .08 | .15 (.14,.16) | .20 (.18,.22) | ADE model |
(Fibiger et al. 2008)
The classic twin analysis consists in modelling within pair dependence using the linear correlation measure. This ties with variance components within and between pairs of total variance in trait of study.
Such components may be further decomposed into genetic and environmental origin based on natural assumptions from quantitative genetics.
We discuss various extensions.
| Relation | Genetics | Environment | Examples |
|---|---|---|---|
| \(\rho_{mz}>4\rho_{dz}\) | Epistasis | albinism | |
| \(\rho_{mz}>2\rho_{dz}\) | Genetic dominance D | ||
| \(\rho_{mz}=2\rho_{dz}\) | Additive effect A (mono- or polygenic) and small D | Small C | BMI |
| \(2\rho_{dz}>\rho_{mz}>\rho_{dz}\) | Additive genes A | Shared environment C | longevity |
| \(\rho_{mz}=\rho_{dz}>0\) | No genetic effect | C | brain cancer |
| \(\rho_{mz}=\rho_{dz}=0\) | No genetic effect | No familial aggregation |
The classic model has different names due to different equivalent representations:
-some implementations works better than others.
Statistical software for implementation and estimation having emphasis will be:
We discuss now the SEM representation of the biometric twin model.
\[ {\Sigma_Y} = \left( \begin{array}{cc} \beta^2{\sigma_X^2}+{\sigma_{\epsilon}^2} & \beta{\sigma_X^2} \\ \beta{\sigma_X^2} & {\sigma_X^2} \\ \end{array} \right) \]
The presentation is at rpubs.com/jhjelmborg/SDU_biometric_analysis_twins_2022.