Biometric Analysis of Twin Data

dummy slide

Introduction

\[ \renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\Expec{\text{E}} \def\logit{\text{logit}} \def\diag{\text{diag}} \]

Knowledge from data

Aim: “Research to improve human health”

Confounding

Outline

Analysis of twin data in health science

  • Biometric modelling
  • Matched analysis

What is it about and how to compute?

Session: Biometric analysis

  • Introduction: aims and applications
  • Methodology: measures of within pair dependence and biometric models
  • Results: Applications having continuous or categorical outcomes.
  • Discussion: Pro and con of models and perspectives.

Biometrics

  • 1918: Biometric landmark papers by R.A. Fisher
  • ‘underlying sources of variation in phenotype?’

Empirical basis

# install.packages("mets")

library(mets)
data("twinbmi")
head(twinbmi)
##   tvparnr  bmi  age gender zyg id num
## 1       1 26.3 57.5   male  DZ  1   1
## 2       1 25.5 57.5   male  DZ  1   2
## 3       2 28.7 56.6   male  MZ  2   1
## 5       3 28.4 57.7   male  DZ  3   1
## 7       4 27.3 53.7   male  DZ  4   1
## 8       4 28.1 53.7   male  DZ  4   2

Twin-twin plot

##    tvparnr bmi1  age gender zyg id num bmi2
## 1        1 26.3 57.5   male  DZ  1   1 25.5
## 3        2 28.7 56.6   male  MZ  2   1   NA
## 5        3 28.4 57.7   male  DZ  3   1   NA
## 7        4 27.3 53.7   male  DZ  4   1 28.1
## 9        5 27.8 52.6   male  DZ  5   1   NA
## 11       6 28.0 52.5   male  DZ  6   1 22.3

R code in supplement.

Goal

To target the within-pair dependencies in twin pairs of MZ and DZ origin.

To infer on sources of variation in continuous and dichotomous traits

Methodology

Synopsis

We review the classic twin design of study to introduce measures of within pair dependence and see how they constitute the basis for biometric modelling.

Measures of within pair dependence

Measures of within pair dependence - classic

  • Pearson product moment correlation, \(\rho\)
  • Alternatives: Spearman rank correlation, Kendalls tau, \(\ldots\)
  • Other measures to be discussed.

Correlation path diagram representation

We need a model for interpretation and comparison of within pair dependence.

Measures of within pair dependence - MZ

## 
##  Pearson's product-moment correlation
## 
## data:  mz[, 1] and mz[, 2]
## t = 37, df = 1481, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.665 0.718
## sample estimates:
##   cor 
## 0.692

Measures of within pair dependence - DZ

## 
##  Pearson's product-moment correlation
## 
## data:  dz[, 1] and dz[, 2]
## t = 22, df = 2786, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.356 0.419
## sample estimates:
##   cor 
## 0.388

Biometrics - classics

  • Landmark papers in Biometrical Genetics 1918.
  • Outcome: Continuous trait (eg.~BMI, \(\ldots\)).
  • Family members variation and covariation biometric model.

Biometrics - classics

\[ \left\{\begin{array}{l} Y = \textrm{Genes} + \textrm{Environment} \\ \Sigma_Y = \Sigma_{\textrm{Genes}} + \Sigma_{\textrm{Environment}} \\ \end{array}\right. \] Biometric model for the contribution of genetic and environmental factors to the variation in outcome.

Biometrics - classics

  • Family members variance-covariance matrix: \[\Sigma_{(Y_1,Y_2)}=\Sigma_{(G_1,G_2)}+\Sigma_{(E_1,E_2)}\]

What is this?

Biometrics - classics

The pair \((Y_{1},Y_{2})\) is bivariate normal distributed with mean \((\mu_1,\mu_2)\) and variance-covariance matrix given by

\[\Sigma_{(Y_1,Y_2)} = \left( \begin{array}{cc} {\sigma_{Y_1}^2} & \rho{\sigma_{Y_1}}{\sigma_{Y_2}} \\ \rho{\sigma_{Y_1}}{\sigma_{Y_2}} & {\sigma_{Y_2}^2} \\ \end{array} \right) \]

  • The correlation \(\rho\) measuring within pair dependence enters here.

Biometrics - classics

  • equal mean and equal variance for twins labelled 1 and 2.

The polygenic ADCE biometric model

The biometric ADCE-model, in which the individual outcome decomposes into \[Y_i=A_i+D_i+C_i+E_i,\] where

  • \(A\): Additive genetic effects of alleles
  • \(D\): Dominant genetic effects
  • \(C\): Shared environmental effects
  • \(E\): Unique environmental effects

The polygenic ADCE biometric model

Assume \(z=u=1\) for MZ, \(z=\frac{1}{2}\) and \(u=\frac{1}{4}\) for DZ pairs: \[ \Sigma_{(G_1,G_2)} = \underbrace{\left( \begin{array}{cc} {\sigma_A^2} & z{\sigma_A^2} \\ z{\sigma_A^2} & {\sigma_A^2} \\ \end{array} \right)}_{\text{additive genetic covariance}} + \underbrace{\left( \begin{array}{cc} {\sigma_D^2} & u{\sigma_D^2} \\ u{\sigma_D^2} & {\sigma_D^2} \\ \end{array} \right)}_{\text{dominant genetic covariance}} \]

The polygenic ADCE biometric model

\[ \Sigma_{(E_1,E_2)} = \underbrace{\left( \begin{array}{cc} {\sigma_C^2} & {\sigma_C^2} \\ {\sigma_C^2} & {\sigma_C^2} \\ \end{array} \right)}_{\text{Common environment covariance}} + \underbrace{\left( \begin{array}{cc} {\sigma_E^2} & {0} \\ {0} & {\sigma_E^2} \\ \end{array} \right)}_{\text{Unique environment covariance}} \]

The polygenic ADCE biometric model

  • Assume \(z=u=1\) for MZ, \(z=\frac{1}{2}\) and \(u=\frac{1}{4}\) for DZ pairs:

\[ \begin{align*} \Sigma_{(Y_1,Y_2)} &= \left( \begin{array}{cc} {\sigma_A^2} & z{\sigma_A^2} \\ z{\sigma_A^2} & {\sigma_A^2} \\ \end{array} \right) + \left( \begin{array}{cc} {\sigma_D^2} & u{\sigma_D^2} \\ u{\sigma_D^2} & {\sigma_D^2} \\ \end{array} \right) \\ &+ \left( \begin{array}{cc} {\sigma_C^2} & {\sigma_C^2} \\ {\sigma_C^2} & {\sigma_C^2} \\ \end{array} \right) + \left( \begin{array}{cc} {\sigma_E^2} & {0} \\ {0} & {\sigma_E^2} \\ \end{array} \right) \end{align*} \]

Biometrics - effects

  • Heritability and Shared environmental effects:

\[ h^2_{Y}=\frac{\sigma_{A}^2+\sigma_{D}^2}{\sigma_{A}^2+\sigma_{D}^2+ \sigma_{C}^2+ \sigma_{E}^2} \]

\[ c^2_{Y}=\frac{\sigma_{C}^2}{\sigma_{A}^2+\sigma_{D}^2+ \sigma_{C}^2+ \sigma_{E}^2} \]

Biometric analysis - polygenic model

Main assumptions

  • Equal environments assumption for MZ and DZ twins.
  • No gene-environment interaction and correlation.
  • No gene-gene interaction (link: epistasis).
  • Equal mean and variance of twin 1 and twin 2, MZ and DZ.
  • Estimation and inference by maximum likelihood principle.

Biometric analysis - polygenic model

  • Only three of four variance-components are estimable in same model since only three equations in model.
  • Submodels of the ADCE: ACE, ADE, AE, CE and E are compared.
  • Parsimony principle for model selection: Best fit to data of least complexity

Biometric analysis - polygenic model

The polygenic models may be represented in terms of a path diagram, the ACE below:

ACE path diagram representation

Biometric analyses - dichotomous outcome

  • Above for continuous traits, how about binary traits?
  • Common approaches: risk scale or liability scale modelling.
  • We consider the classic liability threshold model (aka probit model).
  • Risk scale in Session III on time to event data.

Biometric analyses - liability threshold model

  • A liability scale of outcome is constructed.
  • Within pair dependence measure by tetrachoric correlation
  • Tetrachorics does not depend on prevalence of trait.
  • Biometric ADCE twin model of liability to outcome

Biometric analyses - liability threshold model

New trait: Liability to state “Yes”

\[ Y_i = \left\{ \begin{array}{rl} \textrm{disease yes} & \text{if } Z_i>\tau\\ \textrm{disease no} & \text{if } Z_i\le\tau \end{array} \right. \]

  • \(Z\) is a continuous random variable with standard normal distribution.
  • \(Z\) is not fully observed.
  • \(\tau\) is named the threshold of liability \(Z\).
  • Many small additive effects on the liability to disease corresponds via the normality assumption to multiplicative effects on the risk of disease.

Biometric analyses - liability threshold model

Biometric analyses - liability threshold model

Bivariate standard normality of liabilities

\[ (Z_1,Z_2)\sim\textrm{MvN}\{(0,\left( \begin{array}{cc} 1 & \rho \ \\ \rho & 1 \\ \end{array} \right) )\} \] Tetrachoric correlation of categorical variables is by definition the usual correlation in liabilities to outcomes, \(\rho\). Thresholds and tetrachorics are estimated from the liability-threshold model.

Biometric analyses - liability threshold model

Biometric analyses - liability threshold model

Biometric analyses - liability threshold ADCE model

The biometric ADCE model can now be applied to the continuous liability to outome:

\[Z_i=A_i+D_i+C_i+E_i\] providing heritability and common environmental effects:

\[ h^2_{Z}=\sigma_{A}^2+\sigma_{D}^2, \qquad c^2_{Z}=\sigma_{C}^2 \]

Biometrics - conclusion

  • The within pair intraclass correlation \(\rho\) measured in MZ and DZ twin pairs determines the biometric ADCE model for a continuous outcome
  • If dichotomous outcome, a latent liability scale can be constructed which is continous and the biometric ADCE model applies.

Results - continuous outcome

Synopsis

We illustrate the classic biometric modelling in application to BMI data in twins and then to the dichotoumous twin data of stuttering.

Case study: Body Mass Index

  • Body Mass Index defined as weight (\(kg\)) per squared height (\(m^2\)) is a complex trait related to several health related factors (eg.~obesity, diabetes, aging).
  • The index may allow for comparison among individuals with different height, but is not regarded invariant to sex.
  • Estimated heritability of 0.60-0.70 in BMI has been reported from with the remaining 30-40% due to a unique environmental variance component.
  • The genetic influence is a complex action of several genes. Some genetic variants have been identified.
  • The interplay with environmental factors is under investigation.

Biometric twin study of BMI

  • Correlation: within pair dependence to be compared for MZ and DZ pairs
  • The polygenic model allows for modelling type and magnitude of genetic influence on BMI by decomposing the variance in BMI into genetic and environmental components.

Biometric twin study of BMI - Finnish data

Biometric twin study of BMI

##   tvparnr  bmi  age gender zyg id num
## 1       1 26.3 57.5   male  DZ  1   1
## 2       1 25.5 57.5   male  DZ  1   2
## 3       2 28.7 56.6   male  MZ  2   1
## 5       3 28.4 57.7   male  DZ  3   1
## 7       4 27.3 53.7   male  DZ  4   1
## 8       4 28.1 53.7   male  DZ  4   2

Biometric twin study of BMI

Biometric twin study of BMI

Regression model of BMI on sex and (non-linear) age.

l1 <- lm(bmi ~ gender*ns(age,3), data=twinbmi)
estimate(l1, id=twinbmi$tvparnr)  # Robust standard errors 
##                        Estimate Std.Err    2.5%   97.5%   P-value
## (Intercept)             22.2413  0.1642 21.9194 22.5632 0.000e+00
## gendermale               1.4538  0.2449  0.9738  1.9338 2.916e-09
## ns(age, 3)1              2.3792  0.2376  1.9134  2.8450 1.358e-23
## ns(age, 3)2              4.0896  0.4180  3.2704  4.9089 1.312e-22
## ns(age, 3)3              3.2738  0.2310  2.8212  3.7265 1.309e-45
## gendermale:ns(age, 3)1  -0.5309  0.3278 -1.1733  0.1116 1.053e-01
## gendermale:ns(age, 3)2  -0.3136  0.6111 -1.5113  0.8842 6.079e-01
## gendermale:ns(age, 3)3  -1.4572  0.3126 -2.0698 -0.8446 3.128e-06

-providing curves above and suggests sex and age dependence.

Familial influence on logarithm of BMI

Zygosity Number of pairs Correlation (95% CI)
MZ pairs 1483 ? (?,?)
DZ pairs 2788 ? (?,?)

-we aim for the intraclass correlation in log of BMI for MZ and DZ pairs

Familial influence on logarithm of BMI

Analysis implemented in R script ‘twinbmi22.R’ in supplements.

Familial influence on logarithm of BMI

We begin by fitting the saturated bivariate normal model.

lnbmi.sat <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="sat")
lnbmi.sat
## ____________________________________________________
## Group 1: MZ (n=1483)
##                               Estimate Std. Error  Z value Pr(>|z|)
## Regressions:                                                       
##    logbmi.1~age.1                0.006      0.001   10.221   <1e-12
##    logbmi.1~gendermale.1         0.165      0.042    3.939  8.2e-05
##    logbmi.1~age:gendermale.1    -0.002      0.001   -2.678   0.0074
##     logbmi.2~age.1               0.006      0.001   11.069   <1e-12
##     logbmi.2~gendermale.1        0.186      0.041    4.592  4.4e-06
##     logbmi.2~age:gendermale.1   -0.003      0.001   -3.064   0.0022
## Intercepts:                                                        
##    logbmi.1                      2.889      0.026  110.667   <1e-12
##    logbmi.2                      2.871      0.025  113.421   <1e-12
## Additional Parameters:                                             
##    log(var(MZ)).1               -4.009      0.037 -108.912   <1e-12
##    log(var(MZ)).2               -4.071      0.037 -110.588   <1e-12
##    atanh(rhoMZ)                  0.770      0.026   29.493   <1e-12
## ____________________________________________________
## Group 2: DZ (n=2788)
##                               Estimate Std. Error  Z value Pr(>|z|)
## Regressions:                                                       
##    logbmi.1~age.1                0.006      0.000   13.117   <1e-12
##    logbmi.1~gendermale.1         0.155      0.030    5.170  2.3e-07
##    logbmi.1~age:gendermale.1    -0.002      0.001   -3.007   0.0026
##     logbmi.2~age.1               0.006      0.000   13.270   <1e-12
##     logbmi.2~gendermale.1        0.168      0.030    5.609    2e-08
##     logbmi.2~age:gendermale.1   -0.003      0.001   -3.865  0.00011
## Intercepts:                                                        
##    logbmi.1                      2.914      0.019  149.928   <1e-12
##    logbmi.2                      2.914      0.019  150.096   <1e-12
## Additional Parameters:                                             
##    log(var(DZ)).1               -4.024      0.027 -149.867   <1e-12
##    log(var(DZ)).2               -4.026      0.027 -149.946   <1e-12
##    atanh(rhoDZ)                  0.314      0.019   16.574   <1e-12
## 
##                        Estimate 2.5%  97.5%
## Correlation within MZ: 0.647    0.616 0.675
## Correlation within DZ: 0.304    0.270 0.337
## 
## 'log Lik.' 5629 (df=22)
## AIC: -11214 
## BIC: -11074

Familial influence on logarithm of BMI

## Equal regressions, intercepts and residual variances for twin 1 and twin 2.
lnbmi.flex <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi,
            type="flex")
lnbmi.flex
## ____________________________________________________
## Group 1: MZ (n=1483)
##                              Estimate Std. Error  Z value Pr(>|z|)
## Regressions:                                                      
##    logbmi.1~age.1               0.006      0.001   11.796   <1e-12
##    logbmi.1~gendermale.1        0.175      0.037    4.695  2.7e-06
##    logbmi.1~age:gendermale.1   -0.003      0.001   -3.161   0.0016
## Intercepts:                                                       
##    logbmi.1                     2.880      0.023  124.195   <1e-12
## Additional Parameters:                                            
##    log(var(MZ))                -4.039      0.031 -130.319   <1e-12
##    atanh(rhoMZ)                 0.768      0.026   29.532   <1e-12
## ____________________________________________________
## Group 2: DZ (n=2788)
##                              Estimate Std. Error  Z value Pr(>|z|)
## Regressions:                                                      
##    logbmi.1~age.1               0.006      0.000   16.336   <1e-12
##    logbmi.1~gendermale.1        0.162      0.024    6.674  2.5e-11
##    logbmi.1~age:gendermale.1   -0.002      0.001   -4.255  2.1e-05
## Intercepts:                                                       
##    logbmi.1                     2.914      0.016  185.745   <1e-12
## Additional Parameters:                                            
##    log(var(DZ))                -4.024      0.020 -202.753   <1e-12
##    atanh(rhoDZ)                 0.313      0.019   16.495   <1e-12
## 
##                        Estimate 2.5%  97.5%
## Correlation within MZ: 0.646    0.615 0.674
## Correlation within DZ: 0.303    0.269 0.337
## 
## 'log Lik.' 5623 (df=12)
## AIC: -11223 
## BIC: -11146

Familial influence on logarithm of BMI

#comparison with saturated model
compare(lnbmi.sat,lnbmi.flex)
## 
##  - Likelihood ratio test -
## 
## data:  
## chisq = 12, df = 10, p-value = 0.3
## sample estimates:
## log likelihood (model 1) log likelihood (model 2) 
##                     5629                     5623
#no significant worsening by imposing constraints.
  • -equal mean and variance for twin 1 and 2 is assumed, ok.

Familial influence on logarithm of BMI

## Further, equal for MZ and DZ twins as well.  
lnbmi.u <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg", data=twinbmi, type="u",  control=list(refit=TRUE))
lnbmi.u
## ____________________________________________________
## Group 1: MZ (n=1483)
##                              Estimate Std. Error  Z value Pr(>|z|)
## Regressions:                                                      
##    logbmi.1~age.1               0.006      0.000   20.151   <1e-12
##    logbmi.1~gendermale.1        0.166      0.020    8.156   <1e-12
##    logbmi.1~age:gendermale.1   -0.002      0.000   -5.278  1.3e-07
## Intercepts:                                                       
##    logbmi.1                     2.903      0.013  222.926   <1e-12
## Additional Parameters:                                            
##    log(var)                    -4.026      0.017 -240.678   <1e-12
##    atanh(rhoMZ)                 0.775      0.023   33.521   <1e-12
## ____________________________________________________
## Group 2: DZ (n=2788)
##                              Estimate Std. Error  Z value Pr(>|z|)
## Regressions:                                                      
##    logbmi.1~age.1               0.006      0.000   20.151   <1e-12
##    logbmi.1~gendermale.1        0.166      0.020    8.156   <1e-12
##    logbmi.1~age:gendermale.1   -0.002      0.000   -5.278  1.3e-07
## Intercepts:                                                       
##    logbmi.1                     2.903      0.013  222.926   <1e-12
## Additional Parameters:                                            
##    log(var)                    -4.026      0.017 -240.678   <1e-12
##    atanh(rhoDZ)                 0.313      0.019   16.747   <1e-12
## 
##                        Estimate 2.5%  97.5%
## Correlation within MZ: 0.650    0.623 0.675
## Correlation within DZ: 0.303    0.270 0.336
## 
## 'log Lik.' 5614 (df=7)
## AIC: -11215 
## BIC: -11170

Familial influence on logarithm of BMI

compare(lnbmi.u,lnbmi.flex)
## 
##  - Likelihood ratio test -
## 
## data:  
## chisq = 18, df = 5, p-value = 0.003
## sample estimates:
## log likelihood (model 1) log likelihood (model 2) 
##                     5614                     5623
# Not quite. But natural assumption and we go on.
# Hence intraclass correlations of log(BMI) adjusted for age and sex 
# for MZ and DZ pairs are obtained from 'lnbmi.u'.

Familial influence on logarithm of BMI

Saturated submodels - model selection

Submodel Likelihood df \(-2\Delta\textrm{X}^2\) \(\Delta\textrm{df}\) p AIC
Saturated 5629.137 22 -11214.27
“equal 1 and 2” 5623.369 12 11.537 10 0.3172 -11222.74
“equal MZ and DZ” 5614.387 7 17.962 5 0.002994 -11214.77
  • Saturated model: No constraints on mean and variance.
  • Lowest AIC for “equal 1 and 2” (“lnbmi.flex”) but same for saturated and simplest models.
  • We insist on natural assumptions (“equal 1 and 2 for MZ and DZ, lnbmi.u”) although data may not greatly support these.

Familial influence on logarithm of BMI

# A formal test of genetic effects can be obtained by comparing the MZ and DZ correlation:
estimate(lnbmi.u,contr(6:7,7))
##                           Estimate Std.Err   2.5%  97.5%   P-value
## [atanh(rhoMZ)@1] - [a....   0.4622 0.03848 0.3867 0.5376 3.114e-33
## 
##  Null Hypothesis: 
##   [atanh(rhoMZ)@1] - [atanh(rhoDZ)@2] = 0
  • Evidence for genetic influence (why?)

Familial influence on logarithm of BMI

Zygosity Number of pairs Correlation (95% CI)
MZ pairs 1483 0.65 (0.62,0.68)
DZ pairs 2788 0.30 (0.27,0.34)

The within pair intraclass correlation is the amount of variance between pairs of the total variance in the trait.

Biometric modelling of logarithm of BMI

  • What type and magnitudes of familial influence?

Biometric modelling of logarithm of BMI

Main assumptions

  • Equal environments assumption for MZ and DZ twins.
  • No gene-environment interaction and correlation.
  • No gene-gene interaction (link: epistasis).
  • Equal mean and variance of twin 1 and twin 2, MZ and DZ.

Biometric modelling of logarithm of BMI

  • Submodels of the ADCE polygenic model: ACE, ADE, AE, CE and E are compared.
  • Estimation and inference by maximum likelihood principle.
  • Parsimony principle for model selection: Best fit to data of least complexity

Biometric modelling of logarithm of BMI

lnbmi.ace <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="ace", control=list(start=coef(lnbmi.sat)))
lnbmi.ace
##                        Estimate Std. Error Z value Pr(>|z|)
## logbmi                 2.90e+00   1.31e-02  221.76  < 2e-16
## sd(A)                  1.07e-01   1.66e-03   64.60  < 2e-16
## sd(C)                  6.10e-09   1.59e-02    0.00        1
## sd(E)                  7.97e-02   1.35e-03   58.92  < 2e-16
## logbmi~age             5.83e-03   2.91e-04   20.05  < 2e-16
## logbmi~gendermale      1.66e-01   2.05e-02    8.11  5.0e-16
## logbmi~age:gendermale -2.36e-03   4.49e-04   -5.25  1.5e-07
## 
## MZ-pairs DZ-pairs 
##     1483     2788 
## 
## Variance decomposition:
##   Estimate 2.5%  97.5%
## A 0.645    0.619 0.671
## C 0.000    0.000 0.000
## E 0.355    0.329 0.381
## 
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.645    0.619 0.671
## 
##                        Estimate 2.5%  97.5%
## Correlation within MZ: 0.645    0.618 0.670
## Correlation within DZ: 0.322    0.310 0.335
## 
## 'log Lik.' 5614 (df=7)
## AIC: -11213 
## BIC: -11169

Biometric modelling of logarithm of BMI

##           df    AIC
## lnbmi.u    7 -11215
## lnbmi.ace  7 -11213

-corresponds in fit to the selected correlation model.

Biometric modelling of logarithm of BMI

lnbmi.ade <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="ade", control=list(refit=TRUE))
lnbmi.ade  
##                        Estimate Std. Error Z value Pr(>|z|)
## logbmi                 2.902614   0.013020  222.93  < 2e-16
## sd(A)                  0.100269   0.006090   16.46  < 2e-16
## sd(D)                  0.039375   0.015531    2.54    0.011
## sd(E)                  0.079048   0.001428   55.34  < 2e-16
## logbmi~age             0.005833   0.000289   20.15  < 2e-16
## logbmi~gendermale      0.166060   0.020360    8.16  3.5e-16
## logbmi~age:gendermale -0.002355   0.000446   -5.28  1.3e-07
## 
## MZ-pairs DZ-pairs 
##     1483     2788 
## 
## Variance decomposition:
##   Estimate 2.5%   97.5% 
## A  0.563    0.434  0.693
## D  0.087   -0.048  0.221
## E  0.350    0.324  0.376
## 
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.650    0.624 0.676
## 
##                        Estimate 2.5%  97.5%
## Correlation within MZ: 0.650    0.623 0.675
## Correlation within DZ: 0.303    0.271 0.335
## 
## 'log Lik.' 5614 (df=7)
## AIC: -11215 
## BIC: -11170

Biometric modelling of logarithm of BMI

#comparison of non-nested models
AIC(lnbmi.ace,lnbmi.ade)
##           df    AIC
## lnbmi.ace  7 -11213
## lnbmi.ade  7 -11215
#lower (=preferable) Akaike information index for ADE

Biometric modelling of logarithm of BMI

lnbmi.ae <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="ae",control=list(refit=TRUE))
lnbmi.ae  
##                        Estimate Std. Error Z value Pr(>|z|)
## logbmi                 2.902481   0.013088  221.76  < 2e-16
## sd(A)                  0.107432   0.001663   64.60  < 2e-16
## sd(E)                  0.079726   0.001353   58.92  < 2e-16
## logbmi~age             0.005834   0.000291   20.05  < 2e-16
## logbmi~gendermale      0.166111   0.020481    8.11  5.0e-16
## logbmi~age:gendermale -0.002356   0.000449   -5.25  1.5e-07
## 
## MZ-pairs DZ-pairs 
##     1483     2788 
## 
## Variance decomposition:
##   Estimate 2.5%  97.5%
## A 0.645    0.619 0.671
## E 0.355    0.329 0.381
## 
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.645    0.619 0.671
## 
##                        Estimate 2.5%  97.5%
## Correlation within MZ: 0.645    0.618 0.670
## Correlation within DZ: 0.322    0.310 0.335
## 
## 'log Lik.' 5614 (df=6)
## AIC: -11215 
## BIC: -11177

Biometric modelling of logarithm of BMI

compare(lnbmi.ade,lnbmi.ae)
## 
##  - Likelihood ratio test -
## 
## data:  
## chisq = 2, df = 1, p-value = 0.2
## sample estimates:
## log likelihood (model 1) log likelihood (model 2) 
##                     5614                     5614
  • less evidence for dominant genetic effect \(D\).

Biometric modelling of logarithm of BMI

lnbmi.ce <- twinlm(logbmi~age*gender, id="tvparnr", DZ="DZ", zyg="zyg",data=twinbmi, type="ce",control=list(method="NR"))
lnbmi.ce  
##                        Estimate Std. Error Z value Pr(>|z|)
## logbmi                 2.900785   0.013069  221.96  < 2e-16
## sd(C)                  0.086848   0.001712   50.72  < 2e-16
## sd(E)                  0.101487   0.001099   92.33  < 2e-16
## logbmi~age             0.005856   0.000291   20.15  < 2e-16
## logbmi~gendermale      0.166762   0.020482    8.14  3.9e-16
## logbmi~age:gendermale -0.002372   0.000449   -5.28  1.3e-07
## 
## MZ-pairs DZ-pairs 
##     1483     2788 
## 
## Variance decomposition:
##   Estimate 2.5%  97.5%
## C 0.423    0.398 0.447
## E 0.577    0.553 0.602
## 
## 
##                          Estimate 2.5% 97.5%
## Broad-sense heritability 0        0    0    
## 
##                        Estimate 2.5%  97.5%
## Correlation within MZ: 0.423    0.398 0.447
## Correlation within DZ: 0.423    0.398 0.447
## 
## 'log Lik.' 5496 (df=6)
## AIC: -10979 
## BIC: -10941

Biometric modelling of logarithm of BMI

AIC(lnbmi.ae,lnbmi.ce)
##          df    AIC
## lnbmi.ae  6 -11215
## lnbmi.ce  6 -10979
# favour for AE

Biometric modelling of logarithm of BMI

compare(lnbmi.ace,lnbmi.ce)
## 
##  - Likelihood ratio test -
## 
## data:  
## chisq = 236, df = 1, p-value <2e-16
## sample estimates:
## log likelihood (model 1) log likelihood (model 2) 
##                     5614                     5496
#not good at all.
#We report the AE model.

Biometric modelling of logarithm of BMI

AIC(lnbmi.ace,lnbmi.ade,lnbmi.ae,lnbmi.ce,lnbmi.e)
##           df    AIC
## lnbmi.ace  7 -11213
## lnbmi.ade  7 -11215
## lnbmi.ae   6 -11215
## lnbmi.ce   6 -10979
## lnbmi.e    5 -10141

Biometric modelling of logarithm of BMI

Models ‘Likelihood’ df \(-2\Delta\textrm{X}^2\) \(\Delta\textrm{df}\) p AIC
Saturated 5629.13 22 -11214.27
ACE 5613.62 7 -11213.25
ADE 5614.39 7 -11214.77
AE (*) 5613.62 6 1.527 1 \(0.22\) -11215.25
CE 5495.68 6 235.9 1 \(<0.0001\) -10979.37

Remarks below

Biometric modelling of logarithm of BMI

  • The additive genetic effect A is significant in all models
  • The ADE model is more parsimonious than the ACE model in terms of Akaike’s criterion having lowest AIC value (given by \(-2\ln(\textrm{Likelihood})-2\textrm{df}\)).
  • The AE model is the most parsimonious model
  • The ADE versus AE p-value of \(0.22\) is too conservative and can be halved (Dominicus et al, 2006).
  • The C component in the ACE model vanishes at zero, otherwise we should report it.

Biometric modelling of logarithm of BMI

Zygosity Number of pairs Correlation (95% CI) Heritability (95% CI)
MZ pairs 1483 0.65 (0.62,0.68) 0.64 (0.62,0.67)
DZ pairs 2788 0.30 (0.27,0.34) AE model

Biometric modelling of continuous outcome

  • Mean structure by regression on covariates
  • Variance covariance structure for within family dependence.
  • Biometric model of within pair correlation, magnitude and type of influence.
  • Full analysis by R packages ‘mets’, ‘mglm4twin’ or ‘OpenMx’

Practicals - Use R

  • Try to estimate within pair correlations constrained to same mean and variance for twin 1 and 2, MZ and DZ.
  • Carry on estimating the ACE model and its submodels (AE, CE and E).
  • Try to estimate the ADE model as well and compare it to ACE using the parsimony principle.
  • Which model and hence which results to report?
  • The R ‘mets’ or ‘OpenMx’ packages may be used as shown in scripts.

Results - dichotomous outcome

Synopsis

We illustrate the classic biometric modelling in application to the dichotoumous twin data of stuttering.

Biometric modelling of dichotomous outcome

  • Prevalence structure by probit regression on covariates
  • Variance covariance structure for within family dependence in liability.
  • Full biometric analysis by R packages ‘mets’ and ‘OpenMx’

Biometric study of stuttering

Case study: Genetic influence on Stuttering

  • Part of study on specific language impairment and fluency disorders: Dysphasia, stuttering and cluttering
  • Population based cohort of 46.418 twins, born 1931 to 1982.

The questions relevant to communication disorders:

    1. Did you as a child have periods with middle ear infections (Otitis Media) or pain in your ears?
    1. Do you have or did you have a hearing problem?
    1. Do you have a hearing aid?
    1. Do you have Meniere’s disease?
    1. Do you have tinnitus (buzzing in your ears)?
    1. Did you and your co-twin use an interpersonal language, which other people did not understood?
    1. Did you have any speech disorders in your childhood?

The questions relevant to communication disorders:

    1. Do you stutter or have you stuttered?
    1. Do you speak too fast or did you do so, stumbling over the words and omitted syllables (cluttering)?
    1. Do you have any speech disorders as an adult?
    1. Have you had difficulties talking or understanding speech because of a brain stroke or other kind of brain damage?
    1. Reading: Did you ever have difficulties having enough time for reading Danish subtitles in foreign TV-programs and movies?

Digression on background

  • Both stuttering and cluttering have been regarded as disorders of speech motor control (Kent, 2000) and language factors may be important in both disorders (Guitar, 2006).
  • disorders are explained as different maturation of the brain (Kent, 2000 and Ors 2003).
  • Estimated heritability of 0.70 in stuttering has been reported (Andrew 1991, Felsenfeld 2000, Oliver 2007) with the remaining 30% due to a unique environmental variance component.

Digression on background

  • stuttering has high heritability and little shared environment effect in early childhood and for recovered and persistent groups of children, by age 7 (Dworzynski 2007).
  • Gender differences have been found in people who stutter; linkage on chromosome 7 for men, and chromosome 21 for women (Suresh et al., 2006).

Biometric study of stuttering - data

  • Danish twin omnibus survey 2002. Language, speech, hearing, reading and communication disorders
  • A cohort of 46.418 twins. Response rate is 76%
Zyg and sex reported answer positive answer prevalence
MZ females 4947 187 0.0378
DZ females 6674 245 0.0367
MZ males 3936 325 0.0826
DZ males 6005 498 0.0829
  • Opposite sexed DZ twins, \(11755\) of whom \(643\) answered positive, will be treated later.

Biometric study of stuttering

library(mets)
data(twinstut)
twinstut$binstut <- 1*(twinstut$stutter=="yes")
head(twinstut)
##   tvparnr zyg stutter    sex age nr binstut
## 1       1  mz      no female  71  1       0
## 2       1  mz      no female  71  2       0
## 3       2  dz      no female  71  1       0
## 4       3  os      no   male  71  1       0
## 6       4  os      no   male  71  2       0
## 8       5  mz      no female  71  1       0

Biometric study of stuttering

twinstutwide <- fast.reshape(twinstut, id="tvparnr",varying=c("stutter","binstut"))
#head(twinstutwide)

with(twinstutwide[twinstutwide$zyg=="mz",], table(stutter1,stutter2))
##         stutter2
## stutter1   no  yes
##      no  2992   85
##      yes   88   90
with(twinstutwide[twinstutwide$zyg=="dz",], table(stutter1,stutter2))
##         stutter2
## stutter1   no  yes
##      no  3640  204
##      yes  193   21

Biometric study of stuttering

  • Let’s aim for estimating the within pair dependence.
  • The tetrachorics relates to the Biometric ADCE model.
  • The saturated model do not converge to a solution
    • we bypass starting with flex model assumming equal threshold (prevalence) in twin 1 and twin 2.

Biometric study of stuttering

stut.flex <- bptwin(binstut~sex,data=twinstut,id="tvparnr",zyg="zyg",DZ="dz",
                   type="flex", control=list(method="NR"))
#score(stut.flex)
summary(stut.flex)
## 
##                Estimate Std.Err      Z p-value    
## (Intercept) MZ  -1.8609  0.0243 -76.72  <2e-16 ***
## sexmale MZ       0.4885  0.0306  15.96  <2e-16 ***
## (Intercept) DZ  -1.8035  0.0296 -60.99  <2e-16 ***
## sexmale DZ       0.4113  0.0379  10.85  <2e-16 ***
## atanh(rho) MZ    0.6868  0.0485  14.17  <2e-16 ***
## atanh(rho) DZ    0.1325  0.0624   2.12   0.034 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  20383/12511 6762/4058           
## 
##                            Estimate 2.5%  97.5%
## Tetrachoric correlation MZ 0.596    0.531 0.654
## Tetrachoric correlation DZ 0.132    0.010 0.249
## 
## MZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.008    0.007 0.010
## Casewise Concordance 0.265    0.222 0.312
## Marginal             0.031    0.028 0.035
## Rel.Recur.Risk       8.430    6.910 9.949
## log(OR)              2.690    2.398 2.982
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.002    0.001 0.004
## Casewise Concordance 0.064    0.038 0.103
## Marginal             0.036    0.031 0.040
## Rel.Recur.Risk       1.784    0.927 2.640
## log(OR)              0.639    0.092 1.185
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.928    0.659 1.198

Biometric study of stuttering

Note we estimate from the bivariate probit model:

  • -the tetrachoric correlations in MZ and DZ
  • -prevalence (marginal) is estimated
  • -casewise concordance, the risk of stuttering if co-twin is stuttering.
  • We continue assumming equal threshold (prevalence) in twin 1 and twin 2 for MZ and DZ:

Biometric study of stuttering

stut.u <- bptwin(binstut~sex,data=twinstut,id="tvparnr",zyg="zyg",DZ="dz",type="un")
summary(stut.u)
## 
##               Estimate Std.Err      Z p-value    
## (Intercept)    -1.8394  0.0188 -98.09  <2e-16 ***
## sexmale         0.4603  0.0238  19.32  <2e-16 ***
## atanh(rho) MZ   0.6894  0.0486  14.18  <2e-16 ***
## atanh(rho) DZ   0.1339  0.0622   2.15   0.031 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  20383/12511 6762/4058           
## 
##                            Estimate 2.5%  97.5%
## Tetrachoric correlation MZ 0.598    0.533 0.655
## Tetrachoric correlation DZ 0.133    0.012 0.250
## 
## MZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.009    0.007 0.011
## Casewise Concordance 0.270    0.227 0.317
## Marginal             0.033    0.030 0.036
## Rel.Recur.Risk       8.195    6.779 9.612
## log(OR)              2.674    2.384 2.964
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.002    0.001 0.003
## Casewise Concordance 0.060    0.036 0.098
## Marginal             0.033    0.030 0.036
## Rel.Recur.Risk       1.824    0.928 2.720
## log(OR)              0.659    0.104 1.214
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.929    0.660 1.198

Biometric study of stuttering

compare(stut.flex,stut.u)
## 
##  - Likelihood ratio test -
## 
## data:  
## chisq = 3, df = 2, p-value = 0.2
## sample estimates:
## log likelihood (model 1) log likelihood (model 2) 
##                    -6765                    -6766

-looks fine!

Digression: Stratifying by a covariate

stut.sex.u <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz",type="un")
summary(stut.sex.u)
## 
##               Estimate Std.Err      Z p-value    
## (Intercept)    -1.8407  0.0188 -97.70  <2e-16 ***
## atanh(rho) MZ   1.0542  0.1127   9.36  <2e-16 ***
## atanh(rho) DZ   0.1915  0.0957   2.00   0.045 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  11352/6581  1900/2336           
## 
##                            Estimate 2.5%  97.5%
## Tetrachoric correlation MZ 0.783    0.682 0.855
## Tetrachoric correlation DZ 0.189    0.004 0.362
## 
## MZ:
##                      Estimate 2.5%   97.5% 
## Concordance           0.014    0.011  0.019
## Casewise Concordance  0.440    0.343  0.541
## Marginal              0.033    0.030  0.036
## Rel.Recur.Risk       13.401   10.302 16.501
## log(OR)               3.703    3.119  4.286
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.002    0.001 0.005
## Casewise Concordance 0.075    0.037 0.146
## Marginal             0.033    0.030 0.036
## Rel.Recur.Risk       2.279    0.703 3.854
## log(OR)              0.914    0.109 1.719
## 
##                          Estimate 2.5% 97.5%
## Broad-sense heritability   1      NaN  NaN  
## 
## 
##               Estimate Std.Err      Z p-value    
## (Intercept)    -1.3822  0.0153 -90.52  <2e-16 ***
## atanh(rho) MZ   1.1160  0.0964  11.58  <2e-16 ***
## atanh(rho) DZ   0.0934  0.0799   1.17    0.24    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  9031/5930   1355/1722           
## 
##                            Estimate 2.5%   97.5% 
## Tetrachoric correlation MZ  0.806    0.729  0.863
## Tetrachoric correlation DZ  0.093   -0.063  0.245
## 
## MZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.046    0.039 0.054
## Casewise Concordance 0.550    0.475 0.623
## Marginal             0.083    0.079 0.088
## Rel.Recur.Risk       6.596    5.678 7.513
## log(OR)              3.357    2.887 3.827
## DZ:
##                      Estimate 2.5%   97.5% 
## Concordance           0.009    0.006  0.015
## Casewise Concordance  0.112    0.070  0.176
## Marginal              0.083    0.079  0.088
## Rel.Recur.Risk        1.343    0.723  1.964
## log(OR)               0.362   -0.222  0.945
## 
##                          Estimate 2.5% 97.5%
## Broad-sense heritability   1      NaN  NaN

Biometric study of stuttering

  • We now turn to biometric modelling
  • Sex is retained as a stratifying factor.

Biometric study of stuttering

stut.ace <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz", type="ace")
summary(stut.ace)
## 
##             Estimate Std.Err      Z p-value    
## (Intercept)   -3.723   0.366 -10.18 < 2e-16 ***
## log(var(A))    1.128   0.261   4.33 1.5e-05 ***
## log(var(C))  -16.397   0.413 -39.67 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  11352/6581  1900/2336           
## 
##                    Estimate 2.5%  97.5%
## A                  0.756    0.661 0.850
## C                  0.000    0.000 0.000
## E                  0.244    0.150 0.339
## MZ Tetrachoric Cor 0.756    0.645 0.835
## DZ Tetrachoric Cor 0.378    0.330 0.424
## 
## MZ:
##                      Estimate 2.5%   97.5% 
## Concordance           0.013    0.010  0.018
## Casewise Concordance  0.409    0.312  0.513
## Marginal              0.033    0.030  0.036
## Rel.Recur.Risk       12.446    9.315 15.577
## log(OR)               3.519    2.927  4.112
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.005    0.004 0.006
## Casewise Concordance 0.143    0.122 0.167
## Marginal             0.033    0.030 0.036
## Rel.Recur.Risk       4.361    3.707 5.014
## log(OR)              1.719    1.521 1.917
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.756    0.661 0.850
## 
## 
##             Estimate Std.Err      Z p-value    
## (Intercept)   -2.860   0.247 -11.56 < 2e-16 ***
## log(var(A))    1.193   0.226   5.28 1.3e-07 ***
## log(var(C))  -16.763   0.264 -63.41 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  9031/5930   1355/1722           
## 
##                    Estimate 2.5%  97.5%
## A                  0.767    0.688 0.846
## C                  0.000    0.000 0.000
## E                  0.233    0.154 0.312
## MZ Tetrachoric Cor 0.767    0.676 0.835
## DZ Tetrachoric Cor 0.384    0.343 0.422
## 
## MZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.043    0.036 0.051
## Casewise Concordance 0.510    0.430 0.589
## Marginal             0.084    0.079 0.089
## Rel.Recur.Risk       6.080    5.113 7.047
## log(OR)              3.096    2.608 3.584
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.020    0.017 0.022
## Casewise Concordance 0.235    0.214 0.258
## Marginal             0.084    0.079 0.089
## Rel.Recur.Risk       2.806    2.556 3.057
## log(OR)              1.408    1.265 1.551
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.767    0.688 0.846

Biometric study of stuttering

stut.ade <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz", type="ade")
summary(stut.ade)
## 
##             Estimate Std.Err      Z p-value    
## (Intercept)   -3.952   0.394 -10.04 < 2e-16 ***
## log(var(A))  -12.257  13.579  -0.90    0.37    
## log(var(D))    1.284   0.255   5.04 4.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  11352/6581  1900/2336           
## 
##                    Estimate 2.5%  97.5%
## A                  0.000    0.000 0.000
## D                  0.783    0.698 0.868
## E                  0.217    0.132 0.302
## MZ Tetrachoric Cor 0.783    0.683 0.855
## DZ Tetrachoric Cor 0.196    0.174 0.217
## 
## MZ:
##                      Estimate 2.5%   97.5% 
## Concordance           0.014    0.011  0.019
## Casewise Concordance  0.440    0.344  0.540
## Marginal              0.033    0.030  0.036
## Rel.Recur.Risk       13.389   10.307 16.470
## log(OR)               3.700    3.120  4.280
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.003    0.002 0.003
## Casewise Concordance 0.077    0.069 0.085
## Marginal             0.033    0.030 0.036
## Rel.Recur.Risk       2.336    2.144 2.528
## log(OR)              0.943    0.848 1.037
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.783    0.698 0.868
## 
## 
##             Estimate Std.Err     Z p-value    
## (Intercept)   -3.097   0.270 -11.5 < 2e-16 ***
## log(var(A))  -14.081   0.697 -20.2 < 2e-16 ***
## log(var(D))    1.393   0.218   6.4 1.6e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  9031/5930   1355/1722           
## 
##                    Estimate 2.5%  97.5%
## A                  0.000    0.000 0.000
## D                  0.801    0.733 0.869
## E                  0.199    0.131 0.267
## MZ Tetrachoric Cor 0.801    0.722 0.859
## DZ Tetrachoric Cor 0.200    0.183 0.217
## 
## MZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.046    0.039 0.053
## Casewise Concordance 0.545    0.469 0.619
## Marginal             0.084    0.079 0.088
## Rel.Recur.Risk       6.521    5.595 7.446
## log(OR)              3.320    2.846 3.793
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.013    0.011 0.014
## Casewise Concordance 0.151    0.142 0.161
## Marginal             0.084    0.079 0.088
## Rel.Recur.Risk       1.811    1.727 1.894
## log(OR)              0.754    0.693 0.815
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.801    0.733 0.869

Biometric study of stuttering

AIC(stut.ace,stut.ade)
##          df   AIC
## stut.ace  6 13491
## stut.ade  6 13471

Biometric study of stuttering

stut.ae <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz", type="ae")
summary(stut.ae)
## 
##             Estimate Std.Err      Z p-value    
## (Intercept)   -3.723   0.366 -10.18 < 2e-16 ***
## log(var(A))    1.128   0.261   4.33 1.5e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  11352/6581  1900/2336           
## 
##                    Estimate 2.5%  97.5%
## A                  0.756    0.661 0.850
## E                  0.244    0.150 0.339
## MZ Tetrachoric Cor 0.756    0.645 0.835
## DZ Tetrachoric Cor 0.378    0.330 0.424
## 
## MZ:
##                      Estimate 2.5%   97.5% 
## Concordance           0.013    0.010  0.018
## Casewise Concordance  0.409    0.312  0.513
## Marginal              0.033    0.030  0.036
## Rel.Recur.Risk       12.446    9.315 15.577
## log(OR)               3.519    2.927  4.112
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.005    0.004 0.006
## Casewise Concordance 0.143    0.122 0.167
## Marginal             0.033    0.030 0.036
## Rel.Recur.Risk       4.361    3.707 5.014
## log(OR)              1.719    1.521 1.917
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.756    0.661 0.850
## 
## 
##             Estimate Std.Err      Z p-value    
## (Intercept)   -2.860   0.247 -11.56 < 2e-16 ***
## log(var(A))    1.193   0.226   5.28 1.3e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  9031/5930   1355/1722           
## 
##                    Estimate 2.5%  97.5%
## A                  0.767    0.688 0.846
## E                  0.233    0.154 0.312
## MZ Tetrachoric Cor 0.767    0.676 0.835
## DZ Tetrachoric Cor 0.384    0.343 0.422
## 
## MZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.043    0.036 0.051
## Casewise Concordance 0.510    0.430 0.589
## Marginal             0.084    0.079 0.089
## Rel.Recur.Risk       6.080    5.113 7.047
## log(OR)              3.096    2.608 3.584
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.020    0.017 0.022
## Casewise Concordance 0.235    0.214 0.258
## Marginal             0.084    0.079 0.089
## Rel.Recur.Risk       2.806    2.556 3.057
## log(OR)              1.408    1.265 1.551
## 
##                          Estimate 2.5%  97.5%
## Broad-sense heritability 0.767    0.688 0.846

Biometric study of stuttering

compare(stut.ade,stut.ae)
## 
##  - Likelihood ratio test -
## 
## data:  
## chisq = 19, df = 2, p-value = 6e-05
## sample estimates:
## log likelihood (model 1) log likelihood (model 2) 
##                    -6730                    -6739

Biometric study of stuttering

stut.ce <- bptwin(binstut~strata(sex),data=twinstut,id="tvparnr",zyg="zyg",DZ="dz", type="ce", control=list(method="NR"))
summary(stut.ce)
## 
##             Estimate Std.Err      Z p-value    
## (Intercept)   -2.715   0.141 -19.23  <2e-16 ***
## log(var(C))    0.160   0.192   0.84     0.4    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  11352/6581  1900/2336           
## 
##                    Estimate 2.5%  97.5%
## C                  0.540    0.447 0.633
## E                  0.460    0.367 0.553
## MZ Tetrachoric Cor 0.540    0.440 0.627
## DZ Tetrachoric Cor 0.540    0.440 0.627
## 
## MZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.008    0.006 0.010
## Casewise Concordance 0.230    0.175 0.296
## Marginal             0.033    0.030 0.036
## Rel.Recur.Risk       7.026    5.199 8.853
## log(OR)              2.413    1.999 2.827
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.008    0.006 0.010
## Casewise Concordance 0.230    0.175 0.296
## Marginal             0.033    0.030 0.036
## Rel.Recur.Risk       7.026    5.199 8.853
## log(OR)              2.413    1.999 2.827
## 
##                          Estimate 2.5% 97.5%
## Broad-sense heritability 0        0    0    
## 
## 
##             Estimate Std.Err     Z p-value    
## (Intercept)  -1.9940  0.0858 -23.2  <2e-16 ***
## log(var(C))   0.0814  0.1634   0.5    0.62    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Total MZ/DZ Complete pairs MZ/DZ
##  9031/5930   1355/1722           
## 
##                    Estimate 2.5%  97.5%
## C                  0.520    0.440 0.600
## E                  0.480    0.400 0.560
## MZ Tetrachoric Cor 0.520    0.436 0.596
## DZ Tetrachoric Cor 0.520    0.436 0.596
## 
## MZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.026    0.022 0.032
## Casewise Concordance 0.314    0.264 0.368
## Marginal             0.084    0.079 0.088
## Rel.Recur.Risk       3.749    3.139 4.359
## log(OR)              1.922    1.608 2.236
## DZ:
##                      Estimate 2.5%  97.5%
## Concordance          0.026    0.022 0.032
## Casewise Concordance 0.314    0.264 0.368
## Marginal             0.084    0.079 0.088
## Rel.Recur.Risk       3.749    3.139 4.359
## log(OR)              1.922    1.608 2.236
## 
##                          Estimate 2.5% 97.5%
## Broad-sense heritability 0        0    0

Biometric study of stuttering

AIC(stut.ade,stut.ace,stut.ae,stut.ce)
##          df   AIC
## stut.ade  6 13471
## stut.ace  6 13491
## stut.ae   4 13487
## stut.ce   4 13576

Biometric study of stuttering

  • We report the ADE model in both sexes.
  • -evidence for epistasis? (may be modelled)
  • -sex limitation effects? (modelling DZ OS pairs)

Biometric study of stuttering

Liability threshold model:

Zyg and sex prevalence concordance tetrachorics heritability (95% CI)
MZ females .04 .46 (.36,.56) .79 (.69,.86) .79 (.70,.87)
DZ females .04 .08 (.07,.09) .20 (.18,.22) ADE model
MZ males .08 .54 (.46,.62) .80 (.72,.86) .80 (.73,.87)
DZ males .08 .15 (.14,.16) .20 (.18,.22) ADE model

(Fibiger et al. 2008)

Biometric study of stuttering

  • Analysis script ‘twinstut22.R’
  • If censored data we may extend above (on Day 2).

Discussion

Synopsis

  • The classic twin analysis consists in modelling within pair dependence using the linear correlation measure. This ties with variance components within and between pairs of total variance in trait of study.

  • Such components may be further decomposed into genetic and environmental origin based on natural assumptions from quantitative genetics.

  • We discuss various extensions.

Heuristics of MZ and DZ correlations

  • For instance, the polygenic ACE model relates to correlations via \(\rho_{mz}=h^2+c^2\) and \(\rho_{dz}=\frac{1}{2}h^2+c^2\).
Relation Genetics Environment Examples
\(\rho_{mz}>4\rho_{dz}\) Epistasis albinism
\(\rho_{mz}>2\rho_{dz}\) Genetic dominance D
\(\rho_{mz}=2\rho_{dz}\) Additive effect A (mono- or polygenic) and small D Small C BMI
\(2\rho_{dz}>\rho_{mz}>\rho_{dz}\) Additive genes A Shared environment C longevity
\(\rho_{mz}=\rho_{dz}>0\) No genetic effect C brain cancer
\(\rho_{mz}=\rho_{dz}=0\) No genetic effect No familial aggregation

Discussion

The classic model has different names due to different equivalent representations:

  • The variance components twin model
  • The structural equation twin model , SEM.
  • The mixed effects twin regression model
  • The LaBuda Defries Fulker twin regression model.
  • The multivariate generalized linear models for twin data.
  • (\(\ldots\))

-some implementations works better than others.

Discussion

Statistical software for implementation and estimation having emphasis will be:

We discuss now the SEM representation of the biometric twin model.

Discussion

  • Statistical program for Structural Equation Modelling.
  • The focus in SEM is the basically the covariance matrix, \(\Sigma=\Sigma(\theta)\).
  • Great many statistical methods can be formulated via SEM.
  • Simple linear regression, \(Y=\beta X+\epsilon\), corresponds in SEM to

\[ {\Sigma_Y} = \left( \begin{array}{cc} \beta^2{\sigma_X^2}+{\sigma_{\epsilon}^2} & \beta{\sigma_X^2} \\ \beta{\sigma_X^2} & {\sigma_X^2} \\ \end{array} \right) \]

  • -now, find parameters \(\beta\in\theta\) minimizing the difference between the sample covariance matrix and the one predicted by the model, the right-side.

Discussion

  • A general SEM is of form, \(\eta=B\eta+\Gamma\xi+\zeta\), where \(\eta\) and \(\xi\) denotes endogenous and exogenous variables respectively, \(B\) and \(\Gamma\) are matrices of coefficients and \(\zeta\) denotes errors.
  • -this induces the covariance matrix, \(\Sigma(\theta)\), to be compared with the observed covariance matrix.
  • -indeed the purpose of programs OpenMx, Mx, LISREL, M-Plus and lava.

Discussion - perspectives

  • The biometric polygenic model is general for all family studies.
  • Hence other family data may be included, eg parental data.

Discussion - perspectives

  • More outcomes may be analysed in same model - multivariate analysis (Day 2 of course).
  • Genetic pleiotropy models
  • Gene by Environment interaction models
  • Longitudinal models
  • Sex-limitation models
  • Direction-of-causation models
  • Bayesian statistics models
  • Time-to-event models

Thank You!

References