Quantification in quantitative genetics: from Rasch to Copulas.

Stéphanie van den Berg, University of Twente, The Netherlands

Co-authors:

Jacob von B. Hjelmborg, SDU

Ulrich Halekoh, SDU

Quantitative genetics

Quantitative traits, for instance height, measured in cm.

\[\begin{eqnarray} V_{height} &=& V_G + V_E \\ &=& V_A + V_D + V_E \end{eqnarray}\]

Heritability (narrow sense) is the proportion of variance explained by genetic transmission

\[ h^2 = \frac{V_A}{V_A + V_D+V_E} \]

Variance decomposition

Based on phenotypic correlations between family members

  • parent-offspring

  • siblings

  • half-siblings

  • identical and non-identical twins

Linear mixed models applied to covariance structure

Motivating example 1: BMI \(\frac{\textrm{kg}}{\textrm{m}^2}\)

Motivating example 2: Number of correct answers on an exam

Quantitative traits

  • height (\(\textrm{m}\))
  • bmi (\(\frac{\textrm{kg}}{\textrm{m}^2}\))
  • bloodpressure (\(\textrm{mmHg}\))
  • bloodplatelet count (#platelets per microliter)

Quantitative-ish traits

  • disorder like ADHD (yes/no): assume underlying Gaussian continuity and use tetrachoric correlations
  • behavioural traits like personality, intelligence: no units of measurement.

(Intelligence) test scores are rescaled such that they have a Gaussian distribution

Two issues

  • Quantitative traits that are not Gaussian

  • Traits that we treat as quantitative (and rescale to be Gaussian)

Behavioural traits

Behaviour measured using sum scores

  • I like hanging out with others after school or work, yes/no
  • I usually go out on the weekend, yes/no
  • I enjoy parties, yes/no
  • I prefer being around people, yes/no

Number of yes answers is used as a quantification of extraversion. Same as bloodplatelet counts. But does this qualify as measurement?

Measurement

  • invariant (independent on the questions on the test, independent of instrument, and independent of other persons being tested)

  • at least interval level (not simply ordinal) with a defined unit

Georg Rasch

\[P(Y_{ij} = \textrm{'yes'}) = \frac{1}{1+ \exp(\beta_i - \theta_j)}\]

\[\textrm{logit}(P(Y_{ij})) =\theta_i - \beta_j\]

Distances between persons and items are now on a scale defined by logits.

Analogy: pole vault

Analogy: pole vault

Data from Wanda Diamond League, China Textile City Sports Center - Shanghai/Keqiao (CHN), 3rd May 2025

Analogy: pole vault

Data from Wanda Diamond League, China Textile City Sports Center - Shanghai/Keqiao (CHN), 3rd May 2025

athlete 5.42 5.62 5.72 5.82 5.92 6.01 6.11 6.28
vloon 1 1 1 1 0 NA NA NA
marschall 1 1 1 0 NA NA NA NA
kendricks 0 1 1 0 NA NA NA NA
karalis NA 1 NA 1 1 1 0 NA
sasma NA 0 NA NA NA NA NA NA
li NA 1 NA 0 NA NA NA NA
duplantis NA NA NA NA 1 1 1 0

Rasch analysis

item height_m logit_difficulty
5.42 5.42 -0.7004235
5.62 5.62 -1.6213842
5.82 5.82 0.4079731
5.92 5.92 -0.6815580
6.11 6.11 0.0181823
ranking athlete logit_ability best_successful_attempt
1 duplantis 0.021 6.11
2 karalis 0.015 6.01
3 vloon 0.011 5.82
4 marschall 0.002 5.72
5 li -0.006 5.62
6 sasma -0.021 NA
7 kendricks -0.023 5.72

Intermediate conclusion

Using Rasch modelling, we can check whether behavioural measurements qualify as measurement, and we obtain a fixed unit of measurement.

Importance of scale in genetics

Measurement needs to be invariant with fixed unit, so that:

  • findings regarding non-additive gene action are comparable

  • findings regarding gene-environment interaction are comparable

  • pooling data from different sources is possible

  • the distribution is meaningful

Points of attention

  • during variance decomposition, model uncertainty regarding scale scores by Bayesian hierarchical modelling (Van den Berg, Boomsma & Glas, 2007; Schwabe & Van den Berg, 2014)

  • existing behavioural and diagnostic scales often don’t fit the Rasch model well. Consider omitting misfitting items.

To scale or not to scale?

  • For sum scores it is generally better to try and fix the scale.

  • But sometimes the distributions look very non-Gaussian, also true for non-behavioural traits, such as BMI (\(\frac{\textrm{kg}}{m^2}\)).

  • Scales are always in some sense arbitrary, logit scale included.

To scale or not to scale?

If scale is so crucial for non-additive gene action and gene-environment interaction, why not find a methodology that is invariant to scale?

Dependence

Alternative: ignore the marginal distributions, and focus on the dependency among relatives.

Mutual information

In classic quantitative genetics, Gaussian distributions are assumed, and product-moment correlations are used as sufficient statistics for genetic parameters of interest.

Moving to non-Gaussian world: mutual information can be seen as a sufficient statistic for genetic parameters of interest. For qualitative characters they are scale-free. For quantitative traits, mutual information is invariant to monotonic transformation (scale-free).

Example

Mutual information and Linfoot informational correlation

MI maps from 0 to infinity. Can be mapped onto a Linfoot correlation [0,1].

For quantitative traits:

\[L(X, Y) = \sqrt{1- \exp(-2I(X;Y))}\]

For discrete traits:

\[\begin{equation} L(X, Y)_{discr} = \sqrt{1- \exp \frac{-2I(X;Y)}{1- I(X;Y)/\min(H(X), H(Y))}} \end{equation}\]

Example

Copulas

Parametric forms for dependency, irrespective of scale.

Copulas

Consider random vector of a phenotype in two relatives \((X_1, X_2)\)

Applying the probability integral transform to each component we get marginals \(C_1\) and \(C_2\), \((C_1, C_2)=(F_1(X_1), F_2(X_2))\), that are \(\sim \textrm{Unif}[0,1]\). We get rid of information about scale!

Copula of \((X_1, X_2)\) defined as the joint cumulative distribution function of \((U_1, U_2)\):

\[C(u_1, u_2) = Pr[U_1 \leq u_1, U_2 \leq u_2]\] Only focus on the dependence structure, irrespective of scale.

Copulas and mutual information

A rich family of types of copulas.

For several, there is known relationship between the copula parameters and the mutual information, and therefore Linfoot correlation.

Linfoot correlation can serve as the new sufficient statistic for a new information-based quantitative genetic theory.

Recap:

  • In quantitative genetics, scale is important, otherwise results meaningless and not-comparable across studies.

  • Try to scale using Rasch and use linear models from classic genetics.

  • If not: concentrate only only dependency.

  • For non-Gaussian or unscalable phenotypes, we propose to use mutual information, invariant to scale.

  • Copulas: rich family of dependency structures; often have direct relationship to mutual information.

  • Linfoot correlation can replace Pearson correlation.

Motivating example 1: BMI

Tomorrow: Ulrich Halekoh discusses our neural network Linfoot correlation estimator