| athlete | 5.42 | 5.62 | 5.72 | 5.82 | 5.92 | 6.01 | 6.11 | 6.28 |
|---|---|---|---|---|---|---|---|---|
| vloon | 1 | 1 | 1 | 1 | 0 | NA | NA | NA |
| marschall | 1 | 1 | 1 | 0 | NA | NA | NA | NA |
| kendricks | 0 | 1 | 1 | 0 | NA | NA | NA | NA |
| karalis | NA | 1 | NA | 1 | 1 | 1 | 0 | NA |
| sasma | NA | 0 | NA | NA | NA | NA | NA | NA |
| li | NA | 1 | NA | 0 | NA | NA | NA | NA |
| duplantis | NA | NA | NA | NA | 1 | 1 | 1 | 0 |
Jacob von B. Hjelmborg, SDU
Ulrich Halekoh, SDU
Quantitative traits, for instance height, measured in cm.
\[\begin{eqnarray} V_{height} &=& V_G + V_E \\ &=& V_A + V_D + V_E \end{eqnarray}\]
Heritability (narrow sense) is the proportion of variance explained by genetic transmission
\[ h^2 = \frac{V_A}{V_A + V_D+V_E} \]
Based on phenotypic correlations between family members
parent-offspring
siblings
half-siblings
identical and non-identical twins
Linear mixed models applied to covariance structure
(Intelligence) test scores are rescaled such that they have a Gaussian distribution
Quantitative traits that are not Gaussian
Traits that we treat as quantitative (and rescale to be Gaussian)
Behaviour measured using sum scores
Number of yes answers is used as a quantification of extraversion. Same as bloodplatelet counts. But does this qualify as measurement?
invariant (independent on the questions on the test, independent of instrument, and independent of other persons being tested)
at least interval level (not simply ordinal) with a defined unit
\[P(Y_{ij} = \textrm{'yes'}) = \frac{1}{1+ \exp(\beta_i - \theta_j)}\]
\[\textrm{logit}(P(Y_{ij})) =\theta_i - \beta_j\]
Distances between persons and items are now on a scale defined by logits.
Data from Wanda Diamond League, China Textile City Sports Center - Shanghai/Keqiao (CHN), 3rd May 2025
Data from Wanda Diamond League, China Textile City Sports Center - Shanghai/Keqiao (CHN), 3rd May 2025
| athlete | 5.42 | 5.62 | 5.72 | 5.82 | 5.92 | 6.01 | 6.11 | 6.28 |
|---|---|---|---|---|---|---|---|---|
| vloon | 1 | 1 | 1 | 1 | 0 | NA | NA | NA |
| marschall | 1 | 1 | 1 | 0 | NA | NA | NA | NA |
| kendricks | 0 | 1 | 1 | 0 | NA | NA | NA | NA |
| karalis | NA | 1 | NA | 1 | 1 | 1 | 0 | NA |
| sasma | NA | 0 | NA | NA | NA | NA | NA | NA |
| li | NA | 1 | NA | 0 | NA | NA | NA | NA |
| duplantis | NA | NA | NA | NA | 1 | 1 | 1 | 0 |
| item | height_m | logit_difficulty |
|---|---|---|
| 5.42 | 5.42 | -0.7004235 |
| 5.62 | 5.62 | -1.6213842 |
| 5.82 | 5.82 | 0.4079731 |
| 5.92 | 5.92 | -0.6815580 |
| 6.11 | 6.11 | 0.0181823 |
| ranking | athlete | logit_ability | best_successful_attempt |
|---|---|---|---|
| 1 | duplantis | 0.021 | 6.11 |
| 2 | karalis | 0.015 | 6.01 |
| 3 | vloon | 0.011 | 5.82 |
| 4 | marschall | 0.002 | 5.72 |
| 5 | li | -0.006 | 5.62 |
| 6 | sasma | -0.021 | NA |
| 7 | kendricks | -0.023 | 5.72 |
Using Rasch modelling, we can check whether behavioural measurements qualify as measurement, and we obtain a fixed unit of measurement.
Measurement needs to be invariant with fixed unit, so that:
findings regarding non-additive gene action are comparable
findings regarding gene-environment interaction are comparable
pooling data from different sources is possible
the distribution is meaningful
during variance decomposition, model uncertainty regarding scale scores by Bayesian hierarchical modelling (Van den Berg, Boomsma & Glas, 2007; Schwabe & Van den Berg, 2014)
existing behavioural and diagnostic scales often don’t fit the Rasch model well. Consider omitting misfitting items.
For sum scores it is generally better to try and fix the scale.
But sometimes the distributions look very non-Gaussian, also true for non-behavioural traits, such as BMI (\(\frac{\textrm{kg}}{m^2}\)).
Scales are always in some sense arbitrary, logit scale included.
If scale is so crucial for non-additive gene action and gene-environment interaction, why not find a methodology that is invariant to scale?
Alternative: ignore the marginal distributions, and focus on the dependency among relatives.
In classic quantitative genetics, Gaussian distributions are assumed, and product-moment correlations are used as sufficient statistics for genetic parameters of interest.
Moving to non-Gaussian world: mutual information can be seen as a sufficient statistic for genetic parameters of interest. For qualitative characters they are scale-free. For quantitative traits, mutual information is invariant to monotonic transformation (scale-free).
MI maps from 0 to infinity. Can be mapped onto a Linfoot correlation [0,1].
For quantitative traits:
\[L(X, Y) = \sqrt{1- \exp(-2I(X;Y))}\]
For discrete traits:
\[\begin{equation} L(X, Y)_{discr} = \sqrt{1- \exp \frac{-2I(X;Y)}{1- I(X;Y)/\min(H(X), H(Y))}} \end{equation}\]
Parametric forms for dependency, irrespective of scale.
Consider random vector of a phenotype in two relatives \((X_1, X_2)\)
Applying the probability integral transform to each component we get marginals \(C_1\) and \(C_2\), \((C_1, C_2)=(F_1(X_1), F_2(X_2))\), that are \(\sim \textrm{Unif}[0,1]\). We get rid of information about scale!
Copula of \((X_1, X_2)\) defined as the joint cumulative distribution function of \((U_1, U_2)\):
\[C(u_1, u_2) = Pr[U_1 \leq u_1, U_2 \leq u_2]\] Only focus on the dependence structure, irrespective of scale.
A rich family of types of copulas.
For several, there is known relationship between the copula parameters and the mutual information, and therefore Linfoot correlation.
Linfoot correlation can serve as the new sufficient statistic for a new information-based quantitative genetic theory.
In quantitative genetics, scale is important, otherwise results meaningless and not-comparable across studies.
Try to scale using Rasch and use linear models from classic genetics.
If not: concentrate only only dependency.
For non-Gaussian or unscalable phenotypes, we propose to use mutual information, invariant to scale.
Copulas: rich family of dependency structures; often have direct relationship to mutual information.
Linfoot correlation can replace Pearson correlation.
Tomorrow: Ulrich Halekoh discusses our neural network Linfoot correlation estimator