Setegn plan

Epistasis and Dominance

We agreed to extend the simulation to include Epistasis and Dominance. I now have a clear understanding of the simulation I implement. However, I need to share incase if there is any mistake I have in my simulation strategy. The strategy I am implementing: first, I get the genotype file with the QTL position from XSim output. Second, I simulate the additive , dominance and epistasis effect using the QTLs genotype. In brief the simulation is as follow, the additive effects (a) of QTL were sampled from a univariate normal distribution with null mean and variance one. Dominance factors (δ) were sampled from a univariate normal distribution with mean 1 and standard deviation 0.3. Dominance effects (d) were computed as δ|a|⁠. The epistatic factors (γ) of all pairwise combinations of QTL were sampled from a univariate normal distribution with null mean and variance one. I followed this paper https://hal.archives-ouvertes.fr/hal-02610774/document I will test with different dominance factors ( eg.1. standard normal distribution, Normal distribution with mean 1.2 and variance 0.1) for each of the traits and may be different epistatic factors. I am reading literature on these parameters. However, I mainly focus to include the three common types of epistasis as in this paper https://hal.archives-ouvertes.fr/hal-02610774/document. These are Additive X Additive, Complementary and Interaction.

Additive X Additive

The epistasis coefficient for simulation for the model using Additive by Additive ( locus A and B, with their genotype combination) is given below

Complementary

Interaction

The non-additive genetic variance( epistasis plus dominance variance) will be one of the following three options.

Non-additive genetic variance equals to additive genetic variance ( Broad-sense Heritability is twice narrow-sense Heritability if phenotypic variance is 1).
Non-additive genetic variance equals half of additive genetic variance.
Non-additive genetic variance equals a quarter of additive genetic variance.

Before I implement deep learning, I better to redo the simulation with epistasis and dominance using single trait, multitrait and structural equation modelling. I will choose those scenarios which has lower accuracy in these methods for deep learning. This is because if the accuracy is reasonable good (eg. >0.5), we do not need to implement deep learning.

Results

Milk volume

The broad sense heritability \(H^{2}\) for milk volume is 0.44 and fat percent is 0.60. The narrow sense heritability \(h^{2}\) for milk volume is 0.2 and fat percent is 0.35. The dominant value I used for this simulation is 0.3 for both traits. In this simulation, there were 50 QTLs , in addition to the dominant effect, there was epistasis interaction between each pair of locus. Once we decide to include epistasis, we will reduce the number of loci that are involved in epistasis interaction ( possibly <10). However as starting value, all 50 pairs of loci has epistasis effect. The additive genetic variance for milk volume is 1.117 and for fat percent is 0.1402 and their covariance is -0.265. The dominance variance for milk volume is 1.2 and for fat percent is 0.12, the covariance between milk volume and fat percent is -0.23. Note that in the following plot, the X axix is the correlation between the last generation phenotype with the estimated EBV at different iteration( the first ten iteration followed by the last ten iteration( chain) from MCMC sample)

Fat percent

Fat yield

I Increased the dominant value to 0.5 for both traits and all other factor is similar with the above plot result.

Milk volume

Fat percent

Fat yield

Next plan

Repeat the above simulation with dominant effect only. Eighty percent of this simulation is done. This task is completed and the result is below:

Dominant effect with a dominant value of 0.3 for both traits( No Epistasis)

Milk volume

Fat percent

Fat yield

Dominant effect increased to 0.5 for both traits(No epistasis)

Milk volume

Fat percent

Fat yield

Re do for dominant only with the phenotype value of 13 for milk volume and 5.16 for fat percent. The reason why I didn’t do this right away is to confirm the simulation results with previous findings. To make sure my script is right.
Develop a shiny App( R Shiny + Javascript) that displays the simulation result in a comprehensive way. I already have the skill but I need to re-read and implement. It might take one full week to do it but worth. This is particularity relevant for the paper and for LIC presentation.

My expectation for the result

The Deep learning model will perform even better when I adjust the phenotype value for milk volume=13 and fat percent=5.16, particularly for milk volume because the non-linearity increases proportional to the added phenotypic value.