We agreed to extend the simulation to include Epistasis and Dominance. I now have a clear understanding of the simulation I implement. However, I need to share incase if there is any mistake I have in my simulation strategy. The strategy I am implementing: first, I get the genotype file with the QTL position from XSim output. Second, I simulate the additive , dominance and epistasis effect using the QTLs genotype. In brief the simulation is as follow, the additive effects (a) of QTL were sampled from a univariate normal distribution with null mean and variance one. Dominance factors (δ) were sampled from a univariate normal distribution with mean 1 and standard deviation 0.3. Dominance effects (d) were computed as δ|a|. The epistatic factors (γ) of all pairwise combinations of QTL were sampled from a univariate normal distribution with null mean and variance one. I followed this paper https://hal.archives-ouvertes.fr/hal-02610774/document I will test with different dominance factors ( eg.1. standard normal distribution, Normal distribution with mean 1.2 and variance 0.1) for each of the traits and may be different epistatic factors. I am reading literature on these parameters. However, I mainly focus to include the three common types of epistasis as in this paper https://hal.archives-ouvertes.fr/hal-02610774/document. These are Additive X Additive, Complementary and Interaction.
The epistasis coefficient for simulation for the model using Additive by Additive ( locus A and B, with their genotype combination) is given below
The non-additive genetic variance( epistasis plus dominance variance) will be one of the following three options.
Before I implement deep learning, I better to redo the simulation with epistasis and dominance using single trait, multitrait and structural equation modelling. I will choose those scenarios which has lower accuracy in these methods for deep learning. This is because if the accuracy is reasonable good (eg. >0.5), we do not need to implement deep learning.
The broad sense heritability \(H^{2}\) for milk volume is 0.44 and fat percent is 0.60. The narrow sense heritability \(h^{2}\) for milk volume is 0.2 and fat percent is 0.35. The dominant value I used for this simulation is 0.3 for both traits. In this simulation, there were 50 QTLs , in addition to the dominant effect, there was epistasis interaction between each pair of locus. Once we decide to include epistasis, we will reduce the number of loci that are involved in epistasis interaction ( possibly <10). However as starting value, all 50 pairs of loci has epistasis effect. The additive genetic variance for milk volume is 1.117 and for fat percent is 0.1402 and their covariance is -0.265. The dominance variance for milk volume is 1.2 and for fat percent is 0.12, the covariance between milk volume and fat percent is -0.23. Note that in the following plot, the X axix is the correlation between the last generation phenotype with the estimated EBV at different iteration( the first ten iteration followed by the last ten iteration( chain) from MCMC sample)
I Increased the dominant value to 0.5 for both traits and all other factor is similar with the above plot result.
The Deep learning model will perform even better when I adjust the phenotype value for milk volume=13 and fat percent=5.16, particularly for milk volume because the non-linearity increases proportional to the added phenotypic value.