We agreed to extend the simulation to include Epistasis and Dominance. I now have a clear understanding of the simulation I implement. However, I need to share incase if there is any mistake I have in my simulation strategy. The strategy I am implementing: first, I get the genotype file with the QTL position from XSim output. Second, I simulate the additive , dominance and epistasis effect using the QTLs genotype. In brief the simulation is as follow, the additive effects (a) of QTL were sampled from a univariate normal distribution with null mean and variance one. Dominance factors (δ) were sampled from a univariate normal distribution with mean 1 and standard deviation 0.3. Dominance effects (d) were computed as δ|a|. The epistatic factors (γ) of all pairwise combinations of QTL were sampled from a univariate normal distribution with null mean and variance one. I followed this paper https://hal.archives-ouvertes.fr/hal-02610774/document I will test with different dominance factors ( eg.1. standard normal distribution, Normal distribution with mean 1.2 and variance 0.1) for each of the traits and may be different epistatic factors. I am reading literature on these parameters. However, I mainly focus to include the three common types of epistasis as in this paper https://hal.archives-ouvertes.fr/hal-02610774/document. These are Additive X Additive, Complementary and Interaction.
The epistasis coefficient for simulation for the model using Additive by Additive ( locus A and B, with their genotype combination) is given below
The non-additive genetic variance( epistasis plus dominance variance) will be one of the following three options.
Before I implement deep learning, I better to redo the simulation with epistasis and dominance using single trait, multitrait and structural equation modelling. I will choose those scenarios which has lower accuracy in these methods for deep learning. This is because if the accuracy is reasonable good (eg. >0.5), we do not need to implement deep learning.
The broad sense heritability \(H^{2}\) for milk volume is 0.44 and fat percent is 0.60. The narrow sense heritability \(h^{2}\) for milk volume is 0.2 and fat percent is 0.35. The dominant value I used for this simulation is 0.3 for both traits. In this simulation, there were 50 QTLs , in addition to the dominant effect, there was epistasis interaction between each pair of locus. Once we decide to include epistasis, we will reduce the number of loci that are involved in epistasis interaction ( possibly <10). However as starting value, all 50 pairs of loci has epistasis effect. The additive genetic variance for milk volume is 1.117 and for fat percent is 0.1402 and their covariance is -0.265. The dominance variance for milk volume is 1.2 and for fat percent is 0.12, the covariance between milk volume and fat percent is -0.23. Note that in the following plot, the X axix is the correlation between the last generation phenotype with the estimated EBV at different iteration( the first ten iteration followed by the last ten iteration( chain) from MCMC sample)
I Increased the dominant value to 0.5 for both traits and all other factor is similar with the above plot result.
Redo for dominant only with the phenotype value of 13 for milk volume and 5.16 for fat percent. The reason why I didn’t do this right away is to confirm the simulation results with previous findings. To make sure my script is right.
Develop a shiny App( R Shiny + Javascript) that displays the simulation result in a comprehensive way. I already have the skill but I need to re-read and implement. It might take one full week to do it but worth. This is particularity relevant for the paper and for LIC presentation.
The above simulation result is for purebred population. After step 2 I need to redo for crossbreed for dominant effect only and compare the results with the purebred. I will do the shiny App( R Shiny + Javascript) after I do step 2 and step 4( after finishing the simulation).
I simulated for pure breed and three way cross breed, A mated with B then crossbreed AB mated with C. I simulated additive plus dominant effect. In this simulation, I used the phenotypic value for milk volume 13 and for fat percent 5.16.Thus apart from the crossbreed strategy, this simulation is close to realistic.I will modify the crossbreed strategy once I discussed with you. To compare the effect of cross breeding, I simulated pure breed population with the same parameter. The simulation strategy I implement I can send to you as a separate file.
The deep learning I run is based on one replicate primarily computational issue.The gradient descent may fall on the local minimum and might lead to give low accuracy.
The simulation for crossbred is running. In the mean time, I prefer to finalize the shiny App using R + Javascript. I managed to do for the pure breed. I deployed the shiny App result in the following link https://setegnstat.shinyapps.io/Nonadditive_first_start/. Thus the above result can be easily understood with this APP. The next step will be to deploy the crossbreed result in the App. Furthermore, I am close to finish material and method for the simulation strategy.
The simulation for cross breed is finished.The result is deployed in the following web link. https://setegnstat.shinyapps.io/Deeplearning_Crossbreed_dominant_only/. Note:I simulated with dominant effect only for cross breed population ( “0.30.3 NoEp” in the legend of the plot refers dominant value 0.3 for both traits and the same is true for ” 0.50.5NoEp”). Thus at this time, we have a clear idea, deep learning method fitted milk production traits if these traits are influenced by non-additive genetic effect. The next step is to scale this simulation result. I need to simulate at least with ten replicates( for each scenarios( at least 10) for three traits) and average the estimates of the replicate to reduce the sample to sample variation. More importantly, I can run a t-test to compare the performance of the methods( Deep learning, multi trait or single trait). Thus, we need to discuss where I can submit multiple job in parallels( cluster)
The deep learning method(https://academic.oup.com/g3journal/article/11/10/jkab228/6318779) what I have been running is the customized deep learning method for genomic prediction. It is optimized for both computational speed and prediction accuracy. However, if there is a computational resource, it might be better to investigate with the conventional deep learning methods. I am planning to implement the two most common deep learning methods.