Since we are interested in polymorphic sites, like the SNPs that are the ouput of Genotyping By Sequencing (GBS) assays, instead of using the nucleotide bases (a,c,t,g) we will use the following symbols for a haploid genotype:
In the case of a diploid genotype, the symbols will be as follows.
You would notice the convenience that if phasing is not important or unknown, a diploid genotype can be stored as a one-dimensional array obtained by adding the values of the two parental haploid genotypes.
Our basic probabilistic simulator can be now reformulated like this:
sample(c(0,1),10,rep=TRUE,prob=c(0.5,0.5))
## [1] 0 1 0 0 0 0 1 1 1 1
Now, let’s make a generalized simulator that receives the length of the genome to simulate and the whole genome recombination rate and generates two parental haploid genomes and simulates child diploid genomes. A lot of the code, including the main loop is copied from our previous code.
Input:
Output:
SimulateChromatid <- function(num.markers = 10, recomb.prob = 0.3) {
paternal <- sample(c(0, 1), num.markers, rep = TRUE, prob = c(0.5, 0.5))
maternal <- sample(c(0, 1), num.markers, rep = TRUE, prob = c(0.5, 0.5))
recombination.event <- sample(c(TRUE, FALSE), num.markers, prob = c(recomb.prob,
1 - recomb.prob), rep = TRUE)
child <- c()
parent.path <- c()
current.genome <- sample(c("paternal", "maternal"), 1, prob = c(0.5, 0.5))
for (base.index in 1:length(paternal)) {
if (recombination.event[base.index]) {
if (current.genome == "paternal") {
current.genome <- "maternal"
}
else {
current.genome <- "paternal"
}
}
if (current.genome == "paternal") {
child[base.index] <- paternal[base.index]
}
else {
child[base.index] <- maternal[base.index]
}
parent.path[base.index] <- current.genome
}
return(list(maternal = maternal, paternal = paternal, parent.path = parent.path,
child = child))
}
To test the function we just created we can call it without parameters. I will use its defaults, 10 markers and 0.3 probability of recombination.
SimulateChromatid()
## $maternal
## [1] 1 0 0 1 0 0 0 1 0 1
##
## $paternal
## [1] 1 0 0 1 0 1 0 1 1 1
##
## $parent.path
## [1] "maternal" "maternal" "maternal" "maternal" "maternal" "maternal"
## [7] "paternal" "maternal" "paternal" "paternal"
##
## $child
## [1] 1 0 0 1 0 0 0 1 1 1