The probability of a crossover between two genes depends on the distance between the two locii. One map unit (m.u.) or centiMorgan (cM) is the distance for which the expected value of the number of crossovers is 0.01. This means that there’s roughly but not exactly a 1% chance of crossover – not exactly, because there’s also at least some chance of two or even three crossovers.
To see how this works, think back to how you calculated expected values. Try calculating the expected value of the number of crossovers given the table below and then multiply by 100 to determine the number of map units between the two locii. Remember that you can use R to help you with the calculation.
Number of Crossovers | % chance |
---|---|
0 | 81.9 |
1 | 16.4 |
2 | 1.6 |
3 | 0.1 |
Using the table above, find the expected value in the number of crossovers?
How does the probability of 1 crossover compare to the number of map units separating the locii?
How does the probability of 1 or more crossovers compare to the number of map units separating the locii?
We can use the Poisson distribution to model the number of crossover events. The Poisson distribution is used to model rare events - or more specifically, events that have many opportunities to occur but are quite unlikely to occur in any given opportunity. Mathematically, if you take the binomial distribution with n independent trials each with p probability of success and let n get really big (\(n \to \infty\)) and p get really small (\(p \to 0\)) and the product of n and p is equal to some constant (usually called \(\lambda\)) you have a Poisson distribution.
Starting with the binomial formula and using a little algebra, it’s possible to show that the probability of any number of events is given by the equation:
\[ P(x\ events) = e^{-\lambda} \frac{\lambda^x}{x!} \]
where \(e = 2.71828...\)
The Poisson distribution fits our situation fairly well. The probability of a crossover event between any two base pairs in our DNA strand is quite small, but there are a large number of base pairs between the two locii. The product of the probability of a crossover between any two base pairs and the number of base pairs between two gene locii gives \(\lambda\), the expected value of the number of crossovers. This also happens to be 1/100th of the number of map units. In other words, the number of map units is just \(100 \cdot \lambda\).
Let’s use this idea to calculate the probability of any number of crossovers between two genes that are 40 map units apart. With a “distance” of 40 map units, we need to use a lambda of 0.40.
First, we’ll let x be a vector of values from 0 through 5 (representing possible numbers of crossovers) and then, we’ll run each of these values through the Poisson formula with \(\lambda = 0.4\) to find the probability of each number of crossovers:
x <- 0:5
exp(-0.4)*(0.4^x/factorial(x))
What is the probability of exactly 1 crossover event when genes are 0.4 map units apart?
What is the probability of exactly 2 crossover events when genes are 0.4 map units apart?
We might also wonder about the expected value of the number of crossovers. We can calculate that as follows:
x <- 0:5
probs <- exp(-0.4)*(0.4^x/factorial(x))
sum(x*probs)
A recombination between two genes only occurs when there are an odd number of crossovers between those two genes. Using the Poisson distribution and a little math we can show that the probability of an odd number of crossover events is:
\[P(odd\ number\ crossovers) = \frac{1 - e^{-2\lambda}}{2}\]
or, in terms of map units (m):
\[P(odd\ number\ crossovers) = \frac{1 - e^\frac{-2m}{100}}{2}\]
Let’s use this to write a function in R that gives the probability of recombination given a distance in map units:
recomb.prob <- function(m){
(1-exp(-2*m/100))/2
}
Now, let’s use it!
recomb.prob(1)
recomb.prob(20)
recomb.prob(50)
recomb.prob(100)
recomb.prob(100000)
We could even make a plot of probability of recombination versus distance in map units. We’ll also add a red line showing where the probability of recombination is equal to map units divided by 100.
plot(0:200, recomb.prob(0:200), type="l", xlab="Map Units",
ylab = "Probability of Recombination")
abline(a=0,b=1/100, col="red")
Finally, I created some code to simulate a number of crosses based on the “distances” (in map units) between gene locii. You can paste this into your console and run it, without worrying too much about how it works:
sim.linked.cross <- function(n=1000, distances){
gametes <- matrix(data=NA, nrow=n, ncol=length(distances)+1)
gametes[,1] <- sample(c("A", "a"), n, replace=TRUE)
for(i in 1:length(distances)){
last.dom <- ifelse(gametes[,i]==LETTERS[i],1,0)
dom <- abs(last.dom-rpois(n, lambda=distances[i]/100) %% 2)
gametes[,i+1] <- ifelse(dom, LETTERS[i+1], letters[i+1])
}
gametes <-apply(gametes, 1, paste, collapse="")
table(gametes)
}
Now, let’s use it. First, let’s simulate a cross with 3 genes: A, B and C with distance of 20 map units between A and B and 30 map units between B and C. Here we’re imagining an AaBbCc fruit fly/pea plant/what-have-you descended from two pure-bread parents – one AABBCC and the other aabbcc. Let’s simulate 500 possible gametes:
sim.linked.cross(500, c(20, 30))
## gametes
## abc abC aBc aBC Abc AbC ABc ABC
## 157 52 8 29 37 13 48 156
We could, of course, try this with more genes and different distances:
sim.linked.cross(1000, c(5, 8, 17))
## gametes
## abcd abcD abCd abCD aBcd aBcD aBCd aBCD Abcd AbcD AbCd AbCD ABcd ABcD ABCd
## 404 52 6 21 4 1 3 26 21 2 1 1 27 10 74
## ABCD
## 347