In class, we saw the the transition matrix for a two allele to two allele transition is:
\[P = \begin{bmatrix} 1 & 0 & 0 \\ \frac{1}{4}& \frac{1}{2} & \frac{1}{4}\\ 0 & 0 & 1 \\ \end{bmatrix}\]
This matrix implies that if you start with two A alleles, you must end with two A alleles. However if you start with one A alleles (and one a allele) that there’s a 1/4 chance that you end up with AA, 1/2 chance of Aa and 1/4 of aa.
We can make an equivalent three allele to four allele transition probability matrix:
\[P = \begin{bmatrix} 1 & 0 & 0 & 0 & 0\\ {(\frac{2}{3})}^4& 4 (\frac{1}{3}){(\frac{2}{3})}^3 & 6{(\frac{1}{3})}^2{(\frac{2}{3})}^2 & 4{(\frac{1}{3})}^3(\frac{2}{3}) & {(\frac{1}{3})}^4\\ {(\frac{1}{3})}^4& 4 (\frac{2}{3}){(\frac{1}{3})}^3 & 6{(\frac{2}{3})}^2{(\frac{1}{3})}^2 & 4{(\frac{2}{3})}^3(\frac{1}{3}) & {(\frac{2}{3})}^4\\ 0 &0 &0 & 0 & 1 \\ \end{bmatrix}\]
Does this remind you of anything?
Going back to square matrices (with the same population size in consecutive generations), we can write a formula for the \(i^{th}\) for and \(j^{th}\) column of any transition probability matrix for a population of n alleles as:
\[T_{ij} = {{n}\choose{j-1}} \cdot {(\frac{i-1}{n})}^{j-1} \cdot {(\frac{n-i+1}{n})}^{n-j+1}\]
Let’s write this as a function in R that takes as inputs, i, j and n.
Tprob <- function(i, j, n){
choose(n, j-1)*(((i-1)/n)^(j-1))*(((n-i+1)/n)^(n-j+1))
}
Now we can test it by having it return the values in the two by two matrix we created at the beginning of this lab:
for (i in 1:3){
for (j in 1:3){
print(Tprob(i,j,2))
}
}
## [1] 1
## [1] 0
## [1] 0
## [1] 0.25
## [1] 0.5
## [1] 0.25
## [1] 0
## [1] 0
## [1] 1
Now it’s time to think (a bit) bigger. Let’s imagine a population of 10 alleles. Here we’ll need an 11 x 11 transition probability matrix. To fill it, we can use our function:
T <- matrix(0, 11, 11)
for (i in 1:11){
for (j in 1:11){
T[i,j] <- Tprob(i,j,10)
}
}
Now type T into your console to take a look at what you’ve created.
Let’s not go one step further and create a function that creates a square transition probability matrix of any size. Here, we only need one input, n:
Tmatrix <- function(n){
T <- matrix(0, n+1, n+1)
for (i in 1:(n+1)){
for (j in 1:(n+1)){
T[i,j] <- Tprob(i,j,n)
}
}
return(T)
}
We can use this code to quickly create a transition probability matrix for a 20 allele population:
T <- Tmatrix(20)
Try, trying T into your console again to see what you’ve created.
Now, it’s time to simulate random genetic drift as a Markov Chain. Let’s start with a population with 10 A’s and 10 a’s. We’ll make a vector representing this starting numbers (which could have take on 21 possible values, 0 through 20):
initial <- rep(0, 21)
initial[11] <- 1 #the 11th value which is 10 A and 10 a gets all the probability
Next, let’s bring back the function we wrote for Chutes and Ladders that raises a transition probability matrix to a power:
powermat<-function(P,h){
Ph<-P
if(h>1)
{for(k in 2:h){
Ph<-Ph%*%P}}
return(Ph)
}
Finally, let’s look at the distribution of number of A alleles after 5 generations
gen5 <- initial%*%powermat(T,h=5)
plot(0:20, gen5)
and after 10 generations:
gen10 <- initial%*%powermat(T,h=10)
plot(0:20, gen10)
and after 20 generations:
gen20 <- initial%*%powermat(T,h=20)
plot(0:20, gen20)
Fixation happens when all alleles are either A or a and we can use our Markov chain to compute the probability of fixation after any number of generations.
After 10 generations:
gen10 <- initial%*%powermat(T,h=10)
fixation10 <- sum(gen10[c(1,21)])
fixation10
After 20 generations:
gen20 <- initial%*%powermat(T,h=20)
fixation20 <- sum(gen20[c(1,21)])
fixation20
We can also create a vector of fixation probably as a function of the number of generations:
fixation.prob <- rep(NA, 100) # creating an empty vector 200 generations long
for (g in 1:100){
gen <- initial%*%powermat(T,h=g)
fixation.prob[g] <- sum(gen[c(1,21)])
}
fixation.prob
and we can plot these probabilities:
plot(1:100, fixation.prob, main="Total Fixation Probability v. Generations (20 alleles)", xlab="Generation", ylab="Probability",
type="l")
abline(v=20)
abline(h=0.5)
Challenge #1: Create a plot of the fixation probability versus time for a population of 6 alleles (3 A and 3 a)
We can also calculate the probability that fixation in any particular generation by taking the differences in the total probability of fixation between one generation and the next.
gen.fixation.probabilities <- diff(c(0, fixation.prob), lag=1)
plot(1:100, gen.fixation.probabilities, main="Marginal Fixation Probability v. Generations (20 alleles)", type="l")
And we could use these marginal probabilities to calculate the mean fixation time using our favorite little equation:
\[E[X] = \Sigma x \cdot P(x)\]
sum(gen.fixation.probabilities*(1:100))
Challenge #2 Calculation the mean fixation time for a population of 6 alleles (3 A and 3 a).
So far we assumed that our population of alleles is evenly split between two types but what if the split is not even? What if these alleles emerged from a population that was 50% A but the 20 alleles that survived a bottleneck may have had a different distribution of A. We can solve this problem by using an initial vector of probabilities.
initial <- dbinom(0:20, 20, prob=0.5)
for (g in 1:100){
gen <- initial%*%powermat(T,h=g)
fixation.prob[g] <- sum(gen[c(1,21)])
}
fixation.prob
plot(1:100, fixation.prob, main="Total Fixation Probability v. Generations (20 alleles)", xlab="Generation", ylab="Probability",
type="l")
abline(v=20)
abline(h=0.5)
We can also calculate the mean time to fixation as we did before.
gen.fixation.probabilities <- diff(c(0, fixation.prob), lag=1)
sum(gen.fixation.probabilities*(1:100))
Challenge #3 Plot total fixation probability versus time and calculate the mean fixation time for a population of 6 alleles that survived a bottleneck from population that is 50% A.
Finally, let’s try this with an unequal initial ratio of alleles but pay attention to which allele wins out. After 200 generations, the probability is split but two extreme outcomes (all A and all a) but how much probability does each of these two possibilities get?
initial <- rep(0, 21)
initial[6] <- 1 #the 6th value which is 5 A and 15 a gets all the probability
gen200 <- initial%*%powermat(T,h=200)
gen200[c(1,21)]
Challenge #4 Make a general statement about the probability of each of two alleles dominating the population. If a population with allele frequencies p and q passes through a bottleneck and then undergoes random genetic drift, what can we say about the probability of each allele dominating the population?