Creating a Transition Probability Matrix

In class, we saw the the transition matrix for a two allele to two allele transition is:

\[P = \begin{bmatrix} 1 & 0 & 0 \\ \frac{1}{4}& \frac{1}{2} & \frac{1}{4}\\ 0 & 0 & 1 \\ \end{bmatrix}\]

This matrix implies that if you start with two A alleles, you must end with two A alleles. However if you start with one A alleles (and one a allele) that there’s a 1/4 chance that you end up with AA, 1/2 chance of Aa and 1/4 of aa.

We can make an equivalent three allele to four allele transition probability matrix:

\[P = \begin{bmatrix} 1 & 0 & 0 & 0 & 0\\ {(\frac{2}{3})}^4& 4 (\frac{1}{3}){(\frac{2}{3})}^3 & 6{(\frac{1}{3})}^2{(\frac{2}{3})}^2 & 4{(\frac{1}{3})}^3(\frac{2}{3}) & {(\frac{1}{3})}^4\\ {(\frac{1}{3})}^4& 4 (\frac{2}{3}){(\frac{1}{3})}^3 & 6{(\frac{2}{3})}^2{(\frac{1}{3})}^2 & 4{(\frac{2}{3})}^3(\frac{1}{3}) & {(\frac{2}{3})}^4\\ 0 &0 &0 & 0 & 1 \\ \end{bmatrix}\]

Does this remind you of anything?

Going back to square matrices (with the same population size in consecutive generations), we can write a formula for the \(i^{th}\) for and \(j^{th}\) column of any transition probability matrix for a population of n alleles as:

\[T_{ij} = {{n}\choose{j-1}} \cdot {(\frac{i-1}{n})}^{j-1} \cdot {(\frac{n-i+1}{n})}^{n-j+1}\]

Let’s write this as a function in R that takes as inputs, i, j and n.

Tprob <- function(i, j, n){
  choose(n, j-1)*(((i-1)/n)^(j-1))*(((n-i+1)/n)^(n-j+1))
}

Now we can test it by having it return the values in the two by two matrix we created at the beginning of this lab:

for (i in 1:3){
  for (j in 1:3){
    print(Tprob(i,j,2))
  }
}
## [1] 1
## [1] 0
## [1] 0
## [1] 0.25
## [1] 0.5
## [1] 0.25
## [1] 0
## [1] 0
## [1] 1

Now it’s time to think (a bit) bigger. Let’s imagine a population of 10 alleles. Here we’ll need an 11 x 11 transition probability matrix. To fill it, we can use our function:

T <- matrix(0, 11, 11)

for (i in 1:11){
  for (j in 1:11){
    T[i,j] <- Tprob(i,j,10)
  }
}

Now type T into your console to take a look at what you’ve created.

Let’s not go one step further and create a function that creates a square transition probability matrix of any size. Here, we only need one input, n:

Tmatrix <- function(n){
  T <- matrix(0, n+1, n+1)

for (i in 1:(n+1)){
  for (j in 1:(n+1)){
    T[i,j] <- Tprob(i,j,n)
  }
}
  return(T)
}

We can use this code to quickly create a transition probability matrix for a 20 allele population:

T <- Tmatrix(20)

Try, trying T into your console again to see what you’ve created.

Random Genetic Drift

Now, it’s time to simulate random genetic drift as a Markov Chain. Let’s start with a population with 10 A’s and 10 a’s. We’ll make a vector representing this starting numbers (which could have take on 21 possible values, 0 through 20):

initial <- rep(0, 21)
initial[11] <- 1 #the 11th value which is 10 A and 10 a gets all the probability

Next, let’s bring back the function we wrote for Chutes and Ladders that raises a transition probability matrix to a power:

powermat<-function(P,h){
  Ph<-P
  if(h>1)
  {for(k in 2:h){
      Ph<-Ph%*%P}}
  return(Ph)
}

Finally, let’s look at the distribution of number of A alleles after 5 generations

gen5 <- initial%*%powermat(T,h=5)
plot(0:20, gen5)

and after 10 generations:

gen10 <- initial%*%powermat(T,h=10)
plot(0:20, gen10)

and after 20 generations:

gen20 <- initial%*%powermat(T,h=20)
plot(0:20, gen20)

Fixation happens when all alleles are either A or a and we can use our Markov chain to compute the probability of fixation after any number of generations.

After 10 generations:

gen10 <- initial%*%powermat(T,h=10)
fixation10 <- sum(gen10[c(1,21)])
fixation10

After 20 generations:

gen20 <- initial%*%powermat(T,h=20)
fixation20 <- sum(gen20[c(1,21)])
fixation20

We can also create a vector of fixation probably as a function of the number of generations:

fixation.prob <- rep(NA, 100) # creating an empty vector 200 generations long

for (g in 1:100){
  gen <- initial%*%powermat(T,h=g)
  fixation.prob[g] <- sum(gen[c(1,21)])
}

fixation.prob

and we can plot these probabilities:

plot(1:100, fixation.prob, main="Total Fixation Probability v. Generations (20 alleles)", xlab="Generation", ylab="Probability",
     type="l")
abline(v=20)
abline(h=0.5)

Challenge #1: Create a plot of the fixation probability versus time for a population of 6 alleles (3 A and 3 a)

We can also calculate the probability that fixation in any particular generation by taking the differences in the total probability of fixation between one generation and the next.

gen.fixation.probabilities <-  diff(c(0, fixation.prob), lag=1)
plot(1:100, gen.fixation.probabilities, main="Marginal Fixation Probability v. Generations (20 alleles)", type="l")

And we could use these marginal probabilities to calculate the mean fixation time using our favorite little equation:

\[E[X] = \Sigma x \cdot P(x)\]

sum(gen.fixation.probabilities*(1:100))

Challenge #2 Calculation the mean fixation time for a population of 6 alleles (3 A and 3 a).

Surviving a Bottleneck

So far we assumed that our population of alleles is evenly split between two types but what if the split is not even? What if these alleles emerged from a population that was 50% A but the 20 alleles that survived a bottleneck may have had a different distribution of A. We can solve this problem by using an initial vector of probabilities.

initial <- dbinom(0:20, 20, prob=0.5)

for (g in 1:100){
  gen <- initial%*%powermat(T,h=g)
  fixation.prob[g] <- sum(gen[c(1,21)])
}

fixation.prob
plot(1:100, fixation.prob, main="Total Fixation Probability v. Generations (20 alleles)", xlab="Generation", ylab="Probability",
     type="l")
abline(v=20)
abline(h=0.5)

We can also calculate the mean time to fixation as we did before.

gen.fixation.probabilities <-  diff(c(0, fixation.prob), lag=1)
sum(gen.fixation.probabilities*(1:100))

Challenge #3 Plot total fixation probability versus time and calculate the mean fixation time for a population of 6 alleles that survived a bottleneck from population that is 50% A.

Which Allele Wins Out?

Finally, let’s try this with an unequal initial ratio of alleles but pay attention to which allele wins out. After 200 generations, the probability is split but two extreme outcomes (all A and all a) but how much probability does each of these two possibilities get?

initial <- rep(0, 21)
initial[6] <- 1 #the 6th value which is 5 A and 15 a gets all the probability


gen200 <- initial%*%powermat(T,h=200)

gen200[c(1,21)]

Challenge #4 Make a general statement about the probability of each of two alleles dominating the population. If a population with allele frequencies p and q passes through a bottleneck and then undergoes random genetic drift, what can we say about the probability of each allele dominating the population?