In this lab we’ll be looking at random genetic drift and reversible mutations using Markov chains. In particular, we’ll be looking at how each of these factors effects the heterozygosity of the population – the percent of the population that has two of the same allele.

Random Genetic Drift

First let’s recreate our random drift model, using the formula for transition probabilities that we used in the last lab:

\[T_{ij} = {{n}\choose{j-1}} \cdot {(\frac{i-1}{n})}^{j-1} \cdot {(\frac{n-i+1}{n})}^{n-j+1}\]

Tprob <- function(i, j, n){
  choose(n, j-1)*(((i-1)/n)^(j-1))*(((n-i+1)/n)^(n-j+1))
}

We’ll also need the function we created that takes this formula and fills in a matrix for a population of n/2 diploid individuals:

T.drift.matrix <- function(n){
  T <- matrix(0, n+1, n+1)

for (i in 1:(n+1)){
  for (j in 1:(n+1)){
    T[i,j] <- Tprob(i,j,n)
  }
}
  return(T)
}

Let’s remember what this looks like for a population of 2 individuals (or 4 gametes). This matrix gives the probability of transitioning, from any number of A alleles (0 through 4) to any number of A alleles in the next generation (with alleles randomly sample with replacement from the last generation)?

T.drift.matrix(4)
##            [,1]     [,2]      [,3]     [,4]       [,5]
## [1,] 1.00000000 0.000000 0.0000000 0.000000 0.00000000
## [2,] 0.31640625 0.421875 0.2109375 0.046875 0.00390625
## [3,] 0.06250000 0.250000 0.3750000 0.250000 0.06250000
## [4,] 0.00390625 0.046875 0.2109375 0.421875 0.31640625
## [5,] 0.00000000 0.000000 0.0000000 0.000000 1.00000000

Q1: What is the probability of ending up with 3 A alleles in the next generation if you have 2 A alleles?

Reversible Mutations

Now for something new. Let’s imagine that we have a population with two types of alleles, A and a, which can both mutate into the other type. We can think of the number of “forward” mutations as a binomial random variable with the number of draws determined by the number of A alleles and the chance of “success” determined by the rate of forward mutations. Likewise, for the number of backwards mutations using the number of a alleles and the rate of back mutations.

We can write these binomial random variables into R code as follows:

library(discreteRV)
forward.RV <- function(num.a, u){
  RV(outcomes=0:num.a, probs=dbinom(0:num.a, num.a, u))
}

reverse.RV <- function(num.A, v){
  RV(outcomes=0:num.A, probs=dbinom(0:num.A, num.A, v))
}

The net number of mutations (from A to a) is the number of forward mutations less the number of back mutations. In R, this is as easy as subtracting one random variable from the other:

f <- forward.RV(4, .02)

r <- reverse.RV(2, .01)

net.mutations <- f-r
probs(net.mutations)
##           -2           -1            0            1            2 
## 9.223682e-05 1.827042e-02 9.055041e-01 7.384262e-02 2.259712e-03 
##            3            4 
## 3.073910e-05 1.568160e-07

That’s actually all we need to create our mutation probability transition matrix. The code below looks ugly but it comes down to simply using the numbers of alleles of each type and the forward and backward mutation rates to calculate the probability of any number of mutations.

T.mutation.matrix <- function(n,u,v){
  T <- matrix(data=0, nrow=n+1, ncol=n+1)
 for(i in 1:(n+1)){
   f <- forward.RV(n-i+1, u); r <- reverse.RV(i-1, v)
   net.mutations <-  f-r
   T[i,i+range(net.mutations)[1]:range(net.mutations)[2]] <- probs(net.mutations)
 }
  return(T)
}

Let’s try this out with a small matrix (a population of 4 alleles):

T.mutation.matrix(n=4, u=.02, v=.01)
##            [,1]       [,2]       [,3]       [,4]       [,5]
## [1,] 0.92236816 0.07529536 0.00230496 0.00003136 0.00000016
## [2,] 0.00941192 0.93235632 0.05705952 0.00116432 0.00000792
## [3,] 0.00009604 0.01901984 0.94206424 0.03842784 0.00039204
## [4,] 0.00000098 0.00029108 0.02882088 0.95148108 0.01940598
## [5,] 0.00000000 0.00000396 0.00058806 0.03881196 0.96059602

In the first row we start with 0 A alleles and 4 a alleles. The mutation rate for A to a is 2% for each of the 4 a alleles so we expect .08 mutations and, in fact, we get one mutation 7.5% of the time and 2 mutations 2.3% of the time and there’s even some (small) chance of 3 mutations.

In the second row, we start with 1 A and 3 a alleles. We expect .06 forward mutations and .01 reverse mutations and every so often they happen at the same time and cancel each other out!

Combining Drift and Mutations

Creating a matrix that combines the two elements that might shake up our population from one generation to the next (random drift and mutation) is as simple as multiplying the probability transition matrices. In R:

T.combo.matrix <- function(n, u, v){
  T <- T.drift.matrix(n) %*% T.mutation.matrix(n,u,v)
  return(T)
}

Now, let’s take a look at a the transition matrix for a population of 4 alleles, with forward and reverse mutation rates of 20% and 10% respectively.

T.combo.matrix(4, 0.02, 0.01)
##             [,1]       [,2]       [,3]       [,4]        [,5]
## [1,] 0.922368160 0.07529536 0.00230496 0.00003136 0.000000160
## [2,] 0.295834009 0.42118740 0.22487124 0.05335928 0.004748071
## [3,] 0.060037250 0.24500050 0.37492500 0.25499950 0.065037751
## [4,] 0.004064856 0.04813437 0.21374522 0.42184700 0.312208550
## [5,] 0.000000000 0.00000396 0.00058806 0.03881196 0.960596020

Q2: What is the probabilty that a population that has 2 alleles of each type in one generation has only A alleles in the next generation?

Create a couple of your own matrices and make see if you can make some sense of them.

Putting it all together

First, we’ll need to load a couple of packages. expm let’s us raise matrices to exponents and spatstat lets us calculated weighted cumulative distributions (more on that in a moment).

library(expm); library(Hmisc)

Next, we’re going to create a function that allows you to choose, the number of gametes (n), the forward and backward mutation rates (u and v), the number of generations (t) you would like to pass through and the initial frequency of the A allele in the population (startp).

Remember how a Markov chain works?

\[u^{(t)} = u^{(0)} \cdot P^{t}\] In the formula above, \(x^{(t)}\) is the probability of being in any possible state (having any possible number of A alleles) after t units of time (t generations). \(x^{(0)}\) is the initial probability vector \(P^{t}\) is the probability transition matrix, \(P\) raised to the power of the number of time units, \(t\).

Our function below starts (step 1) by creating \(x^{(0)}\), the initial probabilities of any number of A alleles. To do this, it uses the binomial distribution (good old, dbinom) along with the starting A allele frequency.

Next (step 2), our function creates, P, the probability transition matrix, using the function we created above that accounts for reversible mutations and random genetic drift. In step 3, we use the Markov chain to calculate \(x^{(t)}\), the probability of any number of A alleles after t generations. In step 4, we calculate the heterozygosity for every possible number of A alleles. This formula looks funny, but really we’re just calculation a vector of 2pq’s for a series of p’s. In step 5, we calculate the expected value of the heterozygosity by weighing each heterozygosity by its probability. You’ve seen the formula we’re using before:

\[ E[X] = \Sigma \ p(x) \cdot x \] Finally, in step 6, we add up the probabilities of states where one allele has fixated (when there are either 0 or n A alleles).

The remainder of the code simply takes this information and creates three plots.

T.combo.matrix.plot <- function(n=100, u=0, v=0, t=100, startp=0.5){
  x0 <- dbinom(0:n, n, startp) # step 1: initial probability vector
  P <- T.combo.matrix(n, u, v) # step 2: transition probability matrix
  xt <- x0 %*% (P %^% t) # step 3: markov chain
  het <- 2*(0:n)*(n:0)/n^2 # step 4: hetrozygosity of every possible state
  mean.het <- sum(xt*het) # step 5: expected value of heterozygosity
  fixation <- sum(xt[c(1,n+1)]) #step 6: calculate the fixation probability
  
  par(mfrow=c(3,1))
  
  plot(0:n, xt, main="Probability Distribution of A alleles", ylab="Probability",
       xlab="number of A alles", type="l")
  plot(density(het, weights=xt, bw=1/n, from=0, to=0.5), main="Probability Density of Heterozygosity",
       xlab="Heterozygosity")
  text(0.25, .7*max(density(het, weights=xt, bw=1/n, from=0, to=0.5)$y), 
       paste("Fixation Probability: ", round(fixation,3)))
  text(0.25, .5*max(density(het, weights=xt, bw=1/n, from=0, to=0.5)$y), 
       paste("Mean Heterozygosity: ", round(mean.het,3)))
  plot(wtd.Ecdf(het, weights=xt)$x, wtd.Ecdf(het, weights=xt)$ecdf, main="Cumulative Distribution of Heterozygosity", 
       xlab="Heterozygosity", ylab="Probability", type="l")
  
}

The Fun Part!

First, let’s try this out with no mutations and look at a population of 100 gametes (each with an initial 20% chance of being A) after 10 generations:

T.combo.matrix.plot(300, t=10, u=0, v=0, startp=0.2)

The first two plots are distribution plots the likes of which you’ve seen before. They tell you the probabilities of different numbers of A alleles and the probabilities of different heterozygosities after 10 generations.

The third plot, a cumulative distribution plot may take a little getting used it. Mathematically, it’s simply the area under the heterozygosity plot curve up to (and including) each value. Put another way, it’s the probability of having a Heterozygosity equal to or lower than each value.

Let’s fast-forward in time, still without any mutations:

T.combo.matrix.plot(300, t=100, u=0, v=0, startp=0.2)

and further in time:

T.combo.matrix.plot(300, t=500, u=0, v=0, startp=0.2)

Q3: What happens to this population?

Now, let’s add some mutation, specifically a 0.2% chance (1 in 500) of a forward mutation and a 0.2% chance of a back mutation.

T.combo.matrix.plot(300, t=500, u=0.002, v=0.002, startp=0.2)

Q4: How does mutation affect the destiny of this population?

Let’s try the same thing with a much smaller population both with and without mutation:

T.combo.matrix.plot(16, t=500, u=0, v=0, startp=0.2)
T.combo.matrix.plot(16, t=500, u=0.002, v=0.002, startp=0.2)

Q5: How does population size affect the destiny of this population?

Explore

Try out different combination of population size, numbers of generations, forward and backward mutation rates and initial allele frequencies. Try to determine the relationships between how these variables affect our population of genes.