Wright-Fisherian populations

Using R, we can conduct simple simulations to observe how genetic drift occurs and why it leads to inbreeding. In this demonstration we will simulate idealized small populations, also referred to as ‘Wright-Fisherian’ populations, to illustrate the process of random mating in a finite population and how this leads to changes in allele frequency over generations. In this demonstration, allele frequency changes are due to genetic drift alone. We will only simulate one locus, but you can imagine that the same process is occurring at all loci in the genome.

Learn how to create a random genotype at an allele

To create a random genotype at one locus for one individual, we must randomly sample two alleles from a set of possible alleles. First create a genotype by sampling 2 alleles at random with replacement from the set of possible alleles “A” and “a”

g<- sample(x=c('A', 'a'), size=2, replace=TRUE)
g
## [1] "A" "A"

Then combine the alleles into a single character using the paste() function

g2<- paste(g, collapse="")
g2
## [1] "AA"

Simulate single-locus genotypes for a base population of many individuals

The base population is a population that does not change over time. It is in Hardy-Weinburg equilibrium. It must be infinite in size. All Wright-Fisherian populations track back to a base population, thus we must simulate this base population first. Because we cannot simulate an infinite population, we simulate a very large population of size 1000. To do this we create random genotypes as described in the code above and then we repeat it 1000 times in a loop. Each time we create a random genotype we append it to a vector. This vector is then the vector of genotypes for our base population

Nbase<- 1000
base<- c()
for(i in 1:Nbase){
  g<- sample(x=c('A', 'a'), size=2, replace=TRUE)
  g2<- paste(g, collapse="")
  base<- append(base, g2)
}
head(base)
## [1] "Aa" "Aa" "Aa" "aA" "aa" "AA"

Learn how to compute the allele frequency

First get a table of the genotype counts

tb<- table(base)
tb
## base
##  aa  aA  Aa  AA 
## 269 228 246 257

First get a table allele dosage for each genotype in the table

ctvec<- stringr::str_count(names(tb), "a")
ctvec
## [1] 2 1 1 0

Compute the number of a alleles using matrix multiplication

numbaa<- tb %*% ctvec

compute the proportion of alleles that are aa

freq0<- numbaa/(length(base)*2)

Make a function to compute allele frequencies using the above process so we can more easily compute allele frequencies later on

freqFcn<- function(pop){
  tb<- table(pop)
  numbaa<- tb %*% stringr::str_count(names(tb), "a")
  fq<- numbaa/(length(pop)*2)
  return(fq)
}

Simulate the sampling of founder individuals from the base population

Create the initial population by sampling N individuals from the base population

N<-200
found<- sample(base, N)
head(found)
## [1] "Aa" "aA" "AA" "Aa" "AA" "Aa"

Calculate the frequency of the ‘a’ allele in the initial population

freq1<- freqFcn(found)
freq1
##        [,1]
## [1,] 0.5025

Simulate one generation of random mating

Create a pool of 2N gametes from the initial population

pl<- paste(found, collapse="") #all alleles in one character
gvec<- strsplit(pl, '')[[1]] #split up into a vector with 2N elements  
gvec
##   [1] "A" "a" "a" "A" "A" "A" "A" "a" "A" "A" "A" "a" "a" "A" "a" "A" "a" "A"
##  [19] "A" "a" "A" "A" "a" "a" "a" "a" "a" "A" "a" "A" "A" "a" "A" "A" "A" "A"
##  [37] "A" "a" "A" "A" "A" "a" "a" "a" "A" "A" "a" "A" "a" "a" "A" "a" "a" "a"
##  [55] "a" "A" "A" "a" "a" "A" "A" "A" "A" "A" "a" "a" "A" "a" "A" "A" "a" "a"
##  [73] "A" "a" "a" "a" "a" "a" "a" "A" "a" "A" "A" "a" "A" "A" "a" "a" "A" "a"
##  [91] "a" "a" "A" "A" "A" "a" "a" "a" "A" "A" "A" "A" "a" "A" "a" "A" "A" "A"
## [109] "a" "a" "a" "a" "A" "A" "a" "A" "a" "A" "A" "A" "a" "a" "A" "A" "A" "a"
## [127] "a" "a" "A" "A" "a" "A" "A" "a" "a" "A" "a" "A" "a" "A" "A" "A" "A" "A"
## [145] "a" "A" "a" "a" "A" "a" "a" "a" "a" "a" "A" "A" "A" "A" "a" "a" "a" "A"
## [163] "a" "A" "A" "a" "A" "a" "a" "a" "a" "a" "a" "A" "A" "a" "a" "a" "A" "A"
## [181] "a" "a" "a" "a" "a" "A" "a" "A" "A" "a" "a" "a" "a" "a" "a" "A" "A" "a"
## [199] "a" "A" "A" "a" "A" "A" "a" "a" "a" "a" "a" "A" "A" "a" "a" "a" "a" "a"
## [217] "a" "A" "A" "a" "A" "a" "A" "a" "A" "A" "A" "a" "A" "a" "A" "a" "a" "A"
## [235] "a" "a" "A" "a" "a" "A" "A" "A" "A" "a" "A" "A" "A" "A" "a" "A" "A" "A"
## [253] "a" "A" "a" "a" "a" "A" "a" "A" "A" "a" "a" "a" "A" "A" "a" "A" "A" "a"
## [271] "a" "A" "A" "a" "a" "A" "A" "A" "a" "A" "a" "A" "A" "A" "A" "A" "A" "A"
## [289] "A" "A" "A" "a" "A" "A" "A" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "A"
## [307] "a" "A" "A" "A" "a" "A" "a" "A" "A" "a" "a" "a" "a" "a" "a" "A" "A" "a"
## [325] "a" "a" "a" "A" "A" "A" "a" "a" "a" "a" "a" "A" "a" "a" "a" "A" "a" "A"
## [343] "A" "a" "A" "a" "a" "A" "A" "A" "A" "A" "A" "A" "A" "A" "a" "A" "a" "a"
## [361] "a" "A" "a" "A" "a" "a" "A" "a" "A" "A" "a" "A" "a" "a" "a" "a" "A" "a"
## [379] "a" "A" "A" "A" "a" "a" "A" "A" "a" "A" "A" "A" "a" "A" "A" "a" "A" "A"
## [397] "A" "a" "a" "A"

Sample two alleles from the pool of gametes, unite them to form a zygote, and repeat N times to produce N individuals. It is this sampling that causes genetic drift. A small sample (small N) will be less representative of the pool of gametes than a bigger one.

pop<- c()
for(i in 1:N){
  g<- sample(x=gvec, size=2, replace=TRUE) #random sample two gametes from gamete pool
  g2<- paste(g, collapse="") #unite gametes to form a zygote 
  pop<- append(pop, g2)
}
head(pop)
## [1] "AA" "AA" "AA" "AA" "aa" "AA"

Compute the frequency of the ‘a’ allele

freq1<- freqFcn(pop) 
freq1
##      [,1]
## [1,]  0.5

For later use, lets make a function for random mating

randomMatefcn<- function(pop, N){
  pl<- paste(pop, collapse="") #all alleles in one character
  gvec<- strsplit(pl, '')[[1]] #split up into a vector with 2N elements 
  pop<- c()
  for(i in 1:N){
    g<- sample(x=gvec, size=2, replace=TRUE) #random sample two gametes from gamete pool
    g2<- paste(g, collapse="") #unite gametes to form a zygote 
    pop<- append(pop, g2)
  }
return(pop)
}

Simulate more generations of random mating and compute frequency at each generation

This time we will use the functions randomMatefcn() and freqFcn() to simplify the code. Random mating and allele frequency calculation is repeated in a loop. Notice that at each generation j of intermating, the current population, ‘pop’, is intermated and the resulting population becomes the current population for the next generation of intermating.

freqVec<-freq1
for(j in 1:99){
  pop<- randomMatefcn(pop, N)
  freqVec<- append(freqVec, freqFcn(pop)) 
}

Plot the allele frequency changes

Notice how the frequency of the ‘a’ allele fluctuates over time. Each time you run the code you simulate another Wright-Fisherian population sampled from the same base population. Allele frequency changes over time will be different for each Wright-Fisherian population because drift is involves random sampling. The direction of the allele frequency changes is not predictable. However the variation in the allele frequencies across multiple Wright-Fisherian populations can be predicted based on the population size.

plot(freqVec, type='l', ylim=c(0,1))

Repeat whole process for a different population size

Note that we are starting from the same base population as before and we follow all the same steps described in detail above. I have made the code more concise to so it is easier for you to modify and see the whole process.

N<- 500 #population size
freqVec<- c()
found<- sample(base, N) #sample N founders from the base population
pop<- found #use this as the initial population
for(j in 1:100){
  pop<- randomMatefcn(pop, N) #random mate
  freqVec<- append(freqVec, freqFcn(pop)) #calculate frequency of 'a' and append to vector
}
plot(freqVec, type='l', ylim=c(0,1)) #plot the frequency of the 'a' allele

Exercises

In this set of exercises, you will learn more about genetic drift and inbreeding

First create a base population for this set of exercises

This code was described previously. Run this code to create the base population.

Nbase<- 1000
base<- c()
for(i in 1:Nbase){
  g<- sample(x=c('A', 'a'), size=2, replace=TRUE)
  g2<- paste(g, collapse="")
  base<- append(base, g2)
}

1. Simulate a small Wright-Fisherian population

Use the code provided in the last part of the Class 2 demonstration, simulate a Wright-Fisherian population of size 20 for 100 generations of intermating, and observe the allele frequency over generations. Run the code a few times to observe the variation in allele frequency changes among multiple Wright-Fisherian populations.

2. Is this population in Hardy-Weinburg equilibrium?

Refer to the lecture slides to help answer this question.

3. Determine if the genotypic frequencies in this population follow Hardy-Weinberg Law

Refer to the lecture slides to determine the expected genotype frequencies based on Hardy-Weinberg Law and then compute the chi-squared test statistic to determine if the genotype frequencies that you observe are significantly different from the expected frequencies. For a p-value of 0.05 and one degree of freedom, the critical value is 3.84. Chi-squared test statistic values larger than 3.84 would indicate that the genotypes are not in the expected proportions based on Hardy-Weinberg Law. For a helpful demonstration refer to this video https://youtu.be/hT9_laBrqWg

4. Why does drift lead to inbreeding? Discuss in your group.

Recall that inbreeding is the increase in homozygosity over time.

5. Advanced exercise: Compute the variance of the frequency of ‘a’ allele at generation 100 for N=20

To do this, you will need to simulate a large number of Wright-Fisherian populations of size 20 for 100 generations and record the final allele frequency of ‘a’ for each simulation.