Using R, we can conduct simple simulations to observe how genetic drift occurs and why it leads to inbreeding. In this demonstration we will simulate idealized small populations, also referred to as ‘Wright-Fisherian’ populations, to illustrate the process of random mating in a finite population and how this leads to changes in allele frequency over generations. In this demonstration, allele frequency changes are due to genetic drift alone. We will only simulate one locus, but you can imagine that the same process is occurring at all loci in the genome.
To create a random genotype at one locus for one individual, we must randomly sample two alleles from a set of possible alleles. First create a genotype by sampling 2 alleles at random with replacement from the set of possible alleles “A” and “a”
g<- sample(x=c('A', 'a'), size=2, replace=TRUE)
g
## [1] "A" "A"
Then combine the alleles into a single character using the paste() function
g2<- paste(g, collapse="")
g2
## [1] "AA"
The base population is a population that does not change over time. It is in Hardy-Weinburg equilibrium. It must be infinite in size. All Wright-Fisherian populations track back to a base population, thus we must simulate this base population first. Because we cannot simulate an infinite population, we simulate a very large population of size 1000. To do this we create random genotypes as described in the code above and then we repeat it 1000 times in a loop. Each time we create a random genotype we append it to a vector. This vector is then the vector of genotypes for our base population
Nbase<- 1000
base<- c()
for(i in 1:Nbase){
g<- sample(x=c('A', 'a'), size=2, replace=TRUE)
g2<- paste(g, collapse="")
base<- append(base, g2)
}
head(base)
## [1] "Aa" "Aa" "Aa" "aA" "aa" "AA"
First get a table of the genotype counts
tb<- table(base)
tb
## base
## aa aA Aa AA
## 269 228 246 257
First get a table allele dosage for each genotype in the table
ctvec<- stringr::str_count(names(tb), "a")
ctvec
## [1] 2 1 1 0
Compute the number of a alleles using matrix multiplication
numbaa<- tb %*% ctvec
compute the proportion of alleles that are aa
freq0<- numbaa/(length(base)*2)
Make a function to compute allele frequencies using the above process so we can more easily compute allele frequencies later on
freqFcn<- function(pop){
tb<- table(pop)
numbaa<- tb %*% stringr::str_count(names(tb), "a")
fq<- numbaa/(length(pop)*2)
return(fq)
}
Create the initial population by sampling N individuals from the base population
N<-200
found<- sample(base, N)
head(found)
## [1] "Aa" "aA" "AA" "Aa" "AA" "Aa"
Calculate the frequency of the ‘a’ allele in the initial population
freq1<- freqFcn(found)
freq1
## [,1]
## [1,] 0.5025
Create a pool of 2N gametes from the initial population
pl<- paste(found, collapse="") #all alleles in one character
gvec<- strsplit(pl, '')[[1]] #split up into a vector with 2N elements
gvec
## [1] "A" "a" "a" "A" "A" "A" "A" "a" "A" "A" "A" "a" "a" "A" "a" "A" "a" "A"
## [19] "A" "a" "A" "A" "a" "a" "a" "a" "a" "A" "a" "A" "A" "a" "A" "A" "A" "A"
## [37] "A" "a" "A" "A" "A" "a" "a" "a" "A" "A" "a" "A" "a" "a" "A" "a" "a" "a"
## [55] "a" "A" "A" "a" "a" "A" "A" "A" "A" "A" "a" "a" "A" "a" "A" "A" "a" "a"
## [73] "A" "a" "a" "a" "a" "a" "a" "A" "a" "A" "A" "a" "A" "A" "a" "a" "A" "a"
## [91] "a" "a" "A" "A" "A" "a" "a" "a" "A" "A" "A" "A" "a" "A" "a" "A" "A" "A"
## [109] "a" "a" "a" "a" "A" "A" "a" "A" "a" "A" "A" "A" "a" "a" "A" "A" "A" "a"
## [127] "a" "a" "A" "A" "a" "A" "A" "a" "a" "A" "a" "A" "a" "A" "A" "A" "A" "A"
## [145] "a" "A" "a" "a" "A" "a" "a" "a" "a" "a" "A" "A" "A" "A" "a" "a" "a" "A"
## [163] "a" "A" "A" "a" "A" "a" "a" "a" "a" "a" "a" "A" "A" "a" "a" "a" "A" "A"
## [181] "a" "a" "a" "a" "a" "A" "a" "A" "A" "a" "a" "a" "a" "a" "a" "A" "A" "a"
## [199] "a" "A" "A" "a" "A" "A" "a" "a" "a" "a" "a" "A" "A" "a" "a" "a" "a" "a"
## [217] "a" "A" "A" "a" "A" "a" "A" "a" "A" "A" "A" "a" "A" "a" "A" "a" "a" "A"
## [235] "a" "a" "A" "a" "a" "A" "A" "A" "A" "a" "A" "A" "A" "A" "a" "A" "A" "A"
## [253] "a" "A" "a" "a" "a" "A" "a" "A" "A" "a" "a" "a" "A" "A" "a" "A" "A" "a"
## [271] "a" "A" "A" "a" "a" "A" "A" "A" "a" "A" "a" "A" "A" "A" "A" "A" "A" "A"
## [289] "A" "A" "A" "a" "A" "A" "A" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "A"
## [307] "a" "A" "A" "A" "a" "A" "a" "A" "A" "a" "a" "a" "a" "a" "a" "A" "A" "a"
## [325] "a" "a" "a" "A" "A" "A" "a" "a" "a" "a" "a" "A" "a" "a" "a" "A" "a" "A"
## [343] "A" "a" "A" "a" "a" "A" "A" "A" "A" "A" "A" "A" "A" "A" "a" "A" "a" "a"
## [361] "a" "A" "a" "A" "a" "a" "A" "a" "A" "A" "a" "A" "a" "a" "a" "a" "A" "a"
## [379] "a" "A" "A" "A" "a" "a" "A" "A" "a" "A" "A" "A" "a" "A" "A" "a" "A" "A"
## [397] "A" "a" "a" "A"
Sample two alleles from the pool of gametes, unite them to form a zygote, and repeat N times to produce N individuals. It is this sampling that causes genetic drift. A small sample (small N) will be less representative of the pool of gametes than a bigger one.
pop<- c()
for(i in 1:N){
g<- sample(x=gvec, size=2, replace=TRUE) #random sample two gametes from gamete pool
g2<- paste(g, collapse="") #unite gametes to form a zygote
pop<- append(pop, g2)
}
head(pop)
## [1] "AA" "AA" "AA" "AA" "aa" "AA"
Compute the frequency of the ‘a’ allele
freq1<- freqFcn(pop)
freq1
## [,1]
## [1,] 0.5
For later use, lets make a function for random mating
randomMatefcn<- function(pop, N){
pl<- paste(pop, collapse="") #all alleles in one character
gvec<- strsplit(pl, '')[[1]] #split up into a vector with 2N elements
pop<- c()
for(i in 1:N){
g<- sample(x=gvec, size=2, replace=TRUE) #random sample two gametes from gamete pool
g2<- paste(g, collapse="") #unite gametes to form a zygote
pop<- append(pop, g2)
}
return(pop)
}
This time we will use the functions randomMatefcn() and freqFcn() to simplify the code. Random mating and allele frequency calculation is repeated in a loop. Notice that at each generation j of intermating, the current population, ‘pop’, is intermated and the resulting population becomes the current population for the next generation of intermating.
freqVec<-freq1
for(j in 1:99){
pop<- randomMatefcn(pop, N)
freqVec<- append(freqVec, freqFcn(pop))
}
Notice how the frequency of the ‘a’ allele fluctuates over time. Each time you run the code you simulate another Wright-Fisherian population sampled from the same base population. Allele frequency changes over time will be different for each Wright-Fisherian population because drift is involves random sampling. The direction of the allele frequency changes is not predictable. However the variation in the allele frequencies across multiple Wright-Fisherian populations can be predicted based on the population size.
plot(freqVec, type='l', ylim=c(0,1))
Note that we are starting from the same base population as before and we follow all the same steps described in detail above. I have made the code more concise to so it is easier for you to modify and see the whole process.
N<- 500 #population size
freqVec<- c()
found<- sample(base, N) #sample N founders from the base population
pop<- found #use this as the initial population
for(j in 1:100){
pop<- randomMatefcn(pop, N) #random mate
freqVec<- append(freqVec, freqFcn(pop)) #calculate frequency of 'a' and append to vector
}
plot(freqVec, type='l', ylim=c(0,1)) #plot the frequency of the 'a' allele
In this set of exercises, you will learn more about genetic drift and inbreeding
This code was described previously. Run this code to create the base population.
Nbase<- 1000
base<- c()
for(i in 1:Nbase){
g<- sample(x=c('A', 'a'), size=2, replace=TRUE)
g2<- paste(g, collapse="")
base<- append(base, g2)
}
Use the code provided in the last part of the Class 2 demonstration, simulate a Wright-Fisherian population of size 20 for 100 generations of intermating, and observe the allele frequency over generations. Run the code a few times to observe the variation in allele frequency changes among multiple Wright-Fisherian populations.
Refer to the lecture slides to help answer this question.
Refer to the lecture slides to determine the expected genotype frequencies based on Hardy-Weinberg Law and then compute the chi-squared test statistic to determine if the genotype frequencies that you observe are significantly different from the expected frequencies. For a p-value of 0.05 and one degree of freedom, the critical value is 3.84. Chi-squared test statistic values larger than 3.84 would indicate that the genotypes are not in the expected proportions based on Hardy-Weinberg Law. For a helpful demonstration refer to this video https://youtu.be/hT9_laBrqWg
Recall that inbreeding is the increase in homozygosity over time.
To do this, you will need to simulate a large number of Wright-Fisherian populations of size 20 for 100 generations and record the final allele frequency of ‘a’ for each simulation.