Author: Grace Strobbe
1. Dependencies
2. Before You Get Started…
3. Modeling Allele Frequencies Across Generations: Introducing Key Parameters
4. Modeling the Effect of Selection on Allele Frequencies using the Selection Command
7. Resources
- 7.1. Data References
- 7.2. Resources You Consulted

Author: Grace Strobbe

1. Dependencies

#This command loads required packages
#Note that you may need to install the learnPopGen package first!
library(learnPopGen)

Important tip: In this and the next exercise, we will focus entirely on running different evolutionary models. The plots for these models are generated automatically, and you do not need to use ggplot to produce the outputs.

2. Before You Get Started…

In lecture, I asked you to pause and predict what happens to the allele frequency of an allele A that provides a fitness advantage to its carriers. What was your prediction?

My prediction for an allele A that provides fitness advantage is that it would, with time, become completley dominant.

3. Modeling Allele Frequencies Across Generations: Introducing Key Parameters

Applying the HW principle is a powerful approach to test whether any evolutionary forces are impacting a population at any given locus. It does not tell us, however, what evolutionary forces might be acting on that particular locus. In the following sections, we will get to know different evolutionary forces and discover how they impact allele frequencies across generations.

First, let’s consider how natural selection affects allele frequencies. Natural selection essentially manifests itself in different genotypes having a different fitness. Mathematically, we assign a fitness of 1 to one genotype (e.g., AA), and the fitness of other genotypes can than be defined relative to AA with the parameter s (the strength of selection). For example, if the genotype aa has a 10 % higher fitness than AA (s=0.1), its fitness would be 1+s = 1.1. If genotype Aa has a 15 % lower fitness than AA (s=-0.15), its fitness would be 1+s = 0.85.

It takes just a few key parameters for us to model how selection affects allele frequencies over time: (1) The strength of selection, s. (2) The mode of inheritance (dominant/recessive vs. additive). If an allele at a particular locus is dominant over another, s for the heterozygote is the same as for the dominant homozygote. Inheritance, however, may not be dominant/recessive, in which case you need two values of s to describe how selection might impact evolution.

4. Modeling the Effect of Selection on Allele Frequencies using the Selection Command

We can model the genotype frequencies in response to selection with the selection function. All we need to do is to tell the function the fitness (w) of the three genotypes (AA, Aa, aa). If the mode of inheritance is dominant/recessive, fitness is defined as:

fitness <- c(1, 1, 1+s)

If the mode of inheritance is not dominant/recessive, fitness is defined as:

fitness <- c(1, 1+s1, 1+s2)

Other parameters that are required are p0 (the starting allele frequency of p) and time (the number of generations you want to run the simulation for). We define these as:

start.freq <- number (between 0 and 1)

gen.time <- number (between 1 and infinity; be cautious with high numbers, as they use significant computation time)

Setting these parameters, we can model the effects of selection with the selection() function as follows:

result <-selection(w=fitness, time=gen.time, p0=start.freq)

Let’s put this to the test. Let’s model the effect of selection on a gene with a dominant/recessive mode of inheritance, with the recessive allele being deleterious (i.e., its fitness is zero). Set the starting allele frequency at 0.5 and run the simulation for 50 generations.

#Define the fitness of each genotype
fitness <- c(1, 1, 0)

#Define the starting allele frequency p0
start.freq <- 0.5

#Define how many generations you want to simulate
gen.time <- 50

#Model the allele changes and store the results in an object called r
r <- selection(w=fitness, time=gen.time, p0=start.freq)

#By default, the selection() function plots p, but you can also plot q by adding an extra argument
r <- selection(w=fitness, time=gen.time, p0=start.freq, show = "q")

Note that running the selection() function automatically plots the results of the model. You can also take a look at the numbers in a table using the following code.

#You can also look at the actual number if you want to
r.table <- data.frame(1:gen.time, r$p)
names(r.table) <- c("Generation", "p")

#Show last part of the table in the results
tail(r.table)

I hope what you can observe is that the frequency of A (p) increases over time. The speed of change slows as A gets more common, and it eventually flattens.

4.1. Varying the Strength of Selection

In the coming set of simulations, let’s explore how varying the strength of selection (s) impacts how allele frequencies change across generations. We will focus on a single gene with complete dominance of the A allele. In these first scenarios, the dominant phenotype has a selective advantage over the recessive one, but the A allele is starting at a very low frequency in the population (p0=0.0001). In the first example, the strength of selection against the recessive homozygotes is s=-0.1. Run the simulation and record what happens to the allele frequencies of A and a over 1000 generations.

#Define the fitness of each genotype
fitness1 <- c(1, 1, (-0.1))

#Define the starting allele frequency p0
start.freq1 <- 0.0001

#Define how many generations you want to simulate
gen.time1 <- 1000

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)

The frequency of A appears to increase very quickly and then levels out around the three hundredth generation. We can also observe an increase until the one hundredth generation which means that the “a” allele is selected at a rate of -0.1 and the “A” allele is selected at a higher rate

To truly understand how the strength of selection impacts allele frequencies, we need to run multiple models with different fitnesses for our genotypes. Using the same parameters as above, run multiple models with the selection coefficient varying between -1 and 0. Rather than running each model separately, you can actually run them in bulk using the add=TRUE argument.

#Define the fitness of each genotype. Unlike above, you want to vary the strength of selection, so we will define multiple fitness sets (one for each model you want to run)
fitness1 <- c(1,1,1)
fitness2 <- c(1,1,0.7)
fitness3 <- c(1,1,0.5)
fitness4 <- c(1,1,0.3)
fitness5 <- c(1,1,0)

#Define the starting allele frequency p0. This parameter will will be the same for all models, so you only need to define it once.
start.freq1 <- 0.0001

#Define how many generations you want to simulate. This parameter will will be the same for all models, so you only need to define it once.
gen.time1 <- 1000

#Model the allele changes for the different strengths of selection. You need to run one model for each fitness distribution you defined above. Note that you can plot all results in a single graph by adding "add=TRUE" starting at the second model. 
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)
r2 <- selection(w=fitness2, time=gen.time1, p0=start.freq1, add=TRUE, color = 'red')
r3 <- selection(w=fitness3, time=gen.time1, p0=start.freq1, add=TRUE, color = 'blue')
r4 <- selection(w=fitness4, time=gen.time1, p0=start.freq1, add=TRUE, color = 'green')
r5 <- selection(w=fitness5, time=gen.time1, p0=start.freq1, add=TRUE, color = 'purple')

Looking at the results of your models, how does varying the strength of selection affect changes in allele frequencies across generations?

With the decreasing of the strength of selection the time it takes to arrive at one also decreases. For example, when there is a decreased strength of selection it takes a shorter amount of time to arrive at higher frequencies such as AA

4.2. Selection on Dominant vs. Recessive phenotypes

Let’s examine what happens when the phenotype with higher fitness is dominant vs. recessive. First, set the starting frequency of the A allele to p0=0.01, and give the dominant phenotype a fitness advantage over the recessive phenotype (s against the recessive homozygotes is -0.2). Run the simulation for 1000 generations.

#Define the fitness of each genotype
fitness1 <- c(1,1,0.8)

#Define the starting allele frequency p0
start.freq1 <- 0.01

#Define how many generations you want to simulate
gen.time1 <- 1000

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)

#In this case, it may be useful to look at the actual numbers for p at the end of your simulation
r1.table <- data.frame(1:gen.time1, r1$p)
names(r1.table) <- c("Generation", "p")
tail(r1.table)

Now let’s examine the opposite scenario. Set the starting allele frequency p0=0.99 but this time give the recessive phenotype the fitness advantage over the dominant phenotype (same s as above). Run the simulation for 1000 generations.

#Define the fitness of each genotype
fitness2 <- c(0.8,1,1)

#Define the starting allele frequency p0
start.freq2 <- 0.99

#Define how many generations you want to simulate
gen.time2 <- 1000

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r2 <- selection(w=fitness2, time=gen.time2, p0=start.freq2)

#In this case, it may be useful to look at the actual numbers for p at the end of your simulation
r2.table <- data.frame(1:gen.time2, r2$p)
names(r2.table) <- c("Generation", "p")
tail(r2.table)

Compare and contrast the results of the two simulations? What do you notice about the pace of change and the final allele frequencies? Why does the recessive allele persist in a population when selected against, but the dominant allele does not?

The results of the two simulations appear very similar at first until we look at the opposite alleles. During the smaller time periods the recessive allele is present for long which is likely due to the difference in the starting frequencies.

4.3. Selection For Heterozygotes; Symmetrical Fitness between Homozygotes

So far we have only considered scenarios where fitness was distributed in a dominant/recessive way. Now we are going to investigate the consequences of selection for heterozygous individuals (i.e., heterozygotes having a fitness advantage over homozygous individuals). Give the heterozygous genotypes a 1 % fitness advantage over both of the homozygous genotypes. Run multiple simulations for 1000 generations and vary the starting allele frequency for values between 0 and 1.

#Define the parameters and set up the models
fitness1=c(1,1.01,1)
fitness2=c(1,1.01,1)
fitness3=c(1,1.01,1)
gen.time=1000
start.freq1=0.9
start.freq2=0.5
start.freq3=0.1
results1 <- selection(time = gen.time, p0=start.freq1, w=fitness1)
results2 <- selection(time = gen.time, p0=start.freq2, w=fitness2, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq3, w=fitness3, add=TRUE, color="blue")

What happens to the allele frequencies in the population? Can you explain why?

It appears that there is a fitness advantage for heterozygosity which will in turn cause populations to trend towards whichever allele type has the higher fitness. In the case of this stimulation, the heterozygous individuals are the ones who cause the population trend. The frequency at which the data starts determines how dramatic the change in the alleles will be. The above graph shows that when starting AA alleles are high, there will be a drop towards Aa, and when a higher frequency of aa alleles is present there will be a rise in the frequency of A until it reaches most Aa alleles.

4.4. Selection For Heterozygotes; Asymmetrical Fitness between Homozygotes

What happens if heterozygous individuals have a fitness advantage, but selection is stronger against one of the homozygous genotypes? Set the starting allele frequency to p0=0.1, and give the aa genotype a fitness of 0.0. Run the simulation with various fitness values for the AA genotype (between 0 and 1), while keeping the fitness of the heterozygotes at 1.

#Define the parameters and set up the models
fitness1=c(0.9,1,0.0)
fitness2=c(0.5,1,0.0)
fitness3=c(0.1,1,0.0)
gen.time=100
start.freq=0.1

results1 <- selection(time = gen.time, p0=start.freq, w=fitness1)
results2 <- selection(time = gen.time, p0=start.freq, w=fitness2, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq, w=fitness3, add=TRUE, color="blue")

How does changing the fitness of AA individuals influence the final allele frequencies in the population over time? Will the deleterious allele ever be lost from the population? What are the lowest and highest frequencies the a allele can reach? Why?

The end concentration of AA alleles in the population is effected by the fitness of AA individuals. When the fitness levels of AA individuals is low, there is a lower number of AA individuals present in the population. There will always be some remaining aa individuals because of the ongoing population of Aa individuals. The highest frequency would be around nintey percent removed ans he lowest frequency would be about fifty percent removal due to heterzygous individuals.

4.5. Selection Against Heterozygotes; Symmetrical Fitness between Homozygotes

Selection may also act against heterzygotes. How do allele frequencies change if the two homozygotes have equal fitness, but heterozgotes have a 50 % reduced fitness? Simulate allele frequency changes with multiple values of p0 (between 0 and 1).

#Define the parameters and set up the models
fitness=c(1,0.5,1)
gen.time=100
start.freq1=0.9
start.freq2=0.5
start.freq3=0.1

results1 <- selection(time = gen.time, p0=start.freq1, w=fitness)
results2 <- selection(time = gen.time, p0=start.freq2, w=fitness, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq3, w=fitness, add=TRUE, color="blue")

What happens? Can you explain why?

In the above graph and data set it appears that the selection depends mostly on the starting frequency of AA individuals in the population. When there is a high starting frequency of these individuals it with also end with high frequency. When there is a low frequency of AA individuals the end result will most likely be almost all aa individuals. When the amount of AA individuals is mid it will stay mid through the end. These frequencies have to do with equal fitness’s for AA ans aa, so the population will remain ok with either allele in the majority.

4.6. Selection Against Heterozygotes; Asymmetrical Fitness between Homozygotes

Now conduct the same simulation, but the homozygotes have a different fitness. Again, run the simulation multiple times with different p0. You can also play with different fitness distributions to observe additional patterns.

#Define the parameters and set up the models
fitness1+c(0.9,0.5,0.4)

## [1] 1.8 1.5 0.4

fitness2=c(0.5,0.5,0.456)
fitness3=c(0.3,0.5,0.1)
gen.time=100
start.freq1=0.9
start.freq2=0.5
start.freq3=0.1
results1 <- selection(time = gen.time, p0=start.freq1, w=fitness1)
results2 <- selection(time = gen.time, p0=start.freq2, w=fitness2, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq3, w=fitness3, add=TRUE, color="blue")

What happens? Can you explain why?

It appears that the allele with the higher fitness is the one which will have the highest amount in the population. The allele with the highest amount in the beginning will determine how much of the allele will be in the population. When the allele amount is lower the end results will be a lower popluation of AA individuals.

# 6. Reflection

If you think back to your initial prediction, what have you learned from running all of these models? What are some misconceptions about selection that you were able to clarify with these simulations?

My initial prediction was somewhat correct. The higher the fitness of a population, the more likely the end result will be mostly made up of individuals with the allele that has stronger fitness. Starting frequency also plays a role in the end amount of individuals with allele percentage in the population. This means that despite a very high fitness, the starting frequency limits how high the end amount of the high fitness individuals can be.

7. Resources

7.1. Data References

This exercise does not contain original data.

7.2. Resources You Consulted

Consulting additional resources to solve this assignment is absolutely allowed, but failure to disclose those resources is plagiarism. Please list any collaborators you worked with and resources you used below or state that you have not used any.

-rpubs

Simulating the Effects of Selection and Drift