Nathan Stewart 811847789

fitness1=c(1,1,0.9)
fitness2=c(1,1,0.5)
fitness3=c(1,1,0.1)
gen.time=1000
start.freq=0.5

results1 <- selection(time = gen.time, p0=start.freq, w=fitness1)
results2 <- selection(time = gen.time, p0=start.freq, w=fitness2, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq, w=fitness3, add=TRUE, color="blue")

This is problem 1. Black is weakest selection, blue is the strongest. It just impacts the speed at which A becomes fixed.Impacts the speed of evolution. always AA, Aa, aa

1. Initiate the Project

#This command loads required packages
#Note that you may need to install the learnPopGen package first!
library(learnPopGen)

Important tip: In this and the next exercise, we will focus entirely on running different evolutionary models. The plots for these models are generated automatically, and you do not need to use ggplot to produce the outputs.

2. Before You Get Started…

In the introduction video, I asked you to pause and predict what happens to the allele frequency of an allele A that provides a fitness advantage to its carriers. What is your prediction?

I would predict that an allele that provides a fitness advantage would eventually become completely dominant.

3. Modeling Allele Frequencies Across Generations: Introducing Key Parameters

Applying the HW principle is a powerful approach to test whether any evolutionary forces are impacting a population at any given locus. It does not tell us, however, what evolutionary force might be acting on that locus. In the following sections, we get to know different evolutionary forces and discover how they impact allele frequencies across generations.

First let’s consider how natural selection affects allele frequencies. Natural selection essentially manifests itself in different genotypes having a different fitness. Mathematically, we asign a fitness of 1 to one genotype (AA), and the fitness of other genotypes can than be defined relative to AA with a the parameter s (the strength of selection). For example, if the genotype aa has a 10% higher fitness than AA (s=0.1), its fitness would be 1+s = 1.1. If genotype Aa has a 15% lower fitness than AA (s=-0.15), its fitness would be 1+s = 0.85.

It takes just a few key parameters for us to model how selection affects allele frequency over time: (1) The strength of selection, s. (2) The mode of inheritance (dominant/recessive vs. additive). If an allele at a particular locus is dominant over another, s for the heterozygote is the same as for the dominant homozygote. Inheritance, however, may not be dominant/recessive, in which case you need two values of s to describe how selection might impact evolution.

4. Modeling the Effect of Selection on Allele Frequencies using the Selection Command

We can model the genotype frequencies in response to selection with the selection function. All we need to do is to tell the function the fitness (w) of the three genotypes (AA, Aa, aa). If the mode of inheritance is dominant/recessive, fitness is defined as:

fitness <- c(1, 1, 1+s)

If the mode of inheritance is not dominant/recessive, fitness is defined as:

fitness <- c(1, 1+s1, 1+s2)

Other parameters that are required is p0 (the starting allele frequency of p) and time (the number of generations you want to run the simulation for). We define these as:

start.freq <- number (between 0 and 1)

gen.time <- number (between 1 and infinity; be cautious with high numbers, as they use significant computation time)

Setting these parameters, we can model the effects of selection with the “selection” command as follows:

result <-selection(w=fitness, time=gen.time, p0=start.freq)

Let’s put this to the test. Let’s model the effect of selection on a gene with a dominant/recessive mode of inheritance, with the recessive allele being deleterious (i.e., its fitness is zero). Set the starting allele frequency at 0.5 and run the simulation for 50 generations.

#Define the fitness of each genotype
fitness <- c(1, 1, 0)

#Define the starting allele frequency p0
start.freq <- 0.5

#Define how many generations you want to simulate
gen.time <- 50

#Model the allele changes and store the results in an object called r
r <- selection(w=fitness, time=gen.time, p0=start.freq)

Note that running the selection command automatically plots the results of the model. You can also directly look at the numbers as a table using the following code.

#You can also look at the actual number if you want to
r.table <- data.frame(1:gen.time, r$p)
names(r.table) <- c("Generation", "p")

#Show last part of the table in the results
tail(r.table)

I hope what you can observe is that the frequency of A (p) increases over time. The speed of change slows as A gets more common, and it eventually flattens.

4.1. Varying the Strength of Selection

In the coming set of simulations, let’s explore how varying the strength of selection impacts how allele frequencies change across generations. We focus on a single gene with complete dominance of the A allele. In these first scenarios the dominant phenotype has a selective advantage over the recessive one, but the A allele is starting at a very low frequency in the population (p0=0.0001). In the first example, the strength of selection against the recessive homozygotes is s=-0.1. Run the simulation and record what happens to the allele frequencies of A and a over 1000 generations. What do you observe? Explain the pattern.

#Define the fitness of each genotype
fitness1 <- c(1, 1, (-0.1))

#Define the starting allele frequency p0
start.freq1 <- 0.0001 

#Define how many generations you want to simulate
gen.time1 <- 1000

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)

i observed that the frequency of A appears to shoot up very quickly, and level out what appears to be within 300 generations. There is a sharp increase until about 100 generations. This means that the “a” allele is selected at a rate of -.1 and already started at a very low frequency, this menas that “A” allele is selected at a higher rate.

To truly understand how the strength of selection impacts allele frequencies, we need to run multiple models assuming different fitnesses for our genotypes. Using the same parameters as above, run multiple models with the selection coefficient varying between -1 and 0. Rather than running each model separately, you can actually run them in bulk, and the following code snippet teaches you how.

#Define the fitness of each genotype. Unlike above, you want to vary the strength of selection, so we will define multiple fitness sets (one for each model you want to run)
fitness1 <- c(1,1,1)
fitness2 <- c(1,1,0.7)
fitness3 <- c(1,1,0.5)
fitness4 <- c(1,1,0.3)
fitness5 <- c(1,1,0)

#Define the starting allele frequency p0. This parameter will will be the same for all models, so you only need to define it once.
start.freq1 <- 0.0001

#Define how many generations you want to simulate. This parameter will will be the same for all models, so you only need to define it once.
gen.time1 <- 1000

#Model the allele changes for the different strengths of selection. You need to run one model for each fitness distribution you defined above. Note that you can plot all results in a single graph by adding "add=TRUE" starting at the second model. 
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)
r2 <- selection(w=fitness2, time=gen.time1, p0=start.freq1, add=TRUE, color = 'red')
r3 <- selection(w=fitness3, time=gen.time1, p0=start.freq1, add=TRUE, color = 'blue')
r4 <- selection(w=fitness4, time=gen.time1, p0=start.freq1, add=TRUE, color = 'green')
r5 <- selection(w=fitness5, time=gen.time1, p0=start.freq1, add=TRUE, color = 'purple')

Looking at the results of your models, how does varying the strength of selection affect changes in allele frequencies across generations?

With decreasing strength of selection, the time it takes to arrive at 1. For example, when there is a decreased strength of selection, it takes less time to arrive at higher frequencies of AA.

4.2. Selection on Dominant vs. Recessive phenotypes

Let’s examine what happens when the phenotype with higher fitness is dominant vs. recessive. First, set the starting frequency of the A allele to p0=0.01, and give the dominant phenotype a fitness advantage over the recessive phenotype (s against the recessive homozygotes is -0.2). Run the simulation for 1000 generations.

#Define the fitness of each genotype
fitness1 <- c(1,1,0.8)

#Define the starting allele frequency p0
start.freq1 <-0.01

#Define how many generations you want to simulate
gen.time1 <- 1000

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)


#In this case, it may be useful to look at the actual numbers for p at the end of your simulation
r1.table <- data.frame(1:gen.time1, r1$p)
names(r1.table) <- c("Generation", "p")
tail(r1.table)

Now let’s examine the opposite scenario. Set the starting allele frequency p0=0.99 but this time give the recessive phenotype the fitness advantage over the dominant phenotype (same s as above). Run the simulation for 1000 generations.

#Define the fitness of each genotype
fitness2 <-c(0.8,1,1)

#Define the starting allele frequency p0
start.freq2 <- 0.99

#Define how many generations you want to simulate
gen.time2 <- 1000

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r2 <- selection(w=fitness2, time=gen.time2, p0=start.freq2)


#In this case, it may be useful to look at the actual numbers for p at the end of your simulation
r2.table <- data.frame(1:gen.time2, r2$p)
names(r2.table) <- c("Generation", "p")
tail(r2.table)

Compare and contrast the results of the two simulations? What do you notice about the pace of change and the final allele frequencies? Why does the recessive allele persist in a population when selected against, but the dominant allele does not?

The results look similar at first, except towards opposite alleles, but if you look at smaller periods of time, the recessive allele is present for longer. This is likely due to the different starting frequencies.

4.3. Selection For Heterozygotes; Symmetrical Fitness between Homozygotes

So far we have only considered scenarios where fitness was distributed in a dominant/recessive way. Now we are going to investigate the consequences of selection for heterozygous individuals (they have a fitness advantage over homozygous individuals). Give the heterozygous genotypes a 1% fitness advantage over both of the homozygous genotypes. Run multiple simulations for 1000 generations and vary the starting allele frequency for values between 0 and 1.

#Define the parameters and set up the models
fitness1=c(1,1.01,1)
fitness2=c(1,1.01,1)
fitness3=c(1,1.01,1)
gen.time=1000
start.freq1=0.9
start.freq2=0.5
start.freq3=0.1
results1 <- selection(time = gen.time, p0=start.freq1, w=fitness1)
results2 <- selection(time = gen.time, p0=start.freq2, w=fitness2, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq3, w=fitness3, add=TRUE, color="blue")

What happens to the allele frequencies in the population? Can you explain why?

Since there is a fitness advantage for heterozygosity, the populations will trend towards whatever allele type has the higher fitness. In this case, it was the heterozygous individuals. The starting frequency determines the how dramatically the alleles will change. The graph shows that when the starting AA aleles are high, there will be a drop towards Aa, and when a higher frequency of aa (lower frequency of AA) there will be a rise in frequency of A until it reaches mostly Aa.

4.4. Selection For Heterozygotes; Asymmetrical Fitness between Homozygotes

What happens when heterozygous individuals have a fitness advantage, but selection is stronger against one of the homozygous genotypes? Set the starting allele frequency to p0=0.1, and give the aa genotype a fitness of 0.0. Run the simulation with various fitness coefficients for the AA genotype (between 0 and 1), while keeping the fitness of the heterozygotes at 1.

#Define the parameters and set up the models
fitness1=c(0.9,1,0.0)
fitness2=c(0.5,1,0.0)
fitness3=c(0.1,1,0.0)
gen.time=100
start.freq=0.1

results1 <- selection(time = gen.time, p0=start.freq, w=fitness1)
results2 <- selection(time = gen.time, p0=start.freq, w=fitness2, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq, w=fitness3, add=TRUE, color="blue")

How does changing the fitness of AA individuals influence the final allele frequencies in the population over time? Will the deleterious allele ever be lost from the population? What are the lowest and highest frequencies the a allele can reach? Why?

The fitness of AA individuals affects the end concentration of AA in the population. When there is a low fitness, the amount of AA individuals was much lower than a high fitness for AA individuals. The aa will not be removed from the population, because there will always be some surviving Aa individuals. The highest amount would be around 90% removed, while the lowest would be 50% removal due to the heterozygous individuals.

4.5. Selection Against Heterozygotes; Symmetrical Fitness between Homozygotes

Selection may also act against heterzygotes. How do allele frequencies change if the two homozygotes have equal fitness, but heterozgotes have a 50% reduced fitness? Simulate allele frequency changes with multiple values of p0 (between 0 and 1).

#Define the parameters and set up the models
fitness=c(1,0.5,1)
gen.time=100
start.freq1=0.9
start.freq2=0.5
start.freq3=0.1

results1 <- selection(time = gen.time, p0=start.freq1, w=fitness)
results2 <- selection(time = gen.time, p0=start.freq2, w=fitness, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq3, w=fitness, add=TRUE, color="blue")

What happens? Can you explain why?

It appears that selection depends almost entirely only the starting frequency of AA in the population. When there is a high starting frequency, it will end with a high, if there is a low AA frequency, the end will be mostly all aa. When there is a medium amount of AA it will stay at that medium amount of AA. I’m not entirely sure why this would occur, but I think it has to do with the equal fitnesses for AA and aa, so the population will be fine with either allele.

4.6. Selection Against Heterozygotes; Asymmetrical Fitness between Homozygotes

Now conduct the same simulation, but the homozygotes have a different fitness. Again, run the simulation multiple times with different p0. You can also play with different fitness distributions to observe additional patterns.

#Define the parameters and set up the models
fitness1=c(0.9,0.5,0.4)
fitness2=c(0.5,0.5,0.456)
fitness3=c(0.3,0.5,0.1)
gen.time=100
start.freq1=0.9
start.freq2=0.5
start.freq3=0.1
results1 <- selection(time = gen.time, p0=start.freq1, w=fitness1)
results2 <- selection(time = gen.time, p0=start.freq2, w=fitness2, add=TRUE, color="red")
results3 <- selection(time = gen.time, p0=start.freq3, w=fitness3, add=TRUE, color="blue")

What happens? Can you explain why?

It looks like whatever allele has the higher fitness will end up with a higher amount in the population. The highest starting amount of the allele in the population seems to determine how much of the allele will be in the population, with a lower amount to start results in a lower end population of AA individuals.

5. Reflection

If you think back to your initial prediction, what have you learned from running all of these models?

I would say that I was partially correct. The higher the fitness, the more likely the end result with will be mostly the allele with the stronger fitness, but the starting frequency also plays a role in the end amount of allele percentage in the population. The means that despite a really high fitness, the starting frequency limits just how high the end amount of the high fitness allele can be.

6. Resources

6.1. Data References

This exercise does not contain original data.

6.2. Resources You Consulted

Consulting additional resources to solve this assignment is absolutely allowed, but failure to disclose those resources is plagiarism. Please list any collaborators you worked with and resources you used below or state that you have not used any.

I received help from the discussion post for the assignment.

