Author: Shaelah Holmes 24007742



1. Dependencies

#This command loads required packages
#Note that you may need to install the learnPopGen package first!
library(learnPopGen)

Important tip: In this and the next exercise, we will focus entirely on running different evolutionary models. The plots for these models are generated automatically, and you do not need to use ggplot() to produce the outputs.


2. Before You Get Started…

In lecture, I asked you to pause and predict what happens to the allele frequency of a rare allele A that provides a fitness advantage to its carriers. What was your prediction?

Your answer goes here


3. Modeling Allele Frequencies Across Generations: Introducing Key Parameters

The HW principle provides a powerful approach to test whether any evolutionary forces are impacting a population at any given locus. It does not tell us, however, what evolutionary forces might be acting on that particular locus. In the following sections, we will get to know different evolutionary forces and discover how they impact allele frequencies across generations.

First, let’s consider how natural selection affects allele frequencies. Natural selection essentially manifests itself in different genotypes having a different fitness. Mathematically, we assign a fitness of 1 to one genotype (e.g., AA), and the fitness of other genotypes can then be defined relative to AA with the parameter s (the strength of selection). For example, if the genotype aa has a 10 % higher fitness than AA (s=0.1), its fitness would be 1+s = 1.1. If genotype Aa has a 15 % lower fitness than AA (s=-0.15), its fitness would be 1+s = 0.85.

It takes just a few key parameters for us to model how selection affects allele frequencies over time: (1) The strength of selection, s. (2) The mode of inheritance (dominant/recessive vs. additive). If an allele at a particular locus is dominant over another, s for the heterozygote is the same as for the dominant homozygote. Inheritance, however, may not be dominant/recessive, in which case you need two values of s to describe how selection might impact evolution (see textbook for details).


4. Modeling the Effect of Selection on Allele Frequencies using the Selection Command

We can model the genotype frequencies in response to selection with the selection() function. All we need to do is to tell the function the fitness (w) of the three genotypes (AA, Aa, aa). If the mode of inheritance is dominant/recessive, fitness is defined as:

fitness <- c(1, 1, 1+s)

If the mode of inheritance is not dominant/recessive, fitness is defined as:

fitness <- c(1, 1+s1, 1+s2)

Other parameters that are required are p0 (the starting allele frequency of p) and time (the number of generations you want to run the simulation for). We define these as:

start.freq <- number (between 0 and 1)

gen.time <- number (between 1 and infinity; be cautious with high numbers, as they can use significant computation time)

Setting these parameters, we can model the effects of selection with the selection() function as follows:

result <-selection(w=fitness, time=gen.time, p0=start.freq)

Let’s put this to the test. Let’s model the effect of selection on a gene with a dominant/recessive mode of inheritance, with the recessive allele being deleterious (i.e., its fitness is zero). Set the starting allele frequency at 0.5 and run the simulation for 50 generations.

#Define the fitness of each genotype
fitness <- c(1, 1, 0)

#Define the starting allele frequency p0
start.freq <- 0.5

#Define how many generations you want to simulate
gen.time <- 12

#Model the allele changes and store the results in an object called r
r <- selection(w=fitness, time=gen.time, p0=start.freq)

#By default, the selection() function plots p, but you can also plot q by adding an extra argument
r <- selection(w=fitness, time=gen.time, p0=start.freq, show = "q")

Note that running the selection() function automatically plots the results of the model. You can also take a look at the numbers in a table using the following code:

#You can also look at the actual number if you want to
r.table <- data.frame(1:gen.time, r$p)
names(r.table) <- c("Generation", "p")

#Show last part of the table in the results
tail(r.table)

I hope what you can observe is that the frequency of A (p) increases over time. The speed of change slows as A gets more common, and it eventually flattens.


4.1. Varying the Strength of Selection

In the coming set of simulations, let’s explore how varying the strength of selection (s) impacts how allele frequencies change across generations. We will focus on a single gene with complete dominance of the A allele. In these first scenarios, the dominant phenotype has a selective advantage over the recessive one, but the A allele is starting at a very low frequency in the population (p0=0.0001). In the first example, the strength of selection against the recessive homozygotes is s=-0.1. Run the simulation and record what happens to the allele frequencies of A and a over 1000 generations.

#Define the fitness of each genotype
fitness1 <- c(?, ?, ?)

#Define the starting allele frequency p0
start.freq1 <- ?

#Define how many generations you want to simulate
gen.time1 <- ?

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)

To truly understand how the strength of selection impacts allele frequencies, we need to run multiple models with different values of fitness for our genotypes. Using the same parameters as above, run multiple models with the selection coefficient against aa varying between -1 and 0. Rather than visualizing each model separately, you can actually plot them in together using the add=TRUE argument.

#Define the fitness of each genotype. Unlike above, you want to vary the strength of selection, so we will define multiple fitness sets (one for each model you want to run)
fitness1 <- c(?)
fitness2 <- c(?)
fitness3 <- c(?)
fitness4 <- c(?)
fitness5 <- c(?)

#Define the starting allele frequency p0. This parameter will will be the same for all models, so you only need to define it once.
start.freq1 <- ?

#Define how many generations you want to simulate. This parameter will will be the same for all models, so you only need to define it once.
gen.time1 <- ?

#Model the allele changes for the different strengths of selection. You need to run one model for each fitness distribution you defined above. Note that you can plot all results in a single graph by adding "add=TRUE" starting at the second model. 
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)
r2 <- selection(w=fitness2, time=gen.time1, p0=start.freq1, add=TRUE, color = 'red')
r3 <- selection(w=fitness3, time=gen.time1, p0=start.freq1, add=TRUE, color = 'blue')
r4 <- selection(w=fitness4, time=gen.time1, p0=start.freq1, add=TRUE, color = 'green')
r5 <- selection(w=fitness5, time=gen.time1, p0=start.freq1, add=TRUE, color = 'purple')

Looking at the results of your models, how does varying the strength of selection affect changes in allele frequencies across generations?

Your answer goes here.


4.2. Selection on Dominant vs. Recessive phenotypes

Let’s examine what happens when the phenotype with higher fitness is dominant vs. recessive. First, set the starting frequency of the A allele to p0=0.01, and give the dominant phenotype a fitness advantage over the recessive phenotype (s against the recessive homozygotes is -0.2). Run the simulation for 1000 generations.

#Define the fitness of each genotype
fitness1 <- ?

#Define the starting allele frequency p0
start.freq1 <- ?

#Define how many generations you want to simulate
gen.time1 <- ?

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)

#In this case, it may be useful to look at the actual numbers for p at the end of your simulation
r1.table <- data.frame(1:gen.time1, r1$p)
names(r1.table) <- c("Generation", "p")
tail(r1.table)

Now let’s examine the opposite scenario. Set the starting allele frequency p0=0.99 but this time give the recessive phenotype the fitness advantage over the dominant phenotype (same s as above). Run the simulation for 1000 generations.

#Define the fitness of each genotype
fitness2 <- ?

#Define the starting allele frequency p0
start.freq2 <- ?

#Define how many generations you want to simulate
gen.time2 <- ?

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r2 <- selection(w=fitness2, time=gen.time2, p0=start.freq2)

#In this case, it may be useful to look at the actual numbers for p at the end of your simulation
r2.table <- data.frame(1:gen.time2, r2$p)
names(r2.table) <- c("Generation", "p")
tail(r2.table)

Compare and contrast the results of the two simulations? What do you notice about the pace of change and the final allele frequencies? Why does the recessive allele persist in a population when selected against, but the dominant allele does not?

Your answer goes here.


4.3. Selection For Heterozygotes; Symmetrical Fitness between Homozygotes

So far we have only considered scenarios where fitness was distributed in a dominant/recessive way. Now we are going to investigate the consequences of selection for heterozygous individuals (i.e., heterozygotes having a fitness advantage over homozygous individuals). Give the heterozygous genotypes a 1 % fitness advantage over both of the homozygous genotypes. Run multiple simulations for 1000 generations and vary the starting allele frequency for values between 0 and 1.

#Define the parameters and set up the models

?

What happens to the allele frequencies in the population? Can you explain why?

Your answer goes here.


4.4. Selection For Heterozygotes; Asymmetrical Fitness between Homozygotes

What happens if heterozygous individuals have a fitness advantage, but selection is stronger against one of the homozygous genotypes? Set the starting allele frequency to p0=0.1, and give the aa genotype a fitness of 0.0. Run the simulation with various fitness values for the AA genotype (between 0 and 1), while keeping the fitness of the heterozygotes at 1.

#Define the parameters and set up the models

?

How does changing the fitness of AA individuals influence the final allele frequencies in the population over time? Will the deleterious allele ever be lost from the population? What are the lowest and highest frequencies the a allele can reach? Why?

Your answer goes here.


4.5. Selection Against Heterozygotes; Symmetrical Fitness between Homozygotes

Selection may also act against heterzygotes. How do allele frequencies change if the two homozygotes have equal fitness, but heterozgotes have a 50 % reduced fitness? Simulate allele frequency changes with multiple values of p0 (between 0 and 1).

#Define the parameters and set up the models

?

What happens? Can you explain why?

Your answer goes here.


4.6. Selection Against Heterozygotes; Asymmetrical Fitness between Homozygotes

Now conduct the same simulation, but the homozygotes have a different fitness. Again, run the simulation multiple times with different p0. You can also play with different fitness distributions to observe additional patterns.

#Define the parameters and set up the models

?

What happens? Can you explain why?

Your answer goes here.


5. Changing Selection: Negative Frequency-Dependent Selection

One big assumption we made in the simulations above is that the fitness of genotypes does not change across generations. Of course, the selection coefficient s is not always constant and may vary for a large number of reasons, including changing environmental conditions. A case of changing selection without environmental fluctuation is called negative frequency-dependent selection, where the fitness of a genotype is dependent on its frequency, and common genotypes have a lower fitness. You can simulate negative frequency-dependent selection using the freqdep() function. As before, you will need to specify the starting allele frequency (p0), s (which determines the strength of selection against heterozygotes when they are common), and the number of generations you want to run the simulation for. Run multiple simulations varying p0 and s (I recommend starting with s=1).

r1 <-freqdep(p0=0.6, s=1, time=100)
?

What do you observe? Why does neither of the alleles go to fixation? Can you think of an example of negative frequency-dependent selection in nature?

Your answer goes here.


6. Reflection

If you think back to your initial prediction, what have you learned from running all of these models? What are some misconceptions about selection that you were able to clarify with these simulations?

Your answer goes here.


7. Resources


7.1. Data References

This exercise does not contain original data, but kudos to Liam J. Revell who developed the learnPopGen package.


7.2. Resources You Consulted

Consulting additional resources to solve this assignment is absolutely allowed, but failure to disclose those resources is plagiarism. Please list any collaborators you worked with and resources you used below or state that you have not used any.

Your answer goes here.

---
title: "Simulating the Effects of Selection and Drift"
output:
  html_notebook:
    fig_caption: yes
    toc: yes
    toc_depth: 3
    toc_float: yes
---

## Author: Shaelah Holmes 24007742

------------------------------------------------------------------------

------------------------------------------------------------------------

# 1. Dependencies

```{r message=FALSE}
#This command loads required packages
#Note that you may need to install the learnPopGen package first!
library(learnPopGen)
```

Important tip: In this and the next exercise, we will focus entirely on running different evolutionary models. The plots for these models are generated automatically, and you do not need to use `ggplot()` to produce the outputs.

------------------------------------------------------------------------

# 2. Before You Get Started...

In lecture, I asked you to pause and predict what happens to the allele frequency of a rare allele *A* that provides a fitness advantage to its carriers. What was your prediction?

*Your answer goes here*

------------------------------------------------------------------------

# 3. Modeling Allele Frequencies Across Generations: Introducing Key Parameters

The HW principle provides a powerful approach to test whether any evolutionary forces are impacting a population at any given locus. It does not tell us, however, what evolutionary forces might be acting on that particular locus. In the following sections, we will get to know different evolutionary forces and discover how they impact allele frequencies across generations.

First, let's consider how natural selection affects allele frequencies. Natural selection essentially manifests itself in different genotypes having a different fitness. Mathematically, we assign a fitness of 1 to one genotype (e.g., *AA*), and the fitness of other genotypes can then be defined relative to *AA* with the parameter *s* (the strength of selection). For example, if the genotype *aa* has a 10 % higher fitness than *AA* (*s*=0.1), its fitness would be 1+*s* = 1.1. If genotype *Aa* has a 15 % lower fitness than *AA* (*s*=-0.15), its fitness would be 1+*s* = 0.85.

It takes just a few key parameters for us to model how selection affects allele frequencies over time: (1) The strength of selection, *s*. (2) The mode of inheritance (dominant/recessive vs. additive). If an allele at a particular locus is dominant over another, *s* for the heterozygote is the same as for the dominant homozygote. Inheritance, however, may not be dominant/recessive, in which case you need two values of *s* to describe how selection might impact evolution (see [textbook](https://www.k-state.edu/biology/p2e/evolutionary-mechanisms-i-modeling-selection.html#relative-fitness) for details).

![](dominant_additive.png)

------------------------------------------------------------------------

# 4. Modeling the Effect of Selection on Allele Frequencies using the Selection Command

We can model the genotype frequencies in response to selection with the `selection()` function. All we need to do is to tell the function the **f**itness (*w*) of the three genotypes (*AA*, *Aa*, *aa*). If the mode of inheritance is dominant/recessive, fitness is defined as:

`fitness <- c(1, 1, 1+s)`

If the mode of inheritance is not dominant/recessive, fitness is defined as:

`fitness <- c(1, 1+s1, 1+s2)`

Other parameters that are required are `p0` (the starting allele frequency of *p*) and **`time`** (the number of generations you want to run the simulation for). We define these as:

`start.freq <-` number (between 0 and 1)

`gen.time <-` number (between 1 and infinity; be cautious with high numbers, as they can use significant computation time)

Setting these parameters, we can model the effects of selection with the `selection()` function as follows:

`result <-selection(w=fitness, time=gen.time, p0=start.freq)`

Let's put this to the test. Let's model the effect of selection on a gene with a dominant/recessive mode of inheritance, with the recessive allele being deleterious (i.e., its fitness is zero). Set the starting allele frequency at 0.5 and run the simulation for 50 generations.

```{r}
#Define the fitness of each genotype
fitness <- c(1, 1, 0)

#Define the starting allele frequency p0
start.freq <- 0.5

#Define how many generations you want to simulate
gen.time <- 12

#Model the allele changes and store the results in an object called r
r <- selection(w=fitness, time=gen.time, p0=start.freq)

#By default, the selection() function plots p, but you can also plot q by adding an extra argument
r <- selection(w=fitness, time=gen.time, p0=start.freq, show = "q")
```

Note that running the `selection()` function automatically plots the results of the model. You can also take a look at the numbers in a table using the following code:

```{r}
#You can also look at the actual number if you want to
r.table <- data.frame(1:gen.time, r$p)
names(r.table) <- c("Generation", "p")

#Show last part of the table in the results
tail(r.table)
```

I hope what you can observe is that the frequency of *A* (*p*) increases over time. The speed of change slows as *A* gets more common, and it eventually flattens.

------------------------------------------------------------------------

## 4.1. Varying the Strength of Selection

In the coming set of simulations, let's explore how varying the strength of selection (*s*) impacts how allele frequencies change across generations. We will focus on a single gene with complete dominance of the *A* allele. In these first scenarios, the dominant phenotype has a selective advantage over the recessive one, but the *A* allele is starting at a very low frequency in the population (`p0=0.0001`). In the first example, the strength of selection against the recessive homozygotes is s=-0.1. Run the simulation and record what happens to the allele frequencies of *A* and a over 1000 generations.

```{r}
#Define the fitness of each genotype
fitness1 <- c(?, ?, ?)

#Define the starting allele frequency p0
start.freq1 <- ?

#Define how many generations you want to simulate
gen.time1 <- ?

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)
```

To truly understand how the strength of selection impacts allele frequencies, we need to run multiple models with different values of fitness for our genotypes. Using the same parameters as above, run multiple models with the selection coefficient against *aa* varying between -1 and 0. Rather than visualizing each model separately, you can actually plot them in together using the `add=TRUE` argument.

```{r}
#Define the fitness of each genotype. Unlike above, you want to vary the strength of selection, so we will define multiple fitness sets (one for each model you want to run)
fitness1 <- c(?)
fitness2 <- c(?)
fitness3 <- c(?)
fitness4 <- c(?)
fitness5 <- c(?)

#Define the starting allele frequency p0. This parameter will will be the same for all models, so you only need to define it once.
start.freq1 <- ?

#Define how many generations you want to simulate. This parameter will will be the same for all models, so you only need to define it once.
gen.time1 <- ?

#Model the allele changes for the different strengths of selection. You need to run one model for each fitness distribution you defined above. Note that you can plot all results in a single graph by adding "add=TRUE" starting at the second model. 
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)
r2 <- selection(w=fitness2, time=gen.time1, p0=start.freq1, add=TRUE, color = 'red')
r3 <- selection(w=fitness3, time=gen.time1, p0=start.freq1, add=TRUE, color = 'blue')
r4 <- selection(w=fitness4, time=gen.time1, p0=start.freq1, add=TRUE, color = 'green')
r5 <- selection(w=fitness5, time=gen.time1, p0=start.freq1, add=TRUE, color = 'purple')
```

Looking at the results of your models, how does varying the strength of selection affect changes in allele frequencies across generations?

*Your answer goes here.*

------------------------------------------------------------------------

## 4.2. Selection on Dominant vs. Recessive phenotypes

Let's examine what happens when the phenotype with higher fitness is dominant vs. recessive. First, set the starting frequency of the *A* allele to `p0=0.01`, and give the dominant phenotype a fitness advantage over the recessive phenotype (*s* against the recessive homozygotes is -0.2). Run the simulation for 1000 generations.

```{r}
#Define the fitness of each genotype
fitness1 <- ?

#Define the starting allele frequency p0
start.freq1 <- ?

#Define how many generations you want to simulate
gen.time1 <- ?

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r1 <- selection(w=fitness1, time=gen.time1, p0=start.freq1)

#In this case, it may be useful to look at the actual numbers for p at the end of your simulation
r1.table <- data.frame(1:gen.time1, r1$p)
names(r1.table) <- c("Generation", "p")
tail(r1.table)
```

Now let's examine the opposite scenario. Set the starting allele frequency `p0=0.99` but this time give the recessive phenotype the fitness advantage over the dominant phenotype (same `s` as above). Run the simulation for 1000 generations.

```{r}
#Define the fitness of each genotype
fitness2 <- ?

#Define the starting allele frequency p0
start.freq2 <- ?

#Define how many generations you want to simulate
gen.time2 <- ?

#Model the allele changes and store the results in an object r1; note that the parameter t just signifies how many generations will be simulated
r2 <- selection(w=fitness2, time=gen.time2, p0=start.freq2)

#In this case, it may be useful to look at the actual numbers for p at the end of your simulation
r2.table <- data.frame(1:gen.time2, r2$p)
names(r2.table) <- c("Generation", "p")
tail(r2.table)
```

Compare and contrast the results of the two simulations? What do you notice about the pace of change and the final allele frequencies? Why does the recessive allele persist in a population when selected against, but the dominant allele does not?

*Your answer goes here.*

------------------------------------------------------------------------

## 4.3. Selection For Heterozygotes; Symmetrical Fitness between Homozygotes

So far we have only considered scenarios where fitness was distributed in a dominant/recessive way. Now we are going to investigate the consequences of selection for heterozygous individuals (i.e., heterozygotes having a fitness advantage over homozygous individuals). Give the heterozygous genotypes a 1 % fitness advantage over both of the homozygous genotypes. Run multiple simulations for 1000 generations and vary the starting allele frequency for values between 0 and 1.

```{r}
#Define the parameters and set up the models

?
```

What happens to the allele frequencies in the population? Can you explain why?

*Your answer goes here.*

------------------------------------------------------------------------

## 4.4. Selection For Heterozygotes; Asymmetrical Fitness between Homozygotes

What happens if heterozygous individuals have a fitness advantage, but selection is stronger against one of the homozygous genotypes? Set the starting allele frequency to `p0=0.1`, and give the *aa* genotype a fitness of 0.0. Run the simulation with various fitness values for the *AA* genotype (between 0 and 1), while keeping the fitness of the heterozygotes at 1.

```{r}
#Define the parameters and set up the models

?
```

How does changing the fitness of *AA* individuals influence the final allele frequencies in the population over time? Will the deleterious allele ever be lost from the population? What are the lowest and highest frequencies the a allele can reach? Why?

*Your answer goes here.*

------------------------------------------------------------------------

## 4.5. Selection Against Heterozygotes; Symmetrical Fitness between Homozygotes

Selection may also act against heterzygotes. How do allele frequencies change if the two homozygotes have equal fitness, but heterozgotes have a 50 % reduced fitness? Simulate allele frequency changes with multiple values of `p0` (between 0 and 1).

```{r}
#Define the parameters and set up the models

?
```

What happens? Can you explain why?

*Your answer goes here.*

------------------------------------------------------------------------

## 4.6. Selection Against Heterozygotes; Asymmetrical Fitness between Homozygotes

Now conduct the same simulation, but the homozygotes have a different fitness. Again, run the simulation multiple times with different `p0`. You can also play with different fitness distributions to observe additional patterns.

```{r}
#Define the parameters and set up the models

?
```

What happens? Can you explain why?

*Your answer goes here.*

------------------------------------------------------------------------

# 5. Changing Selection: Negative Frequency-Dependent Selection

One big assumption we made in the simulations above is that the fitness of genotypes does not change across generations. Of course, the selection coefficient *s* is not always constant and may vary for a large number of reasons, including changing environmental conditions. A case of changing selection without environmental fluctuation is called [negative frequency-dependent selection](https://www.k-state.edu/biology/p2e/evolutionary-mechanisms-i-modeling-selection.html#frequency-dependent-selection), where the fitness of a genotype is dependent on its frequency, and common genotypes have a lower fitness. You can simulate negative frequency-dependent selection using the `freqdep()` function. As before, you will need to specify the starting allele frequency (`p0`), `s` (which determines the strength of selection against heterozygotes when they are common), and the number of generations you want to run the simulation for. Run multiple simulations varying `p0` and `s` (I recommend starting with *s*=1).

```{r}
r1 <-freqdep(p0=0.6, s=1, time=100)
?
```

What do you observe? Why does neither of the alleles go to fixation? Can you think of an example of negative frequency-dependent selection in nature?

*Your answer goes here.*

------------------------------------------------------------------------

# 6. Reflection

If you think back to your initial prediction, what have you learned from running all of these models? What are some misconceptions about selection that you were able to clarify with these simulations?

*Your answer goes here.*

------------------------------------------------------------------------

# 7. Resources

------------------------------------------------------------------------

## 7.1. Data References

This exercise does not contain original data, but kudos to [Liam J. Revell](https://faculty.umb.edu/liam.revell/) who developed the learnPopGen package.

------------------------------------------------------------------------

## 7.2. Resources You Consulted

Consulting additional resources to solve this assignment is absolutely allowed, but failure to disclose those resources is plagiarism. Please list any collaborators you worked with and resources you used below or state that you have not used any.

*Your answer goes here.*
