Deriving, parameterizing, and evaluating analytical models is one of the fundamental skills scientists use for describing the world. Model derivation means constructing a mathematical expression that allows prediction based on knowing something about the world. Parameterization refers to estimating the value of key variables that enable prediction. And evaluation usually means investigating properties of the model, such as how the predictions change across a range of parameters and conditions.

In this lab we will derive, parameterize, and evaluate a simple single-locus, two-allele, selection model. Details about the model and parameterization of the model are available in the handout. The model will be parameterized by recording the outcomes of a struggle-for-existence for three genotypes because there are three possible genotypes when there are two alleles. The three parameters are the relative fitness values of the three genotypes labeled w11, w12, and w22.

We will use the model to describe three properties of the world:

  1. Average fitness as a function of the frequency of the three genotypes in a population;

  2. The rate of evolution as a function of the frequency of the three genotypes in a population;

  3. The change in allele frequency over time from an arbitrary starting condition.

PARAMETERIZATION

You will parameterize three separate models. The parameters of each model (the values for w11, w12, and w22) will be estimated by having each student in a group of three compete for resources using a phenotype based on the genotype. Because each phenotype may be influenced by other traits of individuals, each student will estimate their fitness for each of the three genotypes. Thus, there will be three parameters estimated for each of three models.

#define the parameter variables and assign values to each parameter. Each
  #parameter will be labeled with a lower case w, the student's number (1, 2 or 
  #3), the two allele genotype (kk, ks, ss), and the model number (1, 2 or 3).
  #Enter the data as a ratio of the volume of beans collected with the maximum
  #number of beans collected by one of the three individuals in the denominator
  #and the number of beans collected by the three students in the numerator
  #k = knife and s = spoon. 

#the parameters for the first model
w1kk1 <-50/75 
w2ks1 <- 75/75
w3ss1 <- 55/75

#the parameters for the second model
w1kk2 <- 56/84
w2ks2 <- 40/84
w3ss2 <- 84/84

#the parameters for the third model
w1kk3 <- 50/74
w2ks3 <- 55/74
w3ss3 <- 74/74

EVALUATION OF AVERAGE FITNESS

Calculate and visualize average fitness The equation for average fitness is the sum of the frequency of each genotype multiplied by its relative fitness: W = p^2w11 + 2pqw12 + q^2w22

Your goal is to use your parameters for each model to construct a graph showing the dependence of average fitness of the frequency of one of the alleles for a range of allele frequencies ranging from 0 to 1 in increments of 0.01. We provide the annotations and you need to provide the R code for all annotations that ask you to use code you have already used.

#make a vector of values of p (allele frequency) that range from 0 to 1 in 
#increments of 0.01. Hint: you will use the seq(from, to, by) function
freq_p <- seq(0, 1, 0.01)

#evaluate average fitness for all three models. Use W_model1, W_model2 and 
  #W_model3 as the average fitness variable for the three models...Importantly,
  #you will need to replace the w11, w12, and w22 terms with your 
  #parameterization variables. 

#Here is the equation for the first model. 
W_model1 <- freq_p^2*w1kk1 + 2*freq_p*(1-freq_p)*w2ks1 + (1 - freq_p)^2*w3ss1

#############################################
# Create the equation for the second model. # 
# Make sure you check to make sure you are  #
#       using the correct fitness values    # 
#############################################

W_model2 <- freq_p^2*w1kk1 + 2*freq_p*(1-freq_p)*w2ks2 + (1 - freq_p)^2*w3ss2

#############################################
# Create the equation for the third model.  # 
# Make sure you check to make sure you are  #
#       using the correct fitness values    # 
#############################################


W_model3 <- freq_p^2*w1kk3 + 2*freq_p*(1-freq_p)*w2ks3 + (1 - freq_p)^2*w3ss3


#plot the dependence of W on p for all three models using the lines() 
  #function. Importantly, you need to set up your plot using 
  #plot() with NA as the x and y variables
plot(NA, NA, xlim=c(0,1), ylim=c(0,1), xlab="Allele Frequency (p)", ylab="Relative Fitness(W)",main="Relative Fitness of Alleles Based on Frequency")

#plot the lines for each model using a differnt color and 
  #use a line width of 3 (lwd=3)
  #here is the line from the first model
lines(freq_p, W_model1, col="blue", lwd=3)

#########################################
# create the line from the second model #
#########################################

lines(freq_p, W_model2, col="purple", lwd=3)

########################################
# create the line from the third model #
########################################

lines(freq_p, W_model3, col="green", lwd=3)


################
# add a legend #
################

#legend will be for model 1, model 2 and model 3, 
  #and you will have lines with the three colors. 
  #lty = line type and 1 = a solid line

legend("bottomright", legend=c("Model 1", "Model 2", "Model 3"),lty=rep(1,3), lwd=3, col=c("blue", "purple", "green"))
Figure 1: In this figure we see the average fitness of alleles in each model. The fitness is based off the fitness of the most fit genotype in each model. In model1 the most fit genotype was ks(knife spoon) and in models 2 and 3 the most successful genotype was ss(spoon spoon). In this graph we see how the addition of the knife allele into the population changes the population's relative frequency.

Figure 1: In this figure we see the average fitness of alleles in each model. The fitness is based off the fitness of the most fit genotype in each model. In model1 the most fit genotype was ks(knife spoon) and in models 2 and 3 the most successful genotype was ss(spoon spoon). In this graph we see how the addition of the knife allele into the population changes the population’s relative frequency.

Briefly describe this graph, and make an evidence based claim about the information being presented.

In this graph we see how the allele of knives present in the population affects the relative fitness of the population.For models 2 and 3, when no knives are present we have a fitness value of 1.0 because the most fit genotype is spoon spoon. Since the knife spoon genotype was most successful in model 1, the relative fitness of the population was only 0.7 when no knives were present. In model 1 we see a peak of fitness, jsut under 1 at about 50% allele frequency of knives. In th other two models we see a general trend in fitness to decrease as the presence of knife alleles increases.

Based off the data from all three models, spoons are the most effective alleles in the population and the homozygotic genotype will be the most successful on average. A knife spoon combination is the next most successful. The knife knife genotype is least successful. Since spoon spoon and knife spoon are succesful the presence of all three alleles will continue forward. In some populations we may see a directional shift away from the knife alleles but we are unlikely to see the complete disappearance of the knife allele.

EVALUATION OF RATE OF EVOLUTION

Calculate and visualize the rate of evolution as a function of allele frequency. Be sure to include a legend and figure caption.

#We provide the equation for the first model. 
Rate1 <- (freq_p^2*w1kk1 + freq_p*(1-freq_p)*w2ks1)/W_model1 - freq_p

#################################################
#   Create the equation for the second model.  #
# Make sure you use the correct fitness values. #
#################################################

Rate2 <- (freq_p^2*w1kk2 + freq_p*(1-freq_p)*w2ks2)/W_model2 - freq_p

#################################################
#   Create the equation for the third model.  #
# Make sure you use the correct fitness values. #
#################################################

Rate3 <- (freq_p^2*w1kk3 + freq_p*(1-freq_p)*w2ks3)/W_model3 - freq_p

#set up reasonable y axis range for plotting
yrange <- range(c(Rate1, Rate2, Rate3))*1.1

#make a plot but do not include any data: in other words, set up a space to 
  #plot the data where the x axis ranges from 0 to 1 and the yaxis range is the 
  #values in yrange from above
plot(NA, NA, xlim = c(0,1), ylim = yrange, xlab="Allele Frequency (p)", ylab="Change in p",main="Rate of Evolution on Allele Frequency")

#add lines of allele the rate of evolution's dependence of allele frequency
  #for each model (add changes to color and line widths) 
  #here is the code for the line showing the rate for the first model
lines(freq_p, Rate1, col="blue", lwd=2)

######################################################################
# create the code for the line showing the rate for the second model #
######################################################################

lines(freq_p, Rate2, col="purple", lwd=3)

#####################################################################
# create the code for the line showing the rate for the third model #
#####################################################################

lines(freq_p, Rate3, col="green", lwd=4)


#Make a line at y = 0 showing where the rate of evolution is zero.
abline(h=0,lty=3)


#Add a legend.

legend("bottomright", c("Rate1", "Rate2", "Rate3"), pch = rep(16,2), col = c("blue", "purple", "green"))
Figure 2: In this figure we see the rate of evolution of allele frequencies. In the graph we see the change in p as the frequency of knife alleles is increased in the population.We see that the rate of evolution in a population changes based on the frequency of the knife allele in the population. We see that the rate of evolution in model one begins with an increase, peaks at 0.02 until the knife allele population is about 20%. In models 2 and 3 we see that the initial entrance and addition of knife alleles into the population leads to a decrease in the rate of evolution. In model 2 we see that the rate of evolution drops very drastically to -0.09 at about 30% frequency before begining to increase. In model 3 rate of evolution drops to its lowest at about 0.4 with a rate of -0.06.In all 3 models when the allele frequency reaches 1.0 or 100% the rate of evolution is zero.

Figure 2: In this figure we see the rate of evolution of allele frequencies. In the graph we see the change in p as the frequency of knife alleles is increased in the population.We see that the rate of evolution in a population changes based on the frequency of the knife allele in the population. We see that the rate of evolution in model one begins with an increase, peaks at 0.02 until the knife allele population is about 20%. In models 2 and 3 we see that the initial entrance and addition of knife alleles into the population leads to a decrease in the rate of evolution. In model 2 we see that the rate of evolution drops very drastically to -0.09 at about 30% frequency before begining to increase. In model 3 rate of evolution drops to its lowest at about 0.4 with a rate of -0.06.In all 3 models when the allele frequency reaches 1.0 or 100% the rate of evolution is zero.

Briefly describe this graph, and make an evidence based claim about the information being presented.

In this graph we see that in model 1 as the knife allele is initially introduced the rate of evolution increases but once the percent of knife alleles in the population reaches 20% the rate of evolution begins to decrease. It decreases until the knife allele reaches about 80% frequency in the population and then it increases until the rate of change is zero at 100% presense. In model 2 and 3 as the knife allele is introduced the rate of evolution decreases. In model 2 the presence of the knife allele leads to a dramamtic decrease in the rate of evolution until it reaches about 30% frequency in the population. The rate of change increases until it reaches about 90% frequency in the population at which point it decreases until reaching zero at 100% frequency. In model 3 the rate of evolution reaches its lowest at about 40% frequency and then slowly increases until the population reaches 100% frequency and zero rate of change. In model 3 the rate of change never increases to a point above zero.

Based off this data we can predict that the on average the presence of the knife allele negativly impacts evolution at most frequencies. This suggests that the presence of the knife allele will be kept at lower frequencies in order to maximize the population’s opportunity for evolutionary success.

Last, we will evaluate the trajectory of allele frequencies over 100 generations for the three models. Be sure to include a legend and figure caption.

#First, set up four vectors: one with time going from 1 to 100, and three others 
  #filled with NAs that we will replace using a simulation of the models.
Time <- seq(1,100)
p1 <- rep(NA,100)
p2 <- rep(NA,100)
p3 <- rep(NA,100)


#We need to start the simulation at some arbitrary value of p...we'll start 
  #with p = 0.1. After you run the simulation once, you may want to 
  #experiment with starting at different values of p (any value between 0 and 1).
  #For example, you could change each of the 0.1s below to all be 0.5, and 
  #run the simulation again. You may see different patterns in 
  #allele frequency over time.

p1[1] <- 0.2
p2[1] <- 0.2
p3[1] <- 0.2


#Now we will run the simulation. We'll calculate the p value in the next 
  #generation for 99 generations starting with the p value in the first generation. 
  #To do so, we need to calculate average fitness and then calculate p' ("p prime") 
  #using the same equations we used above. This section of code is all one function, 
  #starting with 'for' and ending with the last closing curly bracket (}).
  #You can run it line by line from the top, or highlight the whole section
  #from the forward bracket { to the reverese bracket ] and execute it.

for (i in 1:99){
    W1 <- p1[i]^2* w1kk1 + 2*p1[i]*(1-p1[i])* w2ks1 + (1-p1[i])^2* w3ss1
    W2 <- p2[i]^2* w1kk2 + 2*p2[i]*(1-p2[i])* w2ks2 + (1-p2[i])^2* w3ss2
    W3 <- p3[i]^2* w1kk3 + 2*p3[i]*(1-p3[i])* w2ks3 + (1-p3[i])^2* w3ss3
    p1[i+1] <- (p1[i]^2* w1kk1 + p1[i]*(1-p1[i])* w2ks1)/W1
    p2[i+1] <- (p2[i]^2* w1kk2 + p2[i]*(1-p2[i])* w2ks2)/W2
    p3[i+1] <- (p3[i]^2* w1kk3 + p3[i]*(1-p3[i])* w2ks3)/W3
}

#Plot the results of the simulation for the three models.
plot(NA, NA, xlab="Generations", ylab="Allele Frequency", main="Allele Frequency Over 100 Generations",
     xlim=c(1,100),ylim=c(0,1))

#add line for p1
lines(Time, p1,col="blue", lwd=2)

###################################
#     add lines for each model    #
# (change colors and line widths) #
lines(Time,p2,col="purple", lwd=3)

lines(Time,p3,col="green", lwd=4)
###################################


################
# Add a legend #

legend("topright", c("p1", "p2", "p3"), pch = rep(16,2), col = c("blue", "purple", "green"))
Figure 3: In this figure we see how the frequency of the knife allele changes over 100 generations when the starting frequency is 0.2 or 20%. We see that in models 2 and 3 the knife allele frequency drops and levels out at zero in under 20 generations. In model 1 we see the knife frequency increases to about 0.4 or 40% and plateaus by roughtly 20 generations.

Figure 3: In this figure we see how the frequency of the knife allele changes over 100 generations when the starting frequency is 0.2 or 20%. We see that in models 2 and 3 the knife allele frequency drops and levels out at zero in under 20 generations. In model 1 we see the knife frequency increases to about 0.4 or 40% and plateaus by roughtly 20 generations.

################

Briefly describe this graph, and make an evidence based claim about the information being presented.

In this graph we see the knife allele frequency changes over 100 generations when the population begins with a 20% frequency of the knife allele. In model one, after about 20 generation the frequency has increased from 0.2 to 0.4 and then plateua. A 40% frequency is then maintained in the population. In model 2 the frequency rapidly decreases from 0.2 to 0.0 in about 5 generations. In model 3 the frequency decreases from 0.2 to 0.0 in about 15 generations. In both these models the knife allele is completely removed from the population in under 20 generations.

Based off the date the knife allele is not very beneficial to the populations in models 2 and 3 and will rather quickly be selected against in those populations (directional selection). However, in model 1 population the presence of the knife allele is important to the population and is most beneficial to the population when it is present in about 40% of the population. On average the knife allele is likely to be weeded out of a population as it has a negative population. The knife allele will be not be selected for and will therefore be extinct from most populations rather quickly (especially if the populations begin with a relatively low frequency of knife alleles).

Questions

Q1: What two factors determine how fast allele frequency changes per generation?

original frequency at generation one and the rate of evolution at that frequency

Q2: What does the graph of average fitness as function of the frequency of p look like when wAA < wAa > waa ? Briefly describe what the graph would look like (hint, try editing the first code block).

This graph would show stabilizing selection, in other words the heterozygote genotype is the most fit. This would mean that the graph of average fitness would be an upside-down u showing lower fitness when p is not present or only present and highest frequency around the mid-point where the presence of p and q are equal..

Q3: For the fitness model in Q2, do you predict that the most-fit allele will increase and eventually go to fixation (so that its frequency is 1)? Provide justification for your answer.

Since the most-fit allele is the heterozygote we would expect neither the p allele or the q allele to increase and reach fixation at 100% presence. Both the p and q allele would continue to be present in the population because they are both needed in order to form the most fit genotype.