Week10 E115L: Genetic Drift and selection

by Grace Yuh Chwen Lee, Spring 2024

Directional selection

There are several forms of natural selection. Selection acts “agasint” (i.e., removing) an allele that makes an individual “less fit,” which is known as purifying selection. On the other hand, selection could “favor” an allele that makes an individual “more fit.” In this case, selection acts to increase the frequency of such an allele, which is known as directional selection. We are going to review some key concepts of directional selection today.

Estimating selection coefficient - haploid

Consider one locus with two alleles, A1 and A2.

	A1	A2
number of individuals at time t	\(P_{t}\)	\(Q_{t}\)
allele frequencies at time t	\(p_{t} = P_{t}/(P_{t}+Q_{t})\)	\(q_{t} = Q_{t}/(P_{t}+Q_{t})\)
absolute fitness (with unit)	\(W1\)	\(W2\)
relative fitness (unitless)	\(\omega_{1} = W_{1}/W_{1} = 1\)	\(\omega_{2} = W2/W1 = 1-s\)

What are the possible units of absolute fitness?

Hint: think about what parameters you have estimated for lab report 1 and 2.

	A1	A2
number of individuals at time t	5	95
allele frequencies at time t	??	??
absolute fitness	100	50
relative fitness (unitless)	1	??

For the above table, what is the intital allele frquency of A1?

What is the selection coefficient (s)?

Do you think A1 or A2 allele is “selected for”?

Will A1 or A2 increase in allele frequency?

Estimating selection coefficient - diploid

	A1A1	A1A2	A2A2
genotype frequency	\(p^2\)	\(2pq\)	\(q^2\)
absolute fitness	\(W_{11}\)	\(W_{12}\)	\(W_{22}\)
relative fitness	\(\omega_{11}=1\)	\(\omega_{12} = 1-hs\)	\(\omega_{22} = 1-s\)

What are the two components of fitness?

What are the possible units of absolute fitness?

Hint: think about what you are counting for your plant experiment.

	A11	A12	A22
number of individuals	25	950	9025
genotype frequency	?	?	?
survival component	100%	50%	80%
fertility component	80	70	80
absolute fitness	?	?	?
relative fitness	?	?	?

For the above table, what is the initial allele frequency of A1?

What is the selection coefficient (s)?

What is the dominance coefficient (h)?

Do you think A1 or A2 allele is “selected for”?

Will A1 and A2 increase in allele frequency?

Changes in allele frequencies due to selection and drift

Many evolutionary forces can change allele frequencies:

Directional selection
Genetic drift
Migration
Linked selection (genetic hitchhiking, genetic draft etc…)

In Week 5, we learned about how population genetic theories could predict the changes in frequency of a selected allele over time. (Refer to Week 5 slide if you forgot what the following equations are doing).

\(p_{t+1} = \frac{\omega_{1}}{\bar{\omega}}p_{t}\) (haploid)

\(p_{t+1} = \frac{\omega_{11}p_{t}^2+\omega_{12}p_{t}q_{t}}{\bar{\omega}}\) (diploid)

Together, we defined the p_next_fun(p, omega2) for haploid and p_next_fun_diploid(p, h, s) for diploid. Using these functions, we could predict the changes in allele frequencies for the next ngen generations. Here is an example.

#for haploid case; using the same approach, you should be able to define p_next_fun_diploid() for the diploid case
p_next_fun = function(p, omega2){
  p_next = p/(p*1 + (1-p)*omega2) 
  return(p_next)
}

#inital allele frequeny of A1 = 0.1, selection coefficient against A2 = 0.2
my_p = 0.1
my_s = 0.2
my_ngen = 100

v_p = c() #remember what this is doing?
for(i in 1:my_ngen){
  p_next = p_next_fun(my_p, 1-my_s)
  v_p[i] = p_next
  my_p = p_next ######
}

plot(v_p, type = 'l', xlab = "generation", ylab = "allele frequency", ylim = c(0, 1))

Like what we did with genetic drift, we could define a new function selection_fun(p, s, ngen).

selection_haploid = function(p, s, ngen){
  v_p = c() 
  for(i in 1:ngen){
    p_next = p_next_fun(p, s)
    v_p[i] = p_next
    p = p_next 
  }
  return(v_p)
}

my_p = 0.1
my_s = 0.2
my_ngen = 100
colors = c("red", "pink", "orange", "yellow", "darkgreen", "green", "blue", "skyblue", "violet", "gray50")

v1 = selection_haploid(my_p, 1-my_s, my_ngen)
plot(v1, type = 'l', xlab = "generation", ylab = "allele frequency", ylim = c(0, 1), col = colors[1])

In class exercise:

Try running selection_haploid for 10 times (i.e., predicting changes in allele frequency for the next 100 generations 10 times), and make a plot including the results from these 10 runs.

Are the trajectories for allele frequencies the same or different for your 10 runs?

How about drift_fun that we used last week? Did you get the same allele frequency changes for the 50 runs you tried?

The major difference between the selection vs drift simulation is one is deterministic while the other is stochastic. These two terms are general to various simulations and you may encounter them in context other than population genetics.

Relative importance of directional selection and genetic drift

In real population, both forces lead to changes in allele frequencies. But which one dominates depend on the strength of the forces (or “scale” in population genetics). A rule of thumb is to compare s vs 1/N. When s > 1/N, selection dominates and the changes in allele frequency is more or less deterministic. On the other hand, when s < 1/N, drift dominates and changes in allele frequency is stochastic.

Let’s try using simulations to explore these predictions. We will use functions we defined previously.

Here, I am simulating haploid case. These can be easily extended to diploids by using the corresponding diploid functions.

#define the drift function to make sure it is for haploid
p_next_drift_fun = function(p, N){
  n_next = rbinom(1, size = N, prob = p)
  p_next = n_next/N
  return(p_next)
}

#set up our parameters
N = 1000
p = 0.5
omega2 = 0.9
ngen = 1000

v_p = c()
for (i in 1:ngen){
  
  #here, we let selection happens first. In real population genetic simulations, the order sometimes matter. 
  #NOTE: did we use N here?
  p_next_after_selection = p_next_fun(p, omega2)
  
  #drift happens after
  #NOTE: did we use N here?
  p_next_after_selection_after_drift = p_next_drift_fun(p_next_after_selection, N)
  
  v_p[i] = p_next_after_selection_after_drift
  
  p = p_next_after_selection_after_drift ######## important ######
}

plot(v_p, type = 'l', ylim = c(0, 1), xlab = "generation", ylab = "allele frequency")

Let’s try defining a new function.

sel_drift_fun = function(p, omega2, N, ngen){
  v_p = c()
  
  for (i in 1:ngen){
    p_next_after_selection = p_next_fun(p, omega2)
    p_next_after_selection_after_drift = p_next_drift_fun(p_next_after_selection, N)
    v_p[i] = p_next_after_selection_after_drift
    p = p_next_after_selection_after_drift 
  }
  return(v_p)
}

Let’s use this new function to explore the relative importance of selection and drift. In this case, s >> 1/N.

p = 0.5
ngen = 100

N = 100
s = 0.1
omega2 = 1-s ## rembmer this?

v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "blue", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency")

Let’s try weaker selection, where s ~ 1/N.

p = 0.5
ngen = 100

N = 100
s = 0.01
omega2 = 1-s ## rembmer this?

v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "red", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency")

Let’s try running the simulations multiple times (like what we did for drift alone) and explore the trajectories.

For the case where s >> 1/N, do you think selection or drift dominates?

p = 0.5
ngen = 100
nrun = 10

N = 100
s = 0.1
omega2 = 1-s 


v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "blue", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency", main = "N = 100, s = 0.1")

for(i in 1:nrun-1){
  v_freq = sel_drift_fun(p, omega2, N, ngen)
  lines(v_freq, type = 'l', col = "blue")
}

For the case where s ~ 1/N, do you think selection or drift dominates?

p = 0.5
ngen = 100
nrun = 10

N = 100
s = 0.01
omega2 = 1-s 


v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "red", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency", main = "N = 100, s = 0.01")

for(i in 1:nrun-1){
  v_freq = sel_drift_fun(p, omega2, N, ngen)
  lines(v_freq, type = 'l', col = "red")
}

It is the relative scale of s and N that matters. Dirft would dominate for both N = 100, s = 0.01 and N = 1000, s = 0.001.

par(mfrow = c(1, 2))  #having two figures in one row

p = 0.5
ngen = 1000
nrun = 10

N = 100
s = 0.01
omega2 = 1-s 

v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "red", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency", main = "N = 100, s = 0.01")

for(i in 1:nrun-1){
  v_freq = sel_drift_fun(p, omega2, N, ngen)
  lines(v_freq, type = 'l', col = "red")
}



N = 1000
s = 0.001
omega2 = 1-s 

v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "red", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency", main = "N = 1000, s = 0.001")

for(i in 1:nrun-1){
  v_freq = sel_drift_fun(p, omega2, N, ngen)
  lines(v_freq, type = 'l', col = "red")
}

You may notice that how quickly the allele frequency changes differ between N = 100 and 1000, despite drift dominates both simulations. The time scale for drift depends on the inverse of population size: the expected time to fixation/lost of an allele due to drift is ~ 1/N. Accordingly, allele frequencies change with faster rate when N is smaller.

What I expect in the final plant report

Estimation of fitness in the two enviornment
- clear definition of fitness
Estimation of selection coefficient in the two environment
Discussion of predicted changes in allele frequency over time in the two environment
- Assume that BB is homozygous for A1 allele, and EP is homozygous for A2 allele
- Assume h = 0.5 and h = -0.5
- Assume initial allele frequency = 0.5 for A1 (i.e., p = 0.5 for A2).
Discussion of whether you think these two alleles will be maintained or one of the two will be selected out in nature.