#for haploid case; using the same approach, you should be able to define p_next_fun_diploid() for the diploid case
p_next_fun = function(p, omega2){
p_next = p/(p*1 + (1-p)*omega2)
return(p_next)
}
#inital allele frequeny of A1 = 0.1, selection coefficient against A2 = 0.2
my_p = 0.1
my_s = 0.2
my_ngen = 100
v_p = c() #remember what this is doing?
for(i in 1:my_ngen){
p_next = p_next_fun(my_p, 1-my_s)
v_p[i] = p_next
my_p = p_next ######
}
plot(v_p, type = 'l', xlab = "generation", ylab = "allele frequency", ylim = c(0, 1))Week10 E115L: Genetic Drift and selection
by Grace Yuh Chwen Lee, Spring 2024
Directional selection
There are several forms of natural selection. Selection acts “agasint” (i.e., removing) an allele that makes an individual “less fit,” which is known as purifying selection. On the other hand, selection could “favor” an allele that makes an individual “more fit.” In this case, selection acts to increase the frequency of such an allele, which is known as directional selection. We are going to review some key concepts of directional selection today.
Estimating selection coefficient - haploid
Consider one locus with two alleles, A1 and A2.
| A1 | A2 | |
|---|---|---|
| number of individuals at time t | \(P_{t}\) | \(Q_{t}\) |
| allele frequencies at time t | \(p_{t} = P_{t}/(P_{t}+Q_{t})\) | \(q_{t} = Q_{t}/(P_{t}+Q_{t})\) |
| absolute fitness (with unit) | \(W1\) | \(W2\) |
| relative fitness (unitless) | \(\omega_{1} = W_{1}/W_{1} = 1\) | \(\omega_{2} = W2/W1 = 1-s\) |
| A1 | A2 | |
|---|---|---|
| number of individuals at time t | 5 | 95 |
| allele frequencies at time t | ?? | ?? |
| absolute fitness | 100 | 50 |
| relative fitness (unitless) | 1 | ?? |
Estimating selection coefficient - diploid
| A1A1 | A1A2 | A2A2 | |
|---|---|---|---|
| genotype frequency | \(p^2\) | \(2pq\) | \(q^2\) |
| absolute fitness | \(W_{11}\) | \(W_{12}\) | \(W_{22}\) |
| relative fitness | \(\omega_{11}=1\) | \(\omega_{12} = 1-hs\) | \(\omega_{22} = 1-s\) |
| A11 | A12 | A22 | |
|---|---|---|---|
| number of individuals | 25 | 950 | 9025 |
| genotype frequency | ? | ? | ? |
| survival component | 100% | 50% | 80% |
| fertility component | 80 | 70 | 80 |
| absolute fitness | ? | ? | ? |
| relative fitness | ? | ? | ? |
Changes in allele frequencies due to selection and drift
Many evolutionary forces can change allele frequencies:
Directional selection
Genetic drift
Migration
Linked selection (genetic hitchhiking, genetic draft etc…)
In Week 5, we learned about how population genetic theories could predict the changes in frequency of a selected allele over time. (Refer to Week 5 slide if you forgot what the following equations are doing).
\(p_{t+1} = \frac{\omega_{1}}{\bar{\omega}}p_{t}\) (haploid)
\(p_{t+1} = \frac{\omega_{11}p_{t}^2+\omega_{12}p_{t}q_{t}}{\bar{\omega}}\) (diploid)
Together, we defined the p_next_fun(p, omega2) for haploid and p_next_fun_diploid(p, h, s) for diploid. Using these functions, we could predict the changes in allele frequencies for the next ngen generations. Here is an example.
Like what we did with genetic drift, we could define a new function selection_fun(p, s, ngen).
selection_haploid = function(p, s, ngen){
v_p = c()
for(i in 1:ngen){
p_next = p_next_fun(p, s)
v_p[i] = p_next
p = p_next
}
return(v_p)
}
my_p = 0.1
my_s = 0.2
my_ngen = 100
colors = c("red", "pink", "orange", "yellow", "darkgreen", "green", "blue", "skyblue", "violet", "gray50")
v1 = selection_haploid(my_p, 1-my_s, my_ngen)
plot(v1, type = 'l', xlab = "generation", ylab = "allele frequency", ylim = c(0, 1), col = colors[1])The major difference between the selection vs drift simulation is one is deterministic while the other is stochastic. These two terms are general to various simulations and you may encounter them in context other than population genetics.
Relative importance of directional selection and genetic drift
In real population, both forces lead to changes in allele frequencies. But which one dominates depend on the strength of the forces (or “scale” in population genetics). A rule of thumb is to compare s vs 1/N. When s > 1/N, selection dominates and the changes in allele frequency is more or less deterministic. On the other hand, when s < 1/N, drift dominates and changes in allele frequency is stochastic.
Let’s try using simulations to explore these predictions. We will use functions we defined previously.
Here, I am simulating haploid case. These can be easily extended to diploids by using the corresponding diploid functions.
#define the drift function to make sure it is for haploid
p_next_drift_fun = function(p, N){
n_next = rbinom(1, size = N, prob = p)
p_next = n_next/N
return(p_next)
}
#set up our parameters
N = 1000
p = 0.5
omega2 = 0.9
ngen = 1000
v_p = c()
for (i in 1:ngen){
#here, we let selection happens first. In real population genetic simulations, the order sometimes matter.
#NOTE: did we use N here?
p_next_after_selection = p_next_fun(p, omega2)
#drift happens after
#NOTE: did we use N here?
p_next_after_selection_after_drift = p_next_drift_fun(p_next_after_selection, N)
v_p[i] = p_next_after_selection_after_drift
p = p_next_after_selection_after_drift ######## important ######
}
plot(v_p, type = 'l', ylim = c(0, 1), xlab = "generation", ylab = "allele frequency")Let’s try defining a new function.
sel_drift_fun = function(p, omega2, N, ngen){
v_p = c()
for (i in 1:ngen){
p_next_after_selection = p_next_fun(p, omega2)
p_next_after_selection_after_drift = p_next_drift_fun(p_next_after_selection, N)
v_p[i] = p_next_after_selection_after_drift
p = p_next_after_selection_after_drift
}
return(v_p)
}Let’s use this new function to explore the relative importance of selection and drift. In this case, s >> 1/N.
p = 0.5
ngen = 100
N = 100
s = 0.1
omega2 = 1-s ## rembmer this?
v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "blue", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency")Let’s try weaker selection, where s ~ 1/N.
p = 0.5
ngen = 100
N = 100
s = 0.01
omega2 = 1-s ## rembmer this?
v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "red", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency")Let’s try running the simulations multiple times (like what we did for drift alone) and explore the trajectories.
For the case where s >> 1/N, do you think selection or drift dominates?
p = 0.5
ngen = 100
nrun = 10
N = 100
s = 0.1
omega2 = 1-s
v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "blue", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency", main = "N = 100, s = 0.1")
for(i in 1:nrun-1){
v_freq = sel_drift_fun(p, omega2, N, ngen)
lines(v_freq, type = 'l', col = "blue")
}For the case where s ~ 1/N, do you think selection or drift dominates?
p = 0.5
ngen = 100
nrun = 10
N = 100
s = 0.01
omega2 = 1-s
v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "red", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency", main = "N = 100, s = 0.01")
for(i in 1:nrun-1){
v_freq = sel_drift_fun(p, omega2, N, ngen)
lines(v_freq, type = 'l', col = "red")
}It is the relative scale of s and N that matters. Dirft would dominate for both N = 100, s = 0.01 and N = 1000, s = 0.001.
par(mfrow = c(1, 2)) #having two figures in one row
p = 0.5
ngen = 1000
nrun = 10
N = 100
s = 0.01
omega2 = 1-s
v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "red", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency", main = "N = 100, s = 0.01")
for(i in 1:nrun-1){
v_freq = sel_drift_fun(p, omega2, N, ngen)
lines(v_freq, type = 'l', col = "red")
}
N = 1000
s = 0.001
omega2 = 1-s
v_freq = sel_drift_fun(p, omega2, N, ngen)
plot(v_freq, type = 'l', col = "red", ylim = c(0, 1), xlab = "generation", ylab = "allele frequency", main = "N = 1000, s = 0.001")
for(i in 1:nrun-1){
v_freq = sel_drift_fun(p, omega2, N, ngen)
lines(v_freq, type = 'l', col = "red")
}You may notice that how quickly the allele frequency changes differ between N = 100 and 1000, despite drift dominates both simulations. The time scale for drift depends on the inverse of population size: the expected time to fixation/lost of an allele due to drift is ~ 1/N. Accordingly, allele frequencies change with faster rate when N is smaller.
What I expect in the final plant report
Estimation of fitness in the two enviornment
- clear definition of fitness
Estimation of selection coefficient in the two environment
Discussion of predicted changes in allele frequency over time in the two environment
Assume that BB is homozygous for A1 allele, and EP is homozygous for A2 allele
Assume h = 0.5 and h = -0.5
Assume initial allele frequency = 0.5 for A1 (i.e., p = 0.5 for A2).
Discussion of whether you think these two alleles will be maintained or one of the two will be selected out in nature.