Only one for now
library(lattice)
Download data from Dolores’s growouts. 360ppm and 215ppm branch lengths (3 years data lumped together).
system("wget -O b360.txt https://gist.github.com/rossibarra/5964105/raw/71adcff6492367aa7608ca561df3c4a6f4c2716f/bl360")
bl360 <- scan("b360.txt")
system("wget -O b215.txt https://gist.github.com/rossibarra/5964126/raw/980f82bc928e45529c12ccb0dc74075e74a6eea4/bl215")
bl215 <- scan("b215.txt")
Now we estimate pehnotypic variance \(\sigma^2P \approx 0.03\) and mean \(\mu \approx 0.42\) for branch length at 215ppm. Also look at data.
Vp = var(bl215)
Vp
## [1] 0.03434
bmu = mean(bl215)
bmu
## [1] 0.3707
hist(bl215, xlab = "branch length 215ppm", main = "")
And repeatfor 360ppm, checking for normality here by comparing to curve.
Vp = var(bl360)
Vp
## [1] 0.0298
bmu = mean(bl360)
bmu
## [1] 0.4165
hist(bl360, xlab = "branch length 360ppm", main = "",freq=F)
curve(dnorm(x,mean=0.4165313,sd=0.1726368),from=0,to=1,add=T)
This gets mean of shortest branch length selected plants of sample size x
get_s <- function(x,data) {
mean(data) - mean(data[1:x])
}
This gives response to selection
resp <- function(h2, s) {
h2 * s
}
Following p. 210 in Falconer & Mackay to estimate sampling variance in response to selection. \(\sigma^2_R=\frac{\sigma^2_Pth^2}{Ne}+\frac{1}{M}\)
Vr <- function(Ne, h2, Vp, t, M) {
Vp * t * h2/Ne + 1/M
}
We will fix some number of plants in a chamber (num_plants below). We have no idea what \(h^2\) or \(s\) are, so try lots. But \(s=\frac{s_f+s_m}{2}\) and because we phenotype after flowering, there is no selection on males (\(s_m=0\)) and \(s=\frac{s_f}{2}\). To get \(s\) we decide how many shortest branch plants we choose. We’ll try \(n=1-20\) shortest plants from our population of plants and estimate \(s\) as if we were sampling from a normal. Note that this \(n\) coresponds roughly to \(N_e\) so we’ll call it that!
num_plants = 48 # Ne in our theoretical experimental pop
Ne = c(1:20) # number of adults kept for next generation.
h2 = 1:10/10 # 5-100% h^2 in 5%
Get our simulated data by averaging over resamples from observed (ugly, yes, but it’ll do)
#sorted_bl360 = bl360[order(bl360)] # sort observed data <- now using simulated data instead of observed
#really ugly way to get a "normal" set of plants of size n=24 drawn from the data (or from the normal, now commented out)
#for(i in 1:1000){ bob[[i]]=rnorm(24,mean=0.4165313,sd=0.1726368)}
bob=list()
for(i in 1:1000){ bob[[i]]=sample(bl360,num_plants,replace=TRUE)}
bob=lapply(1:1000,function(x) bob[[x]][order(bob[[x]])])
plants=(rep(0,num_plants));
for(i in 1:1000){ plants=plants+bob[[i]] }
plants=plants/1000
Now we can calculate our selection strength \(s\)
s = sapply(Ne, function(x) get_s(x,plants))/2 # divide by two because of s=sf/2 above
\(R=h^2s\) is easy enough. This shows that the fewer plants you select (smaller \(N_e\)) the bigger your response to selection is – smaller branch lengths in next generation. This makes sense – we expect a bigger shift if we choose progeny only from the 2 plants with the shortest branch lengths than if we choose from the bottom 50% say.
R = sapply(h2, function(x) resp(h2 = x, s = s))
rownames(R) = Ne
colnames(R) = h2
wireframe(R, scales = list(arrows = F), xlab = expression(N[e]), ylab = expression(h^2))
Assume \(t=1\) generations. Total plants measured \(M\) is our variable num_plants. Have to use range of \(h^2\) as above, and now substitute in \(N_e\) from above too. This gives us \(\sigma^2_R\).
est_vr = sapply(h2, function(x) Vr(h2 = x, Ne = Ne, t = 1, M=num_plants, Vp=var(plants)))
rownames(est_vr) = Ne
colnames(est_vr) = h2
wireframe(est_vr, scales = list(arrows = F), screen = list(z = 230, x = -70,
y = 0), zlab = expression(sigma[R]^2), ylab = expression(h^2), xlab = expression(N[e]))
This graphs shows that as we select harder (smaller \(N_e\)) our variance goes up because of genetic drift.
With \(h^2\) and \(s\) (and our bad estimate of \(N_e\) we can get \(R\) and \(\sigma^2_R\) which let’s us calculate a confidence interval.
We find the lower bound of the confidence interval, which we will call \(R_{min}\), assuming the confidence interval is \(R \pm 2∗\sigma_R\). We want \(R_{min}>0\) in order to be able to have confidence that our selection program will yield nonzero results.
sd_R = est_vr^0.5
Rmin = R - 2 * sd_R
rownames(Rmin) = Ne
colnames(Rmin) = h2
wireframe(Rmin, zlab = list("Rmin", rot = 90), scales = list(arrows = F), xlab = expression(N[e]),
ylab = expression(h^2))
This graph shows that for even our confidence interval on the response to selection after one generation includes 0. Which means we can’t rule out genetic drift (let alone show statistical difference to a different selection experiment under different conditions). We get a peak in the middle as selecting harder gives a stronger response, but also leads to more drift, so there’s an otpimum level.
Here we let the experiment run for \(t=5\) generations. We have to make some assumptions. We don’t know how strong selection (difference between final and initial mean) we can really do. Let’s assume \(3x\) original difference, but \(N_e\) stays same. This is equivalent of saying we get 3 times as much response over 5 generations as we see in the original 1 generation.
R = sapply(h2, function(x) resp(h2 = x, s = 3 * s)) #re-estimate response with 3X s
est_vr = sapply(h2, function(x) Vr(h2 = x, Ne = Ne, M=num_plants, Vp=var(plants), t = 5)) #var. response now over 5 gens
rownames(est_vr) = Ne
colnames(est_vr) = h2
sd_R = est_vr^0.5
Rmin = R - 2 * sd_R
rownames(Rmin) = Ne
colnames(Rmin) = h2
wireframe(Rmin, zlab = list("Rmin_5gens", rot = 90), scales = list(arrows = F),
xlab = expression(N[e]), ylab = expression(h^2))
Not so great outlook. \(R_{min}<0\) unless \(h^2\) is really high. This means that unless we have confidence that \(h^2\) is high (which it almost never is for single-plant traits), there is no gaurantee we will see any response to selection.
Finally, if we claim we can make 5 times the single generation progress in 5 generations, we see:
R = sapply(h2, function(x) resp(h2 = x, s = 5 * s)) #re-estimate response with 3X s
est_vr = sapply(h2, function(x) Vr(h2 = x, Ne = Ne, M=num_plants, Vp=var(plants), t = 5)) #var. response now over 5 gens
rownames(est_vr) = Ne
colnames(est_vr) = h2
sd_R = est_vr^0.5
Rmin = R - 2 * sd_R
rownames(Rmin) = Ne
colnames(Rmin) = h2
wireframe(Rmin, zlab = list("Rmin_5gens", rot = 90), scales = list(arrows = F),
xlab = expression(N[e]), ylab = expression(h^2))
This looks much more positive, but makes some very generous assumptions about how well we can select while maintaining \(N_e\) and still only works for \(h^2>0.4\).