Set up two gamma distributions of fitness effects (DFE). The first is all TEs, the second is those that affect expression (“eQTL”). The eQTL DFE has a higher mean \(2Ns\) (\(N\) is effective population size, \(s\) the selection coeffecient). The mean of each is shown with the vertical dashed line.

M=30
g1_shape=.5; g1_rate=.1; # mean = shape/rate
g2_shape=.25; g2_rate=.01;
curve(dgamma(x,shape=g1_shape,rate=g1_rate),lwd=2,from=0,to=M,xaxt='n',xlab="2Ns",yaxt='n',ylab="density"); 
abline(v=g1_shape/g1_rate,lty=2,col="black",lwd=2)
axis(side=1,at=seq(0,M,2))
legend("top",lwd=c(2,2),col=c("black","blue"),legend=c("ALL TEs","qQTL TEs"),bty='n')
curve(dgamma(x,shape=g2_shape,rate=g2_rate),add=T,col="blue",lwd=2,from=0,to=M,xaxt='n',xlab="s"); 
abline(v=g2_shape/g2_rate,lty=2,col="blue",lwd=2)

Now we call “weakly deleterious” anything with \(2<2Ns<20\) (hatched areas on plot).

curve(dgamma(x,shape=g1_shape,rate=g1_rate),lwd=2,from=0,to=M,xaxt='n',xlab="2Ns",yaxt='n',ylab="density"); 
axis(side=1,at=seq(0,M,2))
curve(dgamma(x,shape=g2_shape,rate=g2_rate),add=T,col="blue",lwd=2,from=0,to=M,xaxt='n',xlab="s"); 
legend("top",lwd=c(2,2),col=c("black","blue"),legend=c("ALL TEs","qQTL TEs"),bty='n')

sue=200:2000/100
polygon(c(sue,rev(sue)),c(dgamma(sue,shape=g1_shape,rate=g1_rate),rep(0,length(sue))),col=rgb(0,0,0,0.25),density=30,border=0)
polygon(c(sue,rev(sue)),c(dgamma(sue,shape=g2_shape,rate=g2_rate),rep(0,length(sue))),col=rgb(0,0,1,0.25),angle=-45,density=30,border=0)

So let’s first ask what % of each “observed” DFE is neutral (\(2Ns<2\)). As expected, fewer of the eQTL TEs are neutral.

all_neut<-pgamma(2,shape=g1_shape,rate=g1_rate)
eqtl_neut<-pgamma(2,shape=g2_shape,rate=g2_rate)
barplot(c(all_neut,eqtl_neut),col=c("black","blue"),ylab="% total that are neutral")
axis(side=1,at=c(0.75,2),labels=c("All TEs","eQTL TEs"))

We don’t see the strongly selected stuff (\(2Ns>20\)) because it is removed from the population or at very low frequency. We only see the neutral and weak. If we condition only on the stuff we see, then ask what % are neutral, we get a different answer!

all_cond<-pgamma(20,shape=g1_shape,rate=g1_rate)
eqtl_cond<-pgamma(20,shape=g2_shape,rate=g2_rate)
barplot(c(all_neut/all_cond,eqtl_neut/eqtl_cond),col=c("black","blue"),ylab="% observed that are neutral")
axis(side=1,at=c(0.75,2),labels=c("All TEs","eQTL TEs"))