Sweep Probs for RAD-Seq

Parameters

##Sample
n=100 #sample size
tag_density=c(10,20,50,100,200,500)*10^-6 #sequence tags per bp. 50,000 tags in a 1Gb genome corresponds to 50 tags per Mb or a tag density of 50*10^-6

##Recombination
c=1 #cm/Mb
r=(c/100)/1E6 #recombination per bp

##Diversity
mu=1E-8 #mutation rate per bp
theta=1.5 #150bp tags and assuming nucleotide diversity of 0.01 per bp
Ne=theta/(4*mu*150) #ne

## Selection
s=5*10^((-20:-50)/10) #selection coefficient
alpha=2*Ne*s #2Nes

Function to calculate the reduction in heterozygosity at genetic distance r from sweep (modified from Eq. 3 Kim Stephan 2000)

h<-function(r,alpha,Ne) # originally function of r1, r2 but we hve no r1 (dist between favorable and deleterious)
{ 
  s=alpha/(2*Ne)
  2*r/s*alpha^(-2*r/s)*gamma_inc(-2*r/s,1/alpha)
}

Given a vector of distances, this function finds the distance in bp from a sweep over which diverty (# of SNPs S) is less than X standard deviations below genome wide mean.

d=1:1000*150 #bp distance vector
get_length<-function(d,r,alpha,Ne,n,S,sd_S,x){
  r2=d*r #genetic distance from sweep
  divS=theta*h(r2,alpha,Ne)*sum(1/(1:(n-1))) #S at distance from sweep assuming sweep finished yesterday (tau==0), modified from Kim and Stephan 2000
  return(min(which(divS>S-x*sd_S))) #distance for 1.65 sd, roughly 1-tailed 5%
}

Diversity. Expected number of segregating sites per tag and variance.

S=theta*sum(1/(1:(n-1))) #From watterson's estimator
var_S=theta*sum(1/(1:(n-1)))+theta^2*sum((1/(1:(n-1)))^2) #modified from Wakeley equation 4.8
sd_S=var_S^0.5 #standard dev.

Set up array of variables to iterate over

dat1=expand.grid(s,tag_density); colnames(dat1)=c("s","tag_density") #array over both
dat1<-mutate(dat1,alpha=2*Ne*s)

Do some calculations over array of selection strengths and tag densities.

sweeps=100 #assume 100 sweeps
X=1.65 #how many standard deviations below expected we will consider "significant"
one_side=sapply(dat1$alpha,function(i) get_length(d,r,i,Ne,n,S,sd_S,X)) #distance from sweep that has S below X standard deviations from expectation

Probability of finding one sweep. Assume Poisson probability of ≥1 tag within the region of low diversity.

p2sd=1-exp(-1*dat1$tag_density*2*d[one_side]) #prob of distance to first tag being <= distance 2sd

Expected number of sweeps (out of 100)

pp=p2sd*sweeps #prob times sample size = expectation of binomial
dat2=data.frame(dat1$tag_density,dat1$s,pp,dat1$alpha) #combine with param values 
dat2=mutate(dat2,alpha=dat1.alpha,tag_density=dat1.tag_density,s=dat1.s)

Graph it!

levelplot(data=dat2,
          pp~tag_density*alpha,
          at=c(0:100),
          col.regions=viridis,
          scales=list(
            x=list(labels=c(10,20,50,100,200,500),
                   at=c(10,20,50,100,200,500)*10^-6,
                   cex=1.25,log=TRUE),
            y=list(log=TRUE,cex=1.25,labels=5*2*10^c(-2:-5)*Ne,
                   at=5*2*10^c(-2:-5)*Ne)
          ),
          ylab=list(label=expression(paste(alpha,"=2",N[e],"s")),cex=1.25),
          cex.lab=1.25, cex.axis=1.25, cex.main=2,
          xlab=list(label="tags per Mb",cex=1.25),
          main=list(label="Expected number of 100 sweeps successfully tagged by RAD", cex=1.25)
)