set up loci, pops, neutral SFS
loci=1E7
sample_size=40
ancestral_sfs<-rep(1:999/1000,times=c(rmultinom(n=1,size=loci,prob=c(1/1:999))))
ignore selection in landraces, just sample neutrally. here 6 generations of drift from some ancestral population, just to get differences among populations
pop_ancestral=lapply(1:5, function(pop) ancestral_sfs)
for(i in 1:6){ pop_ancestral[[1]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[1]])/(sample_size)}
for(i in 1:6){ pop_ancestral[[2]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[2]])/(sample_size)}
for(i in 1:6){ pop_ancestral[[3]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[3]])/(sample_size)}
for(i in 1:6){ pop_ancestral[[4]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[4]])/(sample_size)}
for(i in 1:6){ pop_ancestral[[5]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[5]])/(sample_size)}
keep only loci polymorphic in at least one population
is_poly=Reduce(`+`,pop_ancestral)
filtered_pop_ancestral=lapply(1:5, function(pop) pop_ancestral[[pop]][is_poly>0 & is_poly<5])
filtered_loci=sum(is_poly>0 & is_poly<5)
sanity check – how similar are the populations?
cor(filtered_pop_ancestral[[1]],filtered_pop_ancestral[[2]])
## [1] 0.7933197
make DH pop by two rounds of binomial sampling from ancestral neutral freqs.
dh_freq=lapply(1:5, function(pop) rbinom(size=sample_size,n=loci,prob=rbinom(size=sample_size,n=loci,prob=filtered_pop_ancestral[[pop]])/sample_size))
find outliers, shared loci, ancestral frequency, and mean frequency of outliers (see outliers.r). set a binomial p-value cutoff to identify outliers, then make plots.
cutoff=0.05
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")
cutoff=0.01
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")
cutoff=0.001
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")
cutoff=0.0001
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")
We start seeing blank columns because at this level there simply aren’t any shared outliers among many populations.
Now apply selection. assume 5% loci selected, \(s\) comes from some gamma distribution New allele frequency comes from simple haploid selection model \(p'=(p-sp)/(1-sp)\)
curve(from=0,to=1,dgamma(x,shape=.25,rate=5),xlab="s")
selected<-(runif(filtered_loci)<0.05)*rgamma(filtered_loci,shape=.25,rate=5)
selected[selected>=1]=0.99 # truncate the stupid gamma
after_selection<-lapply(1:5, function(pop) (filtered_pop_ancestral[[pop]]-selected*filtered_pop_ancestral[[pop]])/(1-filtered_pop_ancestral[[pop]]*selected) )
Now we make DH pop by two rounds of binomial sampling from allele frequencies after selection
dh_freq=lapply(1:5, function(pop) rbinom(size=sample_size,n=loci,prob=rbinom(size=sample_size,n=loci,prob=after_selection[[pop]])/sample_size))
cutoff=0.05
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")
cutoff=0.01
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")
cutoff=0.001
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")
cutoff=0.0001
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")