Setup

set up loci, pops, neutral SFS

loci=1E7
sample_size=40
ancestral_sfs<-rep(1:999/1000,times=c(rmultinom(n=1,size=loci,prob=c(1/1:999))))

ignore selection in landraces, just sample neutrally. here 6 generations of drift from some ancestral population, just to get differences among populations

pop_ancestral=lapply(1:5, function(pop) ancestral_sfs)

for(i in 1:6){ pop_ancestral[[1]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[1]])/(sample_size)}
for(i in 1:6){ pop_ancestral[[2]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[2]])/(sample_size)}
for(i in 1:6){ pop_ancestral[[3]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[3]])/(sample_size)}
for(i in 1:6){ pop_ancestral[[4]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[4]])/(sample_size)}
for(i in 1:6){ pop_ancestral[[5]]=rbinom(size=sample_size,n=loci,prob=pop_ancestral[[5]])/(sample_size)}

keep only loci polymorphic in at least one population

is_poly=Reduce(`+`,pop_ancestral)
filtered_pop_ancestral=lapply(1:5, function(pop) pop_ancestral[[pop]][is_poly>0 & is_poly<5])
filtered_loci=sum(is_poly>0 & is_poly<5)

sanity check – how similar are the populations?

cor(filtered_pop_ancestral[[1]],filtered_pop_ancestral[[2]])

## [1] 0.7933197

Neutrality

make DH pop by two rounds of binomial sampling from ancestral neutral freqs.

dh_freq=lapply(1:5, function(pop) rbinom(size=sample_size,n=loci,prob=rbinom(size=sample_size,n=loci,prob=filtered_pop_ancestral[[pop]])/sample_size))

find outliers, shared loci, ancestral frequency, and mean frequency of outliers (see outliers.r). set a binomial p-value cutoff to identify outliers, then make plots.

Change cutoff to see impacts:

5%

cutoff=0.05
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")

1%

cutoff=0.01
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")

.1%

cutoff=0.001
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")

.01%

cutoff=0.0001
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")

We start seeing blank columns because at this level there simply aren’t any shared outliers among many populations.

Selection

Now apply selection. assume 5% loci selected, \(s\) comes from some gamma distribution New allele frequency comes from simple haploid selection model \(p'=(p-sp)/(1-sp)\)

curve(from=0,to=1,dgamma(x,shape=.25,rate=5),xlab="s")

selected<-(runif(filtered_loci)<0.05)*rgamma(filtered_loci,shape=.25,rate=5)
selected[selected>=1]=0.99 # truncate the stupid gamma
after_selection<-lapply(1:5, function(pop) (filtered_pop_ancestral[[pop]]-selected*filtered_pop_ancestral[[pop]])/(1-filtered_pop_ancestral[[pop]]*selected)  )

Now we make DH pop by two rounds of binomial sampling from allele frequencies after selection

dh_freq=lapply(1:5, function(pop) rbinom(size=sample_size,n=loci,prob=rbinom(size=sample_size,n=loci,prob=after_selection[[pop]])/sample_size))

Change cutoff to see impacts:

5%

cutoff=0.05
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")

1%

cutoff=0.01
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")

.1%

cutoff=0.001
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")

.01%

cutoff=0.0001
source("outliers.r")
par(mfrow=(c(1,2)))
boxplot(anc_out,ylab="ancestral freqeuncy",xlab="number of pops sharing outlier")
boxplot(mean_out,ylab="average frequency of outlier in DH",xlab="number of pops sharing outlier")

eurosims

Setup

Neutrality

Change cutoff to see impacts:

5%

1%

.1%

.01%

Selection

Change cutoff to see impacts:

5%

1%

.1%

.01%