1000 random values of \(\theta\).

thetas<-runif(1000,1,100)
write.table(thetas,file="~/src/msdir/file.tbs",col.names=F,row.names=F,quote=F)

Use these 1,000 diversity values to simulate a constant size neutral population. Then calculate summary statistics using msstats

system("~/src/msdir/ms 10 1000 -t tbs < ~/src/msdir/file.tbs | ~/src/msstats/src/msstats | cut -f 4,5,8 > ~/src/msdir/sumstats.txt")

Read in data, plot relationship between Tajima’s D and diversity. Do a simple linear regression.

sumstats<-read.table("~/src/msdir/sumstats.txt",header=T)
diversity<-scan("~/src/msdir/file.tbs")
data<-cbind(sumstats,diversity)

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
ggplot(data,aes(y=tajd,x=diversity))+geom_point()+geom_smooth(method="loess")+coord_cartesian(ylim = c(-2, 2))

summary(lm(data=data,tajd~diversity))
## 
## Call:
## lm(formula = tajd ~ diversity, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.94321 -0.59504 -0.03147  0.57605  2.69632 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.1024097  0.0572462  -1.789   0.0739 .
## diversity    0.0001055  0.0009706   0.109   0.9134  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8712 on 998 degrees of freedom
## Multiple R-squared:  1.184e-05,  Adjusted R-squared:  -0.0009901 
## F-statistic: 0.01182 on 1 and 998 DF,  p-value: 0.9134