1000 random values of \(\theta\).

thetas<-runif(1000,1,100)
write.table(thetas,file="~/src/msdir/file.tbs",col.names=F,row.names=F,quote=F)

Use these 1,000 diversity values to simulate a constant size neutral population. Then calculate summary statistics using msstats

system("~/src/msdir/ms 20 1000 -t tbs < ~/src/msdir/file.tbs | ~/src/msstats/src/msstats | cut -f 4,5,8 > ~/src/msdir/sumstats.txt")

Read in data, plot relationship between Tajima’s D and diversity. Do a simple linear regression.

sumstats<-read.table("~/src/msdir/sumstats.txt",header=T)
diversity<-scan("~/src/msdir/file.tbs")
data<-cbind(sumstats,diversity)

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
ggplot(data,aes(y=tajd,x=diversity))+geom_point()+geom_smooth(method="loess")
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).

summary(lm(data=data,tajd~diversity))
## 
## Call:
## lm(formula = tajd ~ diversity, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.03292 -0.63347 -0.01331  0.58207  2.70681 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.1296632  0.0547802  -2.367   0.0181 *
## diversity    0.0004962  0.0009483   0.523   0.6009  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8586 on 997 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0002746,  Adjusted R-squared:  -0.0007282 
## F-statistic: 0.2738 on 1 and 997 DF,  p-value: 0.6009

In contrast, \(\theta_\pi\) for example, is a great measure of diversity.

sumstats<-read.table("~/src/msdir/sumstats.txt",header=T)
diversity<-scan("~/src/msdir/file.tbs")
data<-cbind(sumstats,diversity)

library(ggplot2)
ggplot(data,aes(y=pi,x=diversity))+geom_point()+geom_smooth(method="loess")

summary(lm(data=data,pi~diversity))
## 
## Call:
## lm(formula = pi ~ diversity, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -68.399 -14.323  -2.883   8.043 251.663 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.50989    1.75857   -0.29    0.772    
## diversity    0.98626    0.03046   32.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.62 on 998 degrees of freedom
## Multiple R-squared:  0.5124, Adjusted R-squared:  0.5119 
## F-statistic:  1049 on 1 and 998 DF,  p-value: < 2.2e-16