1000 random values of \(\theta\).
thetas<-runif(1000,1,100)
write.table(thetas,file="~/src/msdir/file.tbs",col.names=F,row.names=F,quote=F)
Use these 1,000 diversity values to simulate a constant size neutral population. Then calculate summary statistics using msstats
system("~/src/msdir/ms 20 1000 -t tbs < ~/src/msdir/file.tbs | ~/src/msstats/src/msstats | cut -f 4,5,8 > ~/src/msdir/sumstats.txt")
Read in data, plot relationship between Tajima’s D and diversity. Do a simple linear regression.
sumstats<-read.table("~/src/msdir/sumstats.txt",header=T)
diversity<-scan("~/src/msdir/file.tbs")
data<-cbind(sumstats,diversity)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
ggplot(data,aes(y=tajd,x=diversity))+geom_point()+geom_smooth(method="loess")
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
summary(lm(data=data,tajd~diversity))
##
## Call:
## lm(formula = tajd ~ diversity, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.03292 -0.63347 -0.01331 0.58207 2.70681
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1296632 0.0547802 -2.367 0.0181 *
## diversity 0.0004962 0.0009483 0.523 0.6009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8586 on 997 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.0002746, Adjusted R-squared: -0.0007282
## F-statistic: 0.2738 on 1 and 997 DF, p-value: 0.6009
In contrast, \(\theta_\pi\) for example, is a great measure of diversity.
sumstats<-read.table("~/src/msdir/sumstats.txt",header=T)
diversity<-scan("~/src/msdir/file.tbs")
data<-cbind(sumstats,diversity)
library(ggplot2)
ggplot(data,aes(y=pi,x=diversity))+geom_point()+geom_smooth(method="loess")
summary(lm(data=data,pi~diversity))
##
## Call:
## lm(formula = pi ~ diversity, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -68.399 -14.323 -2.883 8.043 251.663
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.50989 1.75857 -0.29 0.772
## diversity 0.98626 0.03046 32.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27.62 on 998 degrees of freedom
## Multiple R-squared: 0.5124, Adjusted R-squared: 0.5119
## F-statistic: 1049 on 1 and 998 DF, p-value: < 2.2e-16