Study of the convergence of diffsel on Besnard’s data, using 2 chains.

We use the files .trace output by diffsel.

Reading data

library(coda)

setwd("/home/boussau/Data/Convergenomix/outputPhilTrans/outdir_real_data/besnard2009/cyp_coding/Detection_tools")
d1<-read.table("diffsel/raw_results/myrun.trace", h=T, comment.char="&")[,1:7]
d2<-read.table("diffsel_bis/raw_results/myrun.trace", h=T, comment.char="&")[,1:7]

Autocorrelation plots

d1

a<-autocorr.plot(d1, auto.layout = F)

### d2

a<-autocorr.plot(d2, auto.layout = F)

Autocorrelation is low in both chains, and similar.

Effective sizes

d1 and d2

d1ess <- apply(d1, 2, effectiveSize)
d2ess <- apply(d2, 2, effectiveSize)

ess<-cbind(d1ess, d2ess)
plot(ess[,1], ess[,2], xlab="ESS of chain 1", ylab="ESS of chain 2")
abline(a=0, b=1)

We have an ESS of more than 500 for all variables, but the correlation is not perfect.

Trace plots

for (i in 1:length(d1[1,])) {
  plot(d1[,1], t="l", ylab=colnames(d1)[i])
  lines(d2[,1], col="red")
}

Convergence seems to have been reached, and 200 looks like a reasonable choice for discarding a burnin.

Gelman and Rubin potential scale reduction factor (PSRF) and MPSRF (Brooks and Gelman)

md1 <- mcmc(d1)
md2 <- mcmc(d2)
twoChains <- mcmc.list(md1, md2)
gelman.diag(twoChains)

## Potential scale reduction factors:
## 
##            Point est. Upper C.I.
## X.logprior       1.00       1.00
## lnL              1.01       1.04
## length           1.00       1.01
## globent          1.01       1.04
## selvar1          1.00       1.00
## statent          1.01       1.03
## rrent            1.00       1.00
## 
## Multivariate psrf
## 
## 1.01

The values are all close to 1, convergence seems to have been reached.

Convergence in quantiles (Raftery and Lewis)

We require that estimation of the 0.025 quantile should be precise at the 1% level.

raftery.diag(twoChains, r=0.01)

## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                   
##             Burn-in  Total Lower bound  Dependence
##             (M)      (N)   (Nmin)       factor (I)
##  X.logprior 2        966   937          1.03      
##  lnL        5        1523  937          1.63      
##  length     4        1257  937          1.34      
##  globent    3        1112  937          1.19      
##  selvar1    3        1026  937          1.09      
##  statent    3        1134  937          1.21      
##  rrent      3        1047  937          1.12      
## 
## 
## [[2]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                   
##             Burn-in  Total Lower bound  Dependence
##             (M)      (N)   (Nmin)       factor (I)
##  X.logprior 3        1047  937          1.120     
##  lnL        4        1334  937          1.420     
##  length     2        984   937          1.050     
##  globent    4        1206  937          1.290     
##  selvar1    2        925   937          0.987     
##  statent    2        983   937          1.050     
##  rrent      3        1026  937          1.090

2000 iterations seems to be enough for all variables in the trace, and the burnin suggested by this measure is always very small.

Diffsel MCMC convergence study

Bastien Boussau

July 13, 2018