Importance sampling

On this report we denote \(D\) as the observed-part or extant-species, \(+_i\) as the missing-part or extinct-species of the tree and \(D^+\) is then the complete phylogenetic tree.

The EM algorithms consists on two steps. First, we calculate the conditional expectation:

\[ Q(\theta|\theta^*) = E_{\theta^* } [log P(D^+|\theta) | D] \]

and then we perform the maximization:

\[ \theta ^{**} = argmax_{(\theta)} Q( \theta | \theta ^*) \]

Given that the calculation of the conditional expectation is really hard (if not impossible), we use an approximation, sampling complete-phylogenies under a montecarlo approach. This simulations should be sampled from (real density)

\[ f_{\theta^*} (+_i | D) \]

But instead we sample it from

\[ g_{\theta^*}(+_i | D) \]

To correct this we re-weigh the approximation of the expectation by importance scaling:

\[w_i = \frac{f_{\theta^*} (+_i | D)}{g_{\theta^*}(+_i | D)} = \frac{f_i}{g_i}\]

Thus, the montecarlo re-weighted approximation has the form

\[ E_{\theta^* } [log P(D^+|\theta) | D] \approx \frac{1}{N} \sum^{N}_{i=1} log P(D_i^+ | \theta) \frac{f_i}{g_i}\]

\(f_i\) is the density function (likelihood) of the complete phylogenetic tree.
to calculate \(g_i\) we have 2 ways so far. The first one is related with the diagram above

we multiply the corresponding probabilities to have every outcome, this is done on the piece of code bellow

  if (t_spe < cwt){
      t_ext = rexp(1,mu0)
      t_ext = cbt + t_spe + t_ext
      if (t_ext < ct){
        up = update_tree(wt=wt,t_spe = (cbt + t_spe), t_ext = t_ext, E = E, n = n)
        E = up$E
        n = up$n
        wt = up$wt
        fake = FALSE
        prob[i] = dexp(t_ext,rate=mu0,log=TRUE) + dexp(t_spe,rate=s,log = TRUE)
      }else{
        prob[i] = pexp(q = ct,rate = mu0,lower.tail = F,log.p = TRUE) + dexp(t_spe,rate=s, log = TRUE)
        fake = TRUE
        i = i-1
      }
    }else{
      fake = FALSE
      prob[i] = pexp(q = cwt,rate = s,lower.tail = F,log.p = TRUE)
    }

Note that we are working with multiplication of many densities, which in the end are very small values. To avoid cutting most numbers to zero we consider the following fact:

\[ w_i = \frac{f_i}{g_i} = e^{log(f_i) - log(g_i)} = e^{loglik - \sum_j log(g_{i,j})} \]

and that is why on the code above the log of the densities is calculated. Considering that we calculate the weight in the following way

f_n = -llik(b=pars,n=n,E=E,t=wt)
weight = exp(f_n-sum(prob))
return(list(wt=wt,E=E,n=n,weight=weight,L=L,g=prob,f_n=f_n))

another way to calculate \(g\) is considering the probabilities of not having a new species on the interval \(\Delta t_i\)

\[ \int_0^{\Delta t_i} s_{\lambda_i}e^{-s_{\lambda_i} t}[1-e^{-\mu (r_i-t)}] dt = s_{\lambda_i}\int_0^{\Delta t_i}e^{-s_{\lambda_i} t}-e^{ t(\mu-s_{\lambda_i})-\mu r_i} = s_{\lambda_i}\int_0^{\Delta t_i}e^{-s_{\lambda_i} t}dt - s_{\lambda_i} e^{-\mu r_i} \int_0^{\Delta t_i}e^{ t(\mu-s_{\lambda_i})} dt \] \[= 1-e^{-s_{\lambda_i} \Delta t} -\frac{s_{\lambda_i} e^{-\mu r_i}}{\mu - s_{\lambda_i}}[e^{\Delta t(\mu-s_{\lambda_i})}-1] \] so we do it with this function

convol <-function(wt,lambda,mu,remt){
  out = 1-exp(-lambda*wt)-lambda*exp(-mu*remt)/(mu-lambda)*(exp(wt*(mu-lambda))-1)
  return(out)
}

and we add this option to the code

 if (t_spe < cwt){
      t_ext = rexp(1,mu0)
      t_ext = cbt + t_spe + t_ext
      if (t_ext < ct){
        up = update_tree(wt=wt,t_spe = (cbt + t_spe), t_ext = t_ext, E = E, n = n)
        E = up$E
        n = up$n
        wt = up$wt
        fake = FALSE
        prob[i] = dexp(t_ext,rate=mu0,log=TRUE) + dexp(t_spe,rate=s,log = TRUE)
      }else{
        prob[i] = pexp(q = ct,rate = mu0,lower.tail = F,log.p = TRUE) + dexp(t_spe,rate=s,log = TRUE)
        if(v2){ prob[i] = log(convol(wt = t_spe,lambda = s,mu = mu,remt = ct-cbt))}
        fake = TRUE
        i = i-1
      }
    }else{
      fake = FALSE
      prob[i] = pexp(q = cwt,rate = s,lower.tail = F,log.p = TRUE)
      if(v2){prob[i] = log(convol(wt = t_spe,lambda = s,mu = mu,remt = ct-cbt))}
    }

Implementation

To have a better idea on how this implementation look like we do some simple observations.

We start simulating a whole tree

s <- sim_phyl()

Them we drop extinct species

s2 <- drop.fossil(s$newick)

we transform phylo format into branching-times, number of species, and topology vectors

s2 <- phylo2p(s2)

Now we simulate a complete tree (extinct+extans species) based on the observed tree (extant species only)

s3 <- rec_tree(wt=s2$wt)

and we observe the calculated weight for this tree

s3$weight

## [1] -Inf

it seems very small

This is a random process, if we do it again we end with another complete-tree and its corresponding weight, we do it for 30 iterations just to have an idea of the variability of the weight

w=0
nLTT=0
n_it = 100
st=sim_srt(wt=s2$wt,pars=c(0.8,0.0175,0.1),n_trees = n_it)
for(i in 1:n_it){
  st3 = st[[i]]
  w[i] = st3$weight 
  rec = p2phylo(st3)
  nLTT[i] = nLTTstat(s$newick, rec, distance_method = "abs")
}
su=summary(w)
su

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    -Inf  -222.4  -212.5    -Inf  -198.5  -174.5

boxplot(w)

q3 = w>su[5]
boxplot(w[q3])

They seems very small to be the logarithm of the weight, we centreate it to avoid numerical issues

w = w - max(w)
boxplot(w)

## Warning in bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z
## $group == : Outlier (-Inf) in boxplot 1 is not drawn

points(rep(1,length(w)),w)

now we would like to check if trees with larger weight are really ‘’better trees’’. To do that we check the nLTT statistic

qplot(w,nLTT)

qplot(w[q3],nLTT[q3])

we can see that there is not relationship between the weight and the nLTT statistic. There is clearly something wrong about it.

Now let´s check the real weights, that is

w = exp(w)
qplot(w,nLTT)

now the same but with the second way

s3 <- rec_tree(wt=s2$wt,v2=T)

and we observe the calculated weight for this tree

s3$weight

## [1] -194.5687

it seems very small

This is a random process, if we do it again we end with another complete-tree and its corresponding weight, we do it for 30 iterations just to have an idea of the variability of the weight

w=0
for(i in 1:100){
st3 <- rec_tree(wt=s2$wt,v2=T)
w[i] = st3$weight 
}
print(w)

##   [1]    0.0000    0.0000 -111.6870 -131.3811    0.0000    0.0000 -116.4134
##   [8] -103.9187 -131.2390 -110.5780    0.0000 -126.2164    0.0000 -127.8409
##  [15] -124.0819 -113.7044    0.0000    0.0000    0.0000    0.0000 -187.8608
##  [22] -120.7836    0.0000 -104.1887 -134.6352 -112.7757    0.0000 -121.1497
##  [29] -106.3640    0.0000 -124.7770 -135.5457 -127.6677      -Inf -119.5629
##  [36] -125.2948 -127.8070    0.0000 -125.0288 -118.8490 -104.3103 -112.2139
##  [43]    0.0000 -115.4659 -104.5541    0.0000 -119.6001 -123.9491    0.0000
##  [50] -129.0546 -131.0357    0.0000 -110.5199 -120.9251      -Inf    0.0000
##  [57] -231.5872    0.0000    0.0000 -127.4698 -138.7073 -121.3749 -111.1664
##  [64] -121.7523 -105.8351 -120.2086    0.0000 -105.6984 -143.6311 -186.6319
##  [71] -128.2664  -99.8520 -178.0863 -130.9410 -121.6407 -122.3238 -137.7087
##  [78] -120.0660 -123.5752 -111.7697 -201.1914 -137.8371 -133.5209 -187.5243
##  [85]    0.0000 -132.6997 -117.4974    0.0000 -181.0748      -Inf    0.0000
##  [92]    0.0000    0.0000 -196.5135      -Inf      -Inf -108.0838 -113.7509
##  [99]      -Inf -106.5920

w = w - max(w)
print(w)

##   [1]    0.0000    0.0000 -111.6870 -131.3811    0.0000    0.0000 -116.4134
##   [8] -103.9187 -131.2390 -110.5780    0.0000 -126.2164    0.0000 -127.8409
##  [15] -124.0819 -113.7044    0.0000    0.0000    0.0000    0.0000 -187.8608
##  [22] -120.7836    0.0000 -104.1887 -134.6352 -112.7757    0.0000 -121.1497
##  [29] -106.3640    0.0000 -124.7770 -135.5457 -127.6677      -Inf -119.5629
##  [36] -125.2948 -127.8070    0.0000 -125.0288 -118.8490 -104.3103 -112.2139
##  [43]    0.0000 -115.4659 -104.5541    0.0000 -119.6001 -123.9491    0.0000
##  [50] -129.0546 -131.0357    0.0000 -110.5199 -120.9251      -Inf    0.0000
##  [57] -231.5872    0.0000    0.0000 -127.4698 -138.7073 -121.3749 -111.1664
##  [64] -121.7523 -105.8351 -120.2086    0.0000 -105.6984 -143.6311 -186.6319
##  [71] -128.2664  -99.8520 -178.0863 -130.9410 -121.6407 -122.3238 -137.7087
##  [78] -120.0660 -123.5752 -111.7697 -201.1914 -137.8371 -133.5209 -187.5243
##  [85]    0.0000 -132.6997 -117.4974    0.0000 -181.0748      -Inf    0.0000
##  [92]    0.0000    0.0000 -196.5135      -Inf      -Inf -108.0838 -113.7509
##  [99]      -Inf -106.5920

w = exp(w)
print(w)

##   [1]  1.000000e+00  1.000000e+00  3.125850e-49  8.747748e-58  1.000000e+00
##   [6]  1.000000e+00  2.768883e-51  7.390368e-46  1.008427e-57  9.474798e-49
##  [11]  1.000000e+00  1.530780e-55  1.000000e+00  3.015763e-56  1.294008e-54
##  [16]  4.157152e-50  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00
##  [21]  2.588722e-82  3.502226e-53  1.000000e+00  5.641896e-46  3.378213e-59
##  [26]  1.052274e-49  1.000000e+00  2.428484e-53  6.407605e-47  1.000000e+00
##  [31]  6.456903e-55  1.359120e-59  3.586219e-56  0.000000e+00  1.187123e-52
##  [36]  3.847463e-55  3.119747e-56  1.000000e+00  5.019619e-55  2.423942e-52
##  [41]  4.995741e-46  1.845534e-49  1.000000e+00  7.141399e-51  3.914848e-46
##  [46]  1.000000e+00  1.143792e-52  1.477681e-54  1.000000e+00  8.959484e-57
##  [51]  1.235673e-57  1.000000e+00  1.004223e-48  3.040044e-53  0.000000e+00
##  [56]  1.000000e+00 2.648124e-101  1.000000e+00  1.000000e+00  4.370842e-56
##  [61]  5.757119e-61  1.938897e-53  5.260800e-49  1.329424e-53  1.087426e-46
##  [66]  6.223931e-53  1.000000e+00  1.246675e-46  4.186119e-63  8.847096e-82
##  [71]  1.970696e-56  4.313476e-44  4.550750e-78  1.358498e-57  1.486298e-53
##  [76]  7.506445e-54  1.562729e-60  7.178102e-53  2.147716e-54  2.877661e-49
##  [81]  4.204082e-88  1.374422e-60  1.029511e-58  3.624429e-82  1.000000e+00
##  [86]  2.340148e-58  9.365684e-52  1.000000e+00  2.291980e-79  0.000000e+00
##  [91]  1.000000e+00  1.000000e+00  1.000000e+00  4.521481e-86  0.000000e+00
##  [96]  0.000000e+00  1.147589e-47  3.968524e-50  0.000000e+00  5.101265e-47

So, a first observation, the values of w are really small and varies a lot. That is amaking a lot of numerical inestabilities.

boxplot(w)

summary(w)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00    0.27    1.00    1.00

we can also see how few trees has almost the whole weight and defines the outcome