We are interested on studing macro-evolutionary drivers given a phylogenetic tree
However, we normally do not see extinct species
to infer information from incomplete trees we analyze them as a ‘missing-data problem’. Thus, we performs an EM-algorithm
The Em algorithm consists on two steps:
\[ Q(\theta|\theta^*) = E_{\theta^* } [log P(D^+|\theta) | D]\]
\[ \theta ^{**} = argmax_{(\theta)} Q( \theta | \theta ^*) \]
However, there is no close-form of the equation on the E-step in the phylogenetic tree context. An alternative to it is to calculate \(E_{\theta^* } [log P(D^+|\theta) | D]\) via monte-carlo method.
\[ E_{\theta^* } [log P(D^+|\theta) | D] \approx \frac{1}{N} \sum^{N}_{i=1} log P(D_i^+ | \theta)\]
In order to use the monte-carlo method we need to simulate reconstructed trees…