The reconstruction process

Introduction

We are interested on studing macro-evolutionary drivers given a phylogenetic tree

However, we normally do not see extinct species

to infer information from incomplete trees we analyze them as a ‘missing-data problem’. Thus, we performs an EM-algorithm

EM algorithm

The Em algorithm consists on two steps:

  1. E- step

\[ Q(\theta|\theta^*) = E_{\theta^* } [log P(D^+|\theta) | D]\]

  1. M-step

\[ \theta ^{**} = argmax_{(\theta)} Q( \theta | \theta ^*) \]

However, there is no close-form of the equation on the E-step in the phylogenetic tree context. An alternative to it is to calculate \(E_{\theta^* } [log P(D^+|\theta) | D]\) via monte-carlo method.

MCEM

\[ E_{\theta^* } [log P(D^+|\theta) | D] \approx \frac{1}{N} \sum^{N}_{i=1} log P(D_i^+ | \theta)\]

In order to use the monte-carlo method we need to simulate reconstructed trees…