why EM?

本质是考虑latent data或missing data得mle

Steps (from Andrew Ng)

\(Z为未观测变量\)
1, \(求Q_{ti} = P(Z_i|D_i, \theta_t)\)
2, \(最大化下界 argmax_{\theta_{t+1}}(\sum\limits_i{ \sum\limits_{Z_i}{ Q_{ti} \cdot log\frac{P(D_i, Z_i|\theta)}{Q_{ti}}}})\)
3, goto step1

EM as a Maximization-Maximization Procedure

\[\begin{eqnarray*} Q_i是关于Z_i得概率密度函数 \\ logP(D_i|\theta) &=& logP(D_i,Z_i|\theta) - log(Z_i|D_i, \theta) \\ 两边同时对Q_i求期望 \\ 左边:\sum{Q_i \cdot logP(D_i|\theta)} &=& logP(D_i|\theta) \\ logP(D_i|\theta) &=& \sum{Q_i \cdot logP(D_i,Z_i|\theta)} - \sum{Q_i \cdot log(Z_i|D_i, \theta)} \\ &=& \sum{Q_i \cdot log\frac{logP(D_i,Z_i|\theta)}{Q_i}} - \sum{Q_i \cdot \frac{logP(Z_i|D_i, \theta)}{Q_i}} \\ &=& \sum{Q_i \cdot log\frac{logP(D_i,Z_i|\theta)}{Q_i}} + KL(P(Z_i|D_i, \theta)||Q_i) \\ 因为KL>=0, \\ logP(D_i|\theta) &>=& \sum{Q_i \cdot log\frac{logP(D_i,Z_i|\theta)}{Q_i}} \\ 即 \sum{Q_i \cdot log\frac{logP(D_i,Z_i|\theta)}{Q_i}} 是logP(D_i|\theta) 下界 \\ \end{eqnarray*}\]

所以,E-step是固定\(\theta_t\),此时logP(D_i)大小固定,使KL项最小,得到当前最大下界
M-step是固定\(Q_ti\),使下界最大,此时KL项不为0,loglikelihood增加值大于下界增加值