MPE and MAP Queries

MAP: Maximum a posteriori hypothesis
- Given map variables \(\mathbf{M}\subseteq\mathbf{X}\) and evidence \(\mathbf{e}\), our goal is to find instantiation \(\mathbf{m}\) for which \(P(\mathbf{m|e})\) is maximised
MPE: Maximum a psoteriori explanation
- Most probable probable variable instantiation given evidence \(\mathbf{e}\)
- Special case of MAP where \(\mathbf{M=X}\)
- Maximise the posterior marginal:

\[ \max P(x_1,...,x_n|\mathbf{e}) \]

Computing MPE

MPE probability for the variables \(\mathbf{Q}\) of a netowrk given evidence \(\mathbf{e}\):

\[ MPE_p(\mathbf{e})=\max_q P(\mathbf{q,e}) \]

There may be several instatiations with maximal probability
- Each of them are known as a MPE instnatiation

\[ MPE_p(\mathbf{e})=argmax_q P(\mathbf{q,e}) \]

We are technically maximising the posterior distribution \(P(\mathbf{q|e})=\frac{P(\mathbf{q,e})}{P(\mathbf{e})}\), however \(P(\mathbf{e})\) is independent of instantiation \(\mathbf{q}\), therefore we only need to calculate the joint marginal.

We can utilise the variable elimination algorithm, however we maximise rather than summing out.
- Merge rows that agree on variables that are not of interest, and drop the instantiations that are not max
- Example:

Notes on maximisation
- Commutative, therefore can refer to maximisation without specifying order

\[ \max_{X_1}\max_{X_2}\phi = \max_{X_2}\max_{X_1}\phi \]

Can take factors that do not include the maximised variable out of the operation

\[ \max_X\phi_1\phi_2 = \phi_1\max_X\phi_2\]

Algorithm
- All factors are eliminated resulting in the trivial factor
- Same complexity as VE, therefore time and space complexity \(O(n\exp(w))\) for \(n\) variables and elimination width \(w\)

Note that we need to normalise with probability of evidence as we are computing \(\max P(\mathbf{q,e})\), however we want \(P(\mathbf{q|e})\):

\[ P(\mathbf{q|e}) = \frac{P(\mathbf{q,e})}{P(\mathbf{e})}\]

The algorithm results in a trivial factor with the MPE probability, but stores no information about the instantiation that results in that probability.
Can use extended factors that records the MPE instantiation as it is being constructed

Considering the elimination order \(\pi = O_1,S_1,O_2,S_2,...,O_n,S_n\):
- Applying the MPE algorithm with this elimination order would obtain the Viterbi algorithm
- Applying the VE algorithm with this elimination order would obtain the forward algorithm
The elimination order has width 1 for HMMs
- Therefore both algorithms have linear time and space complexity

The MAP probability for the variables \(\mathbf{M}\) given evidence \(\mathbf{e}\) is defined as:

\[ MAP_p(\mathbf{M,e}) = \max_mP(\mathbf{m,e})\]

There may be several instantiations \(\mathbf{m}\) with maximal probability
- Each of them is a MAP instantiation

\[ MAP_p(\mathbf{M,e}) = argmax_mP(\mathbf{m,e}) \]

We are technically maximising the posterior distribution \(P(\mathbf{m|e})=\frac{P(\mathbf{m,e})}{P(\mathbf{e})}\), however \(P(\mathbf{e})\) is independent of instantiation \(\mathbf{m}\), therefore we only need to calculate the joint marginal.

We first need to sum out the non-MAP variables, computing the marginal \(P(\mathbf{M,e})\) in factored form.
Then we maximise out MAP variables \(\mathbf{M}\), solving MPE problem over the resulting marginal.
- Can use extended factors just as when computing MPE instantiation
Algorithm
- If the network is Bayesian, can prune nodes and edges that are unneeded
- MAP variables must appear last in the elimination order
- Time and space complexity \(O(n\exp(w))\)

Elimination order is constrained to have MAP variables last
- Therefore, may not be able to get a good order as low-width orders may not satisfy the constraint
\(\mathbf{M}\)-constrained tree-width of a graph is the width of its best \(\mathbf{M}\)-constrained variable order
- Complexity of MAP VE is at best exponential in this constrained treewidth