MPE and MAP Queries

Jake

02/11/2022

MPE and MAP Queries

  • MAP: Maximum a posteriori hypothesis
    • Given map variables \(\mathbf{M}\subseteq\mathbf{X}\) and evidence \(\mathbf{e}\), our goal is to find instantiation \(\mathbf{m}\) for which \(P(\mathbf{m|e})\) is maximised
  • MPE: Maximum a psoteriori explanation
    • Most probable probable variable instantiation given evidence \(\mathbf{e}\)
    • Special case of MAP where \(\mathbf{M=X}\)
    • Maximise the posterior marginal:

\[ \max P(x_1,...,x_n|\mathbf{e}) \]

Computing MPE

  • MPE probability for the variables \(\mathbf{Q}\) of a netowrk given evidence \(\mathbf{e}\):

\[ MPE_p(\mathbf{e})=\max_q P(\mathbf{q,e}) \]

  • There may be several instatiations with maximal probability
    • Each of them are known as a MPE instnatiation

\[ MPE_p(\mathbf{e})=argmax_q P(\mathbf{q,e}) \]

  • We are technically maximising the posterior distribution \(P(\mathbf{q|e})=\frac{P(\mathbf{q,e})}{P(\mathbf{e})}\), however \(P(\mathbf{e})\) is independent of instantiation \(\mathbf{q}\), therefore we only need to calculate the joint marginal.

Computation through Variable Elimination

  • We can utilise the variable elimination algorithm, however we maximise rather than summing out.
    • Merge rows that agree on variables that are not of interest, and drop the instantiations that are not max

    • Example:

  • Notes on maximisation
    • Commutative, therefore can refer to maximisation without specifying order

\[ \max_{X_1}\max_{X_2}\phi = \max_{X_2}\max_{X_1}\phi \]

  • Can take factors that do not include the maximised variable out of the operation

\[ \max_X\phi_1\phi_2 = \phi_1\max_X\phi_2\]

  • Algorithm
    • All factors are eliminated resulting in the trivial factor
    • Same complexity as VE, therefore time and space complexity \(O(n\exp(w))\) for \(n\) variables and elimination width \(w\)

  • Note that we need to normalise with probability of evidence as we are computing \(\max P(\mathbf{q,e})\), however we want \(P(\mathbf{q|e})\):

\[ P(\mathbf{q|e}) = \frac{P(\mathbf{q,e})}{P(\mathbf{e})}\]

Recovering MPE Instantiation

  • The algorithm results in a trivial factor with the MPE probability, but stores no information about the instantiation that results in that probability.
  • Can use extended factors that records the MPE instantiation as it is being constructed

MPE and HMM

  • Considering the elimination order \(\pi = O_1,S_1,O_2,S_2,...,O_n,S_n\):
    • Applying the MPE algorithm with this elimination order would obtain the Viterbi algorithm
    • Applying the VE algorithm with this elimination order would obtain the forward algorithm
  • The elimination order has width 1 for HMMs
    • Therefore both algorithms have linear time and space complexity

Computing MAP

  • The MAP probability for the variables \(\mathbf{M}\) given evidence \(\mathbf{e}\) is defined as:

\[ MAP_p(\mathbf{M,e}) = \max_mP(\mathbf{m,e})\]

  • There may be several instantiations \(\mathbf{m}\) with maximal probability
    • Each of them is a MAP instantiation

\[ MAP_p(\mathbf{M,e}) = argmax_mP(\mathbf{m,e}) \]

  • We are technically maximising the posterior distribution \(P(\mathbf{m|e})=\frac{P(\mathbf{m,e})}{P(\mathbf{e})}\), however \(P(\mathbf{e})\) is independent of instantiation \(\mathbf{m}\), therefore we only need to calculate the joint marginal.

Computing MAP through Variable Elimination

  • We first need to sum out the non-MAP variables, computing the marginal \(P(\mathbf{M,e})\) in factored form.
  • Then we maximise out MAP variables \(\mathbf{M}\), solving MPE problem over the resulting marginal.
    • Can use extended factors just as when computing MPE instantiation
  • Algorithm
    • If the network is Bayesian, can prune nodes and edges that are unneeded
    • MAP variables must appear last in the elimination order
    • Time and space complexity \(O(n\exp(w))\)

MAP complexity

  • Elimination order is constrained to have MAP variables last
    • Therefore, may not be able to get a good order as low-width orders may not satisfy the constraint
  • \(\mathbf{M}\)-constrained tree-width of a graph is the width of its best \(\mathbf{M}\)-constrained variable order
    • Complexity of MAP VE is at best exponential in this constrained treewidth