Jointree

Jake

06/12/2022

Jointree

  • Jointree is a variation of variable elimination that utilises factor elimination rather than variable elimination

  • Jointree algorithm is particularly useful when we want to compute posterior marginals for each of \(n\) variables

    • Jointree has a complexity of \(O(n\exp(w))\) while running VE \(n\) times would lead to \(O(n^2\exp(w))\)
      • Saves computations of factors
    • For bayesian nets, it will copmpute the posterior marginals for all network families
    • For Markov nets, it will provide posterior marginals for all cliques

Factor Elimination

  • Consider computation of a marginal over some variable \(Q, P(Q)\)
    • Variable elimination eliminates every other variable
    • Factor elimination will eliminate all factors except for the one that contains \(Q\)
  • Elimination of a factor \(f_i\) from a set of factors is a 2 step process
    • Eliminate all variables \(V\) that appear only in \(f\)
    • Multiply the result \(\sum_v f_i\) by some other factor \(f_j\) in the set

  • Factor projection operation:

\[ project(f,Q) = \sum_{vars(f)-Q}f\]

  • Factor elimination is a variation of VE that eliminates a set of variables \(V\) at once rather than a single variable.
    • Added step of multiplying the new factor by another factor \(f_j\)
    • At the end, sum out any remaining variables that are not in \(Q\) in the last factor

Elimination Trees

  • In factor elimination, elimination trees are used to specify an elimination strategy

    • Analogous to the elimination order for variable elimination
  • The width of the tree can be used to quantify the amount of work performed.

  • An elimination tree \(T\) for a set of factors \(S\) has:

    • Each factor in \(S\) assigned to exactly one node in \(T\)
    • \(\phi_i\) denotes the product of factors assigned to node \(i\) in \(T\)

  • When computing the marginal, need to choose a root node \(r\):
    • Root must have the variables of interest, aka \(Q\subseteq vars(r)\)
    • Not strictly needed but simplifies calculations
  • Given an elimination tree and corresponding root \(r\), the strategy is:
    • Eliminate factor \(\phi_i\) only if it has a single neighbor \(j\) and \(i\neq r\) (not root node)
    • Sum out variables \(V\) that only appear in \(\phi_i\)
    • Multiply result \(\sum_V\phi_i\) into the factor \(\phi_j\) associated with the single neighbor \(j\)

Seperators

  • The seperator of an edge \(i-j\) in an elim tree is the set of variables:
    • \(vars(i,j)\) are variables on the \(i\) side of the edge

\[S_{ij} = vars(i,j)\cap vars(j,i) \]

  • When variables \(V\) are summed out of factor \(f_i\) before it is eliminated, the resulting factor is guaranteed to be over separator \(S_{ij}\)

Clusters

  • The cluster of a node \(i\) in an elimination tree is a set of variables:

\[ C_i= vars(i)\cup\bigcup_j S_{ij}\]

  • The width of an elimination tree is the size of its largest cluster - 1
  • When we are about to eliminate node \(i\), the variables of factor \(\phi_i\) are exactly the cluster of node \(i\), \(C_i\)
  • The factor \(\phi_r\) must be over the cluster of root \(r\), \(C_r\)
    • Therefore, the following algorithm can be used to compute the marginal over any subset of cluster \(C_r\)

Message Passing

  • Message passing allows us to save intermediate computations and reuse them across different queries
    • The key to decreasing complexity, the reason for the algorithm
  • Given an elimination tree \(T\) with root \(r\):
    • For each node \(i\) that is not the root, there is a unique neighbor that is closest to root \(r\)
    • A node \(i\) will be eliminated from the tree only after all its neighbors, except the one closest to the root have been eliminated
    • When a node \(i\) is about to be eliminated, it will have a single neighbor \(j\). Its current factor will have all varaibles by the seperator \(S_{ij}\) eliminated and will be multiplied by factor \(j\)
  • Consider the elimination of node \(i\) with a single neighbor \(j\) as a process of passing a message \(M_{ij}\) from node \(i\) to \(j\)
    • When \(j\) recieves the message, it multiplies it into its current factor \(\phi_j\)
    • Node \(i\) can’t sent a message to \(j\) until it has recieved messages from all other neighbors \(k\neq j\)
    • After \(i\) recieves all these messages, its current factor will be:

\[ \phi_i\prod_{k\neq j}M_{ki}\]

  • The message \(i\) sends to \(j\) will be
    • Eliminates all variables except for the separator variables \(S_{ij}\)

\[ \sum_{C_i\backslash S_{ij}}\phi_i\prod_{k\neq j}M_{ki}\]

  • Formulating factor elimination as message passing now:
    • To compute marginal over some variables \(Q\):
    • Select a root \(r\) in the elimination tree such that \(Q\subseteq C_r\)
    • Push messages towards root \(r\)
    • When all messages into the root are avalaible, we multiply them by \(\phi_r\) and eliminate variables not in \(Q\)
    • If our elimination tree has \(m\) nodes and \(m-1\) edges, then \(m-1\) messages need to be sent
  • This allows us to easily compute the marginal over any other cluster \(C_i, i\neq r\)
    • Choose \(i\) as new root node and repeat message passing process
    • Some additional messages may need to be passed/computed, but not as many as \(m-\) assuming we saved messages sent to \(r\)
      • Choosing every node as a root, the total number of messages is \(2(m-1)\)
  • Normally compute all \(2(m-1)\) in 2 phases
    • Inward towards a root \(r\)
    • Outwards away from root \(r\)

Complexity

  • Message \(i\) to \(j\) is computed by multiplying a few factors
  • Complexity of both multiplication and summation is \(O(\exp(z))\), where \(z\) is size of cluster \(C_i\)
  • Therefore, the cost of any message is \(O(\exp(w))\) where \(w\) is the width of the elimination
  • Considering \(2(m-1)\) messages, the total cost is \(O(m\exp(w))\) for \(m\) nodes

Evidence

  • Given some evidence \(e\), we want to use factor elimination to compute joint marginal \(P(C_i,e)\) for each cluster \(C_i\). We can:
    • Reduce each factor given evidence \(e\) to a set of reduced factors \(f^e\)
      • More efficient if plan to compute marginals with respect to one piece of evidence \(e\)
    • Introduce an evidence indicator \(\lambda_E\) for every variable \(E\) in evidence \(e\).
      • More efficient if plan to compute marginals with respect to multiple pieces of evidence \(e_1,...,e_n\)
      • Assign indicator to node \(i\) in elimnination tree while ensuring \(E\in C_i\)
    • Adjust outcome space

FE with Evidence and Messages

  • Complexity of \(O(m\exp(w))\)

  • When evidence at node \(i\) changes, need to invalidate all messages that depend on the factor at that node.
    • Messages directed away from node \(i\) in the elimination tree

Jointree Algorithm

  • Jointree is a method for constructing an elimination tree/strategy for factor elimination
  • Jointree must satisfy the properties
    • The cluster \(C_i\) is a set of nodes from \(G\)
    • Each factor in \(G\) must appear in some cluster \(C_i\)
    • If a varaible appears in two clusters \(C_i\) and \(C_j\), it must appear in every cluster \(C_k\) on the path connecting nodes \(i\) and \(j\) in the jointree.
  • Separator of edge \(i-j\) is denoted by \(S_{ij}\) and is defined as:

\[ S_{ij} = C_i\cap C_j\]

  • Width of jointree is defined as max cluster size \(-1\)
  • Jointree is minimal if it ceases to be a jointree for \(G\) once we remove a variable from one of its clusters

Jointree algorithm

Answering Queries

  • Marginal probability queries on variables that appear together in a clique
    • Sum out irrelevant variables from any clique containing those variables
  • For posterior marginal queries with evidence \(E=e\)
    • If \(Q\) and \(E\) appear in same cluster
      • Eliminate entries that do not agree with evidence \(e\)
      • SUm out irrelevant variables are renormalize
    • If \(Q\) does not appear in a cluster with \(E\)
      • Set evidence indicators in one or more clusters containing \(E\)
      • Propogate messages along path to cluster containing \(Q\)
      • Sum out irrelevant variables

Common Jointree architechtures

  • Common methods for propagating messages in a jointree
    • Shenoy-Shafer
    • Hugin
  • Shenoy-Shafer generally requires less space but takes more time