Jointree
Jointree is a variation of variable elimination that utilises factor elimination rather than variable elimination
Jointree algorithm is particularly useful when we want to compute posterior marginals for each of \(n\) variables
- Jointree has a complexity of \(O(n\exp(w))\) while running VE \(n\) times would lead to \(O(n^2\exp(w))\)
- Saves computations of factors
- For bayesian nets, it will copmpute the posterior marginals for all network families
- For Markov nets, it will provide posterior marginals for all cliques
- Jointree has a complexity of \(O(n\exp(w))\) while running VE \(n\) times would lead to \(O(n^2\exp(w))\)
Factor Elimination
- Consider computation of a marginal over some variable \(Q, P(Q)\)
- Variable elimination eliminates every other variable
- Factor elimination will eliminate all factors except for the one that contains \(Q\)
- Elimination of a factor \(f_i\) from a set of factors is a 2 step process
- Eliminate all variables \(V\) that appear only in \(f\)
- Multiply the result \(\sum_v f_i\) by some other factor \(f_j\) in the set
- Factor projection operation:
\[ project(f,Q) = \sum_{vars(f)-Q}f\]
- Factor elimination is a variation of VE that eliminates a set of variables \(V\) at once rather than a single variable.
- Added step of multiplying the new factor by another factor \(f_j\)
- At the end, sum out any remaining variables that are not in \(Q\) in the last factor
Elimination Trees
In factor elimination, elimination trees are used to specify an elimination strategy
- Analogous to the elimination order for variable elimination
The width of the tree can be used to quantify the amount of work performed.
An elimination tree \(T\) for a set of factors \(S\) has:
- Each factor in \(S\) assigned to exactly one node in \(T\)
- \(\phi_i\) denotes the product of factors assigned to node \(i\) in \(T\)
- When computing the marginal, need to choose a root node \(r\):
- Root must have the variables of interest, aka \(Q\subseteq vars(r)\)
- Not strictly needed but simplifies calculations
- Given an elimination tree and corresponding root \(r\), the strategy is:
- Eliminate factor \(\phi_i\) only if it has a single neighbor \(j\) and \(i\neq r\) (not root node)
- Sum out variables \(V\) that only appear in \(\phi_i\)
- Multiply result \(\sum_V\phi_i\) into the factor \(\phi_j\) associated with the single neighbor \(j\)
Seperators
- The seperator of an edge \(i-j\) in an elim tree is the set of variables:
- \(vars(i,j)\) are variables on the \(i\) side of the edge
\[S_{ij} = vars(i,j)\cap vars(j,i) \]
- When variables \(V\) are summed out of factor \(f_i\) before it is eliminated, the resulting factor is guaranteed to be over separator \(S_{ij}\)
Clusters
- The cluster of a node \(i\) in an elimination tree is a set of variables:
\[ C_i= vars(i)\cup\bigcup_j S_{ij}\]
- The width of an elimination tree is the size of its largest cluster - 1
- When we are about to eliminate node \(i\), the variables of factor \(\phi_i\) are exactly the cluster of node \(i\), \(C_i\)
- The factor \(\phi_r\) must be over the cluster of root \(r\), \(C_r\)
- Therefore, the following algorithm can be used to compute the marginal over any subset of cluster \(C_r\)
Message Passing
- Message passing allows us to save intermediate computations and reuse them across different queries
- The key to decreasing complexity, the reason for the algorithm
- Given an elimination tree \(T\) with root \(r\):
- For each node \(i\) that is not the root, there is a unique neighbor that is closest to root \(r\)
- A node \(i\) will be eliminated from the tree only after all its neighbors, except the one closest to the root have been eliminated
- When a node \(i\) is about to be eliminated, it will have a single neighbor \(j\). Its current factor will have all varaibles by the seperator \(S_{ij}\) eliminated and will be multiplied by factor \(j\)
- Consider the elimination of node \(i\) with a single neighbor \(j\) as a process of passing a message \(M_{ij}\) from node \(i\) to \(j\)
- When \(j\) recieves the message, it multiplies it into its current factor \(\phi_j\)
- Node \(i\) can’t sent a message to \(j\) until it has recieved messages from all other neighbors \(k\neq j\)
- After \(i\) recieves all these messages, its current factor will be:
\[ \phi_i\prod_{k\neq j}M_{ki}\]
- The message \(i\) sends to \(j\) will be
- Eliminates all variables except for the separator variables \(S_{ij}\)
\[ \sum_{C_i\backslash S_{ij}}\phi_i\prod_{k\neq j}M_{ki}\]
- Formulating factor elimination as message passing now:
- To compute marginal over some variables \(Q\):
- Select a root \(r\) in the elimination tree such that \(Q\subseteq C_r\)
- Push messages towards root \(r\)
- When all messages into the root are avalaible, we multiply them by \(\phi_r\) and eliminate variables not in \(Q\)
- If our elimination tree has \(m\) nodes and \(m-1\) edges, then \(m-1\) messages need to be sent
- This allows us to easily compute the marginal over any other cluster \(C_i, i\neq r\)
- Choose \(i\) as new root node and repeat message passing process
- Some additional messages may need to be passed/computed, but not as many as \(m-\) assuming we saved messages sent to \(r\)
- Choosing every node as a root, the total number of messages is \(2(m-1)\)
- Normally compute all \(2(m-1)\) in 2 phases
- Inward towards a root \(r\)
- Outwards away from root \(r\)
Complexity
- Message \(i\) to \(j\) is computed by multiplying a few factors
- Complexity of both multiplication and summation is \(O(\exp(z))\), where \(z\) is size of cluster \(C_i\)
- Therefore, the cost of any message is \(O(\exp(w))\) where \(w\) is the width of the elimination
- Considering \(2(m-1)\) messages, the total cost is \(O(m\exp(w))\) for \(m\) nodes
Evidence
- Given some evidence \(e\), we want to use factor elimination to compute joint marginal \(P(C_i,e)\) for each cluster \(C_i\). We can:
- Reduce each factor given evidence \(e\) to a set of reduced factors \(f^e\)
- More efficient if plan to compute marginals with respect to one piece of evidence \(e\)
- Introduce an evidence indicator \(\lambda_E\) for every variable \(E\) in evidence \(e\).
- More efficient if plan to compute marginals with respect to multiple pieces of evidence \(e_1,...,e_n\)
- Assign indicator to node \(i\) in elimnination tree while ensuring \(E\in C_i\)
- Adjust outcome space
- Reduce each factor given evidence \(e\) to a set of reduced factors \(f^e\)
FE with Evidence and Messages
- Complexity of \(O(m\exp(w))\)
- When evidence at node \(i\) changes, need to invalidate all messages that depend on the factor at that node.
- Messages directed away from node \(i\) in the elimination tree
Jointree Algorithm
- Jointree is a method for constructing an elimination tree/strategy for factor elimination
- Jointree must satisfy the properties
- The cluster \(C_i\) is a set of nodes from \(G\)
- Each factor in \(G\) must appear in some cluster \(C_i\)
- If a varaible appears in two clusters \(C_i\) and \(C_j\), it must appear in every cluster \(C_k\) on the path connecting nodes \(i\) and \(j\) in the jointree.
- Separator of edge \(i-j\) is denoted by \(S_{ij}\) and is defined as:
\[ S_{ij} = C_i\cap C_j\]
- Width of jointree is defined as max cluster size \(-1\)
- Jointree is minimal if it ceases to be a jointree for \(G\) once we remove a variable from one of its clusters
Jointree algorithm
Answering Queries
- Marginal probability queries on variables that appear together in a clique
- Sum out irrelevant variables from any clique containing those variables
- For posterior marginal queries with evidence \(E=e\)
- If \(Q\) and \(E\) appear in same cluster
- Eliminate entries that do not agree with evidence \(e\)
- SUm out irrelevant variables are renormalize
- If \(Q\) does not appear in a cluster with \(E\)
- Set evidence indicators in one or more clusters containing \(E\)
- Propogate messages along path to cluster containing \(Q\)
- Sum out irrelevant variables
- If \(Q\) and \(E\) appear in same cluster
Common Jointree architechtures
- Common methods for propagating messages in a jointree
- Shenoy-Shafer
- Hugin
- Shenoy-Shafer generally requires less space but takes more time