- Bayesian Networks can be queried for probabilities
Probability of Evidence
- A simple query is the probability of some variable instantiation
- Consider evidence \(\mathbf{E}\) and network variables \(\mathbf{W}\)
\[ P(\mathbf{e}),\quad\mathbf{E}\in\mathbf{W}\]
- For example:
\[ P(X=True, D=True)\]
Prior and Posterior Marginals
- A prior marginal is a marginal expression with no evidence
\[ P(x_1,...,x_m) = \sum_{x_{m+1},...,x_{n}}P(X_1,...,X_n)\]
- A posterior marginal is a marginal expression with evidence
\[ P(x_1,...,x_m|\mathbf{e}) = \sum_{x_{m+1},...,x_{n}}P(X_1,...,X_n|\mathbf{e})\]
Auxiliary Nodes
- Auxiliary nodes can be used to calculate simple evidence queries, as well as more complicated evidence queries. For example, an OR query:
- This is only practical when there are small number of evidence variables, as the CPT grows exponentially.
\[ P(X=True\lor D=True) \]
Most Probable Explanation
- A popular query is the MPE, the most probable variable instantiation given evidence \(\mathbf{e}\)
- Maximise the posterior marginal:
\[ \max P(x_1,...,x_n|\mathbf{e}) \]
- Note that the MPE cannot be obtained by considering the individual marginal distributions for each variable.
Maximum a Posteriori Hypothesis (MAP)
Given map variables \(\mathbf{M}\subseteq\mathbf{X}\) and evidence \(\mathbf{e}\), our goal is to find instantiation \(\mathbf{m}\) forr which \(P(\mathbf{m|e})\)
MPE is a special case of the MAP query where \(\mathbf{M=X}\)
- MPE is however easier to compute, therefore we typically take this as an approximation
- Calculate MPE and take results of variables \(M\) only.
- MPE is however easier to compute, therefore we typically take this as an approximation
Sensitivity Analysis
- We can conduct sensitivity analysis on network parameters
- Find which (non-evidence) parameters need to change, by how much, to get a posterior marginal probability within a certain range.
Intermediate Variables
- Intermediate variables are neither query or evidence variables, but can help with modelling.
- Progesterone Level (L) in the following graph:
- Intermediate variables can be bypassed without affecting model accuracy if:
\[ P(\mathbf{q,e}) = \underbrace{P'(\mathbf{q,e})}_{\text{G without intermediate}}\]
- This is only satisfied if intermediate variable \(X\) has a single variable \(Y\)
- However, even if you can bypass an intermediate variable, deleting it tends to create larger CPTs.
- When bypassing a variable with a single child, the new CPT is calculated through (with parents of \(X\),\(U\), and parents of \(Y\) other than that \(V\)):
\[ \theta'_{y|\mathbf{uv}}=\sum_x\theta_{y|x\mathbf{v}}\theta_{x|\mathbf{u}}\]
Dealing with Large CPTs
- CPTS have \(d^{k+1}\) parameters, noting \(k\) is the number of parents and \(d\) is the possible variable values
- There are modelling issues as \(k\) grows, but there are also issues with specifying the probabilities for each combination if it is being done manually.
Micro-Model - Noisy Or
- Noisy-Or specifies relationships between parents and a common child.
- Each parent/cause \(C_i\) is capable of establishing effect \(E\) on its own
- \(C_i\) can however have its effect suppressed by \(Q_i\)
- Lead variable \(L\) represents all other causes of \(E\)
- Therefore the model can be specified with \(n+1\) parameters
- \(Q_{q_i} = P(Q_i=active)\)
- Probability that suppressor of cause \(C_i\) is active
- \(Q_{L} = P(L=active)\)
- Probability that leak variable is active
- \(Q_{q_i} = P(Q_i=active)\)
- let \(I_\alpha\) be the indices of \(C_i=Trrue\):
\[ P(E=False|\alpha) = \underbrace{(1-\theta_L)}_{\text{Leak not active}}\overbrace{\prod_{i\in I_{\alpha}}\theta_{q_i}}^{\text{Each suppressor is active}}\]
- Often calculated as \(P(E=True|\alpha) = 1-P(E=False|\alpha)\)
Structured Data Representations
- A CPT with a lot of structure may not fit into a noisy-or model.
- There are many ways to compactly represent a structured CPT.
- Consider the structured CPT:
- Decision Trees:
- With enough structure, the decision tree would be linear in size with parents. For an unstructured, it would be exponential like a normal CPT.
- If-Then Rules:
- CPT of E can be represented with a set of if-then rules. Very similar to decision tree.
Deterministic CPTs:
- Deterministic CPTs can be represented compactly using propositional sentences.
- Have one rule for each value \(e_i\) of \(E\)
- The premisses \(\Gamma_i\) are mutually exclusive and exhaustive
\[ \Gamma_i\Leftrightarrow E=e_i\]