Modelling_Bayesian

Bayesian Networks can be queried for probabilities

Probability of Evidence

A simple query is the probability of some variable instantiation
- Consider evidence \(\mathbf{E}\) and network variables \(\mathbf{W}\)

\[ P(\mathbf{e}),\quad\mathbf{E}\in\mathbf{W}\]

For example:

\[ P(X=True, D=True)\]

Prior and Posterior Marginals

A prior marginal is a marginal expression with no evidence

\[ P(x_1,...,x_m) = \sum_{x_{m+1},...,x_{n}}P(X_1,...,X_n)\]

A posterior marginal is a marginal expression with evidence

\[ P(x_1,...,x_m|\mathbf{e}) = \sum_{x_{m+1},...,x_{n}}P(X_1,...,X_n|\mathbf{e})\]

Auxiliary Nodes

Auxiliary nodes can be used to calculate simple evidence queries, as well as more complicated evidence queries. For example, an OR query:
- This is only practical when there are small number of evidence variables, as the CPT grows exponentially.

\[ P(X=True\lor D=True) \]

Most Probable Explanation

A popular query is the MPE, the most probable variable instantiation given evidence \(\mathbf{e}\)
- Maximise the posterior marginal:

\[ \max P(x_1,...,x_n|\mathbf{e}) \]

Note that the MPE cannot be obtained by considering the individual marginal distributions for each variable.

Maximum a Posteriori Hypothesis (MAP)

Given map variables \(\mathbf{M}\subseteq\mathbf{X}\) and evidence \(\mathbf{e}\), our goal is to find instantiation \(\mathbf{m}\) forr which \(P(\mathbf{m|e})\)
MPE is a special case of the MAP query where \(\mathbf{M=X}\)
- MPE is however easier to compute, therefore we typically take this as an approximation
  - Calculate MPE and take results of variables \(M\) only.

Sensitivity Analysis

We can conduct sensitivity analysis on network parameters
- Find which (non-evidence) parameters need to change, by how much, to get a posterior marginal probability within a certain range.

Intermediate Variables

Intermediate variables are neither query or evidence variables, but can help with modelling.
- Progesterone Level (L) in the following graph:

Intermediate variables can be bypassed without affecting model accuracy if:

\[ P(\mathbf{q,e}) = \underbrace{P'(\mathbf{q,e})}_{\text{G without intermediate}}\]

This is only satisfied if intermediate variable \(X\) has a single variable \(Y\)
- However, even if you can bypass an intermediate variable, deleting it tends to create larger CPTs.
- When bypassing a variable with a single child, the new CPT is calculated through (with parents of \(X\),\(U\), and parents of \(Y\) other than that \(V\)):

\[ \theta'_{y|\mathbf{uv}}=\sum_x\theta_{y|x\mathbf{v}}\theta_{x|\mathbf{u}}\]

Dealing with Large CPTs

CPTS have \(d^{k+1}\) parameters, noting \(k\) is the number of parents and \(d\) is the possible variable values
- There are modelling issues as \(k\) grows, but there are also issues with specifying the probabilities for each combination if it is being done manually.

Micro-Model - Noisy Or

Noisy-Or specifies relationships between parents and a common child.
Each parent/cause \(C_i\) is capable of establishing effect \(E\) on its own
- \(C_i\) can however have its effect suppressed by \(Q_i\)
- Lead variable \(L\) represents all other causes of \(E\)
Therefore the model can be specified with \(n+1\) parameters
- \(Q_{q_i} = P(Q_i=active)\)
  - Probability that suppressor of cause \(C_i\) is active
- \(Q_{L} = P(L=active)\)
  - Probability that leak variable is active
let \(I_\alpha\) be the indices of \(C_i=Trrue\):

\[ P(E=False|\alpha) = \underbrace{(1-\theta_L)}_{\text{Leak not active}}\overbrace{\prod_{i\in I_{\alpha}}\theta_{q_i}}^{\text{Each suppressor is active}}\]

Often calculated as \(P(E=True|\alpha) = 1-P(E=False|\alpha)\)

Structured Data Representations

A CPT with a lot of structure may not fit into a noisy-or model.
- There are many ways to compactly represent a structured CPT.
Consider the structured CPT:

Decision Trees:
- With enough structure, the decision tree would be linear in size with parents. For an unstructured, it would be exponential like a normal CPT.

If-Then Rules:
- CPT of E can be represented with a set of if-then rules. Very similar to decision tree.

Deterministic CPTs:

Deterministic CPTs can be represented compactly using propositional sentences.
- Have one rule for each value \(e_i\) of \(E\)
- The premisses \(\Gamma_i\) are mutually exclusive and exhaustive

\[ \Gamma_i\Leftrightarrow E=e_i\]