Modelling_Bayesian

Jake

29/09/2022

  • Bayesian Networks can be queried for probabilities

Probability of Evidence

  • A simple query is the probability of some variable instantiation
    • Consider evidence \(\mathbf{E}\) and network variables \(\mathbf{W}\)

\[ P(\mathbf{e}),\quad\mathbf{E}\in\mathbf{W}\]

  • For example:

\[ P(X=True, D=True)\]

Prior and Posterior Marginals

  • A prior marginal is a marginal expression with no evidence

\[ P(x_1,...,x_m) = \sum_{x_{m+1},...,x_{n}}P(X_1,...,X_n)\]

  • A posterior marginal is a marginal expression with evidence

\[ P(x_1,...,x_m|\mathbf{e}) = \sum_{x_{m+1},...,x_{n}}P(X_1,...,X_n|\mathbf{e})\]

Auxiliary Nodes

  • Auxiliary nodes can be used to calculate simple evidence queries, as well as more complicated evidence queries. For example, an OR query:
    • This is only practical when there are small number of evidence variables, as the CPT grows exponentially.

\[ P(X=True\lor D=True) \]

Most Probable Explanation

  • A popular query is the MPE, the most probable variable instantiation given evidence \(\mathbf{e}\)
    • Maximise the posterior marginal:

\[ \max P(x_1,...,x_n|\mathbf{e}) \]

  • Note that the MPE cannot be obtained by considering the individual marginal distributions for each variable.

Maximum a Posteriori Hypothesis (MAP)

  • Given map variables \(\mathbf{M}\subseteq\mathbf{X}\) and evidence \(\mathbf{e}\), our goal is to find instantiation \(\mathbf{m}\) forr which \(P(\mathbf{m|e})\)

  • MPE is a special case of the MAP query where \(\mathbf{M=X}\)

    • MPE is however easier to compute, therefore we typically take this as an approximation
      • Calculate MPE and take results of variables \(M\) only.

Sensitivity Analysis

  • We can conduct sensitivity analysis on network parameters
    • Find which (non-evidence) parameters need to change, by how much, to get a posterior marginal probability within a certain range.

Intermediate Variables

  • Intermediate variables are neither query or evidence variables, but can help with modelling.
    • Progesterone Level (L) in the following graph:

  • Intermediate variables can be bypassed without affecting model accuracy if:

\[ P(\mathbf{q,e}) = \underbrace{P'(\mathbf{q,e})}_{\text{G without intermediate}}\]

  • This is only satisfied if intermediate variable \(X\) has a single variable \(Y\)
    • However, even if you can bypass an intermediate variable, deleting it tends to create larger CPTs.
    • When bypassing a variable with a single child, the new CPT is calculated through (with parents of \(X\),\(U\), and parents of \(Y\) other than that \(V\)):

\[ \theta'_{y|\mathbf{uv}}=\sum_x\theta_{y|x\mathbf{v}}\theta_{x|\mathbf{u}}\]

Dealing with Large CPTs

  • CPTS have \(d^{k+1}\) parameters, noting \(k\) is the number of parents and \(d\) is the possible variable values
    • There are modelling issues as \(k\) grows, but there are also issues with specifying the probabilities for each combination if it is being done manually.

Micro-Model - Noisy Or

  • Noisy-Or specifies relationships between parents and a common child.
  • Each parent/cause \(C_i\) is capable of establishing effect \(E\) on its own
    • \(C_i\) can however have its effect suppressed by \(Q_i\)
    • Lead variable \(L\) represents all other causes of \(E\)
  • Therefore the model can be specified with \(n+1\) parameters
    • \(Q_{q_i} = P(Q_i=active)\)
      • Probability that suppressor of cause \(C_i\) is active
    • \(Q_{L} = P(L=active)\)
      • Probability that leak variable is active
  • let \(I_\alpha\) be the indices of \(C_i=Trrue\):

\[ P(E=False|\alpha) = \underbrace{(1-\theta_L)}_{\text{Leak not active}}\overbrace{\prod_{i\in I_{\alpha}}\theta_{q_i}}^{\text{Each suppressor is active}}\]

  • Often calculated as \(P(E=True|\alpha) = 1-P(E=False|\alpha)\)

Structured Data Representations

  • A CPT with a lot of structure may not fit into a noisy-or model.
    • There are many ways to compactly represent a structured CPT.
  • Consider the structured CPT:

  • Decision Trees:
    • With enough structure, the decision tree would be linear in size with parents. For an unstructured, it would be exponential like a normal CPT.

  • If-Then Rules:
    • CPT of E can be represented with a set of if-then rules. Very similar to decision tree.

Deterministic CPTs:

  • Deterministic CPTs can be represented compactly using propositional sentences.
    • Have one rule for each value \(e_i\) of \(E\)
    • The premisses \(\Gamma_i\) are mutually exclusive and exhaustive

\[ \Gamma_i\Leftrightarrow E=e_i\]