Dynamic Bayesian Networks

Graphical models with a repeating structure that grows with time
- Time/space dynamic
Utilise the markov property
- Future states are independent of past given the current state

Markov Chains

Consider a discrete variable \(X\) with states
- Transitions between states are nondeterministic

Prior probabilities \(P(X_1)\) denote the starting distribution
- Transition probability/dynamics are denoted \(P(X_t|X_{t-1})\)

Assuming stationary, there is \(|X| + |X|^2\) parameters
- \(|X|\) for number of states
- \(|X|^2\) for probability table (all combos of \(X_{t-1},X_t\))

Properties

Stationary property
- Chains are time-homogeneous
- Transitions probabilities are the same for all values of \(t\)
Markov property
- Future states are independent of the past given the current state

\[ X_{t+1}\perp X_{t-1}|X_t\]

Chain Rule

Chain rule for bayesian networks work here for a sequence of states:

\[ P(X_1,...,X_t) = P(X_1)*\prod_{i=1}^nP(X_t|X_{t-1}) \]

Staying in State

Probability of staying in a certain state for \(d\) steps:
- Stay for \(d-1\) steps, transition out on \(dth\) step.

\[ P(S_s^d)=P(s|s)^{d-1}(1-P(s|s))\]

Expected time spent in a state:

\[ E[S_s]=\frac{1}{1-P(s|s)}\]

Mini-Forward Algorithm

We can find \(P(X_t)\) for any time \(t\) by simulating the chain with the mini-forward algorithm.
- \(O(n|X|^2)\)

\[ P(x_t) = \sum_{x_{t-1}}P(x_t,x_{t-1}) = \sum_{x_{t-1}}P(x_t|x_{t-1})P(x_{t-1}) \]

Example

\[ P(X_t=sun) = P(X_t=sun|X_{t-1}=sun)P(X_{t-1}=sun)\]

\[ + P(X_t=sun|X_{t-1}=rain)P(X_{t-1}=rain) \]

Irreducible Chains

A MC is irreducible if every state \(x'\) is reachable from every other state \(x\)
- For every pair of states \((x,x')\), there is a time \(t\) such that \(P(X_t=x'|X_1=x > 0)\)
- Also known as a regular/ergodic chain
  - The states of MC are recurrent, meaning they are guarenteed to be visited an infinite number of times when simulating the chain.

Aperiodic Chains

A MC is aperiodic if it is possible to return to any state at any time
- There exists a \(t\) such that for all state \(x\) and all \(t'\geq t\). \(P(X_{t'}=x|X_1=x)>0\)

Stationary Distribution

For most chains, \(P(X)\) will converge independent of the initial distribution.
Stationary distribution \(\pi\) is the distribution we obtain if the chain converges

\[\pi(X) = \sum_xP(X|x)\pi(x)\]

\[ \pi = \pi T\] * Example:

\[ \pi(sun) = P(sun|sun)\pi(sun) + P(sun|rain)\pi(rain)\]

\[ \pi(rain) = P(rain|sun)\pi(sun) + P(rain|rain)\pi(rain)\] \[ \pi(sun) + \pi(rain) = 1\]

Stationary distribution can also be considered the \(\%\) of time in each state
An irreducible and aperiodic MC converges to a unique stationary distribution
- Irreducible: We can go from any state to any state
- Aperiodic: Avoids chains that alternate forever and never settle into a stationary distribution
Note that convergence may be possible but very slow

Hidden Markov Models

Extension of markov chains where states are not directly observable
- We are able to observe ‘emissions’ that are probablisitically linked to hidden states
HMM have two main components
- Underlying markov chain over states \(X\)
- Observable outputs (effects of states) at each time step.
  - Often called emissions

HMM parameters include:
- Inital distribution \(P(X_1)\)
- Transition probabilities \(P(X_t|X_{t-1})\)
- Emissions probabilities \(P(E_t|X_t)\)

Chain Rule

Same as usual

\[ P(X_1,E_1,...,X_n,E_n) = P(X_1)P(E_1|X_1)\prod^n_{t=2}P(X_t|X_{t-1})P(E_t|X_t)\]

Independencies

State is independent of all past states and evidence given previous state

\[ X_t\perp X_1,...,X_{t-2},E_1,...,E_{t-1}|X_{t-1}\]

Evidence is independent of all past evidence and states given the current state

\[ E_t\perp X_1,...,X_{t-1},E_1,...,E_{t-1}|X_t\]

Inference

Need to track \(P(X_t)\) over time using evidence.
- \(B(X_t) = P(X_t|e_1,...,e_t)\) denotes belief of state at \(t\)
- \(B(X_1)\) is typically a uniform distribution
Inference has 2 main steps
- Passage of time
- Observation of evidence

Passage of Time

Consider a current state of belief

\[ B(X_t) = P(X_t|e_{1:t}) \]

Need to update belief for 1 time step
- Change of belief before we observe emissions

\[ P(X_{t+1}|e_{1:t}) \]

To compute this:

\[ P(X_{t+1}|e_{1:t}) = \sum_{x_t}P(X_{t+1}|x_t)B(x_t)\]

Observing Evidence

After the time step we get an emission that can be used to update belief.

\[ B(X_{t+1})=P(X_{t+1}|e_{1:t+1})\] \[ \propto P(e_{{t+1}}|X_{t+1})P(X_{t+1}|e_{1:t})\]

Renormalise the results by dividing by:

\[\sum_{x_t} B(X_{t+1}) \]

Forward Algorithm

Considering a sequence of emissions, we want to know the state of belief

\[ B(X_t) = P(X_t|e_{1:t}) \]

We consider:

\[ P(X_t|e_{1:t})\propto P(X_t,e_{1:t})\] \[ = \underbrace{P(e_t|X_t)}_{\text{Observation}}\overbrace{\sum_{x_{t-1}}P(X_t|x_{t-1})P(x_{t-1},e_{1:t-1})}^{\text{Passage of Time}} \]

Most Probable Explanation

We want to compute the most probable sequence of states conditioned on a sequence of evidence.

\[ argmax_{x_{1:t}}P(x_{1:t}|e_{1:t}) \]

State Trellis

A state trellis is a graph that illustrates state transition over time
- Each arm represents a time passage/evidence observation with weight

\[ P(x_t|x_{t-1})P(e_t|x_t)\]

Considering a path as a sequence of states
- Product of weights on a path is the sequence probability according to evidence
- Linked to forward algorithm, which computes the sum of the paths that end in the same state
- MPE involves calculating path with highest probability

Viterbi Algorithm

Viterbi Algorithm computes the maximum of path probabilities that lead to the same final state

\[ m[x_t] = \max_{x_{1:t-1}}P(x_{1:t-1},x_t|e_{1:t})\] \[ = P(e_t|X_t)\max_{x_{t-1}}P(x_t|x_{t-1})m[x_{t-1}]\]

Algorithm:
- \(O(n|X|^2)\) complexity

Example:
- Consider the max probability path to each possible state. We backtrack from the maximal probability.

This algorithm provides the probability of the most likely sequence but not the sequence itself.
- Can keep an additional structure pointing to parent of each node
- Backtrack computation
  - Divide by probability of evidence \(P(e_t|X_t\).
  - For each state \(x_{t-1}\) divide by \(P(x_t|x_{t-1})\).
  - See what value of \(x_{t-1}\) matches result.

Log Probability Moditification

Often use log probabilities with viterbi to avoid vanishing probabilities
- Paths grow exponentially with sequence size

\[ m[x_t] = \log(P(e_t|X_t))+\max_{x_{t-1}}\log(P(x_t|x_{t-1}))+m[x_{t-1}]\]

Algorithm:

Particle Filtering

Sometimes \(|X|\) is too big for use with exact inference.
- \(|X|\) may be too big to even store \(B(X)\)
- E.g. \(X\) is continuous
An approximate solution to inference
- Track samples of \(X\), not all values
- Samples are called particles
- Time per step is linear in number of samples
- In memory, we can store a list of particles rather than a list of states.

Representation of \(P(X)\) is now a list of \(N\) particles (samples)
- Generally \(N<<|X|\)
- \(P(X)\) is approximated by number of particles with value \(x\)
  - More particles, higher accuracy
  - Many \(x\) values will therefore have \(P(x) = 0\)

Elapsing Time

Each particle is moved by sampling its next position from the transition model

\[ x' = sample(P(X'|x))\]

Observing Evidence

Downweight samples based on evidence
- Probabilities don’t sum to 1 but rather sum to \(N\) times an approximation of \(P(e)\)

\[ w(x) = P(e|x) \]

\[B(X)\propto P(e|X)B'(X) \]

Rather than tracking weighted samples, we resample
- Draw \(N\) from our weighted distribution with replacement

Markov Models

Jake

31/10/2022

Dynamic Bayesian Networks

Markov Chains

Properties

Chain Rule

Staying in State

Mini-Forward Algorithm

Irreducible Chains

Aperiodic Chains

Stationary Distribution

Hidden Markov Models

Chain Rule

Independencies

Inference

Passage of Time

Observing Evidence

Forward Algorithm

Most Probable Explanation

State Trellis

Viterbi Algorithm

Log Probability Moditification

Particle Filtering

Elapsing Time

Observing Evidence