Dynamic Bayesian Networks
- Graphical models with a repeating structure that grows with time
- Time/space dynamic
- Utilise the markov property
- Future states are independent of past given the current state
Markov Chains
- Consider a discrete variable \(X\) with states
- Transitions between states are nondeterministic
- Prior probabilities \(P(X_1)\) denote the starting distribution
- Transition probability/dynamics are denoted \(P(X_t|X_{t-1})\)
- Assuming stationary, there is \(|X| + |X|^2\) parameters
- \(|X|\) for number of states
- \(|X|^2\) for probability table (all combos of \(X_{t-1},X_t\))
Properties
- Stationary property
- Chains are time-homogeneous
- Transitions probabilities are the same for all values of \(t\)
- Markov property
- Future states are independent of the past given the current state
\[ X_{t+1}\perp X_{t-1}|X_t\]
Chain Rule
- Chain rule for bayesian networks work here for a sequence of states:
\[ P(X_1,...,X_t) = P(X_1)*\prod_{i=1}^nP(X_t|X_{t-1}) \]
Staying in State
- Probability of staying in a certain state for \(d\) steps:
- Stay for \(d-1\) steps, transition out on \(dth\) step.
\[ P(S_s^d)=P(s|s)^{d-1}(1-P(s|s))\]
- Expected time spent in a state:
\[ E[S_s]=\frac{1}{1-P(s|s)}\]
Mini-Forward Algorithm
- We can find \(P(X_t)\) for any time \(t\) by simulating the chain with the mini-forward algorithm.
- \(O(n|X|^2)\)
\[ P(x_t) = \sum_{x_{t-1}}P(x_t,x_{t-1}) = \sum_{x_{t-1}}P(x_t|x_{t-1})P(x_{t-1}) \]
- Example
\[ P(X_t=sun) = P(X_t=sun|X_{t-1}=sun)P(X_{t-1}=sun)\]
\[ + P(X_t=sun|X_{t-1}=rain)P(X_{t-1}=rain) \]
Irreducible Chains
- A MC is irreducible if every state \(x'\) is reachable from every other state \(x\)
- For every pair of states \((x,x')\), there is a time \(t\) such that \(P(X_t=x'|X_1=x > 0)\)
- Also known as a regular/ergodic chain
- The states of MC are recurrent, meaning they are guarenteed to be visited an infinite number of times when simulating the chain.
Aperiodic Chains
- A MC is aperiodic if it is possible to return to any state at any time
- There exists a \(t\) such that for all state \(x\) and all \(t'\geq t\). \(P(X_{t'}=x|X_1=x)>0\)
Stationary Distribution
- For most chains, \(P(X)\) will converge independent of the initial distribution.
- Stationary distribution \(\pi\) is the distribution we obtain if the chain converges
\[\pi(X) = \sum_xP(X|x)\pi(x)\]
\[ \pi = \pi T\] * Example:
\[ \pi(sun) = P(sun|sun)\pi(sun) + P(sun|rain)\pi(rain)\]
\[ \pi(rain) = P(rain|sun)\pi(sun) + P(rain|rain)\pi(rain)\] \[ \pi(sun) + \pi(rain) = 1\]
Stationary distribution can also be considered the \(\%\) of time in each state
An irreducible and aperiodic MC converges to a unique stationary distribution
- Irreducible: We can go from any state to any state
- Aperiodic: Avoids chains that alternate forever and never settle into a stationary distribution
Note that convergence may be possible but very slow
Particle Filtering
- Sometimes \(|X|\) is too big for use with exact inference.
- \(|X|\) may be too big to even store \(B(X)\)
- E.g. \(X\) is continuous
- An approximate solution to inference
- Track samples of \(X\), not all values
- Samples are called particles
- Time per step is linear in number of samples
- In memory, we can store a list of particles rather than a list of states.
- Representation of \(P(X)\) is now a list of \(N\) particles (samples)
- Generally \(N<<|X|\)
- \(P(X)\) is approximated by number of particles with value \(x\)
- More particles, higher accuracy
- Many \(x\) values will therefore have \(P(x) = 0\)
Elapsing Time
- Each particle is moved by sampling its next position from the transition model
\[ x' = sample(P(X'|x))\]
Observing Evidence
- Downweight samples based on evidence
- Probabilities don’t sum to 1 but rather sum to \(N\) times an approximation of \(P(e)\)
\[ w(x) = P(e|x) \]
\[B(X)\propto P(e|X)B'(X) \]
- Rather than tracking weighted samples, we resample
- Draw \(N\) from our weighted distribution with replacement