05-12/03/2018

Refreshment for your Stat/Prob HP

  • A Probability space: \((\Omega,\mathcal{F},P)\), \(\omega \in \Omega\) events; \(\mathcal{F}=\sigma(\Omega)\), a \(\sigma\)-algebra of \(\Omega\), namely a collection of events in \(\Omega\); \(P(\mathcal{A})\) a probability measure defined on any \(\mathcal{A}\subset \mathcal{F}\).

  • A random variable \(X(\omega)\) has multiple possible outcomes, say in a set \(\mathcal{X}\) (a state space). \(X^{-1}(\mathcal{X}) \subset \mathcal{F}\).

  • Formal definition: a random variable \(X\) is a measurable function \(X: \Omega \rightarrow \mathcal{X}\) from the sample space \(\Omega\) to another measurable space \(\mathcal{X}\). The probability of this random variable \(P(\omega \in \Omega: X(\omega) \in \mathcal{X})\).

  • Conditional Probability: For events \(\mathcal{A}\) and \(\mathcal{B}\) in \((\Omega,\mathcal{F},P)\) such that \(P(\mathcal{B})>0\), the conditional probability: \[P(\mathcal{A}|\mathcal{B})=\frac{P(\mathcal{A}\cap\mathcal{B})}{P(\mathcal{B})},\;\mbox{or }P(\mathcal{A}|\mathcal{B})=\frac{P(\mathcal{B}|\mathcal{A}) P(\mathcal{A})}{P(\mathcal{B})}.\]

Conditional Probability and Dynamics

  • Recall the ARMA setting: e.g. AR(1), \(X_t = \beta X_{t-1} + \varepsilon_t\), \(\varepsilon_t \sim \mathcal{GWN}(0,\sigma^2)\).

  • Conditional probability: \(P(X_t | X_{t-1}=x_{t-1})=\mathcal{N}(\beta x_{t-1}, \sigma^2)\).

  • Generalization: for a larger class of time series, what would be the corresponding probabilities for describing their dynamics?

  • Markov chain: the possible easiest way to approach time series in the nonlinear context.

Filtered Space

  • Filtered space: \((\Omega,\mathcal{F},\{\mathcal{F}_{t}\},P)\)

  • \((\Omega,\mathcal{F},P)\): a probability space

  • \(\{\mathcal{F}_{t}:t\geq0\}\): a filteration - an increasing family of \(\sigma\)-algebras of \(\mathcal{F}\) \[\mathcal{F}_{0} \subset \cdots \subset \mathcal{F}_{t}\cdots\subseteq\mathcal{F}\]

  • A time series \(X=\{X_{t},\, t\in\mathbb{N}\}\) is a countable collection of random variables defined on a filtered space \((\Omega,\mathcal{F},\{\mathcal{F}_{t}\},P)\) where \(X_{t}\) is \(\mathcal{F}_{t}\)-measurable with a natural filtration \[\mathcal{F}_{t}=\sigma(X^{-1}_{0}(\mathcal{X}_1),\dots,X^{-1}_{t}(\mathcal{X}_t)).\]

Markov chains and State space categories

  • Consider a stochastic process \(\{ X_t \}\). For eahc fixed \(t\), \(X_t(\omega)\) is a random variable.
  1. Discrete \(\mathcal{X}\), e.g. \(\mathcal{X} = {0, \pm1, \pm 2,\dots}\), and \(t={1,2,\dots}\): simple random walk, discrete-time Markov chain.

  2. Discrete \(\mathcal{X}\), e.g. \(\mathcal{X} = {0, \pm1, \pm 2,\dots}\), and \(t\in [0,\infty)\): a birth process, Poisson process, continuous-time Markov chain

  3. Continuous \(\mathcal{X}\), e.g. \(\mathcal{X} = \mathbb{R}^{+}=[0,\infty)\), and \(t={1,2,\dots}\): a continuous state space, a birth and death process modeled by a stochastic difference equation

  4. Continuous \(\mathcal{X}\), e.g. \(\mathcal{X} = \mathbb{R}^{+}=[0,\infty)\), and \(t\in [0,\infty)\): a birth and death process modeled by a stochastic differential equation

Markov chain

  • Markov property (with order one): \[\Pr \{ X_t = x_t | X_0 = x_0, \dots, X_{t-1} = x_{t-1} \} = \Pr \{ X_t = x_t | X_{t-1} = x_{t-1} \}\]

  • Discrete-time Markov chain of order \(m\): a time series \(\{X_{t},\, t\in\mathbb{N}\}\) in \((\Omega,\mathcal{F},\{\mathcal{F}_{t}\},P)\) with the property \[P(X_{t+1}=x|\,\mbox{information up to time }t)=P(X_{t+1}=x|\, X_{t},\dots,X_{t-m+1}).\]

  • A common notation \(p_{ij}(m) = \Pr \{ X_t= s_i | X_{t-m} = s_j \}\) where \(s_i, s_j \in \mathcal{X}\).

Markov chain: Stationarity

  • Consider Markov chain of order \(1\): \[P(X_{t}|X_{t-1})\] is called transition probability.

  • If \[P(X_{t+i}=x'|X_{t}=x)=P(X_{i}=x'|X_{0}=x)\] then the chain is time-homogeneous or stationary. Otherwise, the chain is nonhomogeneous or non-stationary.

2D-Random Walk

  • Each round, the particle has to randomly choose one direction from \(\{\mbox{up, down, right, left}\}\) to proceed. The probability of choosing any direction is uniformly distributed.

  • Pair-index in 2D \((1,2)\): \(1\) stands for \(x\)-axis, namely \(\{\mbox{right, left}\}\), \(2\) for \(y\)-axis.

set.seed(2019)
sample(c(1, 2), 5, replace=TRUE) # or sample(c(1, 2), 5, TRUE, prob=c(0.5,0.5))
## [1] 2 2 1 2 1
  • Randomly decide move forward \((1)\) or backward \((-1)\) in the selected axis.
sample(c(-1, 1), 5, TRUE)
## [1] -1  1 -1 -1  1

2D-Random Walk

  • One-step transition probability \(P(X_{1}=i|X_{0}=(0,0))=1/4\) for \(i\in\{(0,1),(1,0),(0,-1),(-1,0)\}\).

  • Given the state \(j\) at time \(t-1\), the transition probability is \(P(X_{t}=i|X_{t-1}=j)=1/4\) for \(i\in\{(j,j+1),(j+1,j),(j,j-1),(j-1,j)\}\).

set.seed(2019); n = 10000; rw = matrix(0, ncol = 2, nrow = n)
index = cbind(seq(n), sample(c(1, 2), n, TRUE)) # x or y-axis
rw[index] = sample(c(-1, 1), n, TRUE)  # the walk
rw[, 1] = cumsum(rw[, 1]); rw[, 2] = cumsum(rw[, 2]); head(cbind(rw,index), 4); 
##      [,1] [,2] [,3] [,4]
## [1,]    0    1    1    2
## [2,]    0    0    2    2
## [3,]    1    0    3    1
## [4,]    1   -1    4    2

2D-Random Walk

end = cbind(rw[10000,1],rw[10000,2]); start = cbind(rw[1,1],rw[1,2]); end
##      [,1] [,2]
## [1,]  -47  -15

Transition Matrix

  • A square matrix \(\mathbf{P}\) with entities \(\{p_{ij}\}\) is called a (probability) transition matrix if
  1. \(p_{ij}\geq0\) for all \(i\), \(j\).

  2. For each row \(j\), \(\sum_{i}p_{ij}=1\).

  3. \(p_{ij}=P(X_{t+1}=s_i|X_{t}=x_j)\) for a homogenous Markov chain \(\{X_{t},\, t\in\mathbb{N}\}\).

  • The \(n\)-step transition matrix is simply \(\mathbf{P}^n\) for the homogenous Markov chain.

  • The \(0\)-step transition matrix is \(\mathbf{P}^0=\mathbf{I}\).

  • For inhomogenous Markov chain, \(p_{ij}(t)\) is time-dependent so as \(\mathbf{P}_{t}\).

Example: Migration

  • Migration: West stay at west (\(0.8\)), west go to east (\(0.2\)). East stay at east \((0.7)\), east go to west (\(0.3\)). Suppose at \(t=0\), \(10\) at west and \(10\) at east. After \(30\) periods, what is the distribution of the population? \[ \mathbf{P} = \left[\begin{array}{cc} 0.8 & 0.3\\ 0.2 & 0.7 \end{array}\right]\]
library(expm); P=matrix(c(0.8,0.3,0.2,0.7), nrow=2, byrow=TRUE)
P%^%30 %*% c(10,10) # or P%^%30 %*%matrix(c(10,10),nrow=2)
##      [,1]
## [1,]   12
## [2,]    8

Example: Migration

P%^%5
##        [,1]    [,2]
## [1,] 0.6125 0.58125
## [2,] 0.3875 0.41875
P%^%15
##           [,1]      [,2]
## [1,] 0.6000122 0.5999817
## [2,] 0.3999878 0.4000183
P%^%30
##      [,1] [,2]
## [1,]  0.6  0.6
## [2,]  0.4  0.4

Graph Plot

library(diagram)
rownames(P)=c("West", "East"); plotmat(P, relsize = 0.7)

Classification of States

  • Transient: there is a non-zero probability that the Markov chain will never return to this state. (It can be visited only a finite number of times)

  • Recurrent: the Markov chain returns to some states infinitely often

  • Positive and Null recurrent: Recurrent states whose mean recurrence time are finite (infinite) are said to be positive (null) recurrent.

Classification of States

  • Transient (1,2,3,4,5), Recurrent (6,7,8).

Matrix Representation

Classification of States

  • Periodic with period \(d(i)\) for state \(i\) (6 and 7 with a period \(2\)): Returning to the state takes place at a \(d\) number of time steps. (\(d(i)\) is the greatest common divisor - gcd - of the set of integers \(n\) for which \(p_{ii} (n)>0\))

  • Aperiodic (8): the period of a state equals one \(d=1\).

  • Absorbing (8): If the Markov chain ever arrives in this state, it will remain there forever.

Classification of States

  • State \(i\) can be reached from state \(j\) if there is a non-zero probability \(p_{ij}(t)>0\) for some \(t\geq 0\). This relationship is denoted as \(j \rightarrow i\).

  • If \(j \rightarrow i\) and \(i \rightarrow j\), then \(i\) and \(j\) are said to communicate \(i\leftrightarrow j\).

  • A system is irreducible if all states communicate. (There is only one communication class.)

  • A state is ergodic if it is both aperiodic and positive recurrent. An ergodic chain is a Markov chain that is irreducible, aperiodic and positive recurrent. An aperiodic, irreducible and recurrent Markov chain has a stationary limiting distribution (Basic limit theorem for aperiodic Markov chain).

Example: a Markov chain with four states.

Example

  • The transition matrix (all listed \(p_{ij}\) are positive): \[\left[\begin{array}{cccc} 0 & 0 & p_{13} & 0\\ p_{21} & 0 & p_{23} & p_{24}\\ 0 & 0 & 0 & 0\\ 0 & p_{42} & 0 & 0 \end{array}\right]\]

  • \(4\leftrightarrow 2 \leftarrow 1 \leftarrow 3\) and \(4 \leftrightarrow 2 \leftarrow 3\)

  • Three communication classes: \(\{1\}\), \(\{3\}\), \(\{2,4\}\). \(\{2,4\}\) is closed but the others are not.

  • If \(p_{32}\) or \(p_{34}\) was positive, then there would be a single communication class \(\{1,2,3,4\}\) so that the Markov chain would be irreducible.

Unconditional Probability

  • Unconditional probability \(\mu_i(t)=P(X_{t}=s_i)\) is different from the transition probability \(P_{ij}\).

  • Let \(\mu(t)=(\mu_0(t),\mu_1(t),\dots)\) be a probability measure on the countable state space \(\mathcal{X}\) at time \(t\).

  • For countable state space, \(\mu(t)\) is a probability vector \[\mu(t)=(P(X_{t}=s_0),P(X_{t}=s_1),\dots).\]

Invariance

P%^%30 %*% c(11,9)
##      [,1]
## West   12
## East    8
P%^%30 %*% c(1,19)
##      [,1]
## West   12
## East    8

Invariant Probability

  • Markov chain \(X_{t}\) at time \(t\) is a random variable associated with its own the unconditional probability \(\mu(t)\).

  • Evolution of \(\mu(t)\) follows \[\mu(t+1)=\mathbf{P}(t)\mu(t)\] where \(\mathbf{P}_{t}\) is the probability transition matrix of \(X_{t}\).

  • Invariant Probability: the limiting distribution \(\mu= (\mu_0, \mu_1,\dots)\) is an invariant probability measure on \(\mathcal{X}\). Any \(\mu_j\) in \(\mu\): \[\mu_{i}=\lim_{t\rightarrow\infty}P(X_{t}= s_i)\,\, \mbox{for any } s_i \in\mathcal{X}\]

Invariant Probability

  • If \(X_{t}\) is time-homogenous, then \(\mu(t+1)=\mathbf{P}\mu(t)\) and \[\mu(t+1)=\mathbf{P} \mu(t)=\mathbf{P}^{2}\mu(t-1)=\cdots=\mathbf{P}^{t+1}\mu(0).\]

  • Especially \[\mathbf{P}^{t+s}=\mathbf{P}^{t}\cdot\mathbf{P}^{s}\] which is called Chapman-Kolmogorov (CK) equation \[p_{ij}(t)=\sum_{k}p_{ik}(t-1)p_{kj}.\]

CK equation (the simplest version)

  • Relationship exist between the \(n\)-step transition probabilities and \(m\)-step and \((n-m)\)-step transition probabilities: \[ p_{ij} (n) = \sum_{k=1}^{\infty} p_{ik}(n-m)p_{kj}(m),\,\, 0<m<n.\]

  • If \(\lim_{t\rightarrow\infty}p_{ij}(t)\rightarrow \mu_{i}\) for any \(j\), namely the initial state doesn’t matter \[\lim_{t\rightarrow\infty}P(X_{t}= s_i |X_{0}= s_j)=\lim_{t\rightarrow\infty}P(X_{t}= s_i)=\mu_{i},\] then taking the limit of CK equation we have the equilibrium equation \[\mu_{i}=\sum_{k}p_{ik}\mu_{k}, \mbox{ or in the matrix form } \mu=\mathbf{P}\mu.\]

Eigen property of Invariant Probability

  • For a squared Matrix \(\mathbf{A}\), \(\mathbf{A} \mathbf{x}=\lambda \mathbf{x}\). Then \(\mathbf{A}^2 \mathbf{x}=\lambda^2 \mathbf{x}\) (as eigenvectors stay at their own directions in \(\mathbf{A}\)).

  • The matrix form \(\mu=\mathbf{P}\mu\). \(\mathbf{P}\) is Markov matrix. Its entries are positive, every column adds to one.

  • The largest eigenvalue of \(\mathbf{P}\) is \(\lambda=1\). The corresponding eigenvector is the steady state (fixed point) \(\mathbf{P} \mathbf{x}=\mathbf{x}\) and after normalizing it is the invariant probability \(\mathbf{P}\mu = \mu\).

Eigen property of Invariant Probability

v = eigen(P, symmetric=FALSE); v
## eigen() decomposition
## $values
## [1] 1.0 0.5
## 
## $vectors
##           [,1]       [,2]
## [1,] 0.8320503 -0.7071068
## [2,] 0.5547002  0.7071068
x = v$vectors[,1]; mu = x/sum(x); mu
## [1] 0.6 0.4

Remarks

  • Markov property (Memoryless in continuous time)

  • State space and different types of Markov chains

  • Transition Matrix for discrete states and discrete time

  • Classifications of states

  • Invariant Probability

  • Evolution (Chapman-Kolmogorov equation/Fokker-Planck equation in continous time)