LECON2601 (2019)

05-12/03/2018

Refreshment for your Stat/Prob HP

A Probability space: \((\Omega,\mathcal{F},P)\), \(\omega \in \Omega\) events; \(\mathcal{F}=\sigma(\Omega)\), a \(\sigma\)-algebra of \(\Omega\), namely a collection of events in \(\Omega\); \(P(\mathcal{A})\) a probability measure defined on any \(\mathcal{A}\subset \mathcal{F}\).
A random variable \(X(\omega)\) has multiple possible outcomes, say in a set \(\mathcal{X}\) (a state space). \(X^{-1}(\mathcal{X}) \subset \mathcal{F}\).
Formal definition: a random variable \(X\) is a measurable function \(X: \Omega \rightarrow \mathcal{X}\) from the sample space \(\Omega\) to another measurable space \(\mathcal{X}\). The probability of this random variable \(P(\omega \in \Omega: X(\omega) \in \mathcal{X})\).
Conditional Probability: For events \(\mathcal{A}\) and \(\mathcal{B}\) in \((\Omega,\mathcal{F},P)\) such that \(P(\mathcal{B})>0\), the conditional probability: \[P(\mathcal{A}|\mathcal{B})=\frac{P(\mathcal{A}\cap\mathcal{B})}{P(\mathcal{B})},\;\mbox{or }P(\mathcal{A}|\mathcal{B})=\frac{P(\mathcal{B}|\mathcal{A}) P(\mathcal{A})}{P(\mathcal{B})}.\]

Conditional Probability and Dynamics

Recall the ARMA setting: e.g. AR(1), \(X_t = \beta X_{t-1} + \varepsilon_t\), \(\varepsilon_t \sim \mathcal{GWN}(0,\sigma^2)\).
Conditional probability: \(P(X_t | X_{t-1}=x_{t-1})=\mathcal{N}(\beta x_{t-1}, \sigma^2)\).
Generalization: for a larger class of time series, what would be the corresponding probabilities for describing their dynamics?
Markov chain: the possible easiest way to approach time series in the nonlinear context.

Filtered Space

Filtered space: \((\Omega,\mathcal{F},\{\mathcal{F}_{t}\},P)\)
\((\Omega,\mathcal{F},P)\): a probability space
\(\{\mathcal{F}_{t}:t\geq0\}\): a filteration - an increasing family of \(\sigma\)-algebras of \(\mathcal{F}\) \[\mathcal{F}_{0} \subset \cdots \subset \mathcal{F}_{t}\cdots\subseteq\mathcal{F}\]
A time series \(X=\{X_{t},\, t\in\mathbb{N}\}\) is a countable collection of random variables defined on a filtered space \((\Omega,\mathcal{F},\{\mathcal{F}_{t}\},P)\) where \(X_{t}\) is \(\mathcal{F}_{t}\)-measurable with a natural filtration \[\mathcal{F}_{t}=\sigma(X^{-1}_{0}(\mathcal{X}_1),\dots,X^{-1}_{t}(\mathcal{X}_t)).\]

Markov chains and State space categories

Consider a stochastic process \(\{ X_t \}\). For eahc fixed \(t\), \(X_t(\omega)\) is a random variable.

Discrete \(\mathcal{X}\), e.g. \(\mathcal{X} = {0, \pm1, \pm 2,\dots}\), and \(t={1,2,\dots}\): simple random walk, discrete-time Markov chain.
Discrete \(\mathcal{X}\), e.g. \(\mathcal{X} = {0, \pm1, \pm 2,\dots}\), and \(t\in [0,\infty)\): a birth process, Poisson process, continuous-time Markov chain
Continuous \(\mathcal{X}\), e.g. \(\mathcal{X} = \mathbb{R}^{+}=[0,\infty)\), and \(t={1,2,\dots}\): a continuous state space, a birth and death process modeled by a stochastic difference equation
Continuous \(\mathcal{X}\), e.g. \(\mathcal{X} = \mathbb{R}^{+}=[0,\infty)\), and \(t\in [0,\infty)\): a birth and death process modeled by a stochastic differential equation

Markov chain

Markov property (with order one): \[\Pr \{ X_t = x_t | X_0 = x_0, \dots, X_{t-1} = x_{t-1} \} = \Pr \{ X_t = x_t | X_{t-1} = x_{t-1} \}\]
Discrete-time Markov chain of order \(m\): a time series \(\{X_{t},\, t\in\mathbb{N}\}\) in \((\Omega,\mathcal{F},\{\mathcal{F}_{t}\},P)\) with the property \[P(X_{t+1}=x|\,\mbox{information up to time }t)=P(X_{t+1}=x|\, X_{t},\dots,X_{t-m+1}).\]
A common notation \(p_{ij}(m) = \Pr \{ X_t= s_i | X_{t-m} = s_j \}\) where \(s_i, s_j \in \mathcal{X}\).

Markov chain: Stationarity

Consider Markov chain of order \(1\): \[P(X_{t}|X_{t-1})\] is called transition probability.
If \[P(X_{t+i}=x'|X_{t}=x)=P(X_{i}=x'|X_{0}=x)\] then the chain is time-homogeneous or stationary. Otherwise, the chain is nonhomogeneous or non-stationary.

2D-Random Walk

Each round, the particle has to randomly choose one direction from \(\{\mbox{up, down, right, left}\}\) to proceed. The probability of choosing any direction is uniformly distributed.
Pair-index in 2D \((1,2)\): \(1\) stands for \(x\)-axis, namely \(\{\mbox{right, left}\}\), \(2\) for \(y\)-axis.

set.seed(2019)
sample(c(1, 2), 5, replace=TRUE) # or sample(c(1, 2), 5, TRUE, prob=c(0.5,0.5))

## [1] 2 2 1 2 1

Randomly decide move forward \((1)\) or backward \((-1)\) in the selected axis.

sample(c(-1, 1), 5, TRUE)

## [1] -1  1 -1 -1  1

2D-Random Walk

One-step transition probability \(P(X_{1}=i|X_{0}=(0,0))=1/4\) for \(i\in\{(0,1),(1,0),(0,-1),(-1,0)\}\).
Given the state \(j\) at time \(t-1\), the transition probability is \(P(X_{t}=i|X_{t-1}=j)=1/4\) for \(i\in\{(j,j+1),(j+1,j),(j,j-1),(j-1,j)\}\).

set.seed(2019); n = 10000; rw = matrix(0, ncol = 2, nrow = n)
index = cbind(seq(n), sample(c(1, 2), n, TRUE)) # x or y-axis
rw[index] = sample(c(-1, 1), n, TRUE)  # the walk
rw[, 1] = cumsum(rw[, 1]); rw[, 2] = cumsum(rw[, 2]); head(cbind(rw,index), 4);

##      [,1] [,2] [,3] [,4]
## [1,]    0    1    1    2
## [2,]    0    0    2    2
## [3,]    1    0    3    1
## [4,]    1   -1    4    2

2D-Random Walk

end = cbind(rw[10000,1],rw[10000,2]); start = cbind(rw[1,1],rw[1,2]); end

##      [,1] [,2]
## [1,]  -47  -15

Transition Matrix

A square matrix \(\mathbf{P}\) with entities \(\{p_{ij}\}\) is called a (probability) transition matrix if

\(p_{ij}\geq0\) for all \(i\), \(j\).
For each row \(j\), \(\sum_{i}p_{ij}=1\).
\(p_{ij}=P(X_{t+1}=s_i|X_{t}=x_j)\) for a homogenous Markov chain \(\{X_{t},\, t\in\mathbb{N}\}\).

The \(n\)-step transition matrix is simply \(\mathbf{P}^n\) for the homogenous Markov chain.
The \(0\)-step transition matrix is \(\mathbf{P}^0=\mathbf{I}\).
For inhomogenous Markov chain, \(p_{ij}(t)\) is time-dependent so as \(\mathbf{P}_{t}\).

Example: Migration

Migration: West stay at west (\(0.8\)), west go to east (\(0.2\)). East stay at east \((0.7)\), east go to west (\(0.3\)). Suppose at \(t=0\), \(10\) at west and \(10\) at east. After \(30\) periods, what is the distribution of the population? \[ \mathbf{P} = \left[\begin{array}{cc} 0.8 & 0.3\\ 0.2 & 0.7 \end{array}\right]\]

library(expm); P=matrix(c(0.8,0.3,0.2,0.7), nrow=2, byrow=TRUE)
P%^%30 %*% c(10,10) # or P%^%30 %*%matrix(c(10,10),nrow=2)

##      [,1]
## [1,]   12
## [2,]    8

Example: Migration

P%^%5

##        [,1]    [,2]
## [1,] 0.6125 0.58125
## [2,] 0.3875 0.41875

P%^%15

##           [,1]      [,2]
## [1,] 0.6000122 0.5999817
## [2,] 0.3999878 0.4000183

P%^%30

##      [,1] [,2]
## [1,]  0.6  0.6
## [2,]  0.4  0.4

Graph Plot

library(diagram)
rownames(P)=c("West", "East"); plotmat(P, relsize = 0.7)

Classification of States

Transient: there is a non-zero probability that the Markov chain will never return to this state. (It can be visited only a finite number of times)
Recurrent: the Markov chain returns to some states infinitely often
Positive and Null recurrent: Recurrent states whose mean recurrence time are finite (infinite) are said to be positive (null) recurrent.

Classification of States

Transient (1,2,3,4,5), Recurrent (6,7,8).

Matrix Representation

Classification of States

Periodic with period \(d(i)\) for state \(i\) (6 and 7 with a period \(2\)): Returning to the state takes place at a \(d\) number of time steps. (\(d(i)\) is the greatest common divisor - gcd - of the set of integers \(n\) for which \(p_{ii} (n)>0\))
Aperiodic (8): the period of a state equals one \(d=1\).
Absorbing (8): If the Markov chain ever arrives in this state, it will remain there forever.

Classification of States

State \(i\) can be reached from state \(j\) if there is a non-zero probability \(p_{ij}(t)>0\) for some \(t\geq 0\). This relationship is denoted as \(j \rightarrow i\).
If \(j \rightarrow i\) and \(i \rightarrow j\), then \(i\) and \(j\) are said to communicate \(i\leftrightarrow j\).
A system is irreducible if all states communicate. (There is only one communication class.)
A state is ergodic if it is both aperiodic and positive recurrent. An ergodic chain is a Markov chain that is irreducible, aperiodic and positive recurrent. An aperiodic, irreducible and recurrent Markov chain has a stationary limiting distribution (Basic limit theorem for aperiodic Markov chain).

Example: a Markov chain with four states.

Example

The transition matrix (all listed \(p_{ij}\) are positive): \[\left[\begin{array}{cccc} 0 & 0 & p_{13} & 0\\ p_{21} & 0 & p_{23} & p_{24}\\ 0 & 0 & 0 & 0\\ 0 & p_{42} & 0 & 0 \end{array}\right]\]
\(4\leftrightarrow 2 \leftarrow 1 \leftarrow 3\) and \(4 \leftrightarrow 2 \leftarrow 3\)
Three communication classes: \(\{1\}\), \(\{3\}\), \(\{2,4\}\). \(\{2,4\}\) is closed but the others are not.
If \(p_{32}\) or \(p_{34}\) was positive, then there would be a single communication class \(\{1,2,3,4\}\) so that the Markov chain would be irreducible.

Unconditional Probability

Unconditional probability \(\mu_i(t)=P(X_{t}=s_i)\) is different from the transition probability \(P_{ij}\).
Let \(\mu(t)=(\mu_0(t),\mu_1(t),\dots)\) be a probability measure on the countable state space \(\mathcal{X}\) at time \(t\).
For countable state space, \(\mu(t)\) is a probability vector \[\mu(t)=(P(X_{t}=s_0),P(X_{t}=s_1),\dots).\]

Invariance

P%^%30 %*% c(11,9)

##      [,1]
## West   12
## East    8

P%^%30 %*% c(1,19)

##      [,1]
## West   12
## East    8

Invariant Probability

Markov chain \(X_{t}\) at time \(t\) is a random variable associated with its own the unconditional probability \(\mu(t)\).
Evolution of \(\mu(t)\) follows \[\mu(t+1)=\mathbf{P}(t)\mu(t)\] where \(\mathbf{P}_{t}\) is the probability transition matrix of \(X_{t}\).
Invariant Probability: the limiting distribution \(\mu= (\mu_0, \mu_1,\dots)\) is an invariant probability measure on \(\mathcal{X}\). Any \(\mu_j\) in \(\mu\): \[\mu_{i}=\lim_{t\rightarrow\infty}P(X_{t}= s_i)\,\, \mbox{for any } s_i \in\mathcal{X}\]

Invariant Probability

If \(X_{t}\) is time-homogenous, then \(\mu(t+1)=\mathbf{P}\mu(t)\) and \[\mu(t+1)=\mathbf{P} \mu(t)=\mathbf{P}^{2}\mu(t-1)=\cdots=\mathbf{P}^{t+1}\mu(0).\]
Especially \[\mathbf{P}^{t+s}=\mathbf{P}^{t}\cdot\mathbf{P}^{s}\] which is called Chapman-Kolmogorov (CK) equation \[p_{ij}(t)=\sum_{k}p_{ik}(t-1)p_{kj}.\]

CK equation (the simplest version)

Relationship exist between the \(n\)-step transition probabilities and \(m\)-step and \((n-m)\)-step transition probabilities: \[ p_{ij} (n) = \sum_{k=1}^{\infty} p_{ik}(n-m)p_{kj}(m),\,\, 0<m<n.\]
If \(\lim_{t\rightarrow\infty}p_{ij}(t)\rightarrow \mu_{i}\) for any \(j\), namely the initial state doesn’t matter \[\lim_{t\rightarrow\infty}P(X_{t}= s_i |X_{0}= s_j)=\lim_{t\rightarrow\infty}P(X_{t}= s_i)=\mu_{i},\] then taking the limit of CK equation we have the equilibrium equation \[\mu_{i}=\sum_{k}p_{ik}\mu_{k}, \mbox{ or in the matrix form } \mu=\mathbf{P}\mu.\]

Eigen property of Invariant Probability

For a squared Matrix \(\mathbf{A}\), \(\mathbf{A} \mathbf{x}=\lambda \mathbf{x}\). Then \(\mathbf{A}^2 \mathbf{x}=\lambda^2 \mathbf{x}\) (as eigenvectors stay at their own directions in \(\mathbf{A}\)).
The matrix form \(\mu=\mathbf{P}\mu\). \(\mathbf{P}\) is Markov matrix. Its entries are positive, every column adds to one.
The largest eigenvalue of \(\mathbf{P}\) is \(\lambda=1\). The corresponding eigenvector is the steady state (fixed point) \(\mathbf{P} \mathbf{x}=\mathbf{x}\) and after normalizing it is the invariant probability \(\mathbf{P}\mu = \mu\).

Eigen property of Invariant Probability

v = eigen(P, symmetric=FALSE); v

## eigen() decomposition
## $values
## [1] 1.0 0.5
## 
## $vectors
##           [,1]       [,2]
## [1,] 0.8320503 -0.7071068
## [2,] 0.5547002  0.7071068

x = v$vectors[,1]; mu = x/sum(x); mu

## [1] 0.6 0.4

Remarks

Markov property (Memoryless in continuous time)
State space and different types of Markov chains
Transition Matrix for discrete states and discrete time
Classifications of states
Invariant Probability
Evolution (Chapman-Kolmogorov equation/Fokker-Planck equation in continous time)