23-30/04/2018

Estimation for Deterministic ODEs

  • (Weighted) Nonlinear Least Squares - (W) NLS: \[ \min_{\theta} \sum_{t=1}^{T} w_t \left(y_{\mbox{Obs}}(t) - y_{\mbox{Sim}}(t,\theta)\right)^2.\]

  • Three steps:

  1. define model \(y_{\mbox{Sim}}(t,\theta)\), collect data \(y_{\mbox{Obs}}(t)\).

  2. Set up the objective function \(\sum_{t=1}^{T} w_t \left(y_{\mbox{Obs}}(t) - y_{\mbox{Sim}}(t,\theta)\right)^2\)

  3. Minimize the cost/loss of the objective function w.r.t. the parameters \(\theta\).

Prey-Predator Model

  • Lotka-Volterra model - \(y_{\mbox{Sim}}(t,\theta)\) where \(\theta = (a_1, a_2, b_1, b_2)\): \[\frac{d y_1}{dt} =a_{1} y_1 - a_{2} y_1 y_2, \,\,\, \frac{d y_2}{dt} = -b_{1} y_2 + b_{2} y_1 y_2 \]
solve.lv = function(parms, times){
  lv = function (time, init, parms) 
  {
       y = init; p = parms
       dy1 = p["a1"] * y[1] - p["a2"] * y[1] * y[2]
       dy2 = -p["b1"] * y[2] + p["b2"] * y[1] * y[2]
       list(c(dy1, dy2))
  }
  state = c(30,4) # initial states
  return(ode(y=state,times=times, func=lv, parms=parms))
}

Simulate an Initial Model

library(deSolve)
parms=c(a1=0.8, a2=0.1,b1=0.7,b2=0.1); times=seq(1900,1920,by=0.1)
init.mod=solve.lv(parms, times); matplot(init.mod[,1],init.mod[,-1],type="l")

Data - Hares vs Lynx

PPmodel = "Year  Hares.x.1000    Lynx.x.1000
1900    30  4
1901    47.2    6.1
1902    70.2    9.8
1903    77.4    35.2
...
1917    7.6 15.8
1918    14.6    9.7
1919    16.2    10.1
1920    24.7    8.6"

PPobs = read.table(text=PPmodel, header=TRUE)
names(PPobs) = c("time", "1", "2") 

Initial Model and Observations

matplot(init.mod[,1],init.mod[,-1],type="l", ylab="Population", xlab="Year")
points(PPobs[,1],PPobs[,2], col="black"); points(PPobs[,1],PPobs[,3], col="red")

Objective Function

  • First consider to calculate the cost: \[\sum_t \left(y_{\mbox{Obs}}(t) - y_{\mbox{Sim}}(t,\theta_0)\right)^2, \,\mbox{ for observed time }t\]
library(FME) # Flexible Modelling Environment 
obj=modCost(obs=PPobs, model=init.mod); obj$model; obj$var
## [1] 35792.13
##   name scale  N SSR.unweighted SSR.unscaled      SSR
## 1    1     1 21       25768.12     25768.12 25768.12
## 2    2     1 21       10024.01     10024.01 10024.01

Objective Function

plot(obj)

Minimize the Objective Function

# Objective Function
PPobj.func = function(x){
  times=seq(1900,1920, by=0.1)
  out = solve.lv(parms=x, times=times)
  return(modCost(obs=PPobs, model=out))
}
# Estimation
initParm = c(a1=0.8, a2=0.1,b1=0.7,b2=0.1);
fit=modFit(f=PPobj.func, p=initParm, lower=c(a1=0, a2=0, b1=0, b2=0),
       upper=c(a1=1, a2=1, b1=1, b2=1)) # you can add method="Marq" default optimizer
  • “Marq” is Levenberg-Marquardt algorithm, a nonlinear least square method. Other options, such as Newton, BFGS, may need gradients and Hessians.

  • Note: this is a local estimate. Try: initParm = c(a1=0.5, a2=0.5,b1=0.5,b2=0.5).

Estimate Results

summary(fit)
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## a1 0.547533   0.022588   24.24   <2e-16 ***
## a2 0.028119   0.001643   17.11   <2e-16 ***
## b1 0.843175   0.037046   22.76   <2e-16 ***
## b2 0.026558   0.001141   23.29   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.454 on 38 degrees of freedom
## 
## Parameter correlation:
##         a1      a2      b1      b2
## a1  1.0000  0.8667 -0.9536 -0.9218
## a2  0.8667  1.0000 -0.8788 -0.8615
## b1 -0.9536 -0.8788  1.0000  0.8690
## b2 -0.9218 -0.8615  0.8690  1.0000

Estimate Results

plot(fit)

Estimate Results

parms=fit$par; Simlv=solve.lv(parms, times)
matplot(Simlv[,1],Simlv[,-1],type="l", ylab="Population", xlab="Year")
points(PPobs[,1],PPobs[,2], col="black"); points(PPobs[,1],PPobs[,3], col="red")

Robustness Check

sf = summary(fit); var=sf$modVariance; cov = sf$cov.scaled
MCMC=modMCMC(p=coef(fit), f=PPobj.func, jump=cov, var0=var); plot(MCMC)
## number of accepted runs: 353 out of 1000 (35.3%)

Robustness Check

pairs(MCMC)

Back to the Generator Q

  • In the CK equation, we’ve derived the following limit and set it to \(\mathbf{Q}\) \[\lim_{\Delta t \rightarrow 0} \frac{\left[\mathbf{P}(\Delta t) - \mathbf{I} \right]}{\Delta t} = \mathbf{Q} \]

  • But if \(\Delta t = 0\) and \(\mathbf{P}(0) = \mathbf{I}\), then the diagonal of \(\mathbf{Q}\) always degenerates to zero which means the states are absobing (never tend to leave).

  • To avoid the degeneration, we need also to modify the diagonal values of \(\mathbf{Q}\) to be non-zero such that \[\mathbf{Q}=\begin{cases} \lim_{\Delta t\rightarrow0}\frac{\mathbf{P}(\Delta t)-\mathbf{I}}{\Delta t}=q_{ij}, & \mbox{ for }i\neq j,\\ -\sum_{i \neq j}q_{ij}=q_{jj}, & \mbox{ for }i=j, \end{cases}\]

Example

  • CK backward equation \(\frac{d\mathbf{P}(t)}{dt} = \mathbf{Q} \mathbf{P}(t)\) has the solution of matrix exponential \[ \mathbf{P}(t) = e^{\mathbf{Q}t} \mathbf{P}(0) = e^{\mathbf{Q}t} = \left( \mathbf{I} + \sum_{n=1}^{\infty} \frac{\mathbf{Q}^n t^n }{n!} \right).\]
library(Matrix); Q = matrix(c(-4, 3, 0, 0,
                               4, -6, 2, 0,
                               0, 3, -4, 1,
                               0, 0, 2, -1), 4, 4, byrow=TRUE); round(expm(0.001*Q), 4)
## 4 x 4 Matrix of class "dgeMatrix"
##       [,1]  [,2]  [,3]  [,4]
## [1,] 0.996 0.003 0.000 0.000
## [2,] 0.004 0.994 0.002 0.000
## [3,] 0.000 0.003 0.996 0.001
## [4,] 0.000 0.000 0.002 0.999

Example

round(expm(1*Q), 4)
## 4 x 4 Matrix of class "dgeMatrix"
##        [,1]   [,2]   [,3]   [,4]
## [1,] 0.2285 0.1929 0.1283 0.0644
## [2,] 0.2572 0.2282 0.1715 0.1072
## [3,] 0.2567 0.2573 0.2503 0.2249
## [4,] 0.2576 0.3216 0.4499 0.6035
expm(100*Q)
## 4 x 4 Matrix of class "dgeMatrix"
##      [,1] [,2] [,3] [,4]
## [1,] 0.12 0.12 0.12 0.12
## [2,] 0.16 0.16 0.16 0.16
## [3,] 0.24 0.24 0.24 0.24
## [4,] 0.48 0.48 0.48 0.48

Generator Q

  • The matrix \(\mathbf{Q}\) for homogeneous continuous time Markov Chain is time-independent.

  • \(\mathbf{Q}\) capturing the rate of transition is called an instantaneous rate or an infinitesimal generator. \[p_{ij} (\Delta t) = q_{ij} \Delta t, \,\, p_{ii}(\Delta t) = 1 + q_{ii} \Delta t.\]

  • We have seen that it can also generate the evolutionary probability \(\mu(t)\): \[ \frac{d\mathbf{P}(t)\mathbf{y}(0)}{dt} = \mathbf{Q} (\mathbf{P}(t)\mathbf{y}(0)) \mbox{ implies } \frac{d\mu(t)}{dt} =\mathbf{Q} \mu(t) \] where \(\mu(t)=\mathbf{P}(t)\mathbf{y}(0)\) for arbitrary initials \(\mathbf{y}(0)\). The probability \(\mu(t)\) will converge to the invariant probability \(\mathbf{Q} \mu^{*} = 0\).

General Birth and Death Process

  • You probability notice that birth and death are not necessary probability rates.

  • The birth and death matrix in fact is the \(\mathbf{Q}\) matrix. But we don’t have \(\mathbf{Q}\) matrix in discrete time. \[\mathbf{Q} = \left[\begin{array}{ccccc} -\lambda_0 & d_1 \\ \lambda_0 & -(d_1+\lambda_1) & d_2 \\ & \lambda_1 & -(d_2+\lambda_2) & \ddots\\ & & \ddots & -(d_{N-1} +\lambda_{N-1}) & d_N \\ & & & \lambda_{N-1} & -d_N \end{array}\right]\]

General Birth and Death Process

  • The probabilities are summarized as follows: \[ \Pr \{ \mbox{one birth in } (t, t+\Delta t] | n \mbox{ in population} \} = \lambda_n \Delta t, \\ \Pr \{ \mbox{one death in } (t, t+\Delta t] | n \mbox{ in population} \} = d_n \Delta t, \\ \Pr \{ \mbox{zero birth in } (t, t+\Delta t] | n \mbox{ in population} \} = 1- \lambda_n \Delta t, \\ \Pr \{ \mbox{zero death in } (t, t+\Delta t] | n \mbox{ in population} \} = 1- d_n \Delta t.\]

  • Consider the probability vector \(\mu (t) = [\mu_0(t),\mu_1(t),\dots, \mu_N(t)]^T\).

  • From \(\frac{d\mu(t)}{dt} = \mathbf{Q} \mu(t)\) and tri-diagonal matrix \(\mathbf{Q}\), we know that for \(1 < n < N\): \[ \frac{d \mu_n (t)}{dt} = \lambda_{n-1} \mu_{n-1}(t) + d_{n+1} \mu_{n+1}(t) - (\lambda_n + d_n) \mu_n (t), \\ n=1, \,\,\, \frac{d \mu_0 (t)}{dt} = d_1 \mu_1 (t) - \lambda_0 \mu_0 (t).\]

Pure Birth Process

  • If \(\lambda_n = \lambda\), \(d_n = 0\) for all \(n \in \{0,1,\dots, N \}\). Then \[\frac{d \mu_n (t)}{dt} = \lambda \mu_{n-1}(t) - \lambda \mu_n (t), \, \frac{d \mu_0 (t)}{dt} = - \lambda_0 \mu_0 (t).\]

  • We know \(\mu_0 (t) = e^{-\lambda t}\). So the solution of \[\frac{d \mu_1 (t)}{dt} = - \lambda \mu_{1}(t) + \lambda e^{-\lambda t} \] is obvious \(\mu_1 (t) = \lambda t e^{-\lambda t}\). Continue by induction, we will obtain \(\mu_n (t) = e^{-\lambda t} \frac{(\lambda t)^n}{n!}\) which is a Poisson distribution with mean \(\lambda t\).

  • So the pure birth process is in fact a Poisson process with constant birth rate \(\lambda\).

Poisson Process

  • The transition matrix function is \[\mathbf{P}(t)=\left[\begin{array}{cccc} e^{-\lambda t} & 0 & 0 & \dots\\ e^{-\lambda t}(\lambda t) & e^{-\lambda t} & 0 & \ddots\\ e^{-\lambda t}(\lambda t)^{2} & e^{-\lambda t}(\lambda t) & e^{-\lambda t} & \ddots\\ \vdots & \vdots & \vdots & \ddots \end{array}\right].\]

  • The generator is \[\mathbf{Q}=\left[\begin{array}{ccccc} -\lambda & 0 & 0 & 0 & \dots\\ \lambda & -\lambda & 0 & 0 & \dots\\ 0 & \lambda & -\lambda & 0 & \dots\\ 0 & 0 & \lambda & -\lambda & \dots\\ \vdots & \vdots & \vdots & \vdots & \ddots \end{array}\right].\]

Poisson Process

  • \(\mu_0 (t) = e^{-\lambda t} = \Pr\{ \mbox{in a time interval [0,t], zero event happens} \}\)

  • or say \(1 - e^{-\lambda t} = \Pr\{\mbox{in a time interval [0,t], an event happens} \}\)

  • Memorylessness (Markov property) in the Poisson process: \[\Pr(Y_{t+\Delta t}= n+1 |Y_{t}=n)= \frac{\Pr (Y_{t+\Delta t}=n+1,Y_{t}=n)}{\Pr(Y_{t}=n)}\\ = \frac{\Pr(Y_{\Delta t}=1)\Pr(Y_{t}=n)}{\Pr(Y_{t}=n)} = \Pr(Y_{\Delta t}=1)= 1 - e^{-\lambda \Delta t}= \mbox{Exp}_{\lambda}(\Delta t).\]

  • A single jump that takes a time interval longer than \(\Delta t\) to happen follows an exponential distribution with the density \(\lambda e^{-\lambda \Delta t}\).

  • Simulate continuous time Markov chain without matrices.

Poisson Process

set.seed(2019); t = 80; lambda = 0.06; n = rpois(1, lambda*t)
JumpTime = rexp(n, lambda); # n: number of events. rexp: time intervals for n events
Time = c(0,cumsum(JumpTime)); Y = 0:n; plot(Time, Y, type="s") # no use of matrix!

Memoryless and autonomous ODEs

  • Markovian System is memoryless: \(\Pr \{Y_t | Y_{t-1}, \dots \} = \Pr \{ Y_t | Y_{t-1} \}\).

  • (Autonomous) ODE does not explicitly depend on time variable: \[ \frac{d y(t)}{dt} = f(y(t)).\]

  • Poissonization: for an autonomous ODE, Poisson processes can approximate the ODE system.

  • Let \(f(y(t))\) be the parameter \(\lambda\) of characterizing the jump. When the system converge, \(f(y^*)=0\) then no jump happens.

Return to Logistic Growth

  • Recall the deterministic logistic growth model \[y_{t+1} - y_t = r y_t \left(1 - \frac{y_t}{K} \right)=\frac{r}{K} y_t \left(K - y_t \right).\] For the stochastic counterpart, we constructed a Markov probability matrix for random transitions. For \(60\) states, the probability matrix was a \(60 \times 60\) matrix with roughly \(200\) non-zero entities.

  • For a continuous time model with a large state space, the previous modeling method by Markovian transition matrix is not efficient (sometimes nor even feasible).

  • Poisson process (monotone increasing / purely birth) for approximating the continuous time logistic growth.

Poisson Approximation

# Deterministic Logistic Model
logit.dfunc = function(SimTime,yinit,parms){
  yt = numeric(); y=yinit; yt[1]=y
  for (i in 2:SimTime) {
    y =  y + parms[1]/parms[2]*y*(parms[2]-y)
    yt[i] = y}
  return(yt)}
# Poisson approximation
logit.func = function(SimTime, yinit, parms){
  yt = time = numeric(); y=yinit; t=0; time[1]=t; yt[1]=y
  for (i in 2:SimTime) {
    eventRate = parms[1]/parms[2]*y*(parms[2]-y)
    if (eventRate<0.001) {break}
    JumpTime = rexp(n=1,rate=eventRate)
    t = JumpTime; time[i] = time[i-1] + t   
    y = y+1; yt[i] = y
    }
  return(data.frame(time, yt))}

Poisson Approximation

SimTime = 1000; set.seed(2019); 
y1 = logit.func(SimTime=SimTime,yinit=5,parms=c(0.01,50));
y2 = logit.func(SimTime=SimTime,yinit=5,parms=c(0.05,500));
head(y2)
##       time yt
## 1 0.000000  5
## 2 2.235150  6
## 3 2.433128  7
## 4 2.987381  8
## 5 5.771947  9
## 6 9.071242 10

Poisson Approximation

par(mfrow=c(1,2)); plot(y1, type="s"); plot(y2, type="s"); 

Poisson Approximation

y3 = logit.dfunc(SimTime=250,yinit=5,parms=c(0.05,500)); plot(y3, type="l", ylab="y(t)", xlab="Time")
for( i in 1:30){y = logit.func(SimTime=SimTime,yinit=5,parms=c(0.05,500));
  lines(y, col=adjustcolor("black", alpha.f = 0.2), type="s")}

Embedded Markov Chain

  • Continuous time Markov process can be approximated by Poisson process.

  • Markov process with transition probability matrix \(\mathbf{P}(t)\) and generator \(\mathbf{Q}\). Let the constant \(\lambda=\max_{j}q_{jj}\). Let \[\mathbf{R}=\frac{1}{\lambda}\mathbf{Q}+\mathbf{I}\] then \(\mathbf{R}\) is a stochastic matrix as for any \(j\) column, \[\sum_{i\neq j} \frac{q_{ij}}{\lambda} + \frac{q_{jj}}{\lambda} + 1= - \frac{q_{jj}}{\lambda} + \frac{q_{jj}}{\lambda} + 1 =1.\]

Embedded Markov Chain

  • The matrix \(\mathbf{R}\) can induce a discrete time Markov chain. Such a Markov chain is called the embedded Markov chain.

  • The Markov transition probability \(\mathbf{R}\) is discrete time chain, comes from \[\mathbf{P}(t)=e^{t\mathbf{Q}}=e^{-\lambda t}e^{t(\mathbf{Q}+\lambda\mathbf{I})}=e^{-\lambda t}\sum_{k=0}^{\infty}\frac{t^{k}(\mathbf{Q}+\lambda\mathbf{I})^{k}}{k!} \\ = e^{-\lambda t}\sum_{k=0}^{\infty}\frac{(\lambda t)^{k}(\frac{\mathbf{Q}}{\lambda}+\mathbf{I})^{k}}{k!} = \sum_{k=0}^{\infty}\mathbf{R}^{k}\frac{e^{-\lambda t}(\lambda t)^{k}}{k!}.\]

Embedded Markov Chain

  • So we have the following expression for any continuous time homogenous Markov transition probability matrix: \[\mathbf{P}(t)=\sum_{k=0}^{\infty}\mathbf{R}^{k}\mathcal{Poi}(k,\lambda t).\]
  • It can be thoughts as
  1. “Poissonizing” the embedded Markov chain can generate the continuous time Markov process
  2. Redistributing the Poisson processes \(\mathcal{Poi}(k,\lambda t)\) according to the order of \(k\) of the embedded Markov chain can also generate the Markov process.
  • This trick can even work for some non-Markovian system.

Embedded Markov Chain

  • Let a new Markov chain generated by \(\mathbf{R}\). The chain has the same stationary distribution \(\mu\) as that in the original chain \[\mu\mathbf{R}=\mu\] if and only if \(\mu\mathbf{Q}=0\).

  • This result comes from \[\mu\mathbf{Q}=\mu\lambda(\mathbf{R}-\mathbf{I})=\lambda\mu\mathbf{R}-\lambda\mu.\]

Remarks

  • Estimation deterministic ODE

  • Generator of continuous time discrete states Markov chain

  • Poisson Process

  • Poissonization

  • Embedded Markov Chain