Last lecture:
Granger causality
Vectorial ARIMA, with seasonalities
Bayesian method
This lecture:
State space model (Dynamic linear model)
Kalman filter
12/10/2018
Last lecture:
Granger causality
Vectorial ARIMA, with seasonalities
Bayesian method
This lecture:
State space model (Dynamic linear model)
Kalman filter
So far, we are in the ARMA type framework: AR, MA, VAR, etc
In fact, this (Box-Jenkins) type of specifications can be fully covered in a more flexible framework - State space models (SSM).
SSM can incorporate time series predictors in a more straightforward manner: on-line parameter and state estimation.
Online parameter/state estimation is a recursive algorithm. To estimate the parameter values at a time step, recursive algorithms use the current measurements and previous parameter estimates. It is more efficient in terms of memory usage.
The cost: adapt yourself to the probabilistic taste.
SSM was developped for engineering controls, such as robotics or GPS monitoring.
In the control context, the state variables define the dynamics of the system.
Only some of these state variables can be measured directly, and these measurements are subject to noise.
The objective of the Kalman filter is to infer values for all the state variables from the noisy measurements. The estimated values of the state variables are then used to control the system.
In economics, the states are often the unknown coefficients. However, now we can allow these unknown coefficients to change over the time.
A general control problem in SSM: let the state vector of the system be \(\theta_t\) and a control vector be \(u_t\): \[ \begin{align} y_t =& F_t \theta_t + v_t \\ \theta_t =& G_t \theta_{t-1} + H_t u_{t} + w_{t} \end{align} \] where \(u_t\) can be regulated by the controller in order to obtain a desired level of the state \(\theta_t\). \(H_t\) is a known matrix. \(v_t\) and \(w_t\) are stochastic errors.
The goal of a control problem is to drive the state from the initial value \(\theta_0\) to a target value \(\theta^{*}\) in a finite time \(T\), setting appropriately the control variables \(u_1, \cdots, u_T\).
Consider SSM: \[ \begin{align} y_t =& F_t \theta_t + v_t \\ \theta_t =& G_t \theta_{t-1} + w_{t} \end{align} \] where \(v_t \sim \mathcal{N} ( 0 ,\mathbf{V}_t )\), \(w_t \sim \mathcal{N} ( 0 ,\mathbf{W}_t )\) and \(\theta_t \sim \mathcal{N} ( m_t ,\mathbf{C}_t )\)
Alternatively, the uncontrollable “control” term can be thought of mixing with the error \(w_t\).
A linear growth model: \[ \begin{align} y_t =& \mu_t + \beta_{t} p_t + v_t \\ \mu_t =& \mu_{t-1} + w_{1,t}\\ \beta_t =& \beta_{t-1} + w_{2,t} \end{align} \] where \(y_t\) is the individual sales process, \(p_t\) is the price, and \(\mu_t\) is a general level of sales in the sector.
The state space form: \[\theta_t = \left(\begin{array}{c} \mu_{t}\\ \beta_{t} \end{array}\right), \, w_t = \left(\begin{array}{c} w_{1,t}\\ w_{2,t} \end{array}\right), \, F_t^{T} = \left(\begin{array}{c} 1\\ p_{t} \end{array}\right), \, G_t = \left(\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right)\]
Filtering for SSM is making the best estimate of the current values of the state from the record of observations.
It follows the Bayes’ theorem: \[p(\theta_t | \mathbf{y}_{1:t-1}, y_t) = \frac{p(y_t|\theta_t) p(\theta_t | \mathbf{y}_{1:t-1})}{p(y_t | \mathbf{y}_{1:t-1})} \\ p(\theta_t | \mathbf{y}_{1:t-1}, y_{t}) \propto p(y_t|\theta_t) p(\theta_t | \mathbf{y}_{1:t-1}) \] where \(\mathbf{y}_{1:t-1}= (y_1,\dots,y_{t-1})\) represent the data up until time \(t-1\).
The posterior belief of the state at \(t\), given data up to \(t\), is proportional to the product of the likelihood, and the prior belief of the state at \(t\), given data up to \(t-1\).
The last expression \[\scriptsize \exp \left\{ - \frac{1}{2W C_0} \left((t C_0 +W)\theta^2 - 2(t C_0 \bar{y} + W m_0 )\theta \right) \right\}\] is proportional to \[\scriptsize \exp \left\{ - \frac{1}{2 W C_t}(\theta - m_t)^2 \right\}\] where \(m_t = \frac{tC_0 \bar{y} + W m_0}{tC_0 +W}, C_t = \frac{W C_0}{tC_0 +W}\).
This can be written as a recursive form \[\scriptsize m_t = m_{t-1} + \frac{C_{t-1}}{C_{t-1} +W}(y_t - m_{t-1}), C_t =\left(\frac{1}{W}+\frac{1}{C_{t-1}}\right)^{-1}\]
Posteriror at \(t-1\): \((\theta_{t-1} | \mathbf{y}_{1:(t-1)} ) \sim \mathcal{N}(m_{t-1},C_{t-1})\)
Prior at \(t\): \((\theta_{t} | \mathbf{y}_{1:t-1} ) \sim \mathcal{N}(m_{t-1},R_{t})\)
Marginal likelihood: \((y_{t} | \mathbf{y}_{1:(t-1)} ) \sim \mathcal{N}(m_{t-1},Q_{t})\)
Posteriro at \(t\): \((\theta_{t} | \mathbf{y}_{1:t} ) \sim \mathcal{N}(m_{t},C_{t})\)
where \(R_t = G_t C_{t-1} G_t^{T} + W_t\), \(Q_t = F^T_t R_t F_t + V_t\).
y = y; R1 = 1; theta = rep(0,n); R = rep(0,n);
m = rep(0,n); C = rep(0,n);
# time t=1
theta[1] = 0; R[1] = 10; Q = R[1]+V
A = R[1]/Q; m[1] = theta[1]+A*(y[1]-theta[1]); C[1] = R[1]-Q*A^2
# Filtering
for (t in 2:n){
theta[t] = m[t-1]; R[t] = C[t-1] + W
Q = R[t]+V; A = R[t]/Q
m[t] = (1- A)*m[t-1] +A*y[t]; C[t] = A*V
}
ts.plot(dropFirst(filter$m), col="black"); lines(m, lty=3)
Predictions start from the posterior estimate of the state, obtained from the Kalman filter, on the time \(t\) on which the forecast is made.
One-step-ahead forecasts follows: \[ \mathbb{E}[y_{t+1} | \mathbf{y}_{1:t}] = F^{T}_{t+1} m_{t+1} \\ \mbox{Var}[y_{t+1} | \mathbf{y_{1:t}}] = F^{T}_{t+1}R_{t+1}F_{t+1} + V_{t+1} = Q_{t+1}\\ y_{t+1} \sim \mathcal{N}(\mathbb{E}[y_{t+1} | \mathbf{y}_{1:t}],\mbox{Var}[y_{t+1} | \mathbf{y_{1:t}}]) \]
This recursive scheme can be extended to \(k\)-steps ahead forecast.
mforecast = dlmForecast(filter, nAhead=1); mforecast$f
## [,1] ## [1,] 1.831276
mforecast$Q
## [[1]] ## [,1] ## [1,] 12.5
One of the attractive features of state space models is that estimation and forecasting can be applied sequentially, as new data become available.
However, when we obtain the observations up to time \(T\), we want to retrospectively reconstruct the behavior of the system.
In this case, one can use a backward-recursive algorithm to compute the conditional distributions of \(\theta_t\) given \(\mathbf{y}_{1:T}\), for any \(t < T\), starting from the filtering distribution \(p(\theta_T|\mathbf{y}_{1:T})\) and estimating backward all the states’ history.
The smoothing scheme is to look back to the system: retain the essential information and dump the unnecessary information.
For one step back, the procedure follows: \[p(\theta_{t-1}|\mathbf{y}_{1:t}) = \int p(\theta_{t-1} | \theta_t, \mathbf{y}_{1:t}) p(\theta_t | \mathbf{y}_{1:t}) d \theta_t\]
smooth = dlmSmooth(y, buildMod);
ts.plot(dropFirst(filter$m), col="black"); lines(dropFirst(smooth$s), col="grey")
To estimate the state vector we compute the conditional densities \(p(\theta_s | \mathbf{y}_{1:t})\).
We distinguish between problems of filtering (when \(s = t\)), forecasting (\(s > t\)) and smoothing (\(s < t\)).
It is worth underlining the difference between filtering and smoothing. In the filtering problem, the data are supposed to arrive sequentially in time. While in smoothing, it is a retrospective analysis where all the observations are collected.
An application of the probabilistic view
A recursive filter
Forecasting and smoothing
Last lecture (optional): review, start at 9:30am