Particle Smoothing

dummy slide

$$ \definecolor{cora}{RGB}{230,159,0} \definecolor{blugre}{RGB}{0,158,115} \definecolor{vermilion}{RGB}{213,94,0} \def\mat#1{{\mathbf{#1}}} \def\vect#1{{\boldsymbol{#1}}} \def\diff{{\mathop{}\!\mathrm{d}}} \def\LP#1{{\left(#1\right)}} \def\LB#1{{\left\{#1\right\}}} \def\LPV #1#2{{\left(\left.#1\vphantom{#2} \right\vert #2\right)}} \def\PF#1#2#3{\overrightarrow{#1}_{#2}^{(#3)}} \def\PB#1#2#3{\overleftarrow{#1}_{#2}^{(#3)}} \def\PS#1#2#3{\overleftrightarrow{#1}_{#2}^{(#3)}} \def\bigO#1{\mathcal{O}\LP{#1}} \def\argmax{\mathop{\mathrm{argmax}}} $$

Introduction

Discrete-time models are popular corporate distress models. E.g., the event that a firms miss an interest payment. Shumway (2001), Chava and Jarrow (2004), and Campbell and Szilagyi (2008) had 4,876 citations on Google Scholar on 7/11/2018.

\[ \begin{align*} y_{it} &\in\{0,1\} \\ h\LP{E\LPV{y_{it}}{\vect z_t}} &= \vect\beta^\top\vect z_{it} \end{align*} \]

\(h\) is a link function and \(\vect z_{it}\) are know covariates.

Data set

Use data set with US public firms. The data set contains at least 1,048 firms and at most 1,833 firms at each month \(t\). There is 624,717 monthly observations, 4,433 firms, and 721 distress events.

Cannot capture aggregate distress levels

Gray area is pointwise 95% prediction intervals. The line is the predicted distress rate. Dots are realized distress rates. Black dots are not covered by prediction intervals.

Model

\[ \begin{align*} h\LP{E\LPV{y_{it}}{\vect z_t, \vect x_t, \vect u_{it}}} &= \vect\beta^\top\vect z_{it} + \vect x_t^\top\vect u_{it} \\ \vect x_t &\sim \mathcal N\LP{\mat F\vect x_{t-1}, \mat \Sigma} \end{align*} \]

\(\vect u_{it}\) are known firm covariates.

Hidden Markov model

\[ g_t\LPV{\vect Y_t}{\vect X_t} \qquad f\LPV{\vect X_t}{\vect X_{t-1}} \qquad f\LP{\vect X_1} \]

\[ P\LP{\vect y_{1:d}} = \int f\LP{\vect x_1} g_1\LPV{\vect y_1}{\vect x_1} \prod_{t = 2}^d g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\vect x_{t - 1}} \diff \vect x_{1:d} \]

All implicitly depend on the parameters in the model. \(g_t\)s implicitly depend on \(\mat Z_t\) and \(\mat U_t\).

EM algorithm

\[ \begin{align*} Q\LPV{\vect\theta}{\vect\theta^{(k)}} &= E\LPV{\log P\LP{\vect y_{1:d}}}{\vect\theta^{(k)}} \\ \log P\LP{\vect y_{1:d}} &= \log f\LP{\vect X_1} + \sum_{t = 2}^d \log \varphi\LP{\vect X_t;\mat F\vect X_{t-1},\mat \Sigma} \\ &\quad + \sum_{t = 1}^d\sum_{i \in R_t} y_{it}\log\LP{h^{-1}\LP{\eta_{it}}} + \LP{1 - y_{it}}\log\LP{1 - h^{-1}\LP{\eta_{it}}} \\ \eta_{it}\LP{\vect x_t} &= \vect\beta^\top\vect z_{it} + \vect x_t^\top\vect u_{it} \end{align*} \]

where \(\varphi\) is a multivariate normal density function, \(R_t\) is the risk set at time \(t\), and \(h^{-1}\) is the inverse link function.

EM algorithm

Need to evaluate

\[ E\LPV{\phi\LP{\vect X_{t-1:t}}}{\vect\theta^{(k)},\vect y_{1:d}} \qquad E\LPV{\psi\LP{\vect X_t}}{\vect\theta^{(k)},\vect y_{1:d}} \]

Talk overview

Importance sampling.

Particle filtering.

Two particle smoothers.

The implementation.

Examples.

Importance sampling

Want to approximate target density \(\tilde d(x) = \alpha d(x)\). Only know \(d\).

  1. Sample \(x^{(1)},x^{(2)},\dots,x^{(N)}\) from proposal distribution with density \(q\).
  2. Compute unnormalised weights \(\tilde W^{(i)} = d(x^{(i)})/q(x^{(i)})\).
  3. Normalize weights \(W^{(i)} = \tilde W^{(i)} / \sum_{i=1}^N \tilde W^{(i)}\).

Importance sampling

Yields a discrete approximation of distribution

\[\widehat P(X) = \sum_{i=1}^N W^{(i)} \delta_{x^{(i)}}(X)\] \(\delta\) is the Dirac delta function.

\[ E\LP{\phi(X)} = \int \phi(x)\diff x \approx \sum_{i = 1}^N W^{(i)} \phi\LP{x^{(i)}} \]

Use effetive sample size to judge

\[ESS = \LP{\sum_{i = 1}^N W^{(i)2}}^{-1}\]

Example: Target

Example: Proposal

Example: Importance sampling

Effective sample size is 29.59 with 50 samples.

Example: Proposal

Example: Importance sampling

Effective sample size is 17.51 with 50 samples.

Requirements

\[ d(x) > 0 \Rightarrow q(x) > 0 \]

\[ \frac{d(x)}{q(x)} < C < \infty \]

Relating to Hiden Markov models

Need

\[ P\LPV{\vect X_1}{\vect y_1} = \frac{g_t\LPV{\vect y_1}{\vect x_1}f\LP{\vect x_1}}{P\LP{\vect y_1}} \]

Use importance sampling to get

\[ \widehat P\LPV{\vect X_1}{\vect y_1} = \sum_{i = 1}^N \PF{W}{1}{i}\delta_{\PF{\vect x}{1}{i}}\LP{\vect X_1} \]

Notation and goal

“Forward” particle clouds \(\{\PF{W}{t}{i}, \PF{\vect x}{t}{i}\}_{i=1,\dots,N}\).

“Backward” particle clouds \(\{\PB{W}{t}{i}, \PB{\vect x}{t}{i}\}_{i=1,\dots,N}\).

“Smoothed” particle clouds \(\{\PS{W}{t}{i}, \PS{\vect x}{t}{i}\}_{i=1,\dots,N}\).

End goal is \[ E\LPV{\phi(\vect X_t)}{\vect y_{1:d}} \approx \sum_{i = 1}^N \PS{W}{t}{i} \phi\LP{\PS{\vect x}{t}{i}} \]

Particle filtering

Also known as sequential Monte Carlo.

Use \(\widehat P\LPV{\vect X_{1:t-1}}{\vect y_{1:t-1}}\) in

\[ \begin{align*} P\LPV{\vect x_{1:t}}{\vect y_{1:t}} &= P\LPV{\vect x_{1:t-1}}{\vect y_{1:t-1}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\vect x_{t-1}} }{P\LPV{\vect y_t}{\vect y_{1:t -1}}} \\ &\approx \sum_{i = 1}^N \PF{W}{t-1}{i} \delta_{\PF{\vect x}{1:{t-1}}{i}}\LP{\vect x_{1:t-1}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\PF{\vect x}{t-1}{i}} }{P\LPV{\vect y_t}{\vect y_{1:t -1}}} \end{align*} \]

Sequential Importance Sampling

  1. Sample

    \[ \PF{\vect x}{t}{i} \sim \overrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i},\vect y_t} \]

  2. Compute and normalize weights

    \[ \begin{align*} \PF{\tilde W}{t}{i} &= \PF{W}{t-1}{i} \frac{ g_t\LPV{\vect y_t}{\PF{\vect x}{t}{i}}f\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{i}} }{\overrightarrow q_t\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{i},\vec y_t}}\\ \PF{W}{t}{i} &= \PF{\tilde W}{t}{i} / \sum_{i = 1}^N\PF{\tilde W}{t}{i} \end{align*} \]

Weight degeneracy

Weights, \(\PF{W}{t}{i}\)s, can degenerate.

Use auxiliary particle filter (Pitt and Shephard 1999) and resample with weights that are ideally

\[ \PF{\beta}{t}{i}\propto P\LPV{\vec y_t}{\PF{\vect x}{t-1}{i}}\PF{W}{t-1}{i} \]

Setting \(\PF{\beta}{t}{i} = \PF{W}{t-1}{i}\) yields the sequential importance resampling algorithm.

Different resampling approaches can be used. See Douc and Cappe (2005).

Auxiliary particle filter

  1. Sample \(j_1,\dots,j_N\) indices using \(\{\PF{\beta}{t}{i}\}_{i=1,\dots,n}\).

  2. Sample

    \[ \PF{\vect x}{t}{i} \sim \overrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{j_i},\vec y_t} \]

  3. Compute and normalize weights

    \[ \begin{align*} \PF{\tilde W}{t}{i} &= \frac{\PF{W}{t-1}{j_i}}{\PF{\beta}{t}{j_i}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f \LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{j_i}} }{\overrightarrow q_t\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{j_i},\vec y_t}}\\ \PF{W}{t}{i} &= \PF{\tilde W}{t}{i} / \sum_{i = 1}^N\PF{\tilde W}{t}{i} \end{align*} \]

Smoothing

Need \(P\LPV{\vect X_t}{\vect y_{1:d}}\) and \(P\LPV{\vect X_{t-1:t}}{\vect y_{1:d}}\).

Use generalized two filter smoothing formula from Briers, Doucet, and Maskell (2009) and smoother from Fearnhead, Wyncoll, and Tawn (2010).

Smoother from Briers et al. (2009)

Use the identity

\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} P\LPV{\vect y_{t:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &\propto \int f\LPV{\vect x_t}{\vect x_{t-1}} P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} P\LPV{\vect y_{t:d}}{\vect x_t} \end{align*} \]

Need to approximate \[ P\LPV{\vect y_{t:d}}{\vect x_t} = \ \int \prod_{k = t + 1}^d f\LPV{\vect x_k}{\vect x_{k-1}} \prod_{k =t}^d g\LPV{\vect y_k}{\vect x_k}\diff \vect x_{t + 1:d} \]

Backward filter

Let

\[ \gamma_1\LP{\vect x}, \quad \gamma_t(\vect x) = \int f\LPV{\vect x}{\vect x_{t-1}} \gamma_{t-1}\LP{\vect x_{t-1}}\diff\vect x_{t-1} \]

Then we introduce artificial probability densities

\[ \tilde P\LPV{\vect x_{t:d}}{\vect y_{1:d}} = \frac{ \gamma_t\LP{\vect x_t} \prod_{k = t + 1}^d f\LPV{\vect x_k}{\vect x_{k-1}} \prod_{k =t}^d g\LPV{\vect y_k}{\vect x_k} }{\tilde P\LP{\vect y_{t:d}}} \]

We can make a backward filter to sample \(\{\PB{\vect x}{t}{i}, \PB{W}{t}{i} \}_{i = 1, \dots, N}\) from \(\tilde P\LPV{\vect x_{t:d}}{\vect y_{t:d}}\).

Combining filters

We can show that

\[ \begin{align*} P\LPV{\vect y_{t:d}}{\vect x_t} &= \tilde P\LP{\vect y_{t:d}} \frac{\tilde P\LPV{\vect x_t}{\vect y_{t:d}}}{ \gamma_t\LP{\vect x_t}} \\ &\propto \frac{\tilde P\LPV{\vect x_t}{\vect y_{t:d}}}{ \gamma_t\LP{\vect x_t}} \approx \sum_{i = 1}^N \frac{\PB{W}{t}{i}}{\gamma_t(\vect x_t)} \delta_{\PB{\vect x}{t}{i}}(\vect x_t) \end{align*} \]

Combining filters

Now, we have the approximations we need

\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &\propto \int f\LPV{\vect x_t}{\vect x_{t-1}} P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} P\LPV{\vect y_{t:d}}{\vect x_t} \\ &\approx \sum_{j=1}^N \PS{W}{t}{j}\delta_{\PB{\vect x}{t}{j}}\LP{\vect x_t} \\ \PS{W}{t}{j} &\propto \sum_{i = 1}^N f\LPV{\PB{\vect x}{t}{j}}{\PF{\vect x}{t-1}{i}}\PF{W}{t-1}{i} \frac{\PB{W}{t}{j}}{\gamma_t\LP{\PB{\vect x}{t}{j}}} \end{align*} \]

Comments

Smoothing step is \(\bigO{N^2}\) but independent of \(n_t = \lvert R_t \rvert\).

Can be reduced with approximate methods in Klaas et al. (2006). Yields \(\bigO{N\log N}\) run times. Alternatively, use rejection sampling if \(f\LPV{\vect x_t}{\vect x_{t-1}} / \gamma_t\LP{\vect x_t}\) is bounded.

Cannot handle singular components.

Smoother from Fearnhead et al. (2010)

Use the identity

\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} P\LPV{\vect y_{t:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} g_t\LPV{\vect y_t}{\vect x_t} P\LPV{\vect y_{t+1:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &\propto \int f\LPV{\vect x_t}{\vect x_{t-1}} P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} g_t\LPV{\vect y_t}{\vect x_t} \cdot \\ &\qquad\int P\LPV{\vect y_{t+1:d}}{\vect x_{t+1}} f\LPV{\vect x_{t + 1}}{\vect x_t}\diff \vect x_{t+1} \end{align*} \]

Smoother from Fearnhead et al. (2010)

  1. Sample \((i_k,j_k)_{k=1,\dots,N}\) with weights \(\PF{\beta}{t-1}{i}\) and \(\PB{\beta}{t+1}{k}\).
  2. Sample \(\PS{\vect x}{t}{k} \sim \overleftrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i_k}, \PB{\vect x}{t+1}{j_k}, \vect y_t}\).
  3. Compute and normalize weights

\[ \begin{align*} \PS{\tilde W}{t}{k} & =\frac{ f\LPV{\color{vermilion}\PS{\vect x}{t}{k}}{\color{blugre}\PF{\vect x}{t-1}{i_k}} \color{vermilion}g_t\LPV{\vect y}{\PS{\vect x}{t}{k}} \color{black}f\LPV{\color{cora}\PB{\vect x}{t+1}{j_k}}{ \color{vermilion}\PS{\vect x}{t}{k}} }{ \overleftrightarrow q_t\LPV{\PS{\vect x}{t}{k}}{\PF{\vect x}{t-1}{i_k}, \PB{\vect x}{t+1}{j_k}, \vect y_t} }\frac{ \color{cora}{\PB{W}{t+1}{j_k}} \color{blugre}{\PF{W}{t-1}{i_k}} }{ \color{cora}\gamma_{t+1}\LP{\PB{\vect x}{t+1}{j_k}} \color{black}\PB{\beta}{t+1}{j_k}\PF{\beta}{t-1}{i_k} } \\ \PS{W}{t}{k} &= \PS{\tilde W}{t}{k} / \sum_{i = 1}^N\PS{\tilde W}{t}{k} \end{align*} \]

Compare with

\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &= \color{blugre}\int \color{black}f\LPV{\color{vermilion}\vect x_t}{\color{blugre}\vect x_{t-1}} \color{blugre}P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} \color{vermilion}g_t\LPV{\vect y_t}{\vect x_t} \cdot \\ &\qquad\color{cora}\int P\LPV{\vect y_{t+1:d}}{\vect x_{t+1}} \color{black}f\LPV{\color{cora}\vect x_{t + 1}}{\color{vermilion}\vect x_t} \color{cora}\diff \vect x_{t+1} \end{align*} \]

Comments

Smoothing step is \(\bigO{N}\) but \(\bigO{n_tp}\) where \(n_t = \lvert R_t \rvert\) and \(p\) is the dimension of \(\vect X_t\).

Can generalize to handle singular components.

Implementation

Implemented in the dynamichazard package.

Most parts are done in parallel with OpenMP or the thread library in C++.

Sampling is not done in parallel. Does not matter if \(n_t = \lvert R_t \rvert\) is large.

Can be done in parallel e.g., as in Zhou (2015) or using the software from L’Ecuyer et al. (2002).

M-step

Maximizing with respect to \(\vect \beta\) is expensive for moderate amount of fixed effects

\[ \begin{align*} & E\LPV{ \sum_{t = 1}^d\sum_{i \in R_t} y_{it}\log\LP{h^{-1}\LP{\eta_{it}}} + \LP{1 - y_{it}}\log\LP{1 - h^{-1}\LP{\eta_{it}}} }{\vect \theta^{(k)}} \\ & \eta_{it}\LP{\vect x_t} = \vect\beta^\top\vect z_{it} + \vect x_t^\top\vect u_{it} \end{align*} \]

E.g., \((n_t, N, d) = (1000, 250, 400)\) then it amounts to a GLM with 100,000,000 observations.

M-step

Take only one iteratively re-weighted least squares iteration.

Do as Wood, Goude, and Shaw (2015) to compute a QR decomposition in parallel with low memory requirements.

Implemented in the bam function in the mgcv package.

May be able to reduce computation time further by using that linear predictors only differ by \(\vect x_t^\top\vect u_{it}\).

Proposal distribution

Need up to three proposal distribution:

\[ \begin{align*} \PF{\vect x}{t}{i} &\sim \overrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i},\vect y_t} \\ \PB{\vect x}{t}{i} &\sim \overleftarrow q_t\LPV{\cdot}{\PB{\vect x}{t+1}{i},\vect y_t} \\ \PS{\vect x}{t}{k} &\sim \overleftrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i_k}, \PB{\vect x}{t+1}{j_k}, \vect y_t} \end{align*} \]

Focus on \(\overrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i},\vect y_t}\).

Proposal distribution

Use \(f\) as proposal distribution.

Bootstrap filter from Gordon, Salmond, and Smith (1993).

Estimate the mode

\[ \vect \mu_t^{(i)} = \argmax_{\vect x_t} g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\PF{\vect x}{t-1}{i}} \]

and make a Taylor expansion around \(\vect \mu_t^{(i)}\).

Then use a multivariate normal distribution or multivariate t-distribution as suggested in Pitt and Shephard (1999) and used in Fearnhead, Wyncoll, and Tawn (2010).

Is \(\bigO{Nn_tp^2}\) where \(p\) is the dimension of \(\vect x_t\).

Proposal distribution

Instead make the expansion once using

\[ \bar{\vect x}_{t -1} = \sum_{i = 1}^N \PF{W}{t - 1}{i}\PF{\vect x}{t - 1}{i} \]

Reduces cost to \(\bigO{Nn_tp + n_tp^2}\).

Example

frm_fixed
## Surv(tstart, tstop, y) ~ wz(r_req_nn) + wz(r_niq_nn) + wz(r_ltq_nn) + 
##     wz(r_actq_lctq) + wz(sigma) + wz(excess_ret) + wz(rel_size) + 
##     wz(dtd) + log_market_ret + r1y + sp_w_c(r_niq_nn, 4) + sp_w_c(sigma, 
##     4)
Fmat <- diag(c(0.948, 0.9793))
Q. <- matrix(c(0.05979 , 0.011014, 
               0.011014, 0.002504), byrow = TRUE, 2)

\[ \begin{align*} h\LP{E\LPV{y_{it}}{\vect z_t, \vect x_t, \vect u_{it}}} &= \vect\beta^\top\vect z_{it} + \vect x_t^\top\vect u_{it} \\ \vect x_t &\sim \mathcal N\LP{\mat F\vect x_{t-1}, \mat \Sigma} \end{align*} \]

Bootstrap filter

library(dynamichazard)
set.seed(58201195)
system.time(
  f1 <- PF_EM(
    fixed = frm_fixed, random = ~ 1 + wz(rel_size), 
    data = distress, model = "logit", type = "VAR", 
    Fmat = Fmat, Q = Q., fixed_effects = fix_e,
    control = PF_control(
      N_fw_n_bw = 1000L, N_smooth = 2500L, N_first = 3000L, 
      N_smooth_final = 500L, n_threads = 4L, n_max = 1L, nu = 8L,
      method = "bootstrap_filter", smoother = "Fearnhead_O_N"),
    by = 1L, max_T = max(distress$tstop), id = distress$gvkey,
    Q_0 = diag(1, 2)))
##    user  system elapsed 
##   667.9   151.0   218.6
logLik(f1)       # from particle filter
## 'log Lik.' -3592 (df=24)
logLik(base_fit) # glm model w/o random effects
## 'log Lik.' -3672 (df=17)

Bootstrap filter: Effective sample size

par(mar = c(5, 4, 1, 1))
plot(head(f1$effective_sample_size$smoothed_clouds, -1), type = "h", 
     ylab = "Effective sample size", xlab = "Time")

Bootstrap filter: Predicted state variables

par(mfcol = c(1, 2), mar = c(5, 4, 1, 1))
plot(f1, cov_index = 1, qlvls = c(.05, .95), qtype = "lines")
plot(f1, cov_index = 2, qlvls = c(.05, .95), qtype = "lines")

Auxiliary particle filter with different proposal

set.seed(58201195)
system.time(
  f2 <- PF_EM(
    fixed = frm_fixed, random = ~ 1 + wz(rel_size), 
    data = distress, model = "logit", type = "VAR", 
    Fmat = Fmat, Q = Q., fixed_effects = fix_e,
    control = PF_control(
      N_fw_n_bw = 1000L, N_smooth = 2500L, N_first = 3000L, 
      N_smooth_final = 500L, n_threads = 4L, n_max = 1L, nu = 8L,
      method = "AUX_normal_approx_w_cloud_mean", smoother = "Fearnhead_O_N"),
    by = 1L, max_T = max(distress$tstop), id = distress$gvkey,
    Q_0 = diag(1, 2)))
##    user  system elapsed 
##   840.3   177.9   272.2
logLik(f2)
## 'log Lik.' -3592 (df=24)

Auxiliary particle filter with different proposal

par(mar = c(5, 4, 1, 1))
plot(head(f2$effective_sample_size$smoothed_clouds - 
            f1$effective_sample_size$smoothed_clouds, -1), type = "h", 
     ylab = "Diff effective sample size", xlab = "Time")

Auxiliary particle filter with different proposal

par(mfcol = c(1, 2), mar = c(5, 4, 1, 1))
plot(f2, cov_index = 1, qlvls = c(.05, .95), qtype = "lines")
plot(f2, cov_index = 2, qlvls = c(.05, .95), qtype = "lines")

Different smoother

set.seed(58201195)
system.time(
  f3 <- PF_EM(
    fixed = frm_fixed, random = ~ 1 + wz(rel_size), 
    data = distress, model = "logit", type = "VAR", 
    Fmat = Fmat, Q = Q., fixed_effects = fix_e,
    control = PF_control(
      N_fw_n_bw = 500L, N_smooth = 1L, N_first = 3000L, n_threads = 4L, 
      n_max = 1L, nu = 8L,
      method = "AUX_normal_approx_w_cloud_mean", smoother = "Brier_O_N_square"),
    by = 1L, max_T = max(distress$tstop), id = distress$gvkey,
    Q_0 = diag(1, 2)))
##    user  system elapsed 
##   644.6   168.1   218.0
logLik(f3)
## 'log Lik.' -3594 (df=24)
The PF_forward_filter function can be used to get more precise log-likelihood estimate.

Different smoother

par(mar = c(5, 4, 1, 1))
plot(head(f3$effective_sample_size$smoothed_clouds - 
            f2$effective_sample_size$smoothed_clouds, -1), type = "h", 
     ylab = "Diff effective sample size", xlab = "Time")

Auxiliary particle filter with different proposal

set.seed(58201195)
system.time(
  f4 <- PF_EM(
    fixed = frm_fixed, random = ~ 1 + wz(rel_size), 
    data = distress, model = "logit", type = "VAR", 
    Fmat = Fmat, Q = Q., fixed_effects = fix_e,
    control = PF_control(
      N_fw_n_bw = 1000L, N_smooth = 2500L, N_first = 3000L, 
      N_smooth_final = 500L, n_threads = 4L, n_max = 1L, nu = 8L,
      method = "AUX_normal_approx_w_particles", smoother = "Fearnhead_O_N"),
    by = 1L, max_T = max(distress$tstop), id = distress$gvkey,
    Q_0 = diag(1, 2)))
##    user  system elapsed 
##  3812.2   582.2  1175.0
logLik(f4)
## 'log Lik.' -3592 (df=24)

Auxiliary particle filter with different proposal

par(mar = c(5, 4, 1, 1))
plot(head(f4$effective_sample_size$smoothed_clouds - 
            f2$effective_sample_size$smoothed_clouds, -1), type = "h", 
     ylab = "Diff effective sample size", xlab = "Time")

Auxiliary particle filter with different proposal

sapply(f4$effective_sample_size, mean) - sapply(f2$effective_sample_size, mean)
##  forward_clouds backward_clouds smoothed_clouds 
##           1.917           2.223          -4.318

Next steps

Different \(y_{it}\) and distributions \(g_t\LPV{\vect y_t}{\vect x_t}\).

Other types of state models.

\(\text{VAR}(q)\) with \(q > 1\), \(\text{VARMA}(p,q)\), restricted models, etc.

Compute observed information matrix.

E.g., see Cappé, Moulines, and Ryden (2005, chap. 11) and Poyiadjis, Doucet, and Singh (2011).

Change particle filter and smoothers.

Different proposal distributions (e.g., unscented transformation and adaptive importance sampling), use resample-moves, use block sampling, and more.

“Nicer” API.

Better implementation of the smoother from Briers, Doucet, and Maskell (2009).

Thank you!

Slides are on github.com/boennecd/Talks and rpubs.com/boennecd/KU-PF-18.

More examples at github.com/boennecd/dynamichazard/tree/master/examples.

Doucet and Johansen (2011) and Givens and Hoeting (2012, chap. 6) has been used in preparation of the slides.

References

Briers, Mark, Arnaud Doucet, and Simon Maskell. 2009. “Smoothing Algorithms for State–space Models.” Annals of the Institute of Statistical Mathematics 62 (1): 61. doi:10.1007/s10463-009-0236-2.

Campbell, Jens, John Y. And Hilscher, and Jan Szilagyi. 2008. “In Search of Distress Risk.” The Journal of Finance 63 (6): 2899–2939. doi:10.1111/j.1540-6261.2008.01416.x.

Cappé, Olivier, Eric Moulines, and Tobias Ryden. 2005. Inference in Hidden Markov Models. Springer-Verlag New York.

Chava, Sudheer, and Robert A. Jarrow. 2004. “Bankruptcy Prediction with Industry Effects.” doi:10.2139/ssrn.287474.

Douc, R., and O. Cappe. 2005. “Comparison of Resampling Schemes for Particle Filtering.” In ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005., 64–69. doi:10.1109/ISPA.2005.195385.

Doucet, Arnaud, and Adam M Johansen. 2011. “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later.” In Handbook of Nonlinear Filtering, edited by Dan Crisan and Boris Rozovskii, 741–67. Oxford: Oxford University Press.

Fearnhead, Paul, David Wyncoll, and Jonathan Tawn. 2010. “A Sequential Smoothing Algorithm with Linear Computational Cost.” Biometrika 97 (2). [Oxford University Press, Biometrika Trust]: 447–64. http://www.jstor.org/stable/25734097.

Givens, Geof H, and Jennifer A Hoeting. 2012. Computational Statistics. 2nd ed. Wiley.

Gordon, N. J., D. J. Salmond, and A. F. M. Smith. 1993. “Novel Approach to Nonlinear/Non-Gaussian Bayesian State Estimation.” IEE Proceedings F - Radar and Signal Processing 140 (2): 107–13. doi:10.1049/ip-f-2.1993.0015.

Klaas, Mike, Mark Briers, Nando de Freitas, Arnaud Doucet, Simon Maskell, and Dustin Lang. 2006. “Fast Particle Smoothing: If I Had a Million Particles.” In Proceedings of the 23rd International Conference on Machine Learning, 481–88. ICML ’06. New York, NY, USA: ACM. doi:10.1145/1143844.1143905.

L’Ecuyer, Pierre, Richard Simard, E. Jack Chen, and W. David Kelton. 2002. “An Object-Oriented Random-Number Package with Many Long Streams and Substreams.” Operations Research 50 (6). INFORMS: 1073–5. http://www.jstor.org/stable/3088626.

Pitt, Michael K., and Neil Shephard. 1999. “Filtering via Simulation: Auxiliary Particle Filters.” Journal of the American Statistical Association 94 (446). [American Statistical Association, Taylor & Francis, Ltd.]: 590–99. http://www.jstor.org/stable/2670179.

Poyiadjis, George, Arnaud Doucet, and Sumeetpal S. Singh. 2011. “Particle Approximations of the Score and Observed Information Matrix in State Space Models with Application to Parameter Estimation.” Biometrika 98 (1). Biometrika Trust: 65–80. http://www.jstor.org/stable/29777165.

Shumway, Tyler. 2001. “Forecasting Bankruptcy More Accurately: A Simple Hazard Model.” The Journal of Business 74 (1). The University of Chicago Press: 101–24. http://www.jstor.org/stable/10.1086/209665.

Wood, Simon N., Yannig Goude, and Simon Shaw. 2015. “Generalized Additive Models for Large Data Sets.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 64 (1): 139–55. doi:10.1111/rssc.12068.

Zhou, Yan. 2015. “VSMC: Parallel Sequential Monte Carlo in C++.” Journal of Statistical Software, Articles 62 (9): 1–49. doi:10.18637/jss.v062.i09.