\[ \begin{align*} y_{it} &\in\{0,1\} \\ h\LP{E\LPV{y_{it}}{\vect z_t}} &= \vect\beta^\top\vect z_{it} \end{align*} \]
\(h\) is a link function and \(\vect z_{it}\) are know covariates.
\[ \begin{align*} h\LP{E\LPV{y_{it}}{\vect z_t, \vect x_t, \vect u_{it}}} &= \vect\beta^\top\vect z_{it} + \vect x_t^\top\vect u_{it} \\ \vect x_t &\sim \mathcal N\LP{\mat F\vect x_{t-1}, \mat \Sigma} \end{align*} \]
\(\vect u_{it}\) are known firm covariates.
\[ g_t\LPV{\vect Y_t}{\vect X_t} \qquad f\LPV{\vect X_t}{\vect X_{t-1}} \qquad f\LP{\vect X_1} \]
\[ P\LP{\vect y_{1:d}} = \int f\LP{\vect x_1} g_1\LPV{\vect y_1}{\vect x_1} \prod_{t = 2}^d g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\vect x_{t - 1}} \diff \vect x_{1:d} \]
\[ \begin{align*} Q\LPV{\vect\theta}{\vect\theta^{(k)}} &= E\LPV{\log P\LP{\vect y_{1:d}}}{\vect\theta^{(k)}} \\ \log P\LP{\vect y_{1:d}} &= \log f\LP{\vect X_1} + \sum_{t = 2}^d \log \varphi\LP{\vect X_t;\mat F\vect X_{t-1},\mat \Sigma} \\ &\quad + \sum_{t = 1}^d\sum_{i \in R_t} y_{it}\log\LP{h^{-1}\LP{\eta_{it}}} + \LP{1 - y_{it}}\log\LP{1 - h^{-1}\LP{\eta_{it}}} \\ \eta_{it}\LP{\vect x_t} &= \vect\beta^\top\vect z_{it} + \vect x_t^\top\vect u_{it} \end{align*} \]
where \(\varphi\) is a multivariate normal density function, \(R_t\) is the risk set at time \(t\), and \(h^{-1}\) is the inverse link function.
Need to evaluate
\[ E\LPV{\phi\LP{\vect X_{t-1:t}}}{\vect\theta^{(k)},\vect y_{1:d}} \qquad E\LPV{\psi\LP{\vect X_t}}{\vect\theta^{(k)},\vect y_{1:d}} \]
Particle filtering.
Two particle smoothers.
The implementation.
Examples.
Want to approximate target density \(\tilde d(x) = \alpha d(x)\). Only know \(d\).
Yields a discrete approximation of distribution
\[\widehat P(X) = \sum_{i=1}^N W^{(i)} \delta_{x^{(i)}}(X)\] \(\delta\) is the Dirac delta function.
\[ E\LP{\phi(X)} = \int \phi(x)\diff x \approx \sum_{i = 1}^N W^{(i)} \phi\LP{x^{(i)}} \]
Use effetive sample size to judge
\[ESS = \LP{\sum_{i = 1}^N W^{(i)2}}^{-1}\]
Effective sample size is 29.59 with 50 samples.
Effective sample size is 17.51 with 50 samples.
\[ d(x) > 0 \Rightarrow q(x) > 0 \]
\[ \frac{d(x)}{q(x)} < C < \infty \]
Need
\[ P\LPV{\vect X_1}{\vect y_1} = \frac{g_t\LPV{\vect y_1}{\vect x_1}f\LP{\vect x_1}}{P\LP{\vect y_1}} \]
Use importance sampling to get
\[ \widehat P\LPV{\vect X_1}{\vect y_1} = \sum_{i = 1}^N \PF{W}{1}{i}\delta_{\PF{\vect x}{1}{i}}\LP{\vect X_1} \]
“Forward” particle clouds \(\{\PF{W}{t}{i}, \PF{\vect x}{t}{i}\}_{i=1,\dots,N}\).
“Backward” particle clouds \(\{\PB{W}{t}{i}, \PB{\vect x}{t}{i}\}_{i=1,\dots,N}\).
“Smoothed” particle clouds \(\{\PS{W}{t}{i}, \PS{\vect x}{t}{i}\}_{i=1,\dots,N}\).
End goal is \[ E\LPV{\phi(\vect X_t)}{\vect y_{1:d}} \approx \sum_{i = 1}^N \PS{W}{t}{i} \phi\LP{\PS{\vect x}{t}{i}} \]
Also known as sequential Monte Carlo.
Use \(\widehat P\LPV{\vect X_{1:t-1}}{\vect y_{1:t-1}}\) in
\[ \begin{align*} P\LPV{\vect x_{1:t}}{\vect y_{1:t}} &= P\LPV{\vect x_{1:t-1}}{\vect y_{1:t-1}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\vect x_{t-1}} }{P\LPV{\vect y_t}{\vect y_{1:t -1}}} \\ &\approx \sum_{i = 1}^N \PF{W}{t-1}{i} \delta_{\PF{\vect x}{1:{t-1}}{i}}\LP{\vect x_{1:t-1}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\PF{\vect x}{t-1}{i}} }{P\LPV{\vect y_t}{\vect y_{1:t -1}}} \end{align*} \]
\[ \PF{\vect x}{t}{i} \sim \overrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i},\vect y_t} \]
\[ \begin{align*} \PF{\tilde W}{t}{i} &= \PF{W}{t-1}{i} \frac{ g_t\LPV{\vect y_t}{\PF{\vect x}{t}{i}}f\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{i}} }{\overrightarrow q_t\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{i},\vec y_t}}\\ \PF{W}{t}{i} &= \PF{\tilde W}{t}{i} / \sum_{i = 1}^N\PF{\tilde W}{t}{i} \end{align*} \]
Weights, \(\PF{W}{t}{i}\)s, can degenerate.
Use auxiliary particle filter (Pitt and Shephard 1999) and resample with weights that are ideally
\[ \PF{\beta}{t}{i}\propto P\LPV{\vec y_t}{\PF{\vect x}{t-1}{i}}\PF{W}{t-1}{i} \]
Setting \(\PF{\beta}{t}{i} = \PF{W}{t-1}{i}\) yields the sequential importance resampling algorithm.
Different resampling approaches can be used. See Douc and Cappe (2005).
Sample \(j_1,\dots,j_N\) indices using \(\{\PF{\beta}{t}{i}\}_{i=1,\dots,n}\).
\[ \PF{\vect x}{t}{i} \sim \overrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{j_i},\vec y_t} \]
\[ \begin{align*} \PF{\tilde W}{t}{i} &= \frac{\PF{W}{t-1}{j_i}}{\PF{\beta}{t}{j_i}} \frac{ g_t\LPV{\vect y_t}{\vect x_t}f \LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{j_i}} }{\overrightarrow q_t\LPV{\PF{\vect x}{t}{i}}{\PF{\vect x}{t-1}{j_i},\vec y_t}}\\ \PF{W}{t}{i} &= \PF{\tilde W}{t}{i} / \sum_{i = 1}^N\PF{\tilde W}{t}{i} \end{align*} \]
Need \(P\LPV{\vect X_t}{\vect y_{1:d}}\) and \(P\LPV{\vect X_{t-1:t}}{\vect y_{1:d}}\).
Use generalized two filter smoothing formula from Briers, Doucet, and Maskell (2009) and smoother from Fearnhead, Wyncoll, and Tawn (2010).
Use the identity
\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} P\LPV{\vect y_{t:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &\propto \int f\LPV{\vect x_t}{\vect x_{t-1}} P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} P\LPV{\vect y_{t:d}}{\vect x_t} \end{align*} \]
Need to approximate \[ P\LPV{\vect y_{t:d}}{\vect x_t} = \ \int \prod_{k = t + 1}^d f\LPV{\vect x_k}{\vect x_{k-1}} \prod_{k =t}^d g\LPV{\vect y_k}{\vect x_k}\diff \vect x_{t + 1:d} \]
Let
\[ \gamma_1\LP{\vect x}, \quad \gamma_t(\vect x) = \int f\LPV{\vect x}{\vect x_{t-1}} \gamma_{t-1}\LP{\vect x_{t-1}}\diff\vect x_{t-1} \]Then we introduce artificial probability densities
\[ \tilde P\LPV{\vect x_{t:d}}{\vect y_{1:d}} = \frac{ \gamma_t\LP{\vect x_t} \prod_{k = t + 1}^d f\LPV{\vect x_k}{\vect x_{k-1}} \prod_{k =t}^d g\LPV{\vect y_k}{\vect x_k} }{\tilde P\LP{\vect y_{t:d}}} \]
We can make a backward filter to sample \(\{\PB{\vect x}{t}{i}, \PB{W}{t}{i} \}_{i = 1, \dots, N}\) from \(\tilde P\LPV{\vect x_{t:d}}{\vect y_{t:d}}\).
We can show that
\[ \begin{align*} P\LPV{\vect y_{t:d}}{\vect x_t} &= \tilde P\LP{\vect y_{t:d}} \frac{\tilde P\LPV{\vect x_t}{\vect y_{t:d}}}{ \gamma_t\LP{\vect x_t}} \\ &\propto \frac{\tilde P\LPV{\vect x_t}{\vect y_{t:d}}}{ \gamma_t\LP{\vect x_t}} \approx \sum_{i = 1}^N \frac{\PB{W}{t}{i}}{\gamma_t(\vect x_t)} \delta_{\PB{\vect x}{t}{i}}(\vect x_t) \end{align*} \]
Now, we have the approximations we need
\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &\propto \int f\LPV{\vect x_t}{\vect x_{t-1}} P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} P\LPV{\vect y_{t:d}}{\vect x_t} \\ &\approx \sum_{j=1}^N \PS{W}{t}{j}\delta_{\PB{\vect x}{t}{j}}\LP{\vect x_t} \\ \PS{W}{t}{j} &\propto \sum_{i = 1}^N f\LPV{\PB{\vect x}{t}{j}}{\PF{\vect x}{t-1}{i}}\PF{W}{t-1}{i} \frac{\PB{W}{t}{j}}{\gamma_t\LP{\PB{\vect x}{t}{j}}} \end{align*} \]
Use the identity
\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} P\LPV{\vect y_{t:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &= \frac{ P\LPV{\vect x_t}{\vect y_{1:t -1}} g_t\LPV{\vect y_t}{\vect x_t} P\LPV{\vect y_{t+1:d}}{\vect x_t}}{ P\LPV{\vect y_{t:d}}{\vect y_{1:t-1}}} \\ &\propto \int f\LPV{\vect x_t}{\vect x_{t-1}} P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} g_t\LPV{\vect y_t}{\vect x_t} \cdot \\ &\qquad\int P\LPV{\vect y_{t+1:d}}{\vect x_{t+1}} f\LPV{\vect x_{t + 1}}{\vect x_t}\diff \vect x_{t+1} \end{align*} \]
\[ \begin{align*} \PS{\tilde W}{t}{k} & =\frac{ f\LPV{\color{vermilion}\PS{\vect x}{t}{k}}{\color{blugre}\PF{\vect x}{t-1}{i_k}} \color{vermilion}g_t\LPV{\vect y}{\PS{\vect x}{t}{k}} \color{black}f\LPV{\color{cora}\PB{\vect x}{t+1}{j_k}}{ \color{vermilion}\PS{\vect x}{t}{k}} }{ \overleftrightarrow q_t\LPV{\PS{\vect x}{t}{k}}{\PF{\vect x}{t-1}{i_k}, \PB{\vect x}{t+1}{j_k}, \vect y_t} }\frac{ \color{cora}{\PB{W}{t+1}{j_k}} \color{blugre}{\PF{W}{t-1}{i_k}} }{ \color{cora}\gamma_{t+1}\LP{\PB{\vect x}{t+1}{j_k}} \color{black}\PB{\beta}{t+1}{j_k}\PF{\beta}{t-1}{i_k} } \\ \PS{W}{t}{k} &= \PS{\tilde W}{t}{k} / \sum_{i = 1}^N\PS{\tilde W}{t}{k} \end{align*} \]
Compare with
\[ \begin{align*} P\LPV{\vect x_t}{\vect y_{1:d}} &= \color{blugre}\int \color{black}f\LPV{\color{vermilion}\vect x_t}{\color{blugre}\vect x_{t-1}} \color{blugre}P\LPV{\vect x_{t-1}}{\vect y_{1:t -1}}\diff \vect x_{t-1} \color{vermilion}g_t\LPV{\vect y_t}{\vect x_t} \cdot \\ &\qquad\color{cora}\int P\LPV{\vect y_{t+1:d}}{\vect x_{t+1}} \color{black}f\LPV{\color{cora}\vect x_{t + 1}}{\color{vermilion}\vect x_t} \color{cora}\diff \vect x_{t+1} \end{align*} \]
Smoothing step is \(\bigO{N}\) but \(\bigO{n_tp}\) where \(n_t = \lvert R_t \rvert\) and \(p\) is the dimension of \(\vect X_t\).
Can generalize to handle singular components.
Implemented in the dynamichazard package.
Most parts are done in parallel with OpenMP or the thread library in C++.
Maximizing with respect to \(\vect \beta\) is expensive for moderate amount of fixed effects
\[ \begin{align*} & E\LPV{ \sum_{t = 1}^d\sum_{i \in R_t} y_{it}\log\LP{h^{-1}\LP{\eta_{it}}} + \LP{1 - y_{it}}\log\LP{1 - h^{-1}\LP{\eta_{it}}} }{\vect \theta^{(k)}} \\ & \eta_{it}\LP{\vect x_t} = \vect\beta^\top\vect z_{it} + \vect x_t^\top\vect u_{it} \end{align*} \]
E.g., \((n_t, N, d) = (1000, 250, 400)\) then it amounts to a GLM with 100,000,000 observations.
Take only one iteratively re-weighted least squares iteration.
Do as Wood, Goude, and Shaw (2015) to compute a QR decomposition in parallel with low memory requirements.
Implemented in the bam function in the mgcv package.
May be able to reduce computation time further by using that linear predictors only differ by \(\vect x_t^\top\vect u_{it}\).
Need up to three proposal distribution:
\[ \begin{align*} \PF{\vect x}{t}{i} &\sim \overrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i},\vect y_t} \\ \PB{\vect x}{t}{i} &\sim \overleftarrow q_t\LPV{\cdot}{\PB{\vect x}{t+1}{i},\vect y_t} \\ \PS{\vect x}{t}{k} &\sim \overleftrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i_k}, \PB{\vect x}{t+1}{j_k}, \vect y_t} \end{align*} \]
Focus on \(\overrightarrow q_t\LPV{\cdot}{\PF{\vect x}{t-1}{i},\vect y_t}\).
Estimate the mode
\[ \vect \mu_t^{(i)} = \argmax_{\vect x_t} g_t\LPV{\vect y_t}{\vect x_t}f\LPV{\vect x_t}{\PF{\vect x}{t-1}{i}} \]
Is \(\bigO{Nn_tp^2}\) where \(p\) is the dimension of \(\vect x_t\).
Instead make the expansion once using
\[ \bar{\vect x}_{t -1} = \sum_{i = 1}^N \PF{W}{t - 1}{i}\PF{\vect x}{t - 1}{i} \]
Reduces cost to \(\bigO{Nn_tp + n_tp^2}\).
frm_fixed## Surv(tstart, tstop, y) ~ wz(r_req_nn) + wz(r_niq_nn) + wz(r_ltq_nn) +
## wz(r_actq_lctq) + wz(sigma) + wz(excess_ret) + wz(rel_size) +
## wz(dtd) + log_market_ret + r1y + sp_w_c(r_niq_nn, 4) + sp_w_c(sigma,
## 4)
Fmat <- diag(c(0.948, 0.9793))
Q. <- matrix(c(0.05979 , 0.011014,
0.011014, 0.002504), byrow = TRUE, 2)\[ \begin{align*} h\LP{E\LPV{y_{it}}{\vect z_t, \vect x_t, \vect u_{it}}} &= \vect\beta^\top\vect z_{it} + \vect x_t^\top\vect u_{it} \\ \vect x_t &\sim \mathcal N\LP{\mat F\vect x_{t-1}, \mat \Sigma} \end{align*} \]
library(dynamichazard)
set.seed(58201195)
system.time(
f1 <- PF_EM(
fixed = frm_fixed, random = ~ 1 + wz(rel_size),
data = distress, model = "logit", type = "VAR",
Fmat = Fmat, Q = Q., fixed_effects = fix_e,
control = PF_control(
N_fw_n_bw = 1000L, N_smooth = 2500L, N_first = 3000L,
N_smooth_final = 500L, n_threads = 4L, n_max = 1L, nu = 8L,
method = "bootstrap_filter", smoother = "Fearnhead_O_N"),
by = 1L, max_T = max(distress$tstop), id = distress$gvkey,
Q_0 = diag(1, 2)))## user system elapsed
## 667.9 151.0 218.6
logLik(f1) # from particle filter## 'log Lik.' -3592 (df=24)
logLik(base_fit) # glm model w/o random effects## 'log Lik.' -3672 (df=17)
par(mar = c(5, 4, 1, 1))
plot(head(f1$effective_sample_size$smoothed_clouds, -1), type = "h",
ylab = "Effective sample size", xlab = "Time")par(mfcol = c(1, 2), mar = c(5, 4, 1, 1))
plot(f1, cov_index = 1, qlvls = c(.05, .95), qtype = "lines")
plot(f1, cov_index = 2, qlvls = c(.05, .95), qtype = "lines")set.seed(58201195)
system.time(
f2 <- PF_EM(
fixed = frm_fixed, random = ~ 1 + wz(rel_size),
data = distress, model = "logit", type = "VAR",
Fmat = Fmat, Q = Q., fixed_effects = fix_e,
control = PF_control(
N_fw_n_bw = 1000L, N_smooth = 2500L, N_first = 3000L,
N_smooth_final = 500L, n_threads = 4L, n_max = 1L, nu = 8L,
method = "AUX_normal_approx_w_cloud_mean", smoother = "Fearnhead_O_N"),
by = 1L, max_T = max(distress$tstop), id = distress$gvkey,
Q_0 = diag(1, 2)))## user system elapsed
## 840.3 177.9 272.2
logLik(f2)## 'log Lik.' -3592 (df=24)
par(mar = c(5, 4, 1, 1))
plot(head(f2$effective_sample_size$smoothed_clouds -
f1$effective_sample_size$smoothed_clouds, -1), type = "h",
ylab = "Diff effective sample size", xlab = "Time")par(mfcol = c(1, 2), mar = c(5, 4, 1, 1))
plot(f2, cov_index = 1, qlvls = c(.05, .95), qtype = "lines")
plot(f2, cov_index = 2, qlvls = c(.05, .95), qtype = "lines")set.seed(58201195)
system.time(
f3 <- PF_EM(
fixed = frm_fixed, random = ~ 1 + wz(rel_size),
data = distress, model = "logit", type = "VAR",
Fmat = Fmat, Q = Q., fixed_effects = fix_e,
control = PF_control(
N_fw_n_bw = 500L, N_smooth = 1L, N_first = 3000L, n_threads = 4L,
n_max = 1L, nu = 8L,
method = "AUX_normal_approx_w_cloud_mean", smoother = "Brier_O_N_square"),
by = 1L, max_T = max(distress$tstop), id = distress$gvkey,
Q_0 = diag(1, 2)))## user system elapsed
## 644.6 168.1 218.0
logLik(f3)## 'log Lik.' -3594 (df=24)
PF_forward_filter function can be used to get more precise log-likelihood estimate.
par(mar = c(5, 4, 1, 1))
plot(head(f3$effective_sample_size$smoothed_clouds -
f2$effective_sample_size$smoothed_clouds, -1), type = "h",
ylab = "Diff effective sample size", xlab = "Time")set.seed(58201195)
system.time(
f4 <- PF_EM(
fixed = frm_fixed, random = ~ 1 + wz(rel_size),
data = distress, model = "logit", type = "VAR",
Fmat = Fmat, Q = Q., fixed_effects = fix_e,
control = PF_control(
N_fw_n_bw = 1000L, N_smooth = 2500L, N_first = 3000L,
N_smooth_final = 500L, n_threads = 4L, n_max = 1L, nu = 8L,
method = "AUX_normal_approx_w_particles", smoother = "Fearnhead_O_N"),
by = 1L, max_T = max(distress$tstop), id = distress$gvkey,
Q_0 = diag(1, 2)))## user system elapsed
## 3812.2 582.2 1175.0
logLik(f4)## 'log Lik.' -3592 (df=24)
par(mar = c(5, 4, 1, 1))
plot(head(f4$effective_sample_size$smoothed_clouds -
f2$effective_sample_size$smoothed_clouds, -1), type = "h",
ylab = "Diff effective sample size", xlab = "Time")sapply(f4$effective_sample_size, mean) - sapply(f2$effective_sample_size, mean)## forward_clouds backward_clouds smoothed_clouds
## 1.917 2.223 -4.318
Different \(y_{it}\) and distributions \(g_t\LPV{\vect y_t}{\vect x_t}\).
Other types of state models.
\(\text{VAR}(q)\) with \(q > 1\), \(\text{VARMA}(p,q)\), restricted models, etc.Compute observed information matrix.
E.g., see Cappé, Moulines, and Ryden (2005, chap. 11) and Poyiadjis, Doucet, and Singh (2011).Change particle filter and smoothers.
Different proposal distributions (e.g., unscented transformation and adaptive importance sampling), use resample-moves, use block sampling, and more.“Nicer” API.
Better implementation of the smoother from Briers, Doucet, and Maskell (2009).
Slides are on github.com/boennecd/Talks and rpubs.com/boennecd/KU-PF-18.
More examples at github.com/boennecd/dynamichazard/tree/master/examples.
Doucet and Johansen (2011) and Givens and Hoeting (2012, chap. 6) has been used in preparation of the slides.
Briers, Mark, Arnaud Doucet, and Simon Maskell. 2009. “Smoothing Algorithms for State–space Models.” Annals of the Institute of Statistical Mathematics 62 (1): 61. doi:10.1007/s10463-009-0236-2.
Campbell, Jens, John Y. And Hilscher, and Jan Szilagyi. 2008. “In Search of Distress Risk.” The Journal of Finance 63 (6): 2899–2939. doi:10.1111/j.1540-6261.2008.01416.x.
Cappé, Olivier, Eric Moulines, and Tobias Ryden. 2005. Inference in Hidden Markov Models. Springer-Verlag New York.
Chava, Sudheer, and Robert A. Jarrow. 2004. “Bankruptcy Prediction with Industry Effects.” doi:10.2139/ssrn.287474.
Douc, R., and O. Cappe. 2005. “Comparison of Resampling Schemes for Particle Filtering.” In ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005., 64–69. doi:10.1109/ISPA.2005.195385.
Doucet, Arnaud, and Adam M Johansen. 2011. “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later.” In Handbook of Nonlinear Filtering, edited by Dan Crisan and Boris Rozovskii, 741–67. Oxford: Oxford University Press.
Fearnhead, Paul, David Wyncoll, and Jonathan Tawn. 2010. “A Sequential Smoothing Algorithm with Linear Computational Cost.” Biometrika 97 (2). [Oxford University Press, Biometrika Trust]: 447–64. http://www.jstor.org/stable/25734097.
Givens, Geof H, and Jennifer A Hoeting. 2012. Computational Statistics. 2nd ed. Wiley.
Gordon, N. J., D. J. Salmond, and A. F. M. Smith. 1993. “Novel Approach to Nonlinear/Non-Gaussian Bayesian State Estimation.” IEE Proceedings F - Radar and Signal Processing 140 (2): 107–13. doi:10.1049/ip-f-2.1993.0015.
Klaas, Mike, Mark Briers, Nando de Freitas, Arnaud Doucet, Simon Maskell, and Dustin Lang. 2006. “Fast Particle Smoothing: If I Had a Million Particles.” In Proceedings of the 23rd International Conference on Machine Learning, 481–88. ICML ’06. New York, NY, USA: ACM. doi:10.1145/1143844.1143905.
L’Ecuyer, Pierre, Richard Simard, E. Jack Chen, and W. David Kelton. 2002. “An Object-Oriented Random-Number Package with Many Long Streams and Substreams.” Operations Research 50 (6). INFORMS: 1073–5. http://www.jstor.org/stable/3088626.
Pitt, Michael K., and Neil Shephard. 1999. “Filtering via Simulation: Auxiliary Particle Filters.” Journal of the American Statistical Association 94 (446). [American Statistical Association, Taylor & Francis, Ltd.]: 590–99. http://www.jstor.org/stable/2670179.
Poyiadjis, George, Arnaud Doucet, and Sumeetpal S. Singh. 2011. “Particle Approximations of the Score and Observed Information Matrix in State Space Models with Application to Parameter Estimation.” Biometrika 98 (1). Biometrika Trust: 65–80. http://www.jstor.org/stable/29777165.
Shumway, Tyler. 2001. “Forecasting Bankruptcy More Accurately: A Simple Hazard Model.” The Journal of Business 74 (1). The University of Chicago Press: 101–24. http://www.jstor.org/stable/10.1086/209665.
Wood, Simon N., Yannig Goude, and Simon Shaw. 2015. “Generalized Additive Models for Large Data Sets.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 64 (1): 139–55. doi:10.1111/rssc.12068.
Zhou, Yan. 2015. “VSMC: Parallel Sequential Monte Carlo in C++.” Journal of Statistical Software, Articles 62 (9): 1–49. doi:10.18637/jss.v062.i09.
Comments
Smoothing step is \(\bigO{N^2}\) but independent of \(n_t = \lvert R_t \rvert\).
Can be reduced with approximate methods in Klaas et al. (2006). Yields \(\bigO{N\log N}\) run times. Alternatively, use rejection sampling if \(f\LPV{\vect x_t}{\vect x_{t-1}} / \gamma_t\LP{\vect x_t}\) is bounded.
Cannot handle singular components.