Discrete state space models in survival analysis: a particle filter application

dummy slide

Talk overview

Motivation and model.

Generalized two-filter smoother and an extension.

Example.

Motivation

Longitudinal data set up to time \(d\) where individuals are censored or cease to exist.

Expect that slopes or the intercept may vary.

Want to extrapolate beyond time \(d\).

Methods are available in the R package dynamichazard.

Example: Firm defaults

Each month we have the firms that are listed in the prior month for a total of \(d\) months.

A typical model is a logistic regression conditional on having survived up to time \(t - 1\).

Firms may be censored (merged, split etc.) or default.

The intercept may be non-constant (unobservable macro effect).

Some of the slopes may vary.

Model

Observational equation

\[ y_{it} \sim g(\cdot \vert \eta_{it}), \qquad \eta_{it} = \bea^\top\vec{x}_{it} + (\mat{R}^\top\alf_t)^\top\vec{z}_{it} + o_{it} \]

\(t=1,\dots,d\), \(i\in \mathcal{R}_t\), \(g(\cdot \vert \eta_{it})\) is from the exponential family

\(\alf_t\) is an unobservable with dynamics

\[ \begin{split} \alf_t = \mat{F}(\thet) \alf_{t-1} + \mat{R}\epslon_t, \qquad & \epslon_t \sim N(\vec{0}, \mat{Q}) \\ & \alf_0 \sim N(\vec{a}_0, \mat{Q}_0) \end{split} \]

Example: conditional piece-wise constant exponential distribution

\(T_i\in[0,\infty)\) is the stop time and likelihood when \(T_i > t - 1\)

\[ \begin{split} y_{it} &= 1_{\{T_i \in (t-1, t]\}}, \quad o_{it} = \log(\min\{t, T_i\} - (t - 1)) \\ P(y_{it} \vert \eta_{it}) &= \exp(y_{it}\eta_{it} - \exp(\eta_{it})) \end{split} \]

with AR(2) process

\[ \begin{split} \vxi_t &= \mat{R}^\top \alf_t, \quad \qquad \alf_t = (\vxi_t^\top, \vxi_{t - 1}^\top)^\top \\ \mat{R} &= \begin{pmatrix} \mat{I}_r & \mat{0} \\ \mat{0} & \mat{0} \end{pmatrix}, \qquad \mat{F} = \begin{pmatrix} \Thet_1 & \Thet_2 \\ \mat{I}_r & \mat{0} \end{pmatrix} \end{split} \]

Particle filtering and smoothing

Idea of generalized two-filter smoother

  • Run a forward particle filter.
  • Run a backward particle filter.
  • Combine the two.

Foward particle filter

Idea: sequential importance sampling

Issue

Want smoothed approximation

\[P(\alf_{(t-1):t} \vert \vec{Y}_{1:d})\]

and not

\[P(\alf_{(t-1):t} \vert \vec{Y}_{1:t})\]

Generalized two sample filter

Need backward filter approximation

\[P(\alf_t \vert \vec{Y}_{t:d}) \propto \gamma_t(\alf_t)P(\vec{Y}_{t:d} \vert \alf_t )\]

where the artificial probability distributions is

\[\begin{split} \gamma_t(\alf_t) &= \int_{\mathbb{R}^q} f(\alf_t\vert\alf_{t-1}) \gamma_{t - 1}(\alf_{t-1})d\alf_{t-1} \\ & = N(\mat{F}(\thet)^{t}\vec{a}_0, \mat{Q}_0+\mat{F}(\thet)^{t\top}\mat{Q}\mat{F}(\thet)^{t}) \end{split}\]

Combining: Briers, Doucet, and Maskell (2009)

Idea: re-weight particles using

\[\hat{w}^{(i)}_t = \frac{\overleftarrow{w}_t^{(i)}}{\gamma_t(\overleftarrow{\alf}_t^{(i)})} \sum_{j = 1}^N f( \overleftarrow{\alf}_t^{(i)} \vert \alf_{t - 1}^{(j)})w_{t - 1}^{(j)}\]

Issue: \(\bigO{N^2}\)

Combining: Fearnhead, Wyncoll, and Tawn (2010)

Sample \(i_k\) using \(\overleftarrow{w}_{t + 1}^{(1)}, \dots \overleftarrow{w}_{t + 1}^{(N)}\).

Sample \(j_k\) using \(w_{t - 1}^{(1)}, \dots, w_{t - 1}^{(N)}\).

Sample \(\hat{\alf}_t^{(k)} \sim \hat{q}(\cdot \vert \overleftarrow{\alf}_{t + 1}^{(i_k)}, \alf_{t - 1}^{(j_k)})\).

Set

\[ \hat{w}_t^{(k)} \propto \frac{ f(\hat{\alf}_t^{(k)} \vert \alf_{t-1}^{(j_k)}) g(\vec{y}_t\vert\hat{\alf}_t^{(k)}) f(\overleftarrow{\alf}_{t + 1}^{(j_k)} \vert \hat{\alf}_t^{(k)}) }{ \hat{q}(\hat{\alf}_{t + 1}^{(k)} \vert \overleftarrow{\alf}_{t + 1}^{(i_k)}, \alf_{t - 1}^{(j_k)}) \gamma_{t + 1}(\overleftarrow{\alf}_{t + 1}^{(i_k)}) } \]

Example: Randomized clinical trial

Longitudinal data with 467 patients and 188 deaths with a cohort followed for around 20 months.

  • drug: factor for the used drug.
  • AZT: whether intolerant to AZT or AZT failed.
  • prevOI: had opportunistic AIDS infection at \(t=0\).

See e.g., Guo and Carlin (2004) for an analysis.

Constant hazard model

sur_fit <- survreg(
  Surv(stop, event) ~ AZT + gender + drug + prevOI, aids,
  dist= "exponential")
summary(sur_fit)$table[, c("Value", "Std. Error", "p")]
#R>                Value Std. Error                   p
#R>  (Intercept)  4.2346     0.2897             < 1E-16
#R>  AZTfailure  -0.1649     0.1629  0.3115384496036516
#R>  gendermale   0.3388     0.2451  0.1669604211575632
#R>  drugddI     -0.2080     0.1464  0.1552720002242361
#R>  prevOIAIDS  -1.2390     0.2264  0.0000000440474652

Cox proportional hazards models

cox_fit <- coxph(
  Surv(stop, event) ~ AZT + gender + drug + prevOI, aids)
summary(cox_fit)$coefficients[, c("coef", "z", "Pr(>|z|)")]
#R>                coef       z      Pr(>|z|)
#R>  AZTfailure  0.1575  0.9635 0.33529247356
#R>  gendermale -0.3419 -1.3928 0.16367711695
#R>  drugddI     0.2170  1.4818 0.13839710893
#R>  prevOIAIDS  1.2920  5.6924 0.00000001252

Cumulative hazard function

Blue: constant hazard model. Black: non-parametric baseline hazard from cox model.

\(\bigO{N^2}\) smoother

set.seed(33706242)
system.time(
  fit_Brier <- PF_EM(
    Surv(stop, event) ~ ddFixed(AZT) + ddFixed(gender) + ddFixed(drug) +
      ddFixed(prevOI),
    aids, model = "exponential", by = .5, max_T = 20, Q = .33, Q_0 = 1,
    id = aids$patient,
    control = PF_control(
      n_threads = 4, N_fw_n_bw = 1000, N_first = 1000,
      N_smooth = 1,                  # Does not matter with Brier_O_N_square
      smoother = "Brier_O_N_square", # Select smoother
      eps = .001, n_max = 100)))
#R>     user  system elapsed 
#R>   2406.9   369.1   893.1

\(\bigO{N^2}\) smoother

Outer crosses are 90% confidence limits, inner crosses are the mean, dashed line is the log of the baseline hazard in the exponential distributed model.

\(\bigO{N^2}\) smoother

mean(fit_Brier$effective_sample_size$smoothed_clouds)
drop(sqrt(fit_Brier$Q))
fit_Brier$fixed_effects
fit_Brier$n_iter
#R>  [1] 576.3
#R>  [1] 0.1373
#R>  [1]  0.1574 -0.3055  0.2236  1.3215
#R>  [1] 55

\(\bigO{N^2}\) smoother

\(\bigO{N}\) smoother

set.seed(33706242)
system.time(
  fit_Fearn <- PF_EM(
    Surv(stop, event) ~ ddFixed(AZT) + ddFixed(gender) + ddFixed(drug) +
      ddFixed(prevOI),
    aids, model = "exponential", by = .5, max_T = 20, Q = .33, Q_0 = 1,
    id = aids$patient,
    control = PF_control(
      n_threads = 4, N_fw_n_bw = 250, N_first = 1000,
      N_smooth = 5000,            # Now it matters
      smoother = "Fearnhead_O_N", # Select smoother
      eps = .001, n_max = 100)))
#R>     user  system elapsed 
#R>   1605.0   754.8   704.4

\(\bigO{N}\) smoother

Blue: \(\bigO{N^2}\) smoother. Black: \(\bigO{N}\) smoother.

\(\bigO{N}\) smoother

mean(fit_Fearn$effective_sample_size$smoothed_clouds)
drop(sqrt(fit_Fearn$Q))
fit_Fearn$n_iter
#R>  [1] 2533
#R>  [1] 0.138
#R>  [1] 48

Two random effects

set.seed(33706242)
system.time(
  fit_Fearn_two <- PF_EM(
    Surv(stop, event) ~ ddFixed(AZT) + ddFixed(gender) + drug + ddFixed(prevOI),
    aids, model = "exponential", by = .5, max_T = 20, 
    Q = diag(.33, 2), Q_0 = diag(1, 2),
    id = aids$patient,
    control = PF_control(
      n_threads = 4, N_fw_n_bw = 250, N_first = 1000,
      N_smooth = 5000,  smoother = "Fearnhead_O_N", eps = .001, n_max = 100)))
#R>     user  system elapsed 
#R>     2916    1387    1480

Intercept

Drug dummy

Dashed line is the coefficient in the exponential distributed model.

Co-variance matrix

fit_Fearn_two$Q
#R>           [,1]     [,2]
#R>  [1,]  0.02292 -0.01339
#R>  [2,] -0.01339  0.02464
cov2cor(fit_Fearn_two$Q)
#R>          [,1]    [,2]
#R>  [1,]  1.0000 -0.5633
#R>  [2,] -0.5633  1.0000
mean(fit_Fearn_two$effective_sample_size$smoothed_clouds)
#R>  [1] 992

Comments

Ignored re-sampling, choice of proposal distribution and more. Different options are available in dynamichazard packages.

Only supports a random walk. More general models is available soon.

Code on github.com/boennecd/Talks and presentation on rpubs.com/boennecd/Nordstat2018.

References

Briers, Mark, Arnaud Doucet, and Simon Maskell. 2009. “Smoothing Algorithms for State–space Models.” Annals of the Institute of Statistical Mathematics 62 (1): 61. doi:10.1007/s10463-009-0236-2.

Fearnhead, Paul, David Wyncoll, and Jonathan Tawn. 2010. “A Sequential Smoothing Algorithm with Linear Computational Cost.” Biometrika 97 (2). [Oxford University Press, Biometrika Trust]: 447–64. http://www.jstor.org/stable/25734097.

Guo, Xu, and Bradley P. Carlin. 2004. “Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages.” The American Statistician 58 (1). [American Statistical Association, Taylor & Francis, Ltd.]: 16–24. http://www.jstor.org/stable/27643494.