dynamichazard Dynamic Hazard Models using State Space Models

Benjamin Christoffersen
28/07/2017

Presentation

  • Model
  • Hard disk failures
  • Estimation methods
  • End comments

Presentation: Focus

Survival analysis with time-varying coefficients

Continuous time model

Fast and scalable

Model

Individual \( i = 1,\dots,n \)

Failure times \( T_1,\dots,T_n\in \mathbb{R}_+ \)

Timer intervals \( t = 1, \dots, d \)

Binary outcomes \( y_{it} = \left\{ \begin{matrix} 1 & T_i \in (t-1, t] \\ 0 & \text{otherwise} \end{matrix}\right. \)

Censoring indicators \( D_{i1},\dots,D_{i2},\dots \) (one if censored)

Covariates \( \vec{x}_{i1}, \vec{x}_{i2}, \dots \)

Model

Observational model

\[ \begin{array}{c} \proppCond{Y_{it} = 1}{ \vec{\alpha}_t} = h(\vec{\alpha}_t^\top \vec{x}_{it}) \\ h(\eta) = 1 / (1 + \exp(-\eta)) \end{array} \]

State model

\[ \begin{array}{ll} \vec{\alpha}_{t + 1} = \mat{F}\vec{\alpha}_t + \mat{R}\vec{\eta}_t \qquad & \vec{\eta}_t \sim N(\vec{0}, \mat{Q}) \\ & \vec{\alpha}_{0} \sim N(\vec{a}_0, \mat{Q}_0) \end{array} , \qquad t = 1,\dots, d \]

  1. and 2. order random walk

Hard disk failures

BackBlaze logo

Storage provider

More than 65000 hard disks Today

3 years of data (2013Q2‒2016Q3)

84366 observations and 623268 data rows

Vault image

Hard disk failures

hd_dat[
  1:10, c("serial_number", "model", "tstart", 
          "tstop", "fails", "smart_12")]
    serial_number        model tstart tstop fails smart_12
505      5XW004AJ ST31500541AS   30.0  40.0     0        0
506      5XW004AJ ST31500541AS   40.0  43.2     0       24
507      5XW004AJ ST31500541AS   43.2  56.9     0       25
508      5XW004Q0 ST31500541AS   40.6  51.0     0        0
509      5XW004Q0 ST31500541AS   51.0  53.7     0       54
510      5XW004Q0 ST31500541AS   53.7  54.1     0       56
511      5XW004Q0 ST31500541AS   54.1  54.4     0       57
512      5XW004Q0 ST31500541AS   54.4  54.5     0       58
513      5XW004Q0 ST31500541AS   54.5  54.7     0       59
514      5XW004Q0 ST31500541AS   54.7  57.2     0       61

Hard disk failures

Number of hard disks Number of failures
ST4000DM000 36286 1556
HMS5C4040BLE640 9333 40
HMS5C4040ALE640 7162 93
ST8000DM002 5155 17
HDS722020ALA330 4774 230
ST3000DM001 4698 1701

Top 6 hard disk versions by number of disks

Estimation methods: Log likelihood

\[ L(\mat{Q},\mat{Q}_0, \vec{a}_0) = p(\vec{\alpha}_0)\prod_{t = 1}^d \proppCond{\vec{\alpha}_t}{\vec{\alpha}_{t-1}} \prod_{i \in R_t} \proppCond{y_{it}}{\vec{\alpha}_t} \]

Risk set \( R_t = \Lbrace{i \in \{1,\dots,n\}: D_{it} = 0} \)

Estimation methods: Log likelihood

\[ \begin{aligned} \log L \Lparen{\mat{Q},\mat{Q}_0, \vec{a}_0} = & - \frac{1}{2} \Lparen{\vec{\alpha}_0 - \vec{a}_0}^\top \mat{Q}^{-1}_0\Lparen{\vec{\alpha}_0 - \vec{a}_0} \\ & - \frac{1}{2} \sum_{t = 1}^d \Lparen{\vec{\alpha}_t - \mat{F}\vec{\alpha}_{t - 1}}^\top\mat{R}^\top\mat{Q}^{-1}\mat{R}\Lparen{\vec{\alpha}_t - \mat{F}\vec{\alpha}_{t - 1}} \\ & - \frac{1}{2} \log \deter{\mat{Q}_0} - \frac{1}{2d} \log \deter{\mat{Q}} \\ & + \sum_{t = 1}^d \sum_{i \in R_t} l_{it}({\vec{\alpha}_t}) + \dots \end{aligned} \]

Estimation methods: Log likelihood

E-step

Find smoothed estimates of \( \vec{\alpha}_0,\dots, \vec{\alpha}_d \) and smoothed covariance matrices given \( \mat{Q} \), \( \mat{Q}_0 \) and \( \vec{a}_0 \)

M-step

Update \( \mat{Q} \) and \( \vec{a}_0 \)

References

Due to Fahrmeir (1992) and Fahrmeir (1994). Like Shumway and Stoffer (1982)

Estimation methods: E-step

Filtering

  • Extended Kalman Filter (EKF)
  • Unscented Kalman filter
  • Sequential approximation of posterior modes
  • Mode estimation

Smoothing

Rauch-Tung-Striebel algorithm

Estimation methods: Kalman Filter (KF)

\[ \emNotee{\vec{a}}{t}{s} = \expecpCond{\vec{\alpha}_t}{\vec{y}_1,\dots,\vec{y}_s}, \qquad \emNotee{\mat{V}}{t}{s} = \expecpCond{\mat{V}_t}{\vec{y}_1,\dots,\vec{y}_s} \]

Prediction step

Compute \( \emNotee{\vec{a}}{t}{t - 1} \) & \( \emNotee{\mat{V}}{t}{t - 1} \) with \( \emNotee{\vec{a}}{t - 1}{t - 1} \) and \( \emNotee{\mat{V}}{t - 1}{t - 1} \)

Correction step

Compute \( \emNotee{\vec{a}}{t}{t} \) & \( \emNotee{\mat{V}}{t}{t} \) with \( \emNotee{\vec{a}}{t}{t-1} \), \( \emNotee{\mat{V}}{t}{t - 1} \) and \( \vec{y}_t \)

Estimation methods: Extended Kalman Filter

Same prediction step as KF

Make 1. order Taylor expansion around \( \emNotee{\vec{a}}{t}{t - 1} \)

Same as one Fisher scoring step in:

\[ \begin{array}{c} \emNotee{\vec{a}}{t}{t} = \argminu{\vec{\alpha}} - \log \proppCond{\vec{\alpha}}{\emNotee{\vec{a}}{t}{t-1}, \emNotee{\mat{V}}{t}{t-1}} -\sum_{i \in R_t} \log\proppCond{y_{it}}{\vec{\alpha}} \\ % \vec{\alpha} \sim N\Lparen{\emNotee{\vec{a}}{t}{t-1}, \emNotee{\mat{V}}{t}{t-1}} \end{array} \]

Essentially L2 penalized generalized linear models

Estimation methods: mode approximation

Mode approximation: Minimize directly using Newton Raphson

EKF with extra iteration: More iterations using working responses as glm

Example: Only factors

library(dynamichazard)

frm <- Surv(tstart, tstop, fails) ~ -1 + model

system.time(
  ddfit <- ddhazard(
    formula = frm, data = hd_dat,
    id = hd_dat$serial_number,
    Q_0 = diag(1, 17), Q = diag(.01, 17),
    by = 1, max_T = 60,
    control = list(
      method = "EKF",
      eps = .005,
      NR_eps = .00001)))
   user  system elapsed 
   17.0     1.2    10.9 

Example: Only factors

plot of chunk plot_only_factors

Example: Only factors

plot of chunk plot_only_factors_1

See this blog post for explanation

Example: Slope

Storage pod

Example: Slope

frm <- Surv(tstart, tstop, fails) ~ -1 + model + smart_12 : model_large

Example: Slope

plot of chunk est_w_slope_est_n_plot

Example: Slope

  • First failure: Replace and rebuilt
  • Extra test if another fails doing rebuilt
  • Removal due to extra tests are not recorded!

Example: Slope

Failure status

Example: Slope

plot of chunk est_w_slope_manual_check_n_xtra_plot

Other options

  • S3 methods for plots, residuals, predict etc.
  • Second order random walk
  • Bootstrap
  • Fixed effects
  • Continuous time observational model

Slides at rpubs.com/boennecd/EMS17 and code at github/boennecd/Talks

Demo

dynamichazard::ddhazard_app()

References

[1] L. Fahrmeir. “Dynamic Modelling and Penalized Likelihood Estimation for Discrete Time Survival Data”. In: Biometrika 81.2 (1994), pp. 317-330.

[2] L. Fahrmeir. “Posterior Mode Estimation by Extended Kalman Filtering for Multivariate Dynamic Generalized Linear Models”. In: Journal of the American Statistical Association 87.418 (1992), pp. 501-509.

[3] R. H. Shumway and D. S. Stoffer. “An Approach to Time Series Smoothing and Forecasting Using the Em Algorithm”. In: Journal of Time Series Analysis 3.4 (1982), pp. 253-264. ISSN: 1467-9892. DOI: 10.1111/j.1467-9892.1982.tb00349.x. .