Technical Note of recurrentR

Introduction

The package recurrentR implements the statistical inference publushed in the following papers:

TODO: Describe some background of non-parametric analysis of recurrent event data here.

This technical note unifies the mathematical notations in these three paper and describes the implemented mathematical formulas closely.

Data

Without loss of generality, we assume that there is a dataset of recurrent event data containing \(n\) instances. Each instance, \(i\), includes the following fields:

The field \(X_i(t)\) is required for model of (C.-Y. Y. Huang, Qin, and Wang 2010). The user could omit this field for model (M-C., Qin, and Chiang 2001) and (C.-Y. Huang and Wang 2004).

In recurrentR, the data is stored in a S4-class object: recurrent-data. The following is the structure of recurrent-data with 100 instances, named obj:

Formal class 'recurrent-data' [package "recurrentR"] with 6 slots
  ..@ W  : num [1:100, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ y  : num [1:100] 9.1729 8.8428 10 10 0.0597 ...
  ..@ t  :List of 100
  .. .. [list output truncated]
  ..@ X  :List of 100
  .. .. [list output truncated]
  ..@ T_0: num 10
  ..@ D  : logi [1:100] TRUE TRUE FALSE FALSE TRUE TRUE ...

The name of the slot is consistent to the variable name described above. For example, for instance 1:

The user could create the object with the following function:

str(create_recurrent_data)
obj <- create_recurrent_data()
obj

Usage

(M-C., Qin, and Chiang 2001)

For each instance \(i\), the occurrence of recurrent event follows a inhomogenous poisson process with the following intensity:

\[\lambda_i(t) = \lambda_0(t) z_i exp(W_i \gamma)\]

where:

In recurrentR:

library(recurrentR)
Wang2001(obj)

(C.-Y. Huang and Wang 2004)

The intensity is the same:

\[\lambda_i(t) = \lambda_0(t) z_i exp(W_i \gamma)\]

where:

Moreover, the hazard function of the censor time is modeled as

\[h_i(t) = h_0(t) z_i exp(W_i \alpha)\]

where:

Conditional on \((W_i, z_i)\), \(N_i(.)\) and \(y_i\) are independent.

library(recurrentR)
Huang2004(obj)

(C.-Y. Y. Huang, Qin, and Wang 2010)

The intensity is:

\[\lambda_i(t) = \lambda_0(t) z_i exp(X_i(t) \beta + \gamma W_i)\]

where:

Conditional on \((W_i, z_i, X_i)\), \(N_i(.)\) and \(y_i\) are independent.

library(recurrentR)
Huang2010(obj)

Implementation Details

(M-C., Qin, and Chiang 2001)

The inference are all included in the output of Wang2001.

library(recurrentR)
result <- Wang2001(obj)

Recall that \[\lambda_i(t) = \lambda_0(t) z_i exp(W_i \gamma)\] and \[\Lambda_0(t) = \int_0^t \lambda_0(u) du\]. The nonparametric maximal likelihood estimator \(\hat{\Lambda}_0(t)\) is:

\[\hat{\Lambda}_0(t) = \prod_{s_{(l)} > t}(1 - \frac{d_{(l)}}{R_{(l)}})\]

where:

The user can obtain \(\hat{\Lambda}_0(t)\):

str(result$Lambda.hat)
result$Lambda.hat(rexp(10))

The \(\hat{\gamma}\) is estimated by solving the following equation:

\[\frac{1}{n} \sum_{i=1}^n w_i \bar{W}_i^T ( \frac{m_i}{\hat{\Lambda}_0(y_i)} - exp(\bar{W}_i \bar{\gamma}) = 0 \in \mathbb{R}^{1 \times (q+1)},\]

where:

(M-C., Qin, and Chiang 2001) provides the best \(w_i\) to estimate \(\gamma\), but it involves \(\gamma\) which might produce potential instability. In recurrentR, we let \(w_i = 1\). If the instance has \(\hat{\Lambda}_0(y_i) = 0\), we will let \(\frac{m_i}{\hat{\Lambda}_0(y_i)} = 0\) as usual convention.

Let \[V(\bar{\gamma}) = \frac{1}{n} \sum_{i=1}^n {\bar{W}_i^T ( \frac{m_i}{\hat{\Lambda}_0(y_i)} - exp(\bar{W}_i \bar{\gamma})},\]

Then \[\frac{dV}{d\bar{\gamma}}(\bar{\gamma}) = \frac{-1}{n} \sum_{i=1}^n{\bar{W}_i^T \bar{W}_i exp(\bar{W}_i \bar{\gamma})}\]

The recurrentR solves \(V(\bar{\gamma}) = 0\) with the function nleqslv from the package nleqslv. The \(\hat{\bar{\gamma}}\) could be accessed by:

result$gamma.bar.hat

The \(\hat{\gamma}\) is:

result$gamma.hat

recurrentR provides both bootstrap and asymptotic variance estimator. The user could choose one by passing the argument method in function Huang2001.

result <- Wang2001(obj, method = "asymptotic")
result <- Wang2001(obj, method = "bootstrap")
result$gamma.bar.hat.var
result$gamma.hat.var
str(result$Lambda.hat.var)

The default method is (TODO)

To calculate the asymptotic variance of \(\hat{\gamma}\) and \(\hat{\Lambda}_0(t)\), we need the following formulas given \(\hat{\bar{\gamma}}\) and \(\hat{\Lambda}_0(t)\):

According to (M-C., Qin, and Chiang 2001), the asymptotic variacne of \(\hat{\gamma}\) is \(\frac{1}{n} \sum_{i=1}^n{\hat{f}_i(\hat{\gamma})}\). The asymptotic variance of \(\hat{\Lambda}_0(t)\) is \(\frac{\hat{\Lambda}_0(t)}{n} \sum_{i=1}^n{\hat{b}_i(t)^2}\)

(C.-Y. Huang and Wang 2004)

The estimator and asymptotic variance related to \(\Lambda_0\) and \(\gamma\) are the same as the one in (M-C., Qin, and Chiang 2001). To obtain the estimator of \(\alpha\) and \(H_0(t) = \int_0^t h(u) du\), we need the estimator of random effect \(z_i\) first:

\[\hat{z}_i = \frac{m_i}{\hat{\Lambda}_0(y_i) exp(W_i \hat{\gamma)}}.\]

Let \[U(\alpha) = \frac{1}{n} \sum_{i=1}^n {D_i W_i^T \frac{\sum_{j=1}^n{W_j^T \hat{z}_j exp(W_j \hat{\alpha}) I(y_j \geq y_i)}}{\sum_{j=1}^n{\hat{z}_j exp(W_j \hat{\alpha})I(y_j \geq y_i)}} },\]

Then \(\hat{\alpha}\) is the one satisfies \(U(\hat{\alpha}) = 0\).

Moreover, Let \[\Gamma(\alpha) = \frac{dU}{d\alpha}(\alpha) = \frac{1}{n} \sum_{i=1}^n{D_i(-\frac{\sum_{j=1}^n{W_j^2 \hat{z}_j exp( W_j \alpha ) I(y_j \geq y_i)}}{\sum_{j=1}^n{\hat{z}_j exp( W_j \alpha ) I(y_j \geq y_i) }} + \frac{(\sum_{j=1}^n{W_j \hat{z}_j exp( W_j \alpha ) I(y_j \geq y_i) })^2}{(\sum_{j=1}^n{\hat{z}_j exp( W_j \alpha ) I(y_j \geq y_i)})^2})},\]

Then we can solve \(\hat{\alpha}\) with nleqslv again. Note that \(a^2\) is the convention of \(a^T a\) if \(a\) is a vector.

With \(\hat{\alpha}\), the \(\hat{H}_0(t)\) will be:

\[\hat{H}_0(t) = \sum_{i=1}^n{D_i I(y_i \leq t) \frac{1}{\sum_{j=1}^n{\hat{z}_j exp(W_j \alpha) I(y_j \geq y_i)}}}.\]

result <- Huang2004(obj)
result$alpha.hat
str(result$H0.hat)

To evaluate the asymptotic variance, we need:

According to (C.-Y. Huang and Wang 2004), the estimator of asymptotic variance of \(\alpha\) will be:

\[\frac{1}{n} \Gamma(\hat{\alpha})^{-1} \hat{\Sigma}(\hat{\alpha}) \Gamma(\hat{\alpha})^{-1}.\]

For the asymptotic variance of \(\hat{H}_0(t)\), we need

Then the estimator of asymptotic variance of \(\hat{H}_0(t)\) is the sample variance of \(\frac{1}{n} \phi_i(t)\).

Huang2010

Recall that the intensity is:

\[\lambda_i(t) = z_i \lambda_0(t) exp(X_i(t) \beta + W_i \gamma)\]

The estimator of \(\hat{\beta}\) does not involve \(W_i\) and \(\gamma\).

The derivative of logged pairwise pseudolikelihood is:

\[g_{i,j}(\beta) = \sum_{k=1}^{m_i}{ \sum_{l=1}^{m_j}{ I(t_{i,k} \leq y_{i,j}) I(t_{j,l} \leq y_{i,j}) \rho_{i,j}(t_{i,k}, t_{j,l}) \frac{- exp(\rho_{i,j}(t_{i,k}, t_{j,l}) \beta)}{1 + exp(\rho_{i,j}(t_{i,k}, t_{j,l}) \beta)} } },\]

where

Let \[S(\beta) = \frac{1}{\left(\begin{array}{c} n \\ 2 \end{array}\right)} \sum_{i < j}{g_{i,j}(\beta)},\]

Then \[\frac{dS}{d\beta}(\beta) = \sum_{i<j} {\frac{dg_{i,j}}{d\beta}(\beta)} ,\]

where:

The \(\hat{\beta}\) is the one satisfies \(S(\beta) = 0\).

To obtain the asymptotic variance, we need:

Recall that in (M-C., Qin, and Chiang 2001), the \(\hat{\Lambda}_0(t)\) is based on:

\[\hat{\Lambda}_0(t) = \prod_{s_{(l)} > t}(1 - \frac{d_{(l)}}{R_{(l)}})\]

where:

To correct the effect of time-dependent covariates \(X(t)\) and \(\beta\), we let

\[\hat{\Lambda}_0(t, \beta) = \prod_{s_{(l)} > t}(1 - \frac{d_{(l)}(\beta)}{R_{(l)}(\beta)}),\]

where:

Note that \(d_{(l)}(0) = d_{(l)}\) and \(R_{(l)}(0) = R_{(l)}\).

The asymptotic variance of \(\hat{\Lambda}_0(t, \beta)\) will be

\[4 \Lambda_0(t)^2 E\left[\kappa_{1,2} (t, \beta) \kappa_{1, 3}( t, \beta)\right].\]

The estimator of \(\hat{\gamma}\) is the root of the following equation:

\[\frac{1}{n} \sum_{i=1}^n{\bar{W}_i \left[\frac{m_i}{\sum_{s_{(l)} \leq y_i}{exp(X_i(s_{(l)}) \hat{\beta})(\hat{\Lambda}_0(s_{(l)}) - \hat{\Lambda}_0(s_{(l-1)})})} - exp(\bar{W}_i \bar{\gamma})\right]},\]

where:

The asymptotic variance of \(\hat{\gamma}\) will be:

\[E(\frac{d\xi}{d\gamma})^{-1} \Sigma E(\frac{d\xi}{d\gamma})^{-1}\].

To obtain the definition of \(\kappa\), we need:

Then

\[\hat{\kappa}_{i,j}(t) = \hat{\phi}_{i,j}(t) + \frac{\hat{\psi}_i(t) + \hat{\psi}_j(t)}{2}\]

The estimator of asymptotic variance of \(\hat{\Lambda}_0(t)\) will be

\[4 \hat{\Lambda_0}(t)^2 \left(\frac{2}{n(n-1)(n-2)} \sum_{i=1}^n{ \sum_{j \neq i, k \neq i, j < k}{ \hat{\kappa}_{i,j}(t)\hat{\kappa}_{i,k}(t) } }\right)\]

The asymptotic variance of \(\hat{\gamma}\) involves:

Therefore, the estimator of asymptotic variance of \(\hat{\bar{\gamma}}\) is:

\[TODO\]

Reference

Huang, C.-Y. Y., J. Qin, and M.-C. C. Wang. 2010. “Semiparametric analysis for recurrent event data with time-dependent covariates and informative censoring.” Biometrics 66 (1) (mar 12): 39–49. doi:10.1111/j.1541-0420.2009.01266.x. http://dx.doi.org/10.1111/j.1541-0420.2009.01266.x.

Huang, Chiung-Yu, and Mei-Cheng Wang. 2004. “Joint Modeling and Estimation for Recurrent Event Processes and Failure Time Data.” Journal of the American Statistical Association 99: 1153–1165. http://EconPapers.repec.org/RePEc:bes:jnlasa:v:99:y:2004:p:1153-1165.

M-C., Wang, J. Qin, and C.-T. Chiang. 2001. “Analyzing Recurrent Event Data With Informative Censoring.” Journal of the American Statistical Association 96: 1057–1065. http://EconPapers.repec.org/RePEc:bes:jnlasa:v:96:y:2001:m:september:p:1057-1065.