Multiply robust estimation in causal survival analysis with treatment noncompliance

Introduction

This vignette demonstrates the use of the R package mrPStrata to assess principal survival causal effects using the multiply robust estimator that developed in Cheng et al (2023). The mrPStrata package is available at https://github.com/chaochengstat/mrPStrata and can be installed as follows

devtools::install_github("chaochengstat/mrPStrata")

Definition: Principal Survival Causal Effect

Let $X$ denote a vector of baseline covariates, $Z\in{0,1}$ denote the treatment assignment status (1, treated; 0, control), $S\in{0,1}$ denote the actual treatment receipt status, and $T$ denote a failure outcome of interest. Usually, $T$ is partially observed due to right censoring at time $C$, and we only observe $\{U,\delta\}$ instead of $U$, where $U=\{T,C\}$ is the observed failure time and $\delta=\mathbb{I}(T\leq C)$ is a censoring indicator. Let $T(z)$ and $S(z)$ be the potential values of $T$ and $S$, respectively, under treatment assignment status $Z=z$. We can define four principal strata based on the joint potential values of $S$, denoted by $G=(S(1),S(0))$:

Principal Stratum ($G$)	group name	abbreviation
$(1,1)$	always takers	a
$(1,0)$	compliers	c
$(0,0)$	never takers	n
$(0,1)$	defiers	d

We assume standard monotonicity to rule out the defiers stratum. For a given principal strata $g\in\{a,c,n\}$, we aim to estimate the following principal survival causal effect (PSCE): \[ \text{PSCE}_g(u) =\mathbb{P}(T(1)\geq u|g) - \mathbb{P}(T(0)\geq u|g) =: \mathcal S_{1,g}(u) - \mathcal S_{0,g}(u) \] where $\mathcal S_{z,g}(u)$ is the counterfactual survival probability of the outcome at a given time $u$ under treatment assignment $Z=z$, within principal stratum $g$. In this package, we shall follow the results in Cheng et al (2023), which leverage the principal ignorability assumption along with the monotonicity to identify PSCE.

Multiply robust estimation

In Cheng et al (2023), we develop a multiply robust estimator of the principal survival causal effects: \[ \widehat{\text{PSCE}}_g^{\text{mr}}(u) = \widehat{\mathcal S}_{1,g}^{\text{mr}}(u) - \widehat{\mathcal S}_{0,g}^{\text{mr}}(u) \] We specify the following parametric models for the four nuisance functions $\mathbb{P}(Z=1|X)$, $\mathbb{P}(S=1|Z,X)$, $\mathbb{P}(C\geq t|Z,S,X)$, and $\mathbb{P}(T\geq t|Z,S,X)$:

Index	Nuisance function	name	parametric model
1	$\mathbb{P}(Z=1\|X)$	propensity score	logistic regression
2	$\mathbb{P}(S=1\|Z,X)$	principal score	logistic regression
3	$\mathbb{P}(C\geq t\|Z,S,X)$	conditional survival function of the censoring time	Cox proportional hazard model
4	$\mathbb{P}(T\geq t\|Z,S,X)$	conditional survival function of the failure time	Cox proportional hazard model

It should be noted that, in the principal score model, we fit two logistic regressions adjusting for $X$, conditioned on subsamples with $Z=1$ and $Z=0$, respectively. Similarly, for the conditional survival functions of censoring time and failure time, we fit four Cox models adjusting for $X$, conditioned on subsamples with $(Z,S) \in \{(1,1),(1,0),(0,1),(0,0)\}$, respectively.

The multiply robust estimator is consistent under three types of misspecifications of the parametric models. Specifically, we show $\widehat{\mathcal S}_{z,g}^{\text{mr}}(u)$ (and $\widehat{\text{PSCE}}_g^{\text{mr}}(u)$) is consistent under either one of the following three scenarios regarding correct ($\checkmark$) specification of parametric working models:

Nuisance model	Scenario 1	Scenario 2	Scenario 3
$\mathbb{P}(Z=1\|X)$	$\checkmark$	$\checkmark$
$\mathbb{P}(S=1\|Z,X)$	$\checkmark$		$\checkmark$
$\mathbb{P}(C\geq t\|Z,S,X)$	$\checkmark$
$\mathbb{P}(T\geq t\|Z,S,X)$		$\checkmark$	$\checkmark$

Basic Syntax

The data-fitting function is mrPStrata:

mrPStrata(times,data,Xpi_names,Xe_names,Xc_names,Xt_names,Z_name,S_name,U_name,delta_name,B)

We require input the following arguments:

times: a vector of time when the principal survival causal effects (PSCEs) are of interest
data: the dataset
Xpi_names: names of the covariates for the propensity score model $\mathbb{P}(Z=1|X)$
Xe_name: names of the covariates for the principal score model $\mathbb{P}(S=1|Z,X)$
Xc_names: names of the covariates for the censoring model $\mathbb{P}(C\geq t|Z,S,X)$
Xt_names: names of the covariates for the outcome model $\mathbb{P}(T\geq t|Z,S,X)$
Z_name: treatment assignment
S_name: treatment received status
U_name: observed failure time
delta_name: censoring time
B: number of iterations for the bootstrap confidence interval (default: 100)

The output of mrPStrata includes the point estimate and 95% confidence interval for $\{\text{PSCE}_g(u),\mathcal S_{z,g}(u)\}$ for all $z\in\{0,1\}$ and $g\in\{a,c,n\}$. Let res denote an output from mrPStrata(), then one can check the PSCE among the alway takers startum (and its corresponding counterfactual survival probabilities) by running

print(res$Always_Takers)

Similarly, one can check the compliers and never takers strata by running print(res$Compliers) and print(res$Never_Takers), respectively. In this package, we also build the plot.psce function to visualize the results, where one can simply run plot.psce(res) to have a direct comparison for the principal causal effect across different strata.

An Illustrative Example

Please library the mrPStrata package if needed.

library("mrPStrata")

Data illustration

In this example, we shall use a simulated dataset with $n=2000$ observations.

attach(sim_data)
head(sim_data)
#>            U delta z s X1         X2          X3         X4         X5
#> 1  1.1335182     0 0 0  1 -0.5945615 -0.40542519 -0.6464966 -0.8356304
#> 2  3.7248385     0 1 1  1  0.5997737 -0.01451717 -0.6402716 -0.9997893
#> 3  5.4627463     1 0 0  0  0.6356863  0.53683525 -0.5959029 -0.7118079
#> 4 12.0053375     1 0 0  1 -0.1404592 -0.16356531 -0.9802712 -0.9732464
#> 5  0.1378012     1 0 1  0  1.0622730 -0.20100149  0.1284239 -0.9595984
#> 6  0.9024529     1 0 1  1  1.4707985  0.49934679  1.1632484 -0.7506528

In the sim_data dataset, we include five baseline covariates X1, X2, X3, X4 and X5, the treatment assignment status z, the treatment received status s, the observed failure time U, and the censoring indicator delta

Implement the multiply robust estimators

Below we obtain the multiply robust estimator of the PSCEs at time points $u=\{1,2,3,4,5,6,7,8\}$. For illustrative purposes, we set B=50 for 50 bootstrap replications.

res = mrPStrata(times=c(1,2,3,4,5,6,7,8),
                data = sim_data,
                Xpi_names = c("X1","X2","X3","X4","X5"),
                Xe_names = c("X1","X2","X3","X4","X5"),
                Xc_names = c("X1","X2","X3","X4","X5"),
                Xt_names = c("X1","X2","X3","X4","X5"),
                Z_name = "z",
                S_name = "s",
                U_name ="U",
                delta_name = "delta",
                B=50)

Below, one can extract the results among the compliers stratum by running

print(res$Compliers)
#> $S_c_1
#>   time      point    CI_lower   CI_upper
#> 1    1 0.54556434 0.496524219 0.60646557
#> 2    2 0.30619816 0.263107429 0.35441785
#> 3    3 0.17289257 0.139106918 0.21664142
#> 4    4 0.10036545 0.066090638 0.13037886
#> 5    5 0.06742639 0.042585755 0.08839895
#> 6    6 0.05386264 0.028984821 0.07619549
#> 7    7 0.03524099 0.010836964 0.05614135
#> 8    8 0.02386368 0.009338725 0.03949698
#> 
#> $S_c_0
#>   time     point   CI_lower  CI_upper
#> 1    1 0.7108038 0.65035549 0.7661776
#> 2    2 0.5191557 0.47211421 0.5625380
#> 3    3 0.4026167 0.35533732 0.4369692
#> 4    4 0.3194536 0.27323395 0.3582388
#> 5    5 0.2361361 0.20572037 0.2651410
#> 6    6 0.1896117 0.16102060 0.2239844
#> 7    7 0.1517459 0.12434052 0.1816480
#> 8    8 0.1055906 0.08131249 0.1362517
#> 
#> $PSCE_c
#>   time       point   CI_lower    CI_upper
#> 1    1 -0.16523942 -0.2328166 -0.09662093
#> 2    2 -0.21295752 -0.2847917 -0.16388868
#> 3    3 -0.22972415 -0.2793199 -0.16507932
#> 4    4 -0.21908815 -0.2679018 -0.15290810
#> 5    5 -0.16870976 -0.2069352 -0.12828986
#> 6    6 -0.13574906 -0.1745459 -0.08925248
#> 7    7 -0.11650494 -0.1557285 -0.08402863
#> 8    8 -0.08172697 -0.1144102 -0.05253458

Similarly, one can try print(res$Always_Takers) and print(res$Never_Takers) to find the results among other strata.

Visualize the results

One can run the followng code to visualize the principal survival causal effects across the three principal strata:

plot.psce(res)

Sensitivity analysis

The analysis of the PSCE relies on two critical assumptions (i) principal ignorability and (ii) monotonicity, where (i) assumes sufficient covariates being collected so that there are no unmeasured confounding between the participants’ compliance behavior and their outcome, and (ii) rules out the defiers stratum. In Cheng et al (2023), we develop a sensitivity analysis framework to evaluate robustness of the PSCE estimation when either assumption (i) or (ii) is violated. Specificially, we develop bias-corrected multiply robust estimator that leverages sensitivity parameters to correct the bias due to departure of the assumption.

Sensitivity analysis for the principal ignorability

Under violation of principal ignorability, the bias-corrected estimator is referred to as $\widehat{\text{PSCE}}_g^{\text{mr-pi}}(u) = \widehat{\mathcal S}_{1,g}^{\text{mr-pi}}(u) - \widehat{\mathcal S}_{0,g}^{\text{mr-pi}}(u)$, which depends on four sensitivity parameters $\{\xi_1,\xi_0,\eta_1,\eta_0\}$ embedded in two sensitivity functions: \[\begin{align*} \epsilon_1(t,X) & := \frac{\mathbb{P}(T(1)\geq t|G=c,X)}{\mathbb{P}(T(1)\geq t|G=a,X)}= \exp\left[\xi_1\times \left(\frac{t}{t_{\text{max}}}\right)^{\eta_1}\right], \\ \epsilon_0(t,X) & := \frac{\mathbb{P}(T(0)\geq t|G=c,X)}{\mathbb{P}(T(0)\geq t|G=n,X)}= \exp\left[\xi_0\times \left(\frac{t}{t_{\text{max}}}\right)^{\eta_0}\right], \end{align*}\] where $t_{\text{max}}$ is maximum time that the PSCEs are of interest (i.e., the largest value in the times argument).

To obtain $\widehat{\text{PSCE}}_g^{\text{mr-pi}}(u)$, we can use the mrPStrata_PI_SA function:

mrPStrata_PI_SA(times,data,Xpi_names,Xe_names,Xc_names,Xt_names,Z_name,S_name,U_name,delta_name,xi_0,xi_1,eta0,eta1,B)

The arguments used in mrPStrata_PI_SA is analogous to these in mrPStrata, where the only difference is that mrPStrata_PI_SA allows us to specify the values of $\xi_0$, $\xi_1$, $\eta_0$, $\eta_1$ in the arguments xi0, xi1, eta0, and eta1, respectively. After obtaining the estimation, one can still use print(res$Always_Takers), print(res$Compliers), and print(res$Never_Takers) to print the results, and use plot.psce(res) to visualize the results.

Below is an illustrative example with the simulated dataset sim_data. Assuming $\xi_0=0.05$, $\xi_1=0.05$, $\eta_0=1$, $\eta_1=1$, the bias-corrected estimation of PSCEs is

res = mrPStrata_PI_SA(times=c(1,2,3,4,5,6,7,8),
                      data = sim_data,
                      Xpi_names = c("X1","X2","X3","X4","X5"),
                      Xe_names = c("X1","X2","X3","X4","X5"),
                      Xc_names = c("X1","X2","X3","X4","X5"),
                      Xt_names = c("X1","X2","X3","X4","X5"),
                      Z_name = "z",
                      S_name = "s",
                      U_name ="U",
                      delta_name = "delta",
                      xi0 = 0.05,
                      xi1 = 0.05,
                      eta0=1,
                      eta1=1,
                      B=50)

The plot of the bias-corrected PSCE estimation is

plot.psce(res)

Sensitivity analysis for the monotonicity

Defiers exist when monotonicity is violated. Under depature from monotinicity, the bias-corrected estimator is referred to as $\widehat{\text{PSCE}}_g^{\text{mr-mo}}(u) = \widehat{\mathcal S}_{1,g}^{\text{mr-mo}}(u) - \widehat{\mathcal S}_{0,g}^{\text{mr-mo}}(u)$, which depend on the sensitivity parameter $\zeta$, which quantifies the degree of depature from the monotinicity assumption \[ \zeta :=\frac{\mathbb{P}(G=d|X)}{\mathbb{P}(G=c|X)} \] To obtain $\widehat{\text{PSCE}}_g^{\text{mr-mo}}(u)$, we can use the mrPStrata_MO_SA function:

mrPStrata_MO_SA(times,data,Xpi_names,Xe_names,Xc_names,Xt_names,Z_name,S_name,U_name,delta_name,zeta,B)

The arguments used in mrPStrata_MO_SA is analogous to these in mrPStrata, where the only difference is that mrPStrata_MO_SA allows us to specify the values of $\zeta$ in the argument zeta. After obtaining the estimation, one can still use print(res$Always_Takers), print(res$Compliers), print(res$Never_Takers), and print(res$Defiers) to print out the results, and use plot.psce(res) to visualize the results.

Below is an illustrative example with the simulated dataset sim_data. Assuming $\zeta=0.02$, the bias-corrected estimation of PSCEs is

res = mrPStrata_MO_SA(times=c(1,2,3,4,5,6,7,8),
                      data = sim_data,
                      Xpi_names = c("X1","X2","X3","X4","X5"),
                      Xe_names = c("X1","X2","X3","X4","X5"),
                      Xc_names = c("X1","X2","X3","X4","X5"),
                      Xt_names = c("X1","X2","X3","X4","X5"),
                      Z_name = "z",
                      S_name = "s",
                      U_name ="U",
                      delta_name = "delta",
                      zeta=0.02,
                      B=50)