This vignette demonstrates the use of the R-package
survmed (available at this github page) to
obtain bias-corrected estimates of the mediation effect measures in
survival mediation analysis, under a main study/validation study design.
We follow the methodology in Cheng et al (2023, arXiv:2304.04868).
Suppose we have an exposure (\(A\)), a mediator (\(M\)), a survival outcome \(\widetilde T\), and a set of baseline covariates (\(W\)). We are interested in assessing the natural indirect effect \(\text{NIE}_{a,a^*|w}(t)\) and the natural direct effect \(\text{NDE}_{a,a^*|w}(t)\). Here, \(\text{NIE}_{a,a^*|w}(t)\) measures that, at a given time \(t\), how much the log-hazard of the outcome would change if we set the exposure to a fixed level \(a\) and change the mediator from the level it would be observed under \(a^*\) versus \(a\). On the other hand, \(\text{NDE}_{a,a^*|w}(t)\) measures how much the log-hazard of the outcome at time \(t\) would change if we change the exposure from level \(a^*\) to \(a\) and fix the mediator to its natural value under exposure level \(a^*\). Meanwhile, we are also interested in estimating the total effect and mediation proportion: \[\begin{align*} \text{TE}_{a,a^*|w}(t) &=\text{NIE}_{a,a^*|w}(t)+\text{NDE}_{a,a^*|w}(t)\\ \text{MP}_{a,a^*|w}(t)& =\text{NIE}_{a,a^*|w}(t)/\text{TE}_{a,a^*|w}(t) \end{align*}\] All effects are conditioning on covariates level at \(W=w\).
We have two sets of expressions to calculate the mediation effect measures – the approximate expression versus the exact expression. The exact expression provides precise formulas to calculate the mediation effects, but they have complex forms. When the outcome is rare, one can consider using the approximate expression with concise formulas. Importantly, the mediation effect measures under the approximate expression are constant over time \(t\), i.e., \(\tau_{a,a^*|w}(t)=\tau_{a,a^*|w}\) for all \(\tau\in\{\text{NIE},\text{NDE},\text{TE},\text{MP}\}\).
In this work, we assume that the exposure \(A\) is mismeasured in the study and therefore may bias our survival mediation results.
Recall that \(W\), \(A\), \(A^*\), \(M\), \(T\), and \(\Delta\) represent the baseline covariates, true exposure, mismeasured exposure, mediator, observed failure time, and failure indicator, respectively. In the main study (samples with \(I=0\)), we observe \(\{W,A^*,M,T,\Delta\}\), where the true exposure is unavailable. In the validation study (samples with \(I=1\)), we observe \(\{W,A,A^*,M,T\}\), which can be used to estimate the exposure measurement error process.
The data-fitting functions are mediate_approx and
mediate_exact, where the first performs mediation analysis
based on approximate expressions and the second performs mediation
analysis based on exact expressions. For each function, the output
includes four estimators, including an unadjusted estimator that ignores
measurement error, and three bias-corrected estimators (based on ORC1,
ORC2, and RRC approaches).
We can call mediate_approx by
mediate_approx(data,Iname,Tname,Dname,Aname,Asname,Mname,Wname,Time_sep,a0,a1,cw)
The following arguments are required:
data: a data.frame data setIname: name of the indicator on the main/validation
study (I=1 for validation study sample, I=0
for main study samples)Tname: name of observed failure timeDname: name of the failure indicatorAname: name of the true exposureAsname: name of the mismeasured exposureMname: name of the mediatorWname: names of the covariatesTime_sep: a vector of time split points for RRC
methoda0: baseline exposure level in the mediation effect
measuresa1: active exposure level in the mediation effect
measurescw: levels of the covariatesSimilarly, one can call mediate_exact by
mediate_exact(data,Iname,Tname,Dname,Aname,Asname,Mname,Wname,Time_sep,a0,a1,cw)
Besides the arguments in mediate_approx,
mediate_exact requires two additional arguments:
t.list: a list of time points for the mediation effect
measuresR: number of bootstrap numbers (default 50)Please library the survmed and survival
packages at first.
We shall use the example data example_data for
illustration. First, let’s attach the dataset.
attach(example_data)
head(example_data)
#> A M Time Event As W I
#> 1 NA -0.530 50.0 0 0.779 0.281 0
#> 2 NA 1.369 50.0 0 -0.548 -0.281 0
#> 3 NA -0.039 2.8 0 0.199 -0.608 0
#> 4 NA -0.653 45.4 0 0.549 0.044 0
#> 5 NA -0.150 13.7 0 0.076 0.390 0
#> 6 NA 1.113 50.0 0 -0.524 -0.390 0Here, A is the true exposure, M is the
mediator, Time is the observed failure time,
Event is the failure indicator, As is the
mismeasured exposure, W is a baseline covariate,
I is the indicator for main study samples
(I=0) and validation study samples (I=1).
Notice that A is unavailable in the main study samples.
Next, let’s specify the arguments used in
mediate_approx.
## 1. specify the exposure, mediator, outcome, and covariates
Iname="I" # name of the indicator of main/validation study samples
Tname="Time" # name of observed failure time
Dname="Event" # name of failure indicator
Aname="A" # name of true exposure
Asname = "As" # name of mismeasured exposure
Mname = "M" # name of mediator
Wname="W" # name of covariates
## 2. specify split points for RRC method
Time_sep=c(0,15,25,35)
## 3. specify the conditional values in the mediation effect measures
a0=0; a1=1; cw=0Finally, run the following code to perform the survival mediation analysis
mediate_approx(example_data,Iname,Tname,Dname,Aname,
Asname,Mname,Wname,Time_sep,a0,a1,cw)
#> $Unadj
#> NIE NDE TE MP
#> point 0.06384189 0.3095698 0.3734117 0.1709692
#> CI.lower 0.04737591 0.2366232 0.2995538 0.1200621
#> CI.upper 0.08030788 0.3825165 0.4472697 0.2218762
#>
#> $ORC1
#> NIE NDE TE MP
#> point 0.08047974 0.3980839 0.4785636 0.1681694
#> CI.lower 0.05505711 0.3042505 0.3829710 0.1113261
#> CI.upper 0.10590237 0.4919173 0.5741562 0.2250127
#>
#> $ORC2
#> NIE NDE TE MP
#> point 0.07922588 0.4022638 0.4814897 0.1645433
#> CI.lower 0.05369454 0.3075076 0.3854000 0.1075630
#> CI.upper 0.10475723 0.4970200 0.5775794 0.2215236
#>
#> $RRC
#> NIE NDE TE MP
#> point 0.07943697 0.3936375 0.4730745 0.1679164
#> CI.lower 0.05406432 0.3006845 0.3786795 0.1103540
#> CI.upper 0.10480962 0.4865904 0.5674694 0.2254789Similarly, one can run mediate_exact to assess mediation
based on the exact expressions. Since the mediation effect is now a
function of time, one should additional specify a grid of time points
(t.list) where their mediation effects are of interest.
Here, we estimate \(\tau_{a,a^*|w}(t)\)
with \(t\in\{10,15,20,25,30,35,40\}\):
## 1. specify t.list
t.list=c(10,15,20,25,30,35,40)
## 2. perform the mediation analysis based on the exact expression
res=mediate_exact(example_data,Iname,Tname,Dname,Aname,
Asname,Mname,Wname,t.list,Time_sep,a0,a1,cw,R=20)
## 3. print the RRC estimates
## One can obtain the unadjusted, ORC1, ORC2 estimates by printing
## res$Unadjusted, res$ORC1, and res$IRC2, respectively.
res$RRC
#> NIE_rrc NDE_rrc TE_rrc
#> 10 0.079 (0.060,0.096) 0.394 (0.312,0.464) 0.473 (0.372,0.541)
#> 15 0.079 (0.060,0.096) 0.394 (0.312,0.464) 0.473 (0.372,0.541)
#> 20 0.079 (0.060,0.096) 0.394 (0.312,0.464) 0.473 (0.372,0.541)
#> 25 0.079 (0.060,0.096) 0.393 (0.312,0.464) 0.473 (0.372,0.540)
#> 30 0.079 (0.060,0.096) 0.393 (0.312,0.464) 0.472 (0.372,0.540)
#> 35 0.079 (0.060,0.096) 0.392 (0.312,0.463) 0.471 (0.372,0.539)
#> 40 0.079 (0.059,0.095) 0.391 (0.312,0.462) 0.470 (0.372,0.538)
#> MP_rrc
#> 10 0.168 (0.129,0.209)
#> 15 0.168 (0.129,0.209)
#> 20 0.168 (0.129,0.209)
#> 25 0.168 (0.129,0.209)
#> 30 0.168 (0.129,0.209)
#> 35 0.168 (0.129,0.209)
#> 40 0.168 (0.129,0.209)Each row represents the mediation effect at a given time point. In
this example, we set the number of boostrap replications at
R=20 for illustrative purposes. One should specify a large
R (e.g., 500) in real data analysis.
The R package can be applied to settings beyond correcting measurement errors in survival mediation analysis, including (i) performing standard survival mediation analysis without exposure measurement error (this is equivalent to implementing an uncorrected estimator) and (ii) performing bias-corrected coefficient estimation in Cox regression models under a main study/validation study design. We provide example syntax for each application separately.
We first consider the first application. If the true exposure \(A\) is available in the study, one can run the standard mediation analysis (i.e., the unadjusted estimator) by running
mediate_unadj_approx(data,Tname,Dname,Aname,Mname,Wname,a0,a1,cw)
or
mediate_unadj_exact(data,Tname,Dname,Aname,Mname,Wname,t.list,a0,a1,cw,R)
Here, mediate_unadj_approx and
mediate_unadj_exact present the mediation analysis based on
the approximate and exact expressions, respectively. The following
arguments are required:
data: a data.frame data setTname: name of observed failure timeDname: name of the failure indicatorAname: name of the true exposureMname: name of the mediatorWname: names of the covariatesTime_sep: a vector of time split points for RRC
methoda0: baseline exposure level in the mediation effect
measuresa1: active exposure level in the mediation effect
measurescw: levels of the covariatest.list: a list of time points for the mediation effect
measuresR: number of bootstrap numbers (default 50)Below, we illustrate the use of mediate_unadj_approx
based on the example data example_data:
## 1. attach dataset
attach(data2)
#> The following objects are masked from example_data:
#>
#> A, Event, M, Time, W
## 2. specify the exposure, mediator, outcome, and covariates
Tname="Time" # name of observed failure time
Dname="Event" # name of failure indicator
Aname="A" # name of true exposure
Mname = "M" # name of mediator
Wname="W" # name of covariates
## 3. Perform the mediation analysis under approximate expressions
mediate_unadj_approx(data2,Tname,Dname,Aname,Mname,Wname,a0=0,a1=1,cw=0)
#> NIE NDE TE MP
#> point 0.07759801 0.2196715 0.2972695 0.2610359
#> CI.lower 0.06152528 0.1605380 0.2380269 0.1915496
#> CI.upper 0.09367074 0.2788049 0.3565120 0.3305222Next, we provide example syntax for the second application.
Specifically, we consider a Cox model with baseline hazard \(\lambda(t|W)=\beta_1 A + \beta_2 W\), where
\(A\) is a univariate variable that is
subject to measurement error and \(W\)
is a vector of other covariates that are free of measurement error.
Based on the methodology in Cheng et al (2023, arXiv:2304.04868), we can
consider using ordinary/risk-set regression calibration to correct the
bias in estimating \(\beta=[\beta_1,\beta_2]\). The ordinary
regression calibration (ORC) runs a standard Cox model with \(A\) in replacement of \(E[A|A^*,W]\), and the risk-set regression
calibration (RRC) runs a standard Cox model with \(A\) in replacement of \(E[A|A^*,W,T>t]\), where \(E[A|A^*,W]\) and \(E[A|A^*,W,T>t]\) can be estimated based
on an external validation data set. We have created functions
cox_orc.f and cox_rrc.f to help implement ORC
and RRC methods to correct the bias in the Cox model \(\lambda(t|W)=\beta_1 A + \beta_2 W\). We
can call cox_orc.f by
cox_orc.f(data,Iname,Tname,Dname,Aname,Asname,Wname)
The following arguments are required:
data: a data.frame data setIname: name of the indicator on the main/validation
study (I=1 for validation study sample, I=0
for main study samples)Tname: name of observed failure timeDname: name of the failure indicatorAname: name of the true measurement of \(A\)Asname: name of the mismeasured measurement of \(A\)Wname: names of other covariates that are free of
measurement error (i.e., \(W\)).Similarly, one can call cox_rrc.f by
cox_rrc.f(data,Time_sep,Iname,Tname,Dname,Aname,Asname,Wname)
Besides the arguments in cox_orc.f,
cox_rrc.f additionally requires specifying
Time_sep, which is a vector of time split points to
estimate the time-varying regression model \(E[A|A^*,W,T>t]\). Below, we illustrate
the use of cox_orc.f and cox_rrc.f based on
the example data example_data:
## 1. specify the exposure, mediator, outcome, and covariates
Iname="I" # name of the indicator of main/validation study samples
Tname="Time" # name of observed failure time
Dname="Event" # name of failure indicator
Aname="A" # name of true exposure
Asname = "As" # name of mismeasured exposure
Wname="W" # name of covariates
Time_sep=c(0,15,25,35) # specify split points for RRC method
## 2. Perform the ORC estimator
cox_orc.f(example_data,Iname,Tname,Dname,Aname,Asname,Wname)
#> A W
#> point 0.4618011 0.17937640
#> SE 0.0471361 0.03556693
#> CI.lower 0.3694143 0.10966521
#> CI.upper 0.5541878 0.24908759
## 2. Perform the RRC estimator
cox_rrc.f(example_data,Time_sep,Iname,Tname,Dname,Aname,Asname,Wname)
#> A W
#> point 0.45113542 0.17546122
#> SE 0.04615565 0.03575939
#> CI.lower 0.36067034 0.10537282
#> CI.upper 0.54160049 0.24554963