Survival Mediation Analysis with Exposure Measurement Error

Introduction

This vignette demonstrates the use of the R-package survmed (available at this github page) to obtain bias-corrected estimates of the mediation effect measures in survival mediation analysis, under a main study/validation study design. We follow the methodology in Cheng et al (2023, arXiv:2304.04868).

Survival Mediation Analysis

Suppose we have an exposure (\(A\)), a mediator (\(M\)), a survival outcome \(\widetilde T\), and a set of baseline covariates (\(W\)). We are interested in assessing the natural indirect effect \(\text{NIE}_{a,a^*|w}(t)\) and the natural direct effect \(\text{NDE}_{a,a^*|w}(t)\). Here, \(\text{NIE}_{a,a^*|w}(t)\) measures that, at a given time \(t\), how much the log-hazard of the outcome would change if we set the exposure to a fixed level \(a\) and change the mediator from the level it would be observed under \(a^*\) versus \(a\). On the other hand, \(\text{NDE}_{a,a^*|w}(t)\) measures how much the log-hazard of the outcome at time \(t\) would change if we change the exposure from level \(a^*\) to \(a\) and fix the mediator to its natural value under exposure level \(a^*\). Meanwhile, we are also interested in estimating the total effect and mediation proportion: \[\begin{align*} \text{TE}_{a,a^*|w}(t) &=\text{NIE}_{a,a^*|w}(t)+\text{NDE}_{a,a^*|w}(t)\\ \text{MP}_{a,a^*|w}(t)& =\text{NIE}_{a,a^*|w}(t)/\text{TE}_{a,a^*|w}(t) \end{align*}\] All effects are conditioning on covariates level at \(W=w\).

We have two sets of expressions to calculate the mediation effect measures – the approximate expression versus the exact expression. The exact expression provides precise formulas to calculate the mediation effects, but they have complex forms. When the outcome is rare, one can consider using the approximate expression with concise formulas. Importantly, the mediation effect measures under the approximate expression are constant over time \(t\), i.e., \(\tau_{a,a^*|w}(t)=\tau_{a,a^*|w}\) for all \(\tau\in\{\text{NIE},\text{NDE},\text{TE},\text{MP}\}\).

In this work, we assume that the exposure \(A\) is mismeasured in the study and therefore may bias our survival mediation results.

Data structure

Recall that \(W\), \(A\), \(A^*\), \(M\), \(T\), and \(\Delta\) represent the baseline covariates, true exposure, mismeasured exposure, mediator, observed failure time, and failure indicator, respectively. In the main study (samples with \(I=0\)), we observe \(\{W,A^*,M,T,\Delta\}\), where the true exposure is unavailable. In the validation study (samples with \(I=1\)), we observe \(\{W,A,A^*,M,T\}\), which can be used to estimate the exposure measurement error process.

Basic syntax

The data-fitting functions are mediate_approx and mediate_exact, where the first performs mediation analysis based on approximate expressions and the second performs mediation analysis based on exact expressions. For each function, the output includes four estimators, including an unadjusted estimator that ignores measurement error, and three bias-corrected estimators (based on ORC1, ORC2, and RRC approaches).

We can call mediate_approx by

mediate_approx(data,Iname,Tname,Dname,Aname,Asname,Mname,Wname,Time_sep,a0,a1,cw)

The following arguments are required:

data: a data.frame data set
Iname: name of the indicator on the main/validation study (I=1 for validation study sample, I=0 for main study samples)
Tname: name of observed failure time
Dname: name of the failure indicator
Aname: name of the true exposure
Asname: name of the mismeasured exposure
Mname: name of the mediator
Wname: names of the covariates
Time_sep: a vector of time split points for RRC method
a0: baseline exposure level in the mediation effect measures
a1: active exposure level in the mediation effect measures
cw: levels of the covariates

Similarly, one can call mediate_exact by

mediate_exact(data,Iname,Tname,Dname,Aname,Asname,Mname,Wname,Time_sep,a0,a1,cw)

Besides the arguments in mediate_approx, mediate_exact requires two additional arguments:

t.list: a list of time points for the mediation effect measures
R: number of bootstrap numbers (default 50)

Illustrative example based on the approximate expression

Please library the survmed and survival packages at first.

library(survmed)
library(survival)

Attach dataset

We shall use the example data example_data for illustration. First, let’s attach the dataset.

attach(example_data)
head(example_data)
#>    A      M Time Event     As      W I
#> 1 NA -0.530 50.0     0  0.779  0.281 0
#> 2 NA  1.369 50.0     0 -0.548 -0.281 0
#> 3 NA -0.039  2.8     0  0.199 -0.608 0
#> 4 NA -0.653 45.4     0  0.549  0.044 0
#> 5 NA -0.150 13.7     0  0.076  0.390 0
#> 6 NA  1.113 50.0     0 -0.524 -0.390 0

Here, A is the true exposure, M is the mediator, Time is the observed failure time, Event is the failure indicator, As is the mismeasured exposure, W is a baseline covariate, I is the indicator for main study samples (I=0) and validation study samples (I=1). Notice that A is unavailable in the main study samples.

Specify the arguments

Next, let’s specify the arguments used in mediate_approx.

## 1. specify the exposure, mediator, outcome, and covariates
Iname="I" # name of the indicator of main/validation study samples
Tname="Time" # name of observed failure time
Dname="Event" # name of failure indicator
Aname="A" # name of true exposure
Asname = "As" # name of mismeasured exposure
Mname = "M" # name of mediator
Wname="W" # name of covariates

## 2. specify split points for RRC method
Time_sep=c(0,15,25,35)

## 3. specify the conditional values in the mediation effect measures
a0=0; a1=1; cw=0

Perform the mediation analysis

Finally, run the following code to perform the survival mediation analysis

mediate_approx(example_data,Iname,Tname,Dname,Aname,
               Asname,Mname,Wname,Time_sep,a0,a1,cw)
#> $Unadj
#>                 NIE       NDE        TE        MP
#> point    0.06384189 0.3095698 0.3734117 0.1709692
#> CI.lower 0.04737591 0.2366232 0.2995538 0.1200621
#> CI.upper 0.08030788 0.3825165 0.4472697 0.2218762
#> 
#> $ORC1
#>                 NIE       NDE        TE        MP
#> point    0.08047974 0.3980839 0.4785636 0.1681694
#> CI.lower 0.05505711 0.3042505 0.3829710 0.1113261
#> CI.upper 0.10590237 0.4919173 0.5741562 0.2250127
#> 
#> $ORC2
#>                 NIE       NDE        TE        MP
#> point    0.07922588 0.4022638 0.4814897 0.1645433
#> CI.lower 0.05369454 0.3075076 0.3854000 0.1075630
#> CI.upper 0.10475723 0.4970200 0.5775794 0.2215236
#> 
#> $RRC
#>                 NIE       NDE        TE        MP
#> point    0.07943697 0.3936375 0.4730745 0.1679164
#> CI.lower 0.05406432 0.3006845 0.3786795 0.1103540
#> CI.upper 0.10480962 0.4865904 0.5674694 0.2254789

Illustrative example based on the exact expression

Similarly, one can run mediate_exact to assess mediation based on the exact expressions. Since the mediation effect is now a function of time, one should additional specify a grid of time points (t.list) where their mediation effects are of interest. Here, we estimate \(\tau_{a,a^*|w}(t)\) with \(t\in\{10,15,20,25,30,35,40\}\):

## 1. specify t.list
t.list=c(10,15,20,25,30,35,40)

## 2. perform the mediation analysis based on the exact expression
res=mediate_exact(example_data,Iname,Tname,Dname,Aname,
               Asname,Mname,Wname,t.list,Time_sep,a0,a1,cw,R=20)

## 3. print the RRC estimates 
## One can obtain the unadjusted, ORC1, ORC2 estimates by printing 
## res$Unadjusted, res$ORC1, and res$IRC2, respectively.
res$RRC
#>                NIE_rrc             NDE_rrc              TE_rrc
#> 10 0.079 (0.060,0.096) 0.394 (0.312,0.464) 0.473 (0.372,0.541)
#> 15 0.079 (0.060,0.096) 0.394 (0.312,0.464) 0.473 (0.372,0.541)
#> 20 0.079 (0.060,0.096) 0.394 (0.312,0.464) 0.473 (0.372,0.541)
#> 25 0.079 (0.060,0.096) 0.393 (0.312,0.464) 0.473 (0.372,0.540)
#> 30 0.079 (0.060,0.096) 0.393 (0.312,0.464) 0.472 (0.372,0.540)
#> 35 0.079 (0.060,0.096) 0.392 (0.312,0.463) 0.471 (0.372,0.539)
#> 40 0.079 (0.059,0.095) 0.391 (0.312,0.462) 0.470 (0.372,0.538)
#>                 MP_rrc
#> 10 0.168 (0.129,0.209)
#> 15 0.168 (0.129,0.209)
#> 20 0.168 (0.129,0.209)
#> 25 0.168 (0.129,0.209)
#> 30 0.168 (0.129,0.209)
#> 35 0.168 (0.129,0.209)
#> 40 0.168 (0.129,0.209)

Each row represents the mediation effect at a given time point. In this example, we set the number of boostrap replications at R=20 for illustrative purposes. One should specify a large R (e.g., 500) in real data analysis.

Other applications

The R package can be applied to settings beyond correcting measurement errors in survival mediation analysis, including (i) performing standard survival mediation analysis without exposure measurement error (this is equivalent to implementing an uncorrected estimator) and (ii) performing bias-corrected coefficient estimation in Cox regression models under a main study/validation study design. We provide example syntax for each application separately.

We first consider the first application. If the true exposure \(A\) is available in the study, one can run the standard mediation analysis (i.e., the unadjusted estimator) by running

mediate_unadj_approx(data,Tname,Dname,Aname,Mname,Wname,a0,a1,cw)

or

mediate_unadj_exact(data,Tname,Dname,Aname,Mname,Wname,t.list,a0,a1,cw,R)

Here, mediate_unadj_approx and mediate_unadj_exact present the mediation analysis based on the approximate and exact expressions, respectively. The following arguments are required:

data: a data.frame data set
Tname: name of observed failure time
Dname: name of the failure indicator
Aname: name of the true exposure
Mname: name of the mediator
Wname: names of the covariates
Time_sep: a vector of time split points for RRC method
a0: baseline exposure level in the mediation effect measures
a1: active exposure level in the mediation effect measures
cw: levels of the covariates
t.list: a list of time points for the mediation effect measures
R: number of bootstrap numbers (default 50)

Below, we illustrate the use of mediate_unadj_approx based on the example data example_data:

## 1. attach dataset
attach(data2)
#> The following objects are masked from example_data:
#> 
#>     A, Event, M, Time, W
## 2. specify the exposure, mediator, outcome, and covariates
Tname="Time" # name of observed failure time
Dname="Event" # name of failure indicator
Aname="A" # name of true exposure
Mname = "M"  # name of mediator
Wname="W" # name of covariates
## 3. Perform the mediation analysis under approximate expressions 
mediate_unadj_approx(data2,Tname,Dname,Aname,Mname,Wname,a0=0,a1=1,cw=0)
#>                 NIE       NDE        TE        MP
#> point    0.07759801 0.2196715 0.2972695 0.2610359
#> CI.lower 0.06152528 0.1605380 0.2380269 0.1915496
#> CI.upper 0.09367074 0.2788049 0.3565120 0.3305222

Next, we provide example syntax for the second application. Specifically, we consider a Cox model with baseline hazard \(\lambda(t|W)=\beta_1 A + \beta_2 W\), where \(A\) is a univariate variable that is subject to measurement error and \(W\) is a vector of other covariates that are free of measurement error. Based on the methodology in Cheng et al (2023, arXiv:2304.04868), we can consider using ordinary/risk-set regression calibration to correct the bias in estimating \(\beta=[\beta_1,\beta_2]\). The ordinary regression calibration (ORC) runs a standard Cox model with \(A\) in replacement of \(E[A|A^*,W]\), and the risk-set regression calibration (RRC) runs a standard Cox model with \(A\) in replacement of \(E[A|A^*,W,T>t]\), where \(E[A|A^*,W]\) and \(E[A|A^*,W,T>t]\) can be estimated based on an external validation data set. We have created functions cox_orc.f and cox_rrc.f to help implement ORC and RRC methods to correct the bias in the Cox model \(\lambda(t|W)=\beta_1 A + \beta_2 W\). We can call cox_orc.f by

cox_orc.f(data,Iname,Tname,Dname,Aname,Asname,Wname)

The following arguments are required:

data: a data.frame data set
Iname: name of the indicator on the main/validation study (I=1 for validation study sample, I=0 for main study samples)
Tname: name of observed failure time
Dname: name of the failure indicator
Aname: name of the true measurement of \(A\)
Asname: name of the mismeasured measurement of \(A\)
Wname: names of other covariates that are free of measurement error (i.e., \(W\)).

Similarly, one can call cox_rrc.f by

cox_rrc.f(data,Time_sep,Iname,Tname,Dname,Aname,Asname,Wname)

Besides the arguments in cox_orc.f, cox_rrc.f additionally requires specifying Time_sep, which is a vector of time split points to estimate the time-varying regression model \(E[A|A^*,W,T>t]\). Below, we illustrate the use of cox_orc.f and cox_rrc.f based on the example data example_data:

## 1. specify the exposure, mediator, outcome, and covariates
Iname="I" # name of the indicator of main/validation study samples
Tname="Time" # name of observed failure time
Dname="Event" # name of failure indicator
Aname="A" # name of true exposure
Asname = "As" # name of mismeasured exposure
Wname="W" # name of covariates
Time_sep=c(0,15,25,35) # specify split points for RRC method

## 2. Perform the ORC estimator
cox_orc.f(example_data,Iname,Tname,Dname,Aname,Asname,Wname)
#>                  A          W
#> point    0.4618011 0.17937640
#> SE       0.0471361 0.03556693
#> CI.lower 0.3694143 0.10966521
#> CI.upper 0.5541878 0.24908759

## 2. Perform the RRC estimator
cox_rrc.f(example_data,Time_sep,Iname,Tname,Dname,Aname,Asname,Wname)
#>                   A          W
#> point    0.45113542 0.17546122
#> SE       0.04615565 0.03575939
#> CI.lower 0.36067034 0.10537282
#> CI.upper 0.54160049 0.24554963