Estimation for Natural Mediation Effect in Longitudinal Data

Introduction

An exposure often acts on an outcome of interest directly and/or indirectly through the mediation of some intermediate variables. Identifying and quantifying these two types of effects is an important step in further understanding the causal mechanism. Overall, we might be interested in an effect question: what would be the effect of an exposure on the outcome if the mediator acts as if the exposure was absent? The medltmle focuses on the Natural Effect parameter, where the overall effect of the exposure on the outcome can be decomposed into a Natural Direct Effect and Natural Indirect Effect. In general, medltmle addresses the need for causal mediation through conditional mediator distributions in general time-varying exposure and mediator settings with survival and non-survival outcomes.

The challenges in longitudinal mediation analysis are address by considering 3 different versions of the parameter:

Stochastic intervention on the mediator fully conditional on the past.
Data-dependent stochastic intervention on the mediator.
Data-dependent stochastic mediation effect with an instrumental variable.

Efficient influence curve under a locally saturated semiparametric model and robustness properties for all three are shown in a technical report. The medltmle supports three estimators:

Nested non-targeted substitution estimator
Inverse Probability Weighted estimator
Targeted Maximum Likelihood estimator

Stochastic intervention on the mediator fully conditional on the past

We’ll start by examining a simple simulated data set. In this case, we will generate data for \(n=400\) samples and \(2\) time points of the form: \[O=(W1,W2,C1,A1,LA1,Z1,LZ1,Y1,C2,A2,LA2,Z2,LZ2,Y2)\] Note that \(W\) variables are baseline covariates, \(C\) is a censoring mechanism, \(A\) is a time-varying exposure, \(LA\) and \(LZ\) are time-varying covariates, \(Z\) is a time-varying mediator, and finally \(Y\) is a survival outcome. This data structure allows for confounders of the exposure-outcome relation and exposure-induced confounders of the mediator-outcome relation, both within time and across time. The medltmle package additionally allows the user to define nodes \(D\) for non-survival outcome when treating death as a censoring variable, but we do not recommend it. In addition, medltmle allows for time-dependency in the baseline covariates.

We can expect the generated data-generating mechanism in a bit more detail. Note that \(L_0\) encodes two baseline (binary) covariates, \(W_1\) and \(W_2\). Our intervention is includes both a censoring indicator and a binary exposure \(A_t\); \(LA_t\) encodes covariates at time \(t\) that are directly affected by \(A_t\) and may influence \(Z_t\) and \(L_t\). \(Z_t\) is a binary mediator, whereas \(L_t\) includes a time varying covariate \(LZ_t\) and a death indicator \(Y_t\).

suppressMessages(library(medltmle))

set.seed(2)
end.time=2

data<-GenerateData(n=400, end.time=2)
head(data)

##   W1 W2        C_1 A_1 LA_1 Z_1 LZ_1 Y_1        C_2 A_2 LA_2 Z_2 LZ_2 Y_2
## 1  0  0   censored  NA   NA  NA   NA  NA       <NA>  NA   NA  NA   NA  NA
## 2  1  1 uncensored   1    1   1    1   0 uncensored   1    1   1    0   0
## 3  0  1 uncensored   1    0   1    1   0 uncensored   1    1   0    0   0
## 4  0  1 uncensored   1    1   1    1   0   censored  NA   NA  NA   NA  NA
## 5  1  0   censored  NA   NA  NA   NA  NA       <NA>  NA   NA  NA   NA  NA
## 6  1  0 uncensored   1    0   1    1   0   censored  NA   NA  NA   NA  NA

Since this is a simulated data set, we know the data-generating distribution. With function make.sim.spec we just let medltmle know what the correct set of covariates is. Note however, that this function is specific to the simulated data set, and is therefore not for general use. The medltmle allows you to specify a list of covariates you want to use for each part of the likelihood. If set to NULL, you can specify if you want to condition on the entire past, or impose Markov assumptions of a certain order (default is one time step in the past).

spec<-make.sim.spec(2)

With correct models specified for each, we run medltmle for the non-data dependent parameter with exposure set to \((1,0)\), \((0,0)\), and \((1,1)\). In this particular setting, we are estimating our parameter using general linear models.

result_10 <- suppressMessages(medltmle(data=data,
                           Anodes=names(data)[grep('^A',names(data))],
                           Cnodes=names(data)[grep('^C',names(data))],
                           Znodes=names(data)[grep('^Z',names(data))],
                           Lnodes=names(data)[grep('^L',names(data))],
                           Ynodes=names(data)[grep('^Y',names(data))],
                           survivalOutcome = T,
                           QLform=spec$QL.c,
                           QZform=spec$QZ.c,
                           gform=spec$g.c,
                           qzform=spec$qz.c,
                           qLform=spec$qL.c,
                           abar=rep(1,end.time),
                           abar.prime=rep(0,end.time),
                           CSE=TRUE,
                           time.end=end.time
                           ))

## Mon Dec  2 12:45:44 2019 EstimateLYnodes node  1 
## Mon Dec  2 12:45:44 2019 EstimateLYnodes node  2 
## Mon Dec  2 12:45:44 2019 EstimateLYnodes node  3 
## Mon Dec  2 12:45:44 2019 EstimateLYnodes node  4 
## Mon Dec  2 12:45:44 2019 EstimateLYnodes node  1 
## Mon Dec  2 12:45:44 2019 EstimateLYnodes node  2 
## Mon Dec  2 12:45:44 2019 EstimateLYnodes node  3 
## Mon Dec  2 12:45:44 2019 EstimateLYnodes node  4 
## [1] "2019-12-02 12:45:44 PST"
## fixing:  2.106805 
## fixing:  2.825206 
## fixing:  2.682846 
## fixing:  2.337713

result_00 <- suppressMessages(medltmle(data=data,
                           Anodes=names(data)[grep('^A',names(data))],
                           Cnodes=names(data)[grep('^C',names(data))],
                           Znodes=names(data)[grep('^Z',names(data))],
                           Lnodes=names(data)[grep('^L',names(data))],
                           Ynodes=names(data)[grep('^Y',names(data))],
                           survivalOutcome = T,
                           QLform=spec$QL.c,
                           QZform=spec$QZ.c,
                           gform=spec$g.c,
                           qzform=spec$qz.c,
                           qLform=spec$qL.c,
                           abar=rep(0,end.time),
                           abar.prime=rep(0,end.time),
                           CSE=TRUE,
                           time.end=end.time
                           ))

## Mon Dec  2 12:45:45 2019 EstimateLYnodes node  1 
## Mon Dec  2 12:45:45 2019 EstimateLYnodes node  2 
## Mon Dec  2 12:45:45 2019 EstimateLYnodes node  3 
## Mon Dec  2 12:45:45 2019 EstimateLYnodes node  4 
## [1] "2019-12-02 12:45:45 PST"
## fixing:  4.072394 
## fixing:  4.072397 
## fixing:  4.0724

result_11 <- suppressMessages(medltmle(data=data,
                           Anodes=names(data)[grep('^A',names(data))],
                           Cnodes=names(data)[grep('^C',names(data))],
                           Znodes=names(data)[grep('^Z',names(data))],
                           Lnodes=names(data)[grep('^L',names(data))],
                           Ynodes=names(data)[grep('^Y',names(data))],
                           survivalOutcome = T,
                           QLform=spec$QL.c,
                           QZform=spec$QZ.c,
                           gform=spec$g.c,
                           qzform=spec$qz.c,
                           qLform=spec$qL.c,
                           abar=rep(1,end.time),
                           abar.prime=rep(1,end.time),
                           CSE=TRUE,
                           time.end=end.time
                           ))

## Mon Dec  2 12:45:45 2019 EstimateLYnodes node  1 
## Mon Dec  2 12:45:45 2019 EstimateLYnodes node  2 
## Mon Dec  2 12:45:45 2019 EstimateLYnodes node  3 
## Mon Dec  2 12:45:45 2019 EstimateLYnodes node  4 
## [1] "2019-12-02 12:45:45 PST"
## fixing:  2.331347 
## fixing:  2.331345 
## fixing:  2.331346 
## fixing:  5.384206

result_10$estimates

##        tmle        iptw 
## 0.005729184 0.014033848

Note that Natural Direct Effect} is defined as: \[E[Y_{2}(1,\overline{\Gamma^0})] - E[Y_{2}(0,\overline{\Gamma^0})]\] whereas the Natural Indirect Effect is defined as: \[E[Y_{2}(1, \overline{\Gamma^1})] - E[Y_{2}(1,\overline{\Gamma^0})]\] finally, overall effect is just: \[E[Y_{2}(1)] - E[Y_{2}(0)] = E[Y_{2}(1, \overline{\Gamma^1})] - E[Y_{2}(0,\overline{\Gamma^0})]\] We can summarize the results using summary_medltmle function.

res<-summary_medltmle(nie1=result_11,nie2=result_10,nde1=result_10,nde2=result_00)

res$NDE

##               NDE     SE NDE    CI lower   CI upper
## tmle -0.004270837 0.01283323 -0.02942396 0.02088229
## iptw  0.004391043 0.01293076 -0.02095325 0.02973534

res$NIE

##              NIE       SE NIE      CI lower    CI upper
## tmle 0.007561666 0.0014892329  0.0046427694 0.010480562
## iptw 0.001083652 0.0007445051 -0.0003755779 0.002542882

res$NE

##               NE      SE NE    CI lower   CI upper
## tmle 0.003290829 0.01267467 -0.02155151 0.02813317
## iptw 0.005474695 0.01327889 -0.02055193 0.03150132

Estimation for Natural Mediation Effect in Longitudinal Data

Ivana Malenica

Introduction

Stochastic intervention on the mediator fully conditional on the past

Session Information

References