This tutorial provides a gentle introduction to the Causal Roadmap and its applications in pharmaco-epidemiologic research. It is designed for a broad audience, including learners from both academia and industry. We systematically walk through each step of the Causal Roadmap—from explicitly formulating a research question, to translating it into a formal causal estimand, to identifying and estimating that estimand from observed data, and finally to drawing valid inferences and interpreting results. Each step is illustrated using a working example from a pharmaco-epidemiology setting, accompanied by interactive, built-in code to facilitate hands-on learning. The structure and content of this tutorial follow, in an analytical way, the Introduction to Causal Inference and the causal Roadmap course (htp://www.ucbbiostat.com/) Petersen and Balzer.
Adopting the Causal Roadmap in our approach to research in causal inference enables us to clearly state a scientific question and select an analtyic approach that matches the question being asked while ensuring systematic assessment of our ability/feasibility to answer this question from the data we observe (identifiability). Head to head analysis method comparison lets us select the best approach.
We will now formally introduce the Causal Roadmap but before let us go over some notation!
The Causal Roadmap is a framework that provides a systematic process to move from a research question to estimation and interpretation which guides investigators on how to design and analyse their studies a priori. This framework has the following steps;
We shall now delve into each of these steps in details!
The hypothetical experiment defined in Step 0 can be viewed as a target trial.
Observational studies aim to emulate this trial by aligning eligibility criteria, treatment assignment, follow-up, outcome definitions, and estimands.
The Causal Roadmap provides the formal structure for conducting such emulations transparently.
Causal modeling formalizes our knowledge however limited. We are able to explore which variables affect each other, examine the role of unmeasured factors and the functional form of the relationships between variables.
In this tutorial, we shall focus on structural causal models and corresponding causal graphs (Pearl 2000). However, we do note that their are many other causal frameworks.
The figure 1 below corresponds to a simple causal graph with corresponding structural casual model as follows;
We make no assumptions on the background factors \((U_w,U_A,U_Y)\) or on the functional forms of functions \((f_w,f_A,f_Y)\)
If you believed no unmeasured confounding, a possible causal model and graph (figure 2) would be;
Here we assume that the background factors are all independent but still make no assumption on the functional forms of \((f_w,f_A,f_Y)\)
However, it is important to note that wishing for something does not make it true.
An estimand precisely defines the treatment effect of interest by specifying all attributes of the causal question.
| Attribute | Specification |
|---|---|
| Population | Eligible individuals meeting study inclusion criteria |
| Treatment Strategies | Intervention A versus comparator B |
| Endpoint | Binary or time-to-event outcome within a fixed horizon |
| Intercurrent Events | Addressed via treatment-policy or hypothetical strategy |
| Summary Measure | Risk difference, risk ratio, or mean difference |
Explicit estimand specification ensures alignment between the scientific question, identification assumptions, and estimation strategy.
A treatment-policy estimand contrasts outcomes under initial treatment assignment regardless of subsequent treatment changes (i.e., intention-to-treat estimates as presented in this workshop).
A hypothetical estimand contrasts outcomes under a counterfactual world in which intercurrent events (e.g., switching or discontinuation) do not occur.
The choice between these estimands reflects different scientific questions and determines how intercurrent events are handled during analysis.
Intercurrent events are post-treatment events that affect the interpretation or existence of the outcome, such as treatment switching, discontinuation, or death.
Handling of intercurrent events must be specified at the estimand stage, not deferred to estimation. Common strategies include:
This choice determines the causal question being answered.
In many applications, outcomes occur over time and are subject to censoring.
Rather than targeting hazard ratios, the Causal Roadmap naturally accommodates risk-based estimands, such as cumulative incidence at a fixed time horizon.
For example, the causal risk difference at 90 days compares the probability of experiencing the event by day 90 under each treatment strategy.
The observed data structure must specify which events terminate follow-up and how they relate to the estimand.
Censoring may occur due to administrative end of follow-up, loss to follow-up, or treatment switching.
Whether censoring is causal or administrative depends on the estimand and must be addressed through design or analysis.
This process involves linking the causal effect to the parameter estimable from observed data. This requires some assumptions as follows:
With these assumptions, we can express our causal target parameter which is a function of counterfactuals in terms of our observed data i.e \[ \begin{aligned} \mathbb{E}(Y_a) &= \mathbb{E}\big[ \mathbb{E}(Y_a \mid W) \big] \\ &= \mathbb{E}\big[ \mathbb{E}(Y_a \mid A=a, W) \big] under \ randomization\\ &= \mathbb{E}\big[ \mathbb{E}(Y \mid A=a, W) \big] under \ consistency \end{aligned} \]
Again wishing for something does not make it true.
Under the above assumptions we can have the G-computation identifiability result (Robins 1986) as
What if the assumptions are not all met? For example one might be worried about unmeasured confounders or that the data structure does not assure temporality.Possible options include;
Assessing Assumptions in Real-World Data The plausibility of identification assumptions depends on the data source. For example, claims data may offer rich information on diagnoses and procedures but limited clinical detail.Explicitly discussing data limitations strengthens interpretation and guides sensitivity analyses.
Having established the causal estimand, the observed-data structure, and the assumptions under which the estimand is identified, we now turn to estimation. At this stage of the Causal Roadmap, the scientific question has been translated into a well-defined statistical target, and the remaining tasks concern how best to estimate this target from finite data, assess precision, and diagnose potential threats such as model misspecification or practical violations of assumptions. The choice of estimator should therefore be guided by the estimand and identification results, rather than driving them.
\[ \text{logit}\big( \mathbb{E}(Y \mid A, W) \big) = \beta_0 + \beta_1 A + \beta_2 W_1 + \cdots + \beta_{19} W_{18}. \]
Why start with g-computation? G-computation (outcome modeling and standardization) is often the most intuitive entry point for causal estimation because it does not interpret a regression coefficient as a causal effect. Instead, it implements the identification formula by estimating the conditional mean outcome \(E[Y \mid A, W]\), predicting counterfactual outcomes under \(A=1\) and \(A=0\) for each individual, and then standardizing (averaging) those predictions over the empirical distribution of \(W\). The resulting contrast is a marginal (population-level) estimand, such as a risk difference.
Algorithm (standardization / g-computation):
library(readr)
data <- read_csv("CausalWorkshop.csv")
## Rows: 200 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (6): W1, W2, W3, W4, A, Y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 6
## W1 W2 W3 W4 A Y
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.704 0.312 0.441 0 0.0467
## 2 0 0.696 0.438 1.38 0 0.0771
## 3 0 0.394 -0.538 -1.44 0 0.177
## 4 0 0.642 0.767 1.25 0 0.0330
## 5 1 0.426 -0.203 -0.146 0 0.105
## 6 1 0.458 1.60 0.709 0 0.0228
tail(data)
## # A tibble: 6 × 6
## W1 W2 W3 W4 A Y
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0.856 1.17 2.10 1 0.00278
## 2 0 0.434 0.545 -0.238 0 0.0469
## 3 1 0.243 -1.80 -0.769 0 0.0675
## 4 0 0.629 -0.229 0.0750 1 0.109
## 5 0 0.0913 -0.100 0.761 0 0.117
## 6 1 0.543 -1.48 -0.794 0 0.0531
dim(data)
## [1] 200 6
set.seed(1)
outcome.regression <- glm(Y ~ A + W1+W2+W3+W4, family='binomial', data=data)
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
Key point: Regression coefficients (for example, an odds ratio from a logistic model) are typically conditional measures given \(W\). The Roadmap target in this workshop is a marginal contrast (for example, a risk difference). Standardization (g-computation) converts an outcome regression into a marginal estimand by averaging predicted outcomes over the covariate distribution.
data.A1 <- data.A0 <- data
data.A1$A <- 1
data.A0$A <- 0
colMeans(data.A1)
## W1 W2 W3 W4 A Y
## 0.52000000 0.51459003 0.02187934 -0.01958053 1.00000000 0.06244803
colMeans(data.A0)
## W1 W2 W3 W4 A Y
## 0.52000000 0.51459003 0.02187934 -0.01958053 0.00000000 0.06244803
predict.outcome.A1 <- predict( outcome.regression, newdata=data.A1,
type='response')
predict.outcome.A0 <- predict(outcome.regression, newdata=data.A0,
type='response')
mean(predict.outcome.A1)
## [1] 0.03877997
mean(predict.outcome.A0)
## [1] 0.07258282
Simple.Subs <- mean(predict.outcome.A1 - predict.outcome.A0)
Simple.Subs
## [1] -0.03380285
These failure modes motivate (i) overlap diagnostics, (ii) flexible nuisance estimation (for example, SuperLearner), and (iii) doubly robust estimators such as TMLE.
Preview: One way to reduce reliance on parametric assumptions is to estimate \(Q(A,W)\) with flexible learners (for example, generalized additive models, random forests, boosting) or an ensemble via SuperLearner. In the next sections, we use SuperLearner first for nuisance estimation and then integrate it with TMLE, which targets the causal estimand while retaining a basis for statistical inference.
Intuition (pseudo-population): IPTW treats confounding as a form of biased sampling. Units who received a treatment that was unlikely given their covariates receive larger weights, and units who received an expected treatment receive smaller weights. In the resulting weighted pseudo-population, treatment is approximately independent of measured covariates (if the propensity score model is correct), mimicking a randomized experiment.
pscore.regression <- glm(A~ W1+W2+W3+W4, family='binomial',
data=data)
summary(pscore.regression)
##
## Call:
## glm(formula = A ~ W1 + W2 + W3 + W4, family = "binomial", data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.3445 0.3821 -3.519 0.000433 ***
## W1 0.4637 0.3130 1.482 0.138399
## W2 0.5113 0.5629 0.908 0.363744
## W3 0.4483 0.3208 1.397 0.162360
## W4 -0.2676 0.3095 -0.865 0.387227
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 247.64 on 199 degrees of freedom
## Residual deviance: 241.99 on 195 degrees of freedom
## AIC: 251.99
##
## Number of Fisher Scoring iterations: 4
Identification link: Under consistency, exchangeability given \(W\), and positivity, we can write \[ E\{Y(1)\} = E\left[\frac{\mathbb{I}(A=1)Y}{e(W)}\right], \qquad E\{Y(0)\} = E\left[\frac{\mathbb{I}(A=0)Y}{1-e(W)}\right], \] where \(e(W)=P(A=1\mid W)\). IPTW replaces the expectation with a sample average and replaces \(e(W)\) with an estimate.
predict.prob.A1 <- predict(pscore.regression, type='response')
summary(predict.prob.A1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1555 0.2521 0.3034 0.3100 0.3658 0.5145
predict.prob.A0 <- 1 - predict.prob.A1
wt1 <- as.numeric( data$A==1)/predict.prob.A1
wt0 <- as.numeric( data$A==0)/predict.prob.A0
head(data.frame(cbind(A=data$A, 1/predict.prob.A1, wt1, wt0)))
## A V2 wt1 wt0
## 1 0 2.646982 0 1.607171
## 2 0 4.197342 0 1.312760
## 3 0 3.718918 0 1.367793
## 4 0 3.735871 0 1.365514
## 5 0 3.044577 0 1.489099
## 6 0 2.128806 0 1.885892
mean(wt1*data$Y)
## [1] 0.03973564
mean(wt0*data$Y)
## [1] 0.07234258
IPW <- mean(wt1*data$Y) - mean(wt0*data$Y)
IPW
## [1] -0.03260694
mean( (wt1-wt0)*data$Y)
## [1] -0.03260694
Unstabilized weights can be variable in finite samples. A common alternative is to use stabilized weights, which multiply by the marginal probability of observed treatment: \[ SW_i = \begin{cases} \frac{P(A=1)}{e(W_i)}, & A_i=1 \\ \frac{P(A=0)}{1-e(W_i)}, & A_i=0. \end{cases} \] Stabilized weights often reduce variance and improve numerical stability without changing the large-sample target under correct specification.
pA <- mean(data$A)
sw1 <- as.numeric(data$A==1) * pA / predict.prob.A1
sw0 <- as.numeric(data$A==0) * (1-pA) / (1 - predict.prob.A1)
sw <- sw1 + sw0
summary(sw)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6025 0.8991 0.9700 1.0013 1.0937 1.9930
Implementation note: IPTW can be implemented either by directly computing weighted means of \(Y\) within treatment groups (as shown here) or by fitting a weighted regression of \(Y\) on \(A\) using robust standard errors. These approaches target the same marginal contrast when implemented consistently.
w <- wt1 + wt0
lo <- quantile(w, 0.01)
hi <- quantile(w, 0.99)
w_trunc <- pmin(pmax(w, lo), hi)
summary(w_trunc)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.212 1.359 1.525 1.993 2.455 4.885
###$ Why doubly robust estimators?
Both g-computation and IPTW rely on strong modeling assumptions: g-computation requires a correct outcome model, while IPTW requires a correct treatment model. Doubly robust estimators combine both approaches and are consistent if either the outcome model or the treatment model is correctly specified. This property is particularly valuable in real-world data settings, where all models are approximations.
library('SuperLearner')
SL.library <- c('SL.glm', 'SL.step.interaction', 'SL.gam'
)
SL.outcome.regression <- suppressWarnings(SuperLearner(Y=data$Y,
X=subset(data, select=-Y),
SL.library=SL.library,
family='binomial'))
SL.outcome.regression
##
## Call:
## SuperLearner(Y = data$Y, X = subset(data, select = -Y), family = "binomial",
## SL.library = SL.library)
##
##
## Risk Coef
## SL.glm_All 0.002745489 0.0000000
## SL.step.interaction_All 0.001407506 0.1134315
## SL.gam_All 0.001164333 0.8865685
SL.predict.outcome <- predict(SL.outcome.regression,
newdata=subset(data, select=-Y))$pred
head(SL.predict.outcome)
## [,1]
## [1,] 0.06057923
## [2,] 0.03497205
## [3,] 0.09365561
## [4,] 0.02889946
## [5,] 0.08963042
## [6,] 0.01211903
SL.predict.outcome.A1 <- predict(SL.outcome.regression,
newdata=subset(data.A1, select=-Y))$pred
head(SL.predict.outcome.A1)
## [,1]
## [1,] 0.03789361
## [2,] 0.02165598
## [3,] 0.05940978
## [4,] 0.01781923
## [5,] 0.05674338
## [6,] 0.00741711
SL.predict.outcome.A0 <- predict(SL.outcome.regression, newdata=subset(data.A0, select=-Y))$pred
# simple subst estimator
mean(SL.predict.outcome.A1) - mean(SL.predict.outcome.A0)
## [1] -0.02523562
SL.pscore <- SuperLearner(Y=data$A, X=subset(data, select=-c(A,Y)),
SL.library=SL.library, family='binomial')
SL.pscore
##
## Call:
## SuperLearner(Y = data$A, X = subset(data, select = -c(A, Y)), family = "binomial",
## SL.library = SL.library)
##
##
## Risk Coef
## SL.glm_All 0.2193296 0.0000000
## SL.step.interaction_All 0.2073331 0.2987384
## SL.gam_All 0.2048886 0.7012616
SL.predict.prob.A1 <- SL.pscore$SL.predict
summary(SL.predict.prob.A1 - predict(SL.pscore, newdata=subset(data,select=-c(A,Y)))$pred)
## V1
## Min. :0
## 1st Qu.:0
## Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
summary(SL.predict.prob.A1)
## V1
## Min. :0.1560
## 1st Qu.:0.2378
## Median :0.2748
## Mean :0.3100
## 3rd Qu.:0.3501
## Max. :0.9270
SL.predict.prob.A0 <- 1 - SL.predict.prob.A1
Note: As you run this code you might encounter a warning “non-integer #successes in a binomial glm!”. This simply means that our outcome Y is not binary much as it’s bounded between 0 and 1. This is okay and can be ignored.
\[H(A,W)= \frac{\mathbb{I}(A=1)}{\mathbb{P}(A=1\mid W)}- \frac{\mathbb{I}(A=0)}{\mathbb{P}(A=0\mid W)}\]
Here’s code to evaluate the “clever covariate”
H.AW <- (data$A==1)/SL.predict.prob.A1 - (data$A==0)/SL.predict.prob.A0
summary(H.AW)
## V1
## Min. :-2.187049
## 1st Qu.:-1.407167
## Median :-1.301617
## Mean : 0.006881
## 3rd Qu.: 1.988445
## Max. : 5.879351
H.1W <- 1/SL.predict.prob.A1
H.0W <- -1/SL.predict.prob.A0
tail(data.frame(A=data$A, H.AW, H.1W, H.0W))
## A H.AW H.1W H.0W
## 195 1 2.475845 2.475845 -1.677578
## 196 0 -1.394130 3.537231 -1.394130
## 197 0 -1.326580 4.062038 -1.326580
## 198 1 5.371727 5.371727 -1.228743
## 199 0 -1.184828 6.410444 -1.184828
## 200 0 -1.339595 3.944684 -1.339595
3.Run logistic regression of the outcome on this covariate using logit of the initial estimator \(\mathbb{E}(Y\mid A,W)\) as offset where logit(x)= log[x/(1-x)]
logitUpdate <- glm( data$Y ~ -1 +offset( qlogis(SL.predict.outcome)) +
H.AW, family='binomial')
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
epsilon <- logitUpdate$coef
epsilon
## H.AW
## 0.004130973
target.predict.outcome.A1 <- plogis( qlogis(SL.predict.outcome.A1)+
epsilon*H.1W)
target.predict.outcome.A0 <- plogis( qlogis(SL.predict.outcome.A0)+
epsilon*H.0W)
TMLE <- mean( target.predict.outcome.A1 - target.predict.outcome.A0)
TMLE
## [1] -0.02419015
The final step of the Causal Roadmap is to interpret the findings. At this stage, we evaluate whether and to what extent the underlying assumptions have been met in order to determine the strength of interpretation.
Findings support a statistical interpretation if (1) the statistical estimator has negligible bias and its variance is well estimated
Findings support a causal interpretation if 1 holds and (2) if the non testable identifiability assumptions hold.
Can be interpreted as if implemented in the real-world if 1 and 2 hold and if (3) the intervention is feasible and applicable to the real world population.
Findings can be interpreted as if we had emulated a randomized trial if 1-3 hold and the exposure could have been randomized to that population.
If there are concerns about causal assumptions (e.g. temporal odering is unclear, unmeasured confounding), the results can be interpreted as associational. In this case the estimand, \(\mathbb{E}\big[\mathbb{E}(Y\mid A=1,W)-\mathbb{E}(Y\mid A=0,W)]\) can be interpreted as;
If the authors believe causal assumptions are met, the parameter can be interpreted as the population average treatment effect \(\mathbb{E}\big[Y_1-Y_0\big]\).
Use TMLE with Super Learner as part of a toolbox. Recall, fancy estimation tools cannot replace careful thinking throughout the rest of the Roadmap.
Remember to formally derive adjustment sets and the statistical parameter.
Doubly robust estimators (e.g TMLE or A-IPW) can incorporate machine learning while maintaining basis for valid statistical inference. This helps us avoid errors of “statistical model neglect”, occurring when relying on unsubstantiated (parametric) assumptions during estimation. However, not without conditions.
Practical positivity violations can happen. This can result from poor support for exposures of interest and can lead to bias and/or underestimates of variance. Some solutions to this include;
Run a simulation study mimicking key patterns of the observed data for example sample size, confounding structure, missing data mechanisms, practical violations, sparsity of the exposure and/or outcome, dependence structure etc and use results to guide analyses.
You have also survived a high speed tour through the Roadmap, and hopefully can appreciate some of its strengths.
If you are interested in learning about more advanced settings, here are some links to other resources.