Introduction

This tutorial provides a gentle introduction to the Causal Roadmap and its applications in pharmaco-epidemiologic research. It is designed for a broad audience, including learners from both academia and industry. We systematically walk through each step of the Causal Roadmap—from explicitly formulating a research question, to translating it into a formal causal estimand, to identifying and estimating that estimand from observed data, and finally to drawing valid inferences and interpreting results. Each step is illustrated using a working example from a pharmaco-epidemiology setting, accompanied by interactive, built-in code to facilitate hands-on learning. The structure and content of this tutorial follow, in an analytical way, the Introduction to Causal Inference and the causal Roadmap course (htp://www.ucbbiostat.com/) Petersen and Balzer.

Why venture down a new path?

Adopting the Causal Roadmap in our approach to research in causal inference enables us to clearly state a scientific question and select an analtyic approach that matches the question being asked while ensuring systematic assessment of our ability/feasibility to answer this question from the data we observe (identifiability). Head to head analysis method comparison lets us select the best approach.

We will now formally introduce the Causal Roadmap but before let us go over some notation!

Notation

Motivation

The Causal Roadmap

The Causal Roadmap is a framework that provides a systematic process to move from a research question to estimation and interpretation which guides investigators on how to design and analyse their studies a priori. This framework has the following steps;

We shall now delve into each of these steps in details!

Step 0: State the question

Target Trial Emulation

The hypothetical experiment defined in Step 0 can be viewed as a target trial.

Observational studies aim to emulate this trial by aligning eligibility criteria, treatment assignment, follow-up, outcome definitions, and estimands.

The Causal Roadmap provides the formal structure for conducting such emulations transparently.

Step 1: Define the causal model

Step 2: Define the causal parameter of interest

Estimand Specification (ICH E9[R1] Framework)

An estimand precisely defines the treatment effect of interest by specifying all attributes of the causal question.

Attribute Specification
Population Eligible individuals meeting study inclusion criteria
Treatment Strategies Intervention A versus comparator B
Endpoint Binary or time-to-event outcome within a fixed horizon
Intercurrent Events Addressed via treatment-policy or hypothetical strategy
Summary Measure Risk difference, risk ratio, or mean difference

Explicit estimand specification ensures alignment between the scientific question, identification assumptions, and estimation strategy.

Treatment-Policy versus Hypothetical Estimands

A treatment-policy estimand contrasts outcomes under initial treatment assignment regardless of subsequent treatment changes (i.e., intention-to-treat estimates as presented in this workshop).

A hypothetical estimand contrasts outcomes under a counterfactual world in which intercurrent events (e.g., switching or discontinuation) do not occur.

The choice between these estimands reflects different scientific questions and determines how intercurrent events are handled during analysis.

Intercurrent Events

Intercurrent events are post-treatment events that affect the interpretation or existence of the outcome, such as treatment switching, discontinuation, or death.

Handling of intercurrent events must be specified at the estimand stage, not deferred to estimation. Common strategies include:

  • Treatment-policy: ignore the intercurrent event
  • Hypothetical: censor or reweight to eliminate its occurrence
  • Composite: redefine the outcome to include the event

This choice determines the causal question being answered.

Time-to-Event Outcomes and Risk-Based Estimands

In many applications, outcomes occur over time and are subject to censoring.

Rather than targeting hazard ratios, the Causal Roadmap naturally accommodates risk-based estimands, such as cumulative incidence at a fixed time horizon.

For example, the causal risk difference at 90 days compares the probability of experiencing the event by day 90 under each treatment strategy.

Step 4: Assess Identifiablity

Assessing Assumptions in Real-World Data The plausibility of identification assumptions depends on the data source. For example, claims data may offer rich information on diagnoses and procedures but limited clinical detail.Explicitly discussing data limitations strengthens interpretation and guides sensitivity analyses.

Having established the causal estimand, the observed-data structure, and the assumptions under which the estimand is identified, we now turn to estimation. At this stage of the Causal Roadmap, the scientific question has been translated into a well-defined statistical target, and the remaining tasks concern how best to estimate this target from finite data, assess precision, and diagnose potential threats such as model misspecification or practical violations of assumptions. The choice of estimator should therefore be guided by the estimand and identification results, rather than driving them.

Step 5: Choose and apply the estimator

\[ \text{logit}\big( \mathbb{E}(Y \mid A, W) \big) = \beta_0 + \beta_1 A + \beta_2 W_1 + \cdots + \beta_{19} W_{18}. \]

Simple substitution estimator (g-computation)

Why start with g-computation? G-computation (outcome modeling and standardization) is often the most intuitive entry point for causal estimation because it does not interpret a regression coefficient as a causal effect. Instead, it implements the identification formula by estimating the conditional mean outcome \(E[Y \mid A, W]\), predicting counterfactual outcomes under \(A=1\) and \(A=0\) for each individual, and then standardizing (averaging) those predictions over the empirical distribution of \(W\). The resulting contrast is a marginal (population-level) estimand, such as a risk difference.

  • To get some intuition behind this estimator, let us think of causal inference as a problem of missing information where we know the outcome under the observed exposure but are missing the outcome under the other exposure condition.
  • We therefore use parametric regression to estimate outcomes for all under both exposed and unexposed conditions after controlling for the measured confounders.
  • We then average and compare predicted outcomes. The algorithm is as follows. First we shall load in our simulated dataset “CausalWorkshop.csv” and set the random seed.

Algorithm (standardization / g-computation):

  1. Fit an outcome model \(Q(A,W) = E[Y \mid A, W]\).
  2. Create two counterfactual datasets by setting \(A=1\) for everyone and \(A=0\) for everyone.
  3. Predict outcomes under each intervention and average predictions over the observed \(W\) distribution: \[ E[Y^a] = E_W\{E[Y \mid A=a, W]\}. \]
library(readr)
data <-  read_csv("CausalWorkshop.csv")
## Rows: 200 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (6): W1, W2, W3, W4, A, Y
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 6
##      W1    W2     W3     W4     A      Y
##   <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>
## 1     1 0.704  0.312  0.441     0 0.0467
## 2     0 0.696  0.438  1.38      0 0.0771
## 3     0 0.394 -0.538 -1.44      0 0.177 
## 4     0 0.642  0.767  1.25      0 0.0330
## 5     1 0.426 -0.203 -0.146     0 0.105 
## 6     1 0.458  1.60   0.709     0 0.0228
tail(data)
## # A tibble: 6 × 6
##      W1     W2     W3      W4     A       Y
##   <dbl>  <dbl>  <dbl>   <dbl> <dbl>   <dbl>
## 1     0 0.856   1.17   2.10       1 0.00278
## 2     0 0.434   0.545 -0.238      0 0.0469 
## 3     1 0.243  -1.80  -0.769      0 0.0675 
## 4     0 0.629  -0.229  0.0750     1 0.109  
## 5     0 0.0913 -0.100  0.761      0 0.117  
## 6     1 0.543  -1.48  -0.794      0 0.0531
dim(data)
## [1] 200   6
set.seed(1)
  1. We the estimate the mean outcome Y as a function of exposure (treatment) A and measured confounders W. In this example we run a main terms logistic regression.
outcome.regression <- glm(Y ~ A + W1+W2+W3+W4, family='binomial', data=data)
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

Key point: Regression coefficients (for example, an odds ratio from a logistic model) are typically conditional measures given \(W\). The Roadmap target in this workshop is a marginal contrast (for example, a risk difference). Standardization (g-computation) converts an outcome regression into a marginal estimand by averaging predicted outcomes over the covariate distribution.

  1. We use estimates from 1 above to predict outcomes for each unit while “setting” the exposure to different values e.g. A=1 and A=0
data.A1 <- data.A0 <- data
data.A1$A <- 1
data.A0$A <- 0
colMeans(data.A1)
##          W1          W2          W3          W4           A           Y 
##  0.52000000  0.51459003  0.02187934 -0.01958053  1.00000000  0.06244803
colMeans(data.A0)
##          W1          W2          W3          W4           A           Y 
##  0.52000000  0.51459003  0.02187934 -0.01958053  0.00000000  0.06244803
predict.outcome.A1 <- predict( outcome.regression, newdata=data.A1, 
                               type='response')
predict.outcome.A0 <- predict(outcome.regression, newdata=data.A0, 
                              type='response')
  1. Average predictions to estimate the marginal risks in the population under exposure and no exposure. To compare estimates, take the difference in means.
mean(predict.outcome.A1)
## [1] 0.03877997
mean(predict.outcome.A0)
## [1] 0.07258282
Simple.Subs <- mean(predict.outcome.A1 - predict.outcome.A0)
Simple.Subs
## [1] -0.03380285

When can g-computation fail?

  • Outcome model misspecification: If \(E[Y \mid A, W]\) is misspecified (wrong functional form, missing interactions), the standardized estimate can be biased.
  • Practical positivity violations: If some covariate strata rarely receive one treatment, g-computation must extrapolate to regions with little or no support.
  • Unmeasured confounding: No outcome-modeling approach can correct for confounders that are not measured.

These failure modes motivate (i) overlap diagnostics, (ii) flexible nuisance estimation (for example, SuperLearner), and (iii) doubly robust estimators such as TMLE.

Preview: One way to reduce reliance on parametric assumptions is to estimate \(Q(A,W)\) with flexible learners (for example, generalized additive models, random forests, boosting) or an ensemble via SuperLearner. In the next sections, we use SuperLearner first for nuisance estimation and then integrate it with TMLE, which targets the causal estimand while retaining a basis for statistical inference.

IPTW- Inverse Probability of Treatment Weighting estimator

Intuition (pseudo-population): IPTW treats confounding as a form of biased sampling. Units who received a treatment that was unlikely given their covariates receive larger weights, and units who received an expected treatment receive smaller weights. In the resulting weighted pseudo-population, treatment is approximately independent of measured covariates (if the propensity score model is correct), mimicking a randomized experiment.

  • The intuition behind this estimation approach is to think of confounding as a problem of biased sampling, where certain exposure–covariate subgroups are overrepresented relative to what we would observe in a randomized trial, while others are underrepresented.
  • We apply weights to up-weight under-represented units and down-weight over-represented units
  • We then average and compare weighted outcomes. The algorithm is as follows;
  1. Estimate the probability of being exposed/treated A as a function of measured confounders W:\(\mathbb{P}(A=1\mid W)\). This is often referred to as the propensity score. We can estimate the propensity score by running a main terms logistic regression as illustrated below
pscore.regression <- glm(A~ W1+W2+W3+W4, family='binomial',
                         data=data)
summary(pscore.regression)
## 
## Call:
## glm(formula = A ~ W1 + W2 + W3 + W4, family = "binomial", data = data)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.3445     0.3821  -3.519 0.000433 ***
## W1            0.4637     0.3130   1.482 0.138399    
## W2            0.5113     0.5629   0.908 0.363744    
## W3            0.4483     0.3208   1.397 0.162360    
## W4           -0.2676     0.3095  -0.865 0.387227    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 247.64  on 199  degrees of freedom
## Residual deviance: 241.99  on 195  degrees of freedom
## AIC: 251.99
## 
## Number of Fisher Scoring iterations: 4

Identification link: Under consistency, exchangeability given \(W\), and positivity, we can write \[ E\{Y(1)\} = E\left[\frac{\mathbb{I}(A=1)Y}{e(W)}\right], \qquad E\{Y(0)\} = E\left[\frac{\mathbb{I}(A=0)Y}{1-e(W)}\right], \] where \(e(W)=P(A=1\mid W)\). IPTW replaces the expectation with a sample average and replaces \(e(W)\) with an estimate.

  1. We then use estimates from 1 above to calculate exposed/treated weights: 1/\(\mathbb{P}(A=1\mid W)\) and unexposed/untreated weights:1/\(\mathbb{P}(A=0\mid W)\)
predict.prob.A1 <- predict(pscore.regression, type='response')
summary(predict.prob.A1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1555  0.2521  0.3034  0.3100  0.3658  0.5145
predict.prob.A0 <- 1 - predict.prob.A1
wt1 <- as.numeric( data$A==1)/predict.prob.A1
wt0 <- as.numeric( data$A==0)/predict.prob.A0
head(data.frame(cbind(A=data$A, 1/predict.prob.A1, wt1, wt0)))
##   A       V2 wt1      wt0
## 1 0 2.646982   0 1.607171
## 2 0 4.197342   0 1.312760
## 3 0 3.718918   0 1.367793
## 4 0 3.735871   0 1.365514
## 5 0 3.044577   0 1.489099
## 6 0 2.128806   0 1.885892
  1. We then apply the weights and average the weighted outcomes to estimate the marginal risks in the population under A=1 and A=0. To compare estimates, we take the difference in weighted means.
mean(wt1*data$Y)
## [1] 0.03973564
mean(wt0*data$Y)
## [1] 0.07234258
IPW <- mean(wt1*data$Y) - mean(wt0*data$Y)
IPW
## [1] -0.03260694
mean( (wt1-wt0)*data$Y)
## [1] -0.03260694

Stabilized weights (optional but common in practice)

Unstabilized weights can be variable in finite samples. A common alternative is to use stabilized weights, which multiply by the marginal probability of observed treatment: \[ SW_i = \begin{cases} \frac{P(A=1)}{e(W_i)}, & A_i=1 \\ \frac{P(A=0)}{1-e(W_i)}, & A_i=0. \end{cases} \] Stabilized weights often reduce variance and improve numerical stability without changing the large-sample target under correct specification.

pA <- mean(data$A)
sw1 <- as.numeric(data$A==1) * pA / predict.prob.A1
sw0 <- as.numeric(data$A==0) * (1-pA) / (1 - predict.prob.A1)
sw  <- sw1 + sw0
summary(sw)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6025  0.8991  0.9700  1.0013  1.0937  1.9930

Implementation note: IPTW can be implemented either by directly computing weighted means of \(Y\) within treatment groups (as shown here) or by fitting a weighted regression of \(Y\) on \(A\) using robust standard errors. These approaches target the same marginal contrast when implemented consistently.

Diagnostics and practical tips (IPTW)

  • Overlap: Inspect the distribution of \(e(W)\) by treatment group. Limited overlap indicates practical positivity problems and increases variance.
  • Weight tails: Summarize weights (mean, sd, extreme quantiles). A heavy right tail indicates instability and sensitivity to a small number of observations.
  • Truncation: Truncating weights (for example, at the 1st and 99th percentiles) can reduce variance at the cost of potential bias. In finite samples, truncation can improve mean squared error, but it should be reported transparently as a sensitivity analysis.
w <- wt1 + wt0
lo <- quantile(w, 0.01)
hi <- quantile(w, 0.99)
w_trunc <- pmin(pmax(w, lo), hi)
summary(w_trunc)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.212   1.359   1.525   1.993   2.455   4.885

TMLE- targeted maximum likelihood estimation

###$ Why doubly robust estimators?

Both g-computation and IPTW rely on strong modeling assumptions: g-computation requires a correct outcome model, while IPTW requires a correct treatment model. Doubly robust estimators combine both approaches and are consistent if either the outcome model or the treatment model is correctly specified. This property is particularly valuable in real-world data settings, where all models are approximations.

  • For intuition here, we again think of causal inference as a problem of missing information.
  • We predict the outcomes for all units under both exposed and unexposed conditions. We use a flexible estimation approach e.g. Super Learner to avoid assuming regression models that are not known or we use parametric knowledge if known.
  • We incorporate information on the covariate-exposure relation to improve the initial estimator. Why TMLE?
    • Again here we use a flexible estimation approach or parameteric knowledge if available.
    • We have a second chance to control for confounding hence this method is doubly robust
    • We are able to hone our estimator towards the parameter of interest.
    • This estimator is asymptotically linear and therefore we can obtain normal curve inference
  • Finally, we average and compare the targeted predictions under exposure and no exposure.

What is Super Learner?

  • This is a supervised machine learning algorithm that offers a flexible and data daptive approach to learn complex relationships from data.
  • This algorithm uses cross-validation(sample splitting) to evaluate the performance of a library of candidate estimators.
    • The library should be diverse including simple (e.g expert informed parametric regressions) and more adaptive algorithms (e.g penalized regressions, stepwise regression, adaptive splines)
    • Performance is measured by a loss function e.g squared prediction error
  • Cross-validation allows us to compare algorithms based on how they perform on independent data. Here we partition data in “folds”, fit each algorithm on the training set and evaluate its performance (called “risk”) on the validation set. We rotate through the folds and average the cross-validated risk estimates across the folds to obtain one measure of performance for each algorithm.
  • We could choose the algorithm with the best performance (e.g the lowest cross-validated MSE)
  • Instead, Super Learner builds the best combination of algorithm- specific predictions. We now illustrate below how to fit a Super Learner.
library('SuperLearner')
SL.library <- c('SL.glm', 'SL.step.interaction', 'SL.gam'
)
SL.outcome.regression <- suppressWarnings(SuperLearner(Y=data$Y, 
                                      X=subset(data, select=-Y),
                                      SL.library=SL.library, 
                                      family='binomial'))
SL.outcome.regression
## 
## Call:  
## SuperLearner(Y = data$Y, X = subset(data, select = -Y), family = "binomial",  
##     SL.library = SL.library) 
## 
## 
##                                Risk      Coef
## SL.glm_All              0.002745489 0.0000000
## SL.step.interaction_All 0.001407506 0.1134315
## SL.gam_All              0.001164333 0.8865685
SL.predict.outcome <- predict(SL.outcome.regression, 
                              newdata=subset(data, select=-Y))$pred
head(SL.predict.outcome)
##            [,1]
## [1,] 0.06057923
## [2,] 0.03497205
## [3,] 0.09365561
## [4,] 0.02889946
## [5,] 0.08963042
## [6,] 0.01211903

Why do I need to target?

  • We could use Super Learner to predict the outcomes for each unit while “setting” the exposure to different levels and then average and contrast the predictions.
SL.predict.outcome.A1 <- predict(SL.outcome.regression, 
                                 newdata=subset(data.A1, select=-Y))$pred
head(SL.predict.outcome.A1)
##            [,1]
## [1,] 0.03789361
## [2,] 0.02165598
## [3,] 0.05940978
## [4,] 0.01781923
## [5,] 0.05674338
## [6,] 0.00741711
SL.predict.outcome.A0 <- predict(SL.outcome.regression, newdata=subset(data.A0, select=-Y))$pred

# simple subst estimator
mean(SL.predict.outcome.A1) - mean(SL.predict.outcome.A0)
## [1] -0.02523562
  • But Super Learner is focused on \(\mathbb{E}(Y\mid A,W)\) and not our parameter of interest. It makes the wrong bias-variance trade-off and specifically incurs too much bias.
  • There is also no reliable way to obtain statistical inference (i.e create 95% confidence intervals)

What is targeting?

  • Targeting involves using information in the estimated propensity score \(\mathbb{P}(A=1\mid W)\) to update the initial (Super Learner) estimator of \(\mathbb{E}(Y\mid A,W)\).
  • It involves running a univariate regression of the outcome Y on a clever covariate with offset the initial estimator. Why “clever”? It ensures that the targeting step moves the initial estimator in a direction that removes bias.
  • We then use the estimated coefficient to update our initial predictions of the outcome under the exposure and no exposure.

How do i target? (One approach)

  1. Use Super Learner to estimate the propensity score \(\mathbb{P}(A=1\mid W)\)
SL.pscore <- SuperLearner(Y=data$A, X=subset(data, select=-c(A,Y)),
                          SL.library=SL.library, family='binomial')
SL.pscore
## 
## Call:  
## SuperLearner(Y = data$A, X = subset(data, select = -c(A, Y)), family = "binomial",  
##     SL.library = SL.library) 
## 
## 
##                              Risk      Coef
## SL.glm_All              0.2193296 0.0000000
## SL.step.interaction_All 0.2073331 0.2987384
## SL.gam_All              0.2048886 0.7012616
SL.predict.prob.A1 <- SL.pscore$SL.predict
summary(SL.predict.prob.A1 - predict(SL.pscore, newdata=subset(data,select=-c(A,Y)))$pred)
##        V1   
##  Min.   :0  
##  1st Qu.:0  
##  Median :0  
##  Mean   :0  
##  3rd Qu.:0  
##  Max.   :0
summary(SL.predict.prob.A1)
##        V1        
##  Min.   :0.1560  
##  1st Qu.:0.2378  
##  Median :0.2748  
##  Mean   :0.3100  
##  3rd Qu.:0.3501  
##  Max.   :0.9270
SL.predict.prob.A0 <- 1 - SL.predict.prob.A1

Note: As you run this code you might encounter a warning “non-integer #successes in a binomial glm!”. This simply means that our outcome Y is not binary much as it’s bounded between 0 and 1. This is okay and can be ignored.

  1. Calculate the “clever covariate”

\[H(A,W)= \frac{\mathbb{I}(A=1)}{\mathbb{P}(A=1\mid W)}- \frac{\mathbb{I}(A=0)}{\mathbb{P}(A=0\mid W)}\]

Here’s code to evaluate the “clever covariate”

H.AW <- (data$A==1)/SL.predict.prob.A1 - (data$A==0)/SL.predict.prob.A0
summary(H.AW)
##        V1           
##  Min.   :-2.187049  
##  1st Qu.:-1.407167  
##  Median :-1.301617  
##  Mean   : 0.006881  
##  3rd Qu.: 1.988445  
##  Max.   : 5.879351
H.1W <- 1/SL.predict.prob.A1
H.0W <- -1/SL.predict.prob.A0
tail(data.frame(A=data$A, H.AW, H.1W, H.0W))
##     A      H.AW     H.1W      H.0W
## 195 1  2.475845 2.475845 -1.677578
## 196 0 -1.394130 3.537231 -1.394130
## 197 0 -1.326580 4.062038 -1.326580
## 198 1  5.371727 5.371727 -1.228743
## 199 0 -1.184828 6.410444 -1.184828
## 200 0 -1.339595 3.944684 -1.339595

3.Run logistic regression of the outcome on this covariate using logit of the initial estimator \(\mathbb{E}(Y\mid A,W)\) as offset where logit(x)= log[x/(1-x)]

logitUpdate <- glm( data$Y ~ -1 +offset( qlogis(SL.predict.outcome)) +
                      H.AW, family='binomial')
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
epsilon <- logitUpdate$coef
epsilon
##        H.AW 
## 0.004130973
  1. Plug in the estimated coefficient \(\epsilon\) to yield our targeted estimator \(\mathbb{E^*}(Y\mid A,W)\) and use the targeted estimator \(\mathbb{E^*}(Y\mid A,W)\) to predict outcomes for all under A=1 and A=0
target.predict.outcome.A1 <- plogis( qlogis(SL.predict.outcome.A1)+
                                       epsilon*H.1W)
target.predict.outcome.A0 <- plogis( qlogis(SL.predict.outcome.A0)+
                                      epsilon*H.0W)
  1. Average the predictions to estimate the marginal risks in the population under exposure and no exposure. Compare the estimates by taking the difference.
TMLE <- mean( target.predict.outcome.A1 - target.predict.outcome.A0)
TMLE
## [1] -0.02419015

Estimator Properties

  • We have discussed three estimators and gone through their implementation. We shall now go over the properties and each of them and points of consideration.
  • Simple substitution estimator
    • Relies on consistently estimating the mean outcome \(\mathbb{E^*}(Y\mid A,W)\). Sometimes we have a lot of knowledge about how the relationship between the outcome Y and the exposure-covariates (A,W) but other times, our knowldege is limited and assuming a parametric regression model can result in bias and misleading inferences.
  • IPTW
    • Relies on consistently estimating the propensity score \(\mathbb{P}(A=1\mid W)\). While sometimes we have a lot of knowledge about how the exposure was assigned, other times our knowledge is limited and assuming a parametric regression model can result in bias and misleading inference.
    • This estimator is unstable under positivity violations. When covariate groups only have a few exposed or unexposed observations, weights can blow up!!. When there are covariate groups with 0 exposed or unexposed observations, weights will not blow up but the estimator will likely be biased and varaince will be underestimated.
  • TMLE
    • This estimator is doubly robust i.e. yields a consistent estimate if either the conditional mean \(\mathbb{E^*}(Y\mid A,W)\) or the propensity score \(\mathbb{P}(A=1\mid W)\) is consistently estimated. We get two chances to get it right !!
    • It is also semi-parametric efficient which means it achieves the lowest asymptotic variance (most precision) among a large class of estimators if both the conditional mean and propensity score are consistently estimated at reasonable rates.
    • This estimator has formal theory to support valid statistical inference under mild conditions even when using machine learning.
    • Being a substitution estimator (plug-in), it is robust under positivity violations,strong confounding and rare outcomes.
    • There is readily available software to implement this estimator e.g. ltmle package, lmptp package in R among others
  • We have now come to the end of our estimation exercise assuming we have selected an estimating approach and estimated our parameter of interest. We now move back to the general Roadmap.

Step 6: Statistical Uncertainty

Step 7: Interpret findings

Summary and Discussion

Caution: Use your tools well.