1. Background

My goal with this document is to:

  1. Illustrate how to use inverse-probability-of-treatment weights to estimate the effect of a time-varying treatment regimen.

  2. Along the way, illustrate how to use dplyr to simulate data in R.

At this writing, the document does not demonstrate the parametric regression version of IPTW for time-varying treatments – marginal structural models. As a note on terminology, by my reading of Chapter 23 of the Fitzmaurice text on Longitudinal Analysis and Chapters 12 and 21 of the Hernan & Robins What If?, the term marginal structural model tends to be reserved for a parametric approach to estimation of inverse-probability-of-treatment weights (full citations below). Marginal structural models are so-called because they model the marginal mean of the counterfactual values. Models for counterfactuls are often referred to as structural (p. 565).

A regression model may be preferable if the number of time-covariate combinations is high because it smooths over sparse data. An advantage of not using regression is that no regression means no regression assumptions. The non-parametric method allows the focus to be on the weighting concept, avoiding the additional technical modelling detail.

Whether estimated parametrically or not, the general idea for the weights is the same. At each time point, the weight is a running product of the inverse probability of exposure given exposure and covariate history at that time multiplied by the corresponding quantity for the time before, and by that for the time before, etc.

As an aside, a question: Because of independencies assumed by DAGs, at each time point, I think only the immediate parents of exposure need to be included in the conditional probability. Do you agree? For example, consider the below DAG:

A–>B–>C–>E

E and B are independent conditional on C, so P(E|C) = P(E|C,B,A).

Reference: Product Decomposition Rule in 29 of Pearl J, Glymour M, Jewell N. Causal Inference in Statistics: A Primer. Wiley; 2016.

Another consideration is the stability of the weights. If the probability of exposure in a given stratum is very small, then one divided by a very small number can get very large. This can make the estimate “unstable.” Stabilized weights have thus been proposed. The formula for the stabilized weights uses the same denominator but uses a new numerator: the running product of exposure given prior exposure, not considering prior covariates.

The demo implements both stabilized and un-stabilized weights.

References

Robins JM, Hernan MA. Estimation of the causal effects of time-varying exposures. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G, eds. Longitudinal Data Analysis. Chapman & Hall/CRC; 2009:156. https://cdn1.sph.harvard.edu/wp-content/uploads/sites/343/2013/03/abc.pdf.

Hernán MA, Robins JM. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC; 2020. https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2020/02/ci_hernanrobins_21feb20.pdf

2. Load packages

Use install.packages(“package_name”) before loading them.

#Load packages. These should already be installed on your computer using install.packages("package_name")
library(tidyverse) #See http://r4ds.had.co.nz/ for background.
library(ggdag) #For drawing a DAG

3. Draw the DAG to be emulated

The goal with the simulation is to generate a dataset consistent with the below DAG. Recall, a DAG illustrating time-varying confounding should have at least one variable (here, L1) that is both an intermediate on a path between the exposure (A0) and the outcome and a confounder on another path between the exposure (at a different time, A1) and the outcome.

The goal is to assess effect of A1=A0=1 vs A1=A0=0 on D1.

The below plot uses ggdag, which creates DAGs using ggplot-like syntax. See the intro vignette.

dag <- ggdag::dagify(
  A0 ~ L0,
  L1 ~  A0, 
  A1 ~  L1 + A0,
  D1 ~ L0 + L1 + A0 + A1 ,                   
  exposure = c("A0", "A1"),
  outcome = "D1"
                ) 
ggdag_classic(dag, layout = "circle") + theme_dag_blank()

Side-note: this DAG is not exactly consistent with the DAG implied by the data from Dana’s example SAS and R code. I removed a couple of the edges.

L1 is a confounder on a path between A0 and D1 and an intermediate on a path between A0 and D1. It’s tough to see, though. A powerpoint drawing is easier to read:

The target effect is that depicted by a DAG in which nothing causes the exposure at either time point (A0 or A1), like this:

As an aside, Robins and Hernan refer to this as a non-dynamic regimen p. 561 because nothing affects subsequent exposure. It is experimentally set. Alternatively, it could be of interest for the treatment plan to depend on covariates or exposure at a prior time. Treatment could still be set by the investigator but the probability might change depending on conditions to that point. Robins and Hernan refer to this as a dynamic regimen p. 566. These terms are also described at length in the latter portion of the new What if? book.

4. The simulation

4.1. Create the dataset corresponding to the first DAG.

sample_size = 100000
id_column = tibble(id =1:sample_size) #This creates a dataset of 100,000 observations and 1 variable (id).

#Set the "seed" to make the simulated data the same every time it is run.
set.seed(2020)
data = id_column %>%
  #Randomly generate variables.
  mutate(
    #They are all Bernoulli (1,0), so use rbinom()
    
    #L0 has no parents, so define it based on a probability
    L0 = rbinom(
      n=n(), #the number of observations in the dataset
      size = 1, #the number of trials per observation. We are making a 1,0 variable, so this is 1
      prob = .5),
    

    #A0 depends on L0 but nothing else
    #Again use the structure of the rbinom() function but this time
    #set the probability using the logistic function
    A0 = rbinom(
      n=n(),
      size=1,
      prob = plogis(-1+.5*L0) #plogis is the expit function. see ?plogis
    ),
    
    #Define L1 based on L0 and A0
    L1 = rbinom(
      n=n(),
      size=1,
      prob = plogis(-.1 +1*A0 )  
    ),
    
    #Define A1 based on A0
    A1 = rbinom(
      n=n(),
      size=1,
      prob = plogis(.1 +.5*A0 + .5*L1)
    ),
    
    #Define D1 based on all four of the others
    D1 = rbinom(
      n=n(),
      size = 1,
      prob = plogis(-1 + .1*(A1+A0) + 1*L1 + 1*L0)
    )
    
  )

Check the distributions.

table(data$A0)
## 
##     0     1 
## 67617 32383
table(data$L0)
## 
##     0     1 
## 50572 49428
table(data$L1)
## 
##     0     1 
## 44935 55065
table(data$D1)
## 
##     0     1 
## 46734 53266

4.2. Calculate the crude risk difference.

Crude RD = P(D=1|A1=1, A0=1) - P(D=1|A1=0, A0=0)

P_D1_g_A0_A1= data %>% #reading: probabilty of D given A0 and A1
  group_by(A0, A1) %>%
  #Recall, P(A|B)=P(A,B)/P(B) -- "joint over marginal", so this does that with frequencies.
  #Sum counts the number of times D1=1 in each stratum of A1, A0
  #The sum() won't work in general but the sum works here because D1 is 0,1.
  # n()is number of obs in each stratum of L0
  summarise(D1_perc = sum(D1)/n())

#Crude risk difference?
#filter to A1=A0=1 or A1=A0=0  and pull out the value
P_D1_g_11 = P_D1_g_A0_A1 %>% filter(A1==1 & A0==1) %>% dplyr::select(D1_perc) %>% pull()
P_D1_g_00 = P_D1_g_A0_A1 %>% filter(A1==0 & A0==0) %>% dplyr::select(D1_perc) %>% pull()
RD_crude = P_D1_g_11 - P_D1_g_00
RD_crude
## [1] 0.1498431

4.3. Calculate the weights.

At each time point, calculate the probability of exposure given prior exposure and prior covariates. Then, multiply those exposure probabilities together. The (un-stabilized) weight is the inverse of that product.

First, calculate P(A0=1|L0)

P_A0 = data %>% 
  group_by(L0) %>% 
  #Sum counts the number of times A0=1 in each stratum of L0
  #This won't work in general but the sum works here b/c it's dichotomous, coded 0/1
  # n()is number of obs in each stratum of L0
  summarise(P_A0 = sum(A0)/n()) 
P_A0
## # A tibble: 2 x 2
##      L0  P_A0
##   <int> <dbl>
## 1     0 0.266
## 2     1 0.383

Then, calculate P(A1=1|L1, A0).

Note, we calculate P(A1=1|L1, A0) rather than P(A1=1|L1, A0, L0) because L0 is not an immediate parent of A1 per the DAG (above). By DAG rules, P(A1=1|L1,A0) = P(A1=1|L1, A0, L0).

P_A1 = data %>% 
  group_by(L1, A0) %>%
  summarise(P_A1 = sum(A1)/n())
P_A1
## # A tibble: 4 x 3
## # Groups:   L1 [2]
##      L1    A0  P_A1
##   <int> <int> <dbl>
## 1     0     0 0.530
## 2     0     1 0.647
## 3     1     0 0.643
## 4     1     1 0.752

Values for the numerator part of the stabilized weights

For the stabilized weights, we need to know the probability of exposure given exposure history only. That will go in the numerator of the weight. A0 has no previous exposure, so the first part of the product is simply the marginal probability of A0: P(A0=1)

P_A0_st_wt = data %>%
  summarise(P_A0_st_wt = sum(A0)/n()) %>%
  pull() #grab the value with pull() to to add it into the bigger dataset in the step below
P_A0_st_wt
## [1] 0.32383

The second part of the product is P(A1|A0), a small dataframe of two values: P(A1=1|A0=1) and P(A1=1|A0=0).

P_A1_st_wt = data %>%  
  group_by(A0) %>% 
  summarise(P_A1_st_wt = sum(A1)/n()) 
P_A1_st_wt
## # A tibble: 2 x 2
##      A0 P_A1_st_wt
##   <int>      <dbl>
## 1     0      0.584
## 2     1      0.721

Join these probabilities back with the main dataset and calculate weights

data_joined = data %>%
  mutate(
    P_A0_st_wt = P_A0_st_wt, #Appending the marginal probability of A0 for the stabilized weights
  ) %>%
  left_join(P_A0, by = "L0") %>% #this adds the conditional probability of A0 given the value of L0
  left_join(P_A1, by = c("L1", "A0")) %>% #add probability of A1 conditional on L1, A0
  left_join(P_A1_st_wt, by = c("A0")) %>%
  
  #now, calculate the probability that a subject's exposure was as observed -
  # the probability they got what they got
  #if exposed, assign the probability of exposure. 
  #if unexposed assign 1-the probability of exposure
  mutate(
    P_A0_got = case_when(
      A0==1 ~ P_A0,
      A0==0 ~ 1-P_A0),
    
    P_A1_got = case_when(
      A1==1 ~ P_A1,
      A1==0 ~ 1-P_A1),
    
    #The probability that a subject's *regimen* (A0=a0, A1=a1) is as observed.
    #to calculate multiply the two times points together
    P_A0_A1_got = P_A0_got*P_A1_got,
    
    #calculate the unstabilized weights: 1 over the probability of the regimen
    IPTW = 1/P_A0_A1_got,
    
    #work on the numerator for the stabilized weights.
    #again need to figure out the probabilty they got what they got
    P_A0_st_wt_got = case_when(
      A0==1 ~ P_A0_st_wt, #if A0 is 1, assign the marginal probability of A0
      A0==0 ~ 1-P_A0_st_wt), #if A0 is 0, assign 1 minus the marginal probability of A0
    
    P_A1_st_wt_got = case_when(
      A1==1 ~ P_A1_st_wt, #if A1 is 1, assign that subject's probability of A1
      A1==0 ~ 1-P_A1_st_wt),  #if A1 is 0, assign 1 minus that subject's probability of A1
    
    #now, here is the numerator for the stabilized weights: the probability of the exposure
    #history (not conditional on any covariates other than prior exposure)
    #up to that point for that subject
    IPTW_st_num = P_A0_st_wt_got*P_A1_st_wt_got,
    
    #for the denominator, we can we use P_A0_A1_got
    IPTW_st = IPTW_st_num/ P_A0_A1_got
  )

#Print the data to check
data_joined
## # A tibble: 100,000 x 18
##       id    L0    A0    L1    A1    D1 P_A0_st_wt  P_A0  P_A1 P_A1_st_wt
##    <int> <int> <int> <int> <int> <int>      <dbl> <dbl> <dbl>      <dbl>
##  1     1     1     0     1     0     0      0.324 0.383 0.643      0.584
##  2     2     0     0     0     1     1      0.324 0.266 0.530      0.584
##  3     3     1     0     1     1     1      0.324 0.383 0.643      0.584
##  4     4     0     0     0     0     0      0.324 0.266 0.530      0.584
##  5     5     0     0     1     1     1      0.324 0.266 0.643      0.584
##  6     6     0     0     1     1     1      0.324 0.266 0.643      0.584
##  7     7     0     1     1     1     1      0.324 0.266 0.752      0.721
##  8     8     0     0     1     1     0      0.324 0.266 0.643      0.584
##  9     9     0     0     1     1     0      0.324 0.266 0.643      0.584
## 10    10     1     1     1     0     0      0.324 0.383 0.752      0.721
## # ... with 99,990 more rows, and 8 more variables: P_A0_got <dbl>,
## #   P_A1_got <dbl>, P_A0_A1_got <dbl>, IPTW <dbl>, P_A0_st_wt_got <dbl>,
## #   P_A1_st_wt_got <dbl>, IPTW_st_num <dbl>, IPTW_st <dbl>

Examine distribution of weights

summary(data_joined$IPTW)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.120   2.572   3.445   3.999   4.039  15.107
data_joined %>%  ggplot(aes(IPTW)) + geom_histogram() 

summary(data_joined$IPTW_st)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6690  0.8366  0.9940  1.0000  1.1663  1.3648
data_joined %>%  ggplot(aes(IPTW_st)) + geom_histogram() 

4.4. Calculate the weighted risk difference, weighted by the inverse probability of the regimen.

#Now, calculate the risk difference weighted by the IPTW
IPTW_num= data_joined  %>%  #numerator of the risk calculation
  filter(D1==1) %>% #for numerator of risk calculation, limit to outcome=1.
  group_by(A1, A0) %>% #stratify by exposure status

  summarise(
    num_risk_IPTW = sum(IPTW*D1), #multiply the weights by the outcome and take sum (num=numerator)
    num_risk_IPTW_st = sum(IPTW_st*D1) #stabilized weight version
    ) 

A dataset of the weighted risks - both stabilized and unstabilized - by levels of A0, A1:

IPTW = data_joined %>%  
  group_by(A1, A0) %>% 
  summarise(
    #denominator of the risk calculations
    denom_risk_IPTW = sum(IPTW),
    denom_risk_IPTW_st = sum(IPTW_st), #stabilized version
    
    ) %>% #sum the weights in the denominator
  #bring in the numerator of the risk calculation
  left_join(IPTW_num, by = c("A0", "A1")) %>%
  #calculate the weighted risk across levels of A0 and A1
  mutate(
    risk_IPTW = num_risk_IPTW/denom_risk_IPTW,
    risk_IPTW_st = num_risk_IPTW_st/denom_risk_IPTW_st
    )

These are the weighted risks by values of A1, A0.

IPTW
## # A tibble: 4 x 8
## # Groups:   A1 [2]
##      A1    A0 denom_risk_IPTW denom_risk_IPTW~ num_risk_IPTW num_risk_IPTW_st
##   <int> <int>           <dbl>            <dbl>         <dbl>            <dbl>
## 1     0     0         100009.           28154.        49297.           13878.
## 2     0     1          99855.            9021.        57529.            5197.
## 3     1     0          99995.           39464.        51701.           20404.
## 4     1     1         100066.           23364.        59120.           13804.
## # ... with 2 more variables: risk_IPTW <dbl>, risk_IPTW_st <dbl>

So, to calculate the weighted risk difference, we just pick which risks we want to subtract . We are interested in A1=A0=1 vs A1=A0=0.

Un-stabilized weighted risks and their difference

risk_IPTW_11 = IPTW %>% filter(A1==1 & A0==1) %>% dplyr::select(risk_IPTW) %>% pull()
risk_IPTW_00 = IPTW %>% filter(A1==0 & A0==0) %>% dplyr::select(risk_IPTW) %>% pull()
RD_IPTW = risk_IPTW_11 - risk_IPTW_00
RD_IPTW
## [1] 0.09788112

Stabilized weighted risks and their difference

risk_IPTW_st_11 = IPTW %>% filter(A1==1 & A0==1) %>% dplyr::select(risk_IPTW_st) %>% pull()
risk_IPTW_st_00 = IPTW %>% filter(A1==0 & A0==0) %>% dplyr::select(risk_IPTW_st) %>% pull()
RD_IPTW_st = risk_IPTW_st_11 - risk_IPTW_st_00
RD_IPTW_st
## [1] 0.09788112

Compare crude with adjusted

RD_crude
## [1] 0.1498431
RD_IPTW_st
## [1] 0.09788112
RD_IPTW
## [1] 0.09788112

Fin.

Note, this demo does not include a presentation of variance or standard errors. Hopefully I can add it. The textbooks cited above include some discussion.