Key Definitions

Visualize Some DAGs

library(ggdag)
library(tidyverse)
theme_set(theme_dag())

# Randomized Experiment (vii)
dagify(Y ~ D) %>% ggdag()

# Three variable DAG(acyclic graph)
dagify(y ~ x, x ~ a, a ~ y) %>% ggdag()

ggdag is more specifically concerned with structural causal models (SCMs): DAGs that portray causal assumptions about a set of variables. Beyond being useful conceptions of the problem we’re working on (which they are), this also allows us to lean on the well-developed links between graphical causal paths and statistical associations. Causal DAGs are mathematically grounded, but they are also consistent and easy to understand. Thus, when we’re assessing the causal effect between an exposure and an outcome, drawing our assumptions in the form of a DAG can help us pick the right model without having to know much about the math behind it. Another way to think about DAGs is as non-parametric structural equation models (SEM): we are explicitly laying out paths between variables, but in the case of a DAG, it doesn’t matter what form the relationship between two variables takes, only its direction. The rules underpinning DAGs are consistent whether the relationship is a simple, linear one, or a more complicated function.

Healthcare Example

Let’s say we’re looking at the relationship between smoking and cardiac arrest. We might assume that smoking causes changes in cholesterol, which causes cardiac arrest:

smoking_ca_dag <- dagify(cardiacarrest ~ cholesterol,
       cholesterol ~ smoking + weight,
       smoking ~ unhealthy,
       weight ~ unhealthy,
       labels = c("cardiacarrest" = "Cardiac\n Arrest", 
                  "smoking" = "Smoking",
                  "cholesterol" = "Cholesterol",
                  "unhealthy" = "Unhealthy\n Lifestyle",
                  "weight" = "Weight"),
       latent = "unhealthy",
       exposure = "smoking",
       outcome = "cardiacarrest")

ggdag(smoking_ca_dag, text = FALSE, use_labels = "label")

Modified Causal Graph

Consider the graph \(G^-\) where the causal link between the treatment \(D\) and \(Y\) is deleted. This graph must add an edge from \(D\) to the children of \(Y\) to be a “proper conditioning set”

#Initial CDAG
dagify(
  xd ~ xy,
  d ~ xd,
  y ~ d,
  y ~ xy,
  b ~ y, 
  labels = c(
    "d" = "Treatment",
    "y" = "Respnse",
    "b" = "Post\n Treatment"
  )
) %>% 
  ggdag(use_labels = "label")

Demo script

Draw Samples

# Set sample size
n <- 1000

# Treatment effect
tau <- 3

# Random draws for binomial distribution
xy <- rbinom(n, 1, 0.5)
xd <- rbinom(n, 1, 0.5 * xy + 0.25)

# Treatment
D = rbinom(n, 1, 0.5 * xd + 0.25)

# Model
y = tau * D - xy + rnorm(n)
b = rbinom(n, 1, 0.9 - 0.7 * (y < 0))
# Naive estimate of tau
tau.naive = mean(y[D == 1]) - mean(y[D == 0])
print(tau.naive)
## [1] 2.818068
#Conditional Average Treatment effects (CATE)
tau.xy = mean(xy==1)*(mean(y[D==1 & xy==1]) - mean(y[D==0 & xy==1])) + mean(xy==0)*(mean(y[D==1 & xy==0]) - mean(y[D==0 & xy==0]))
print(tau.xy)
## [1] 3.052023