Key Definitions

Compositional form of joint densities: \(f(x_1, x_2, ..., x_p) = f(x_1)f(x_2|x_1)f(x_3|x_1, x_2)...f(x_p|x_{p-1})\)
Conditional Independence: \(f(x_1|x_2, x_3) = f(x_1|x_2)\), \(X_1\) is independent of \(X_3\) given \(X_2\)
A graph \(G = (V, E)\) consists of two sets \(V\) and \(E\). The elements of \(V\) are called the vertices and the elements of \(E\) the edges of \(G\). Each edge is a pair of vertices. For instance, the sets \(V = {1, 2, 3, 4, 5}\) and \(E = {{1, 2}, {2, 3}, {3, 4}, {4, 5}}\) define a graph with 5 vertices and 4 edges.
Adjancency: If \(uv \in E(G)\), then \(u, v\) are said to be adjacent, in which case we also say that \(u\) is connected to \(v\) or \(u\) is a neighbour of \(v\). If \(uv \notin E(G)\), then \(u\) and \(v\) are nonadjacent (not connected, non-neighbours).
Neighborhood: The neighbourhood of a vertex \(v \in V (G)\), denoted \(N(v)\), is the set of vertices adjacent to \(v\), i.e. \(N(v) = {u \in V (G) | vu \in E(G)}\). The closed neighbourhood of \(v\) is denoted and defined as follows: \(N[v] = N(v) \cup {v}\).
Diameter: The diameter of a connected graph, denoted \(diam(G)\), is max \(a,b \in V (G) dist(a, b)\).
Directed graphs: Draw a node for each variable and a line from \(X_j\) going into \(X_i\) if \(X_j\) appears in the conditional distribution of \(X_i\)
Joint distribution: \(f(x_1, x_2, ..., x_p) = \Pi_{j = 1}^{p}f(x_j | parents(x_j))\)
Markov property: \(X_j\) is independent of all non descendants of \((x_j|parents(x_j))\)
Causal Graphical Models (assumptions):
- There is a directed acyclic graph \(G\) representing the relationship of causal variables
- Satifies the Markov Property(MP): joint distribution of variables obey the Markov Propery on \(G\)
- Faithfulness: joint distribution has all the conditional independence relations implied by the MP
Colliders: A collider is a node/variable \(Z\) in a DAG that sits on an undirected path between two other variables \(X_j\) and \(X_i\), where both the paths have arrows into \(Z\)
Conditioning Set: Check if conditioning on a set of nodes/variables, \(S\), renders two variables \(D\) and \(Y\) conditionally independent
1. Verify all undirected paths between \(D\) and \(Y\)
2. Consider each variable on the path to verify at least one is blocked
“d-seperated”: If every path contains at least one blocked variable, then \(D\) and \(Y\) are conditionally independent given \(S\) and we say that they are “d-separated” by \(S\) (or that \(S\) “d-separates” them). The \(d\) refers to “directed”
Regression adjustment: Delete any causal paths from \(D\) to \(Y\), and with the new graph \(G^-\) determine if \(Y\) and \(D\) are independent
Average Causal Effects: Conditioning on a set \(S\) that d-separates \(Y\) and \(D\) in the \(G^-\) graph allows us to interpret \(f(y | d, s)\) causally (in terms of \(D\) effect on \(Y\) ), from which, for example, we can estimate the conditional average treatment effect (CATE)
- \(\tau_s = E(Y|D = 1, S = s) - E(Y|D = 0, S = s)\)

Visualize Some DAGs

A DAG displays assumptions about the relationship between variables (often called nodes in the context of graphs). The assumptions we make take the form of lines (or edges) going from one node to another. These edges are directed, which means to say that they have a single arrowhead indicating their effect. Here’s a simple DAG where we assume that \(x\) affects \(y\):

library(ggdag)
library(tidyverse)
theme_set(theme_dag())

# Randomized Experiment (vii)
dagify(Y ~ D) %>% ggdag()

# Three variable DAG(acyclic graph)
dagify(y ~ x, x ~ a, a ~ y) %>% ggdag()

ggdag is more specifically concerned with structural causal models (SCMs): DAGs that portray causal assumptions about a set of variables. Beyond being useful conceptions of the problem we’re working on (which they are), this also allows us to lean on the well-developed links between graphical causal paths and statistical associations. Causal DAGs are mathematically grounded, but they are also consistent and easy to understand. Thus, when we’re assessing the causal effect between an exposure and an outcome, drawing our assumptions in the form of a DAG can help us pick the right model without having to know much about the math behind it. Another way to think about DAGs is as non-parametric structural equation models (SEM): we are explicitly laying out paths between variables, but in the case of a DAG, it doesn’t matter what form the relationship between two variables takes, only its direction. The rules underpinning DAGs are consistent whether the relationship is a simple, linear one, or a more complicated function.

Healthcare Example

Let’s say we’re looking at the relationship between smoking and cardiac arrest. We might assume that smoking causes changes in cholesterol, which causes cardiac arrest:

smoking_ca_dag <- dagify(cardiacarrest ~ cholesterol,
       cholesterol ~ smoking + weight,
       smoking ~ unhealthy,
       weight ~ unhealthy,
       labels = c("cardiacarrest" = "Cardiac\n Arrest", 
                  "smoking" = "Smoking",
                  "cholesterol" = "Cholesterol",
                  "unhealthy" = "Unhealthy\n Lifestyle",
                  "weight" = "Weight"),
       latent = "unhealthy",
       exposure = "smoking",
       outcome = "cardiacarrest")

ggdag(smoking_ca_dag, text = FALSE, use_labels = "label")

Modified Causal Graph

Consider the graph \(G^-\) where the causal link between the treatment \(D\) and \(Y\) is deleted. This graph must add an edge from \(D\) to the children of \(Y\) to be a “proper conditioning set”

#Initial CDAG
dagify(
  xd ~ xy,
  d ~ xd,
  y ~ d,
  y ~ xy,
  b ~ y, 
  labels = c(
    "d" = "Treatment",
    "y" = "Respnse",
    "b" = "Post\n Treatment"
  )
) %>% 
  ggdag(use_labels = "label")

Demo script

Draw Samples

# Set sample size
n <- 1000

# Treatment effect
tau <- 3

# Random draws for binomial distribution
xy <- rbinom(n, 1, 0.5)
xd <- rbinom(n, 1, 0.5 * xy + 0.25)

# Treatment
D = rbinom(n, 1, 0.5 * xd + 0.25)

# Model
y = tau * D - xy + rnorm(n)
b = rbinom(n, 1, 0.9 - 0.7 * (y < 0))

# Naive estimate of tau
tau.naive = mean(y[D == 1]) - mean(y[D == 0])
print(tau.naive)

## [1] 2.818068

#Conditional Average Treatment effects (CATE)
tau.xy = mean(xy==1)*(mean(y[D==1 & xy==1]) - mean(y[D==0 & xy==1])) + mean(xy==0)*(mean(y[D==1 & xy==0]) - mean(y[D==0 & xy==0]))
print(tau.xy)

## [1] 3.052023

Causal Graphs

Sam Kuhn