library(cfid)plot_graphviz <- function(g){
el <- get_edgelist(g)
nodes_ <- unique(unlist(el))
net <- bnlearn::empty.graph(nodes=nodes_)
bnlearn::arcs(net) <- el
bnlearn::graphviz.plot(net)
}
dsep <- function(g, u, v, z){
# u and v are variables to d-separate
# z is d-separating set of length 0 or more.
el <- get_edgelist(g)
nodes_ <- unique(unlist(el))
net <- bnlearn::empty.graph(nodes=nodes_)
bnlearn::arcs(net) <- el
return(bnlearn::dsep(net, u, v, z))
}Consider the following graph (node names were inspired by the DeepSCM Morpho-MNIST example by Pawlowsky et. al. with variables T for thickness, I for intensity, and X for an image).
T: Thickness I: Intensity X: Image
g <- cfid::dag("T -> I -> X <- T")
plot_graphviz(g)
#> Loading required namespace: RgraphvizLet T = 1 be “very thick.” Let I = 1 be “very intense.” Let X = 1 be “dense image.”
Interested in “Given observed thickness (T=1) and intensity (I=1), what would the image look like if not intense (I=0)?”
g <- cfid::dag("T -> X <- I T -> I")
v1 <- cf("X", obs=0, int=c(I = 0))
v2 <- cf("T", 1)
v3 <- cf("I", 1)
gamma <- conj(v1, v2, v3)
pw_graph <- cfid:::pwg(g, gamma)
plot_graphviz(pw_graph)cf_graph <- cfid:::make_cg(g, gamma)
plot_graphviz(cf_graph)This is the same as the SWIG. The T has a subscript, but I suspect that is just because T and T_{i} are exchangeable.
The CF graph and the SWIG are not the same because the CF graph prunes all variables that are not essential to the query.
For example, if I add a node “D”, it is ignored if it is not relevant to the query.
g <- cfid::dag("T -> X <- I T -> I -> D")
v1 <- cf("X", obs=0, int=c(I = 0))
v2 <- cf("T", 1)
v3 <- cf("I", 1)
gamma <- conj(v1, v2, v3)
pw_graph <- cfid:::pwg(g, gamma)
plot_graphviz(pw_graph)cf_graph <- cfid:::make_cg(g, gamma)
plot_graphviz(cf_graph)To fix this, I have to add D to the query.
v4 <- cf("D", 1)
gamma <- conj(v1, v2, v3, v4)
cf_graph <- cfid:::make_cg(g, gamma)
plot_graphviz(cf_graph)Question: Under what conditions are the single world CF graph and the SWIG the same?
Now I attempt to draw a counterfactual graph for a two-world query. “Given observed intensity (I=1), thickness(T=1), and image (X=1), what would the image look like if not intense (I=0)?”
g <- cfid::dag("T -> X <- I T -> I")
v1 <- cf("X", obs=0, int=c(I = 0))
v2 <- cf("T", 1)
v3 <- cf("I", 1)
v4 <- cf("X", 1)
gamma <- conj(v1, v2, v3, v4)
pw_graph <- cfid:::pwg(g, gamma)
plot_graphviz(pw_graph)cf_graph <- cfid:::make_cg(g, gamma)
plot_graphviz(cf_graph)dsep(cf_graph, "I", "X_{i}", "T_{i}")
#> [1] TRUEBoth the SWIG and the CF graph were invented for identification purposes.
Eli’s innovation in pyro was to implement an intervention operator with SWIG semantics instead of simple ideal intervention semantics. The difference is slight, but it added a new use case to the SWIG; it instantly opened up probabilistic inference for single-world queries without needing an SCM.
One might want to do the same for the CF graph by implementing an effect handler that does the same for multiverse counterfactuals. But there are problems.
If single-world inference in pyro or multiverse inference in causal pyro proves cumbersome, it may be instructive to build SWIGs and CF graphs in pgmpy and rely on simple PGM inference.
The CF graph enables d-separation. D-separation had three use cases: identification, validation through conditional independence tests, and constraint-based causal discovery. The latter two cases would require observed counterfactuals. Of course, one could simulate counterfactuals from a simulation model. But that is an edge case and too advanced for pedagogy.