On the non-ID usefulness of counterfactual graphs

Consider the following graph (node names were inspired by the DeepSCM Morpho-MNIST example by Pawlowsky et. al. with variables T for thickness, I for intensity, and X for an image).

Plotting a parallel world graph.

Let T = 1 be “very thick.” Let I = 1 be “very intense.” Let X = 1 be “dense image.”

Interested in “Given observed thickness (T=1) and intensity (I=1), what would the image look like if not intense (I=0)?”

g <- cfid::dag("T -> X <- I  T -> I")
v1 <- cf("X", obs=0, int=c(I = 0))
v2 <- cf("T", 1)
v3 <- cf("I", 1)
gamma <- conj(v1, v2, v3)
pw_graph <- cfid:::pwg(g, gamma)
plot_graphviz(pw_graph)

cf_graph <- cfid:::make_cg(g, gamma)
plot_graphviz(cf_graph)

This is the same as the SWIG. The T has a subscript, but I suspect that is just because T and T_{i} are exchangeable.

The CF graph and the SWIG are not the same because the CF graph prunes all variables that are not essential to the query.

For example, if I add a node “D”, it is ignored if it is not relevant to the query.

g <- cfid::dag("T -> X <- I  T -> I -> D")
v1 <- cf("X", obs=0, int=c(I = 0))
v2 <- cf("T", 1)
v3 <- cf("I", 1)
gamma <- conj(v1, v2, v3)
pw_graph <- cfid:::pwg(g, gamma)
plot_graphviz(pw_graph)

cf_graph <- cfid:::make_cg(g, gamma)
plot_graphviz(cf_graph)

To fix this, I have to add D to the query.

v4 <- cf("D", 1)
gamma <- conj(v1, v2, v3, v4)
cf_graph <- cfid:::make_cg(g, gamma)
plot_graphviz(cf_graph)

Question: Under what conditions are the single world CF graph and the SWIG the same?

Now I attempt to draw a counterfactual graph for a two-world query. “Given observed intensity (I=1), thickness(T=1), and image (X=1), what would the image look like if not intense (I=0)?”

g <- cfid::dag("T -> X <- I  T -> I")
v1 <- cf("X", obs=0, int=c(I = 0))
v2 <- cf("T", 1)
v3 <- cf("I", 1)
v4 <- cf("X", 1)
gamma <- conj(v1, v2, v3, v4)
pw_graph <- cfid:::pwg(g, gamma)
plot_graphviz(pw_graph)

cf_graph <- cfid:::make_cg(g, gamma)
plot_graphviz(cf_graph)

dsep(cf_graph, "I", "X_{i}", "T_{i}")
#> [1] TRUE

How the CF graph is useful (beyond ID)

Both the SWIG and the CF graph were invented for identification purposes.

Eli’s innovation in pyro was to implement an intervention operator with SWIG semantics instead of simple ideal intervention semantics. The difference is slight, but it added a new use case to the SWIG; it instantly opened up probabilistic inference for single-world queries without needing an SCM.

One might want to do the same for the CF graph by implementing an effect handler that does the same for multiverse counterfactuals. But there are problems.

A probabilistic inference use case already exists via the abduction-action-prediction on the SCM. Therefore, probabilistic inference on the CF graph is merely an enhancement.
The conversion is more complicated than it would be for SWIGs. It would require sampling from an SCM to train a model with default distributions. That said, pyro is well-suited for this task.

If single-world inference in pyro or multiverse inference in causal pyro proves cumbersome, it may be instructive to build SWIGs and CF graphs in pgmpy and rely on simple PGM inference.

On the non-ID usefulness of counterfactual graphs

Plotting a parallel world graph.

How the CF graph is useful (beyond ID)

Ideas for simulation data