MC_proof

Suppose we have a causal model with the following causal DAG.

library(bnlearn)

## 
## Attaching package: 'bnlearn'

## The following object is masked from 'package:stats':
## 
##     sigma

net <- model2network("[U][X|U][M|X][Y|M:U]")
graphviz.plot(net)

## Loading required namespace: Rgraphviz

My objective is to estimate $P(Y|do(X=x))$.

Case 1: Monte Carlo simulation with graph mutilation when parameters are known.

Further, assume I know a prior the parametric form for the causal Markov kernels and their parameters. In other words, we have a fully generative causal model.

A generative causal model enables us to simulate the affect of the intervention $do(X=x) using graph mutilation using the following Monte Carlo procedure.

Using graph mutilation to set X=x. This produces the following mutilated graph.

net2 <- mutilated(net, list(X=1.0))
graphviz.plot(net2)

Forward simulate Y.
Take the average of the samples of Y.

Case 2: Monte Carlo simulation with graph mutilation when parameters are unknown but fully observed data is available.

Suppose the parameters were not known a priori. However, we have training data available that will enable us to estimate the parameter values. In this data set, assume all the variables were covered in the data. Further, assume that the data is large enough that we can assume parameter estimates have converge on the true value of the parameters.

In this case we can simply fit the parameters on the training data, then repeat causal effect estimation as in Case 1.

Case 3: Do-calculus-based Monte Carlo simulation with graph mutilation

Suppose I do not have full coverage of all the nodes in the data.

Lemma: If an intervention query is non-parameterically identifiable via the do-calculus, then Monte Carlo estimation on the mutilated graph is a consistent estimator of the intervention query.

Proof sketch.

The rules of do-calculus show relationships between conditional probability distributions in the original causal DAG and various mutilated DAGs, including the mutilated DAG created as a result of simply applying the intervention. Thus, any estimand derived by the do-calculus is a consequence of these relations. Therefore, any estimand is derived by mathematical operations on conditional probability distributions defined on the mutilated DAG.

Using our example, let $G$ represent the DAG and $G_{\bar{X}}$ represent the graph mutilated to represent an intervention on $X$. Let $P_{G}(.)$ and $P_{G_{\bar{X}}}(.)$ refer to a probability distributions encoded in the causal models represented by $G$ and $G_{\bar{X}}$ respectively. Modeling ideal interventions with graph mutilation assumes that $P_G(Y|do(X=x)) = P_{G_{\bar{X}}}(Y|X=x)$

Consider the distribution $P_{G_{\bar{X}}}(Y|M=m)$. How can we relate this to its counterpart in $G$, $P_{G}(Y|M=m)$? The problem is that there is a backdoor path between $Y$ and $M$ in $G$, specifically $Y-U-X-M$. We can block that path by conditioning on $X$. Thus \[P_{G_{\bar{X}}}(Y|M=m) = \int_u P_{G}(Y|M=m)P_G(X=u)du\]

Going back to reasoning on $P_{G}(Y|do(X=x)) = P_{G_{\bar{X}}}(Y|X=x)$

\[\begin{align*} P_{G_{\bar{X}}}(Y|X=x) &=\int_v P_{G_{\bar{X}}}(Y|M=v, X=x)P_{G_{\bar{X}}}(M=v|X=x)dv \\ &=\int_v P_{G_{\bar{X}}}(Y|M=v)P_{G_{\bar{X}}}(M=v|X=x)dv \end{align*}\]

$P_{G_{\bar{X}}}(M=v|X=x)$ will be unaffected when graph mutilation on $G$ creates $G_{\bar{X}}$.

\[\begin{align*} P_{G_{\bar{X}}}(Y|X=x) &=\int_v P_{G_{\bar{X}}}(Y|M=v, X=x)P_{G_{\bar{X}}}(M=v|X=x)dv \\ &=\int_v P_{G_{\bar{X}}}(Y|M=v)P_{G_{\bar{X}}}(M=v|X=x)dv \\ &=\int_v P_{G_{\bar{X}}}(Y|M=v)P_{G}(M=v|X=x)dv \end{align*}\]

Finally, we substitute our previous $P_{G_{\bar{X}}}(Y|M=v)$ for $\int_u P_{G}(Y|M=v)P(X=u)du$

The resulting value on the right hand side is exactly the front-door adjustment formula, an estimand for $P_G(Y|do(X=x))$ that is derived from the do-calculus.

In the causal inference literature, the common practice is to construct an estimator that targets this estimand directly. For example one might model $P_{G}(Y|M=v)$, $P_G(X=u)$, and $P_{G}(M=v|X=x)$ directly from data, then numerically integrate.

However, as generative modelers, we prefer to use our our Monte Carlo graph mutilation estimation procedure.

Note that given correct parameters, Monte Carlo estimation is exact. So this work has three main contributions:

We illustrate Bayesian methods to parameter learning. For the first time, we connect standard Bayesian model criticism techniques to the concepts of causal reasoning on a DAG.
We illustrate how putting causal constraints on the parameter values improves learning and predictive performance.
We show the consequences of Monte Carlo estimation of an intervention distribution when we don’t have do-calculus identifiability from partial data. This illustrates that Monte Carlo estimation of P(Y|do(X=x)) from a generative model trained on partial data requires do-calculus identifiability, otherwise the estimator will fail to converge. Further, the causal constraint we derived in the do-calculus valid case cannot be relied on to “fix” this problem.