Indentification of Causal Effects

Outline

The key terminology of this post is causality. I’d like to refer to the definition of causal effect proposed by the author: Given two disjoint sets of variables, $X$ and $Y$, the causal effect of $X$ on $Y$, denoted either as $P(y|\hat x)$ or $P(y|do(x))$, is a function from $X$ to the space of probability distribution on $Y$.

The difference $\mathbb{E}(Y|do(x'))-$(Y|do(x’’)) is sometimes taken as the definition of “causal effect”

Due to many kinds of reason, interventing explanatory variable directly is not realistic. So the author is trying to find some ways to eliminate the $Do()$ operator without removing the intuition behind the intervention. Two approaches are proposed. One is Adjustment formula based on back-door criteria and the other is Front-door Adjustment based on front-door criteria.

More generally, if there is some way to decide in advance if a given causal model lends itself to such an elimination procedure, we can apply the procedure and find ourselves in possession of the causal effect, without having to lift a finger to intervention. Otherwise, we would at least know that the assumptions embedded in the model are not sufficient to uncover the causal effect from the observational data and there is no escape from running an interventional experiment of some kind.

In addition to the brief view on the identification of causal effects, this post is organized in the following way:

Difference between $p(y_i|\hat x_i)$ and $p(y_i|x_i)$
Adjustment Formula approach and its connection to the linear regression model used in econometrics.
Front-door Adjustment approach.
Do calculus and identifying models.

1. Difference between $p(y_i|\hat x_i)$ and $p(y_i|x_i)$

$P(y|\hat x)$ is just a short form of $P(y|do(x))$ which is completely DIFFERENT from $P(y|x_i=x)$. $P(y|x_i=x)$ is used when we only obeserve the value of $x_i$ to be $x$, thus the causal diagram remain unchanged. As I said above, under some assumptions, we can use observational data to uncover the causal effect without intervention. In that case, $p(y|x) = p(y|\hat x)$.

However, when the causal effect is noted as $p(y_i|do(x_i))$ or $p(y_i|\hat x_i)$, the diagram changes. The arrows pointed to $X_i$ are all removed.

The direct influence of this difference is the way to docompose the conditional joint distribution. Take the below diagrams as example:

Turning the Sprinkler On

The decomposition of first diagram is: \[ P(x_1,x_3,x_4,x_5|x_2) =P(x_1|x_2)P(x_3|x_1)P(x_4|x_2,x_3)(x_5|x_4) \] The decomposition of second diagram is: \[ P(x_1,x_3,x_4,x_5|\hat x_2) =P(x_1)P(x_3|x_1)P(x_4|x_2,x_3)(x_5|x_4) \] The difference between the first term in two equations is caused by the change of the diagram. In the first diagram, the connect between $x_1$ and $X_2$ still exists, so when being conditional on the differenct level of $X_2$, the distribution of $X_1$ may vary. In the second diagram, due to the do operation, the connection between $x_1$ and $x_2$ is removed, so the they are independent.

Notation: I’m not going to use the $pa_j$ theory to explain the omitted variables compared to the decomposition based on Bayesian rule. The omission can be account for the conditional independence based on collider, folk, or inverse folk paths’ properties.

2. Adjustment Formula

Introduction

Since doing experiment is sometimes unrealistic in economic research, the most familiar method of predicting the effect of an “intervention” is to “control” for confounders using the adjustment formula.

Confounders are the variables which affect independent variable $X$ and independent variable $Y$ simultaneously By saying “control”, we mean to divide our observations based on various levels of confounding variables set $Z$. After doing this, we measure the average causal effect of an intervention by first estimating its effect at each stratum of the deconfounder. We then compute a weighted average of those strata, where each stratum is weighted according to its prevalence in the population.

If, for example, the decounfounder is gender, we first estimate the causal effect in girls group and boys group. Then we average the two, if the population is half boys and half girls, the weight for each group is a half.A mathmetical instruction can be shown below:

According to Bayesian Rule, the probability of $y_i$ after intervention and deconfounding can be written as below:

\[ p(y_i| x_i, z_i) = \frac{p(y_i, z_i| x_i)}{p( z_i| x_i)} \Rightarrow p(y_i, z_i| x_i) = p(y_i| x_i, z_i)*p( z_i| x_i) \tag{1} \]

The notation of average causal effect of intervention is $E(y| x)$ and it can be analyzed as following: \[ \begin {align} \mathbb{E}(y_i| x_i) & = \int y_i*p(y_i|x_i) dy_i \\ & = \int y_i*\int_{ z_i} p(y_i, z_i| x_i)]dy_id z_i\\ & = \int y_i*\int_{ z_i}p(y_i| x_i, z_i)*p( z_i| x_i)]dy_id( z_i)\\ & = \int_{ z_i} [\int y_i*p(y_i| x_i, z_i)dy_i] * p( z_i| x_i)d( z_i) \end {align}\tag{2} \]
The result $\int_{ z_i} [\int y_i*p(y_i| x_i, z_i)dy_i] * p( z_i| x_i)d( z_i)$ is the weighted average causal effect based on the proportion of deconfounder $Z$.

Connection to linear regression

Adjustment formula can be astonishingly simple when being applied to linear regression. If confounder set $Z$ happens to satisfy the back-door condition, then adding $Z$ into the regression function gives us the regression coefficient of $Y$ on $X$ already adjusted for $Z$.

\[ \begin{align} \mathbb{E}(y_i| x_i) &= \mathbb{E}[\mathbb{E}[y_i| x_i, z_i]|x_i]\\ & = \mathbb{E}[\int y_i*p(y_i| x_i, z_i)dy_i] x_i] \end{align}\tag{3} \]

Since linear regression function has function form $y_i=\alpha+\beta x_i + \gamma z_i + \epsilon_i$ and $\epsilon_i|x_i,z_i \sim N(0,\sigma^2)$. Then \[ \begin{align} & \ \ \ \ \ \ y_i|x_i= x_i,z_i= z_i=\alpha+\beta x_i + \gamma z_i + \epsilon_i| x_i, z_i\\ & \Rightarrow y_i| x_i, z_i \sim N(\alpha+\beta x_i+\gamma z_i,\sigma^2)\\ & \Rightarrow \mathbb{E}(y_i| x_i, z_i) = \alpha+\beta x_i+\gamma z_i\\ & \Rightarrow \mathbb{E}(y_i| x_i, z_i) | x_i = \alpha+\beta x_i+ \gamma z_i| x_i\\ & \Rightarrow p(\mathbb{E}(y_i| x_i, z_i) | x_i) = p( z_i| x_i) \end{align} \tag{4} \] Based on (4), (3) can be written as \[ \begin{align} & \ \ \ \ \ \ \mathbb{E}[\int y_i*p(y_i| x_i, z_i)dy_i] x_i]\\ &=\int_{ z_i}[\int y_i*p(y_i| x_i, z_i)dy_i]*p( z_i| x_i)d( z_i) \end{align} \tag{5} \]

Thus we can skip the cumbersome procedure of regressing $Y$ on $X$ for each level of $Z$ and computing the weighted average of the regression coefficients. Linear regression model already does all the averaging for us.

In conclusion, when we have a diagram showing $Z$ is the confounder of $X$ and $Y$, the coefficient of adjusted linear regression model conveys the causal effect of $X$ on $Y$. If we can’t find the diagram, the coefficient is just the correlation between $X$ and $Y$.

3. The Front-door Adjustment

Adjustment formula is a kind of intervention based on back-door criteria. However, in reality, due to the loss of data, we can not block all the back door path, which means we can’t perfectly adjust the linear regression with all the confounders. In this case, a more appreciable method comes up, which is called “front-door adjustment.”

The posterior intervention distribution is: \[ p(y|\hat x) = \sum _zp(z|x)\sum_{x'}p(y|x')p(x') \]

4. Do calculus and identifying models

Rules used to manipulate do-expression

Whether we are using Adjustment Formula approach or Front-door Adjustment approach, we are following the same analyzing pattern. We begin with the target formula $P(y|do(x))$ and try to eliminate the do-operator from it, leaving only classical probability expressions, like $P(Y|X)$ or $P(Y,Z,W)$. We cannot, of course, manipulate our target expression wilfully; the operations must conform to what $do(X)$ means as a physical intervention. Thus we must pass the expression through a sequence of legitimate manipulations, each licensed by the axioms and the assumptions of our model.

Here are the three legitimate transformations on $do-expressions$:

Rule 1. Insertion/deletion of observations: \[ if \ (y \perp z|x,w) \ in \ G_{\bar x} \Rightarrow \ P(y|\hat x, z,w) = P(y|\hat x,w) \] The stated equation holds provided that the variable set $Z$ blocks all the paths from $W$ to $Y$ after we have deleted all the arrows leading into $X$.

Rule 2. Action/observation exchange: \[ if \ (y \perp z|x,w) \ in \ G_{\bar x \underline z}\Rightarrow P(y|\hat x,\hat z, w) = P(y|\hat x,z,w) \] This is familiar to us from our back-door discussion. We know that if a set $Z$ of variables blocks all back-door paths from $X$ to $Y$, then conditional on $Z$, do(X) is equivalent to see(X).

Rule 3. Insertion/deletion of actions:

\[ if \ (y \perp z|x,w) \ in \ G_{\overline x \overline {z(w)}}\Rightarrow P(y|\hat x,\hat z,w) = P(y|\hat x, w) \] Rule 3 is quite simple: it essentially says that we can remove do(X) from $P(Y | do(X))$ in any case where there are no causal paths from X to Y. By saying causal paths, I mean paths from $X$ to $Y$ with only forward-directed arrows.

Causal effect of smoking on lung cancer using manipulation

Causal Diagram for Smoking and Cancer

\[ \begin{align} P(c|do(s)) & = \sum_t P(c|do(s),t)P(t|do(s))\\ & = \sum_t P(c|do(s),do(t))P(t|do(s)) \tag{Rule 2}\\ & = \sum_t P(c|do(s),do(t))P(t|s) \tag{Rule 2}\\ & = \sum_t P(c|do(t))P(t|s) \tag{Rule 3}\\ & = \sum_{s'}\sum_t P(c|do(t),s')P(s'|do(t))P(t|s) \\ & = \sum_{s'}\sum_t P(c|t,s')P(s'|do(t))P(t|s) \tag{Rule 2}\\ & = \sum_{s'}\sum_t P(c|t,s')P(s')P(t|s) \tag{Rule 3}\\ \end{align} \]

The steps of using legitimate transformations are: 1. According to the result of each rule, guess which of the rule may be used. 2. Based on the guess, draw a diagram to see if the perpendicular condition is satisfied. 3. If yes, transform the do-expression. If not, repeat 1~2. 4. If no rules can be used, the model may be unidentifying.

For example, if we want to convert observation $t$ to action $t$ in expression $P(c|do(s),t)$, then Rule 2 should be used. Next, draw the diagram $G_{\bar s \underline t}$, shown as below:

$G_{\bar s \underline t}$

Based on the diagram, the perpendicular condition $c\perp t|s$ satisfies. So $P(c|do(s),t)=P(c|do(s),do(t))$

Identifying Models

Below are the diagrams in which the causal effect of $X$ on $Y$ is identifiable. Such a model is called “identifying” because their structures communicate a sufficient number of assumptions to permit the identification of the target quantity $P(Y|\hat X)$. The dashed curve is called confounding arc, which represents the existence in the diagram of a back-door path that contains only unobserved variables.

Identifying Models

Model a is the most simple and uncommon model. In nature, it’s hard to find a pair of variables without any confounders. But once it is real, the causal effect $P(Y|\hat X)=P(Y|X)$

Model b is similar to the first one. The confounder exists in pair ($Z$,$Y$) doesn’t affect the causal effect $P(Y|\hat X)$ which is still equal to $P(Y|X)$.

To uncover the causal effect in model (c) and (d), Adjustment Formula approach can be used by controlling observation $Z$. No matter what kind of unobservable confounders are, the back-door path with them can always be unblocked by $Z$. So $P(Y|\hat X)=\sum_z P(Y|X,Z)P(Z)$

Model e is the same as the smoking-cancer example above which can be solved by Front-door Adjustment approach.

Indentification of Causal Effects

Outline

1. Difference between \(p(y_i|\hat x_i)\) and \(p(y_i|x_i)\)

2. Adjustment Formula

Introduction

Connection to linear regression

3. The Front-door Adjustment

4. Do calculus and identifying models

Rules used to manipulate do-expression

Causal effect of smoking on lung cancer using manipulation

Identifying Models

Conclusion

Reference