Non-Transitivity of Correlation

In this post, I show how correlation is not transitive, and include the R code for a simulation, so that you can try it for yourself.

Regression is an important tool in data science, not only for prediction but also to elucidate causality between variables. To establish causality, OLS regression is insufficient, because if a model has endogeneity, i.e. an explanatory variable is correlated with the error term, then the OLS estimates are biased. Hence OLS must always be complemented with IV regression and the Durbin-Wu-Hausman test, to verify which of the two methods is most appropriate for the model at hand.

Mathematically, suppose we have a linear model of the form:

\[y_i = \beta_0 + \beta_1x_i + u_i\] Suppose that the explanatory variable \(x_i\) is correlated with the error term \(u_i\). Then the OLS estimates will be biased. Of course, we don’t know this a priori because the error term \(u_i\) is unobserved. We can test for this with an IV regression.

IV regression uses instrumental variables, which are defined by the following properties:

The instrumental variable, z, must be correlated with the explanatory variable, x.
The instrumental variable, z, must not be correlated with the error term, u.

This seems counterintuitive. Surely if z is correlated with x, and x is correlated with u, then z must be correlated with u. We would expect correlation to be transitive, just like equality:

If z = x and x = u, then z = u.

So it seems intuitive to say: if z is correlated with x and x is correlated with u, then z is correlated with z. Mathematically:

\[cov(z,x)\neq 0\ and\ cov(x,u)\neq 0 => cov(z,u)\neq 0\] However, for correlation, this is not true. In fact, IV regression is possible thanks to correlation NOT being transitive. If it were, then an instrumental variable z being correlated with x that in turn is correlated with u would lead to z being correlated with u. But then z would not satisfy the condition for being an instrumental variable, which is a contradiction.

\[cov(z,x)\neq 0\ and\ cov(x,u)\neq 0 => cov(z,u)\neq0=><=\]

Therefore, it is interesting to see under what conditions we can have the setup where endogeneity is present and we have a good instrumental variable z:

\[cov(z,x)\neq 0\ and\ cov(x,u)\neq 0\ and\ cov(z,u)=0\] More precisely, given variables x and u that satisfy:

\[cov(x,u)\neq 0...(1)\]

find variable z that satisfies the conditions for being an instrument:

\[cov(z,x)\neq0\ldots(2)\] \[cov(z,u)=0\ldots(3)\]

The equivalent equations for these conditions are:

\[x=\alpha_0+\alpha_1u+\varepsilon_x...(4)\] \[z=\gamma_0+\gamma_1x+\varepsilon_z...(5)\]

Without loss of generality, we can assume that equation (4) satisfies the ceteris paribus condition, i.e. \(cov(u,\varepsilon_x)=0\), because the only restriction on x and u is condition (1), which does not imply \(cov(u,\varepsilon_x)\neq0\). In contrast, we cannot assume that equation (5) satisfies \(cov(x,\varepsilon_z)=0\), because z is subject to more restrictions, (1), (2) and (3), which together may imply a restriction on \(cov(x,\varepsilon_z)\).

In addition, for conditions (1) and (2) to hold, necessarily we must have:

\[\alpha_1\neq 0 \ and \ \gamma_1\neq0\] Substitute x from (4) into (5): \[z=\gamma_0+\gamma_1(\alpha_0+\alpha_1u+\varepsilon_x)+\varepsilon_z\]

\[\therefore z=\gamma_0+\gamma_1\alpha_0+\gamma_1\alpha_1u+\gamma_1\varepsilon_x+\varepsilon_z...(6)\] Given \(cov(z,u)=0\):

\[cov(\gamma_0+\gamma_1\alpha_0+\gamma_1\alpha_1u+\gamma_1\varepsilon_x+\varepsilon_z,u)=0\] \[\therefore\gamma_1\alpha_1\sigma_u^2+\gamma_1cov(\varepsilon_x,u)+cov(\varepsilon_z,u)=0\] Since \(cov(u,\varepsilon_x)=0\):

\[cov(\varepsilon_z,u)=-\gamma_1\alpha_1\sigma_u^2...(7)\] Therefore \(cov(\varepsilon_z,u)\neq 0\), as was allowed for. We can write the relation between \(\varepsilon_z\) and u as:

\[\varepsilon_z=\delta_0+\delta_1u+\mu...(8)\] where \(\delta_1\neq 0\) and \(\mu\) is a stochastic error term independent of u. From (8):

\[cov(\varepsilon_z,u) = \delta_1\sigma_u^2\]

Equating with (7) we get:

\[\delta_1=-\gamma_1\alpha_1\] Substituting _1 into (8) yields:

\[\varepsilon_z=\delta_0-\gamma_1\alpha_1u+\mu...(9)\] Substitute \(\varepsilon_z\) from (9) into (6):

\[z=\gamma_0+\gamma_1\alpha_0+\gamma_1\alpha_1u+\gamma_1\varepsilon_x+(\delta_0-\gamma_1\alpha_1u+\mu)\] \[\therefore z=\alpha_0+\alpha_1\gamma_0+\alpha_1\varepsilon_x+\mu...(10)\] Equation (10) is the data generating process (DGP) for z. In this equation, z does not depend on u, and hence \(cov(z,u)=0\). However, x is not included in (10) either; so how is z correlated with x? The answer is that z is generated with the same error term that generates x from u.

We test this with a simulation. The R code is included.

Parameters: \[\alpha_0=\alpha_1=\gamma_0=\gamma_1=1\] \[\sigma_{\varepsilon_x}^2=2\] \[\sigma_\mu^2=0.5\]

# Packages
library(ggplot2)
library(gridExtra)

# 1. Parameters
sd.ex = 2 # Standard deviation of ex
sd.mu = 0.5 # Standard deviation of mu
n = 100

Generate u as a uniform distribution between 0 and 10.

set.seed(0)
u = sort(runif(n, 0, 10))

Generate \(\varepsilon_x\) as a normal distribution with mean = 0 and \(\sigma_{\varepsilon_x}^2=2\).

set.seed(0)
ex = rnorm(n, sd=sd.ex)

Generate x from u and _x using (4).

x = 1 + u + ex

Generate z from _x using (10).

set.seed(0)
mu = rnorm(n, sd=sd.mu)
z = 2 + ex + mu

Test that the conditions hold.

# Scatterplots
g1 = ggplot(mapping = aes(y=x, x=u)) + geom_point(size=1) + geom_smooth(method = "lm")
g2 = ggplot(mapping = aes(y=z, x=x)) + geom_point(size=1) + geom_smooth(method = "lm")
g3 = ggplot(mapping = aes(y=z, x=u)) + geom_point(size=1) + geom_smooth(method = "lm")
grid.arrange(g1, g2, g3, ncol=2)

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

# Correlations
cor(x,u)

## [1] 0.8381772

cor(z,x)

## [1] 0.5509224

cor(z,u)

## [1] 0.00660519

We can see that \(r_{x,u}\) and \(r_{x,x}\) are significantly different than zero and \(r_{z,u}\) is almost equal to zero.

Result: The simulation supports the proof.

I hope you enjoyed this demonstration and that it is a useful contribution to our community.

Non-Transitivity of Correlation

Jerome Smith-Uldall

2024-10-28