Causality,Correlation and Risk

Author

Dr.Heiko Frings, Instat GmbH, Bielefeld

Published

April 1, 2024

Causality

Yes, correlation is not causality. Nevertheless, correlation can be an indicator of causality. But what the hell is causality? The answer to this question is anything but trivial.

Unlike correlation, causality naturally has a direction, an arrow from cause to effect. This arrow is an arrow in time. The effect follows the cause. The direction is obviously very important from a practical point of view. This insight is somewhat trivial, but still worth mentioning because it is sometimes overlooked in data analysis. It emphasizes that a pure correlation analysis is not sufficient especially when it come to risk analysis.

One can see a certain irony in the fact that the modern concept of causality initially always connoted determinism, whereas in the more recent past it was predominantly used in conjunction with stochastic analyses.

The philosophical discussion about causality has a long history. Physics as a model of an exact empirical science, has repeatedly raised the problem of causality and stimulated philosophical discussion. Hume’s criticism of induction is certainly an important milestone. According to Kant, causality is “a priori”, i.e. it applies before any experience and makes experience itself possible in the first place. Quantum mechanics finally broke the link between causality and determinism.¹

Even before that, the concept of causality was considered in a probabilistic context especially in connection with thermodynamics (Mach, Boltzmann).²

Michael Stöltzner, Vienna Indeterminism: Mach, Boltzmann, Exner, Synthese Vol. 119, No. 1/2, Ludwig Boltzmann, Troubled Genius as Philosopher (1999), pp. 85-111 (27 pages)

While, on the one hand, the development of physics, notably Newtonian mechanics, strongly influenced the concept of causality and probably also popularized it, leading, for example, to the idea of Laplace’s demon, it must be noted that, on closer inspection, the concept of causality is not a necessary component of Newtonian mechanics. On the contrary: There is no causal order of actio and reactio in Newtons third axiom. One of the most influential critics of the concept of causality was Bertrand Russell:

“In the following paper I wish, first, to maintain that the word cause is so inextricably bound up with misleading associations as to make its complete extrusion from the philosophical vocabulary desirable; secondly, to inquire what principle, if any, is employed in science in place of the supposed law of causality which philosophers imagine to be employed; thirdly, to exhibit certain confusions, especially in regard to teleology and determinism, which appear to me to be connected with erroneous notions as to causality. All philosophers, of every school, imagine that causation is one of the fundamental axioms or postulates of science, yet, oddly enough, in advanced sciences such as gravitational astronomy, the word cause never occurs.”³

Increasingly, the use of the concept of causality in other, less precise sciences was also examined more closely, which also led to a focus on probabilistic aspects.

The basic idea of counterfactual causal theories is that the meaning of causal statements can be explained by counterfactual conditionals of the form “If E had not occurred, C would not have occurred”. Most counterfactual analyses have focused on claims of the form “event C caused event E” that describe “singular” or “symbolic” or “actual” causality. Such analyses have become popular since the development of possibility world semantics for counterfactual statements in the 1970s. This possibly worlds semantic has been developed mainly by Paul Kripke.⁴ The best-known counterfactual analysis of causality is the theory of David Lewis (1973b), including further elaborations of this theory.

In the current discussion concerning causality, the exact natural sciences have tended to take a back seat. Discussions now tend to focus on disciplines such as medicine, psychology, sociology and history. There are probably two main reasons for this:

(1) In disciplines with multifactorial mechanisms of action and less precise concepts, the clarification of causality has a much greater practical relevance.

(2) Interest has shifted strongly to the concept of causality in the context of probability.

Both multifactoriality and probabilistic nature are also essential attributes of any entrepreneurial risk model. It therefore seems necessary for a risk analyst to take a closer look at the topic of causality.

Correlation

Correlation is first of all simply an observable connection between different entities under consideration.

Such a correlation can of course exist between quantitative, comparative as well as merely qualitative characteristics.

In terms of probability theory there is a positive correlation between two events A and B if the probability p of the common event A∩B is bigger than the product of single event probabilities probabilities p(A) and p(B):

p(A∩B)>p(A)p(B)

For comparative and quantitative characteristics, the well-known linear and rank correlation coefficients can be used to measure the strength of the correlation between two sets of comparative features or two quantitative variables, respectively. Certainly, correlation coefficients are very useful tools to condense the information provided by the data. But, this unavoidable also means that part of information is lost / not used.

Correlation is frequently used as an abbreviation for correlation coefficient, which can easily lead to the misunderstanding that a correlation coefficient equal to zero or close to zero means that there is no correlation. The fact that this is not necessarily the case is an example of the general phenomenon that correlations with identical correlation coefficients can nevertheless be very different from one another. This is one of the reasons that techniques like using copulas are getting more and more popular.

From Wikipedia: Point sets with Pearson correlation coefficient.

According to some critics, a causal dependency is nothing more than a correlation with a correlation coefficient of 1 and, conversely, a correlation with a correlation coefficient of 1 simply means that there is a deterministic causal dependency.

Reichenbach’s Principle of Common Cause

It was the philosopher and physicist Hans Reichenbach (1891 - 1951) who extended this somewhat narrow view of correlation and causality by introducing the common cause principle⁵ :

Suppose that two events A and B are positively correlated. Suppose, moreover, that neither event is a cause of the other. Then, Reichenbach’s Common Cause Principle (RCCP) states that A and B will have a common cause C that renders them conditionally independent.

The RCCP states that probabilistic correlations are ultimately derived from causal relationships. That is, if p(A∩B)>p(A)p(B), it is either because one of these events causes the other, or the inequality can be derived from other inequalities p(A|C)>p(A|non(C)) and p(B|C)>p(B|non(C)), where C is a cause of A and B. The principle is important because it establishes a link between causal structure and probabilistic correlations, which allows conclusions to be drawn about causal relationships from empirically observable correlations.

Subsequent analysis has revealed that this principle is not universally true. There exist cases where significant correlations between variables cannot be attributed to a causal relationship. Laws of coexistence in physics can lead to correlations without a straightforward common cause are one class of examples.

But, the principal is at least a fruitful heuristic tool.

Causal Diagrams

The diagram displayed above in order to illustrate the RCCP is a simple example of a causal diagram.

A causal diagram is a visual representation of the cause-effect relationships between variables in a system of interest.

The geneticist Sewall Wright, in 1921, was the first to use directed graphs (path diagrams) to represent probabilistic cause and effect relationships among a set of variables.⁶ He developed path diagrams and path analysis, which later went on to be used in the social sciences in methods such as structural equation modelling in the 1970s.

Path diagrams also led to Bayesian networks in the 1980s, with philosopher and AI researcher Judea Pearl being one of the leading developers. And soon after, causal path diagrams and probabilistic DAGs were merged into a formal theory of causal diagrams.⁷

A causal diagram contains only two things:

The variables each represented by a node on the diagram
The causal relationships, each represented by an arrow from the cause variable to the caused variable

In mathematical terms a causal diagram is a directed, acyclic graph (DAG). A graph is a pair (V,E) of a set V (vertices) and a subset E (edges) of V x V. A graph is directed if the edges are ordered. A path in a graph is sequence of edges joining a sequence of vertices in which all vertices and edges are distinct. A cycle is path of graph in which only the first and the last vertices are equal.

An example from the R package ggdag⁸

Code zeigen

suppressMessages(library(ggdag))
library(ggplot2)
theme_set(theme_dag())
smoking_ca_dag <- dagify(cardiacarrest ~ cholesterol,
  cholesterol ~ smoking + weight,
  smoking ~ unhealthy,
  weight ~ unhealthy,
  labels = c(
    "cardiacarrest" = "Cardiac\n Arrest",
    "smoking" = "Smoking",
    "cholesterol" = "Cholesterol",
    "unhealthy" = "Unhealthy\n Lifestyle",
    "weight" = "Weight"
  ),
  latent = "unhealthy",
  exposure = "smoking",
  outcome = "cardiacarrest"
)

ggdag(smoking_ca_dag, text = FALSE, use_labels = "label")

Counterfactuals and the Do Operator

Judas Pearl drew on David Lewis’ ideas about couterfactuals and combined them with the techniques of Sewall Wright, who used directed graphs to represent probabilistic causal relationships.

According to Pearl and others, a variable X is said to have a causal influence on Y if a change in X leads to changes in (the distribution of) Y. This position is very helpful in practice, although not everyone subscribes to it.⁹

Interventions and counterfactual events are simulated by a mathematical operator called do(x). This operator simulates physical interventions by deleting certain functions from the DAG and replacing them with a constant X = x, while the rest of the model remains unchanged. The resulting model is labelled M^x.

The post intervention distribution thus the distribution resulting from the action do(X = x) is then given by the equation

P_M(y|do(x)) = P_M^x(y)

In other terms: In the context of model M, the post-intervention distribution of outcome Y is defined as the probability that the model M^x assigns to each outcome Y = y. Using this distribution, which can be readily calculated from any fully specified model M we can evaluate the effectiveness of the treatment by comparing aspects of this distribution at different values of x.

The following is a very simple example with some pseudo data:

Code zeigen

suppressMessages(library(ggdag))
library(ggplot2)
dag2 <- dagify(y ~ x ,
  z ~ y ,
  labels = c(
    "x" = "Normal Distribution (0,1)",
    "y" = "~ x * Normal Distribution (0,1)",
    "z" = "~ y + 5*Normal Distribution (0,1)"
   
  )
)


ggdag(dag2,text = FALSE, use_labels = "label")

Have a look on the relations:

Code zeigen

set.seed(1)
n <- 5000
x <- rnorm(n, 0, 1)
y <- x * rnorm(n, 0, 1)
z <- y + 5*rnorm(n, 0, 0.10)
par(mfrow=c(2,2))
plot(x,y,col="darkred")
plot(x,z,col="darkred")
plot(y,z,col="darkred")

Now we define some interventions

Code zeigen

noIntervention<-function(n = 5000) {
x <- rnorm(n, 0, 1)
y <- x * rnorm(n, 0, 1)
z <- y + 5*rnorm(n, 0, 0.10)
cbind(x, y, z)
}

intervention_z <- function(z, n = 5000) {
x <- rnorm(n, 0, 1)
y <- x* rnorm(n, 0, 1)
cbind(x, y, z)
}

intervention_x <- function(x, n = 5000) {
y <- x* rnorm(n, 0, 1)
z <- y + 5*rnorm(n, 0, 0.1)
cbind(x, y, z)
}

intervention_y <- function(y, n = 5000) {
x <- rnorm(n, 0, 1)
z <- y + 5*rnorm(n, 0, 0.1)
cbind(x, y, z)
}

set.seed(1)
dat<- noIntervention()
datz <- intervention_z(z = 2)
datx <- intervention_x(x = 2)
par(mfrow=c(2,2))
hist(rowSums(dat),col="darkred",breaks=50)
hist(rowSums(datz),col="darkred",breaks=50)
hist(rowSums(datx),col="darkred",breaks=50)

Causality & Risk

Causal relationships help us to understand the dynamics of cause and effect in complex systems, and by identifying causal factors we can predict how changes in one variable will affect others. This is not possible by analysing correlations alone. In risk analysis, understanding these relationships allows us to accurately assess potential risks.

In this way, modelling causal relationships improves predictive models. By incorporating causality, we can create more accurate models that take into account both direct and indirect effects. These models can improve risk predictions and inform prevention measures. In this way, causal analysis enables the establishment of effective and efficient mitigation measures.

By addressing root causes, we can reduce the likelihood and impact of negative events. For example, identifying the causal link between poor cybersecurity practices and data breaches helps organizations improve their security protocols.

Causal Inference from Data

Causal inference from data has three aspects:

Uncovering the causal structure from the data
Identification of the causal effect for a given causal structure
Estimation of an identifiable causal effect from the data

There are lots of software tools for causal analysis. These include some interesting code libraries for Python and R. The following list only display some of them.

Python Libraries

CausalPy is a Python library specifically designed for causal inference and discovery.
- It offers a comprehensive set of tools for estimating causal effects and identifying causal relationships in both observational and experimental data.
- Developed by the consultancy company PyMC, CausalPy is a valuable resource for causal analysis

DoWhy is another Python library for causal inference.
- It supports explicit modelling and testing of causal assumptions.
- DoWhy combines causal graphical models and potential outcomes frameworks, making it a powerful choice for causal analysis
causal -learn
- It is a Python translation and extension of the Tetrad java code.
- It offers the implementations of up-to-date causal discovery methods as well as simple and intuitive APIs.
causalnex:
- causalnex is an R library for causal discovery.
- It focuses on discovering causal relationships from data, especially in scenarios where the underlying causal structure is unknown.
- If you’re dealing with complex data and want to uncover causal links, causalnex is a valuable resource.

R Libraries

pgmpy:
- It is an R library that focuses on probabilistic graphical models (PGMs) and Bayesian networks.
- It allows you to model and analyse causal relationships using graphical representations.
- If you’re interested in Bayesian networks, pgmpy is worth exploring.
causalml:
- causalml is an R package that provides tools for causal machine learning.
- It includes methods for estimating treatment effects, propensity score matching, and more.
- If you’re working with machine learning models and want to incorporate causal analysis, check out causalml.
causaleffect
- Description: Functions for identification and transportation of causal effects.
- Provides a conditional causal effect identification algorithm (IDC) by Shpitser, I. and Pearl, J. (2006)
CausalImpact
- This R package implements an approach to estimating the causal effect of a designed intervention on a time series.
BCDAG
- A collection of functions for structure learning of causal networks and estimation of joint causal effects from observational Gaussian data.
- Main algorithm consists of a Markov chain Monte Carlo scheme for posterior inference of causal structures, parameters and causal effects between variables.

Footnotes

Peter Mittelstaedt, Philosophische Probleme der modernen Physik,Blibliographisches Institut Mannheim 1972↩︎
Michael Stöltzner , Causality, Realism and the Two Strands of Boltzmann’s Legacy (1896-1936), Dissertation, Universität Bielefeld, 2003↩︎
Bertrand Russel: On the notion of cause,Proceedings of the Aristotelian Society New Series, Vol. 13 (1912 - 1913) , pp. 1-26 (26 pages) , Published By: Oxford University Press↩︎
Kripke, Saul. 1980. Naming and Necessity. Harvard University Press: 22. // Kripke, Saul “Semantical Analysis of Modal Logic I: Normal Propositional Calculi,” Zeitschrift für mathematische Logik und Grundlagen der Mathematik, 9(1963): 67–96.↩︎
Hans Reichenbach, The Direction of Time, published posthumously in 1956 Wolfgang Spohn, On Reichenbach’s Principle of Common Cause,Universität Bielefeld, 1998 Link ↩︎
Wright S. Correlation and causation. Journal of Agricultural Research. 1921;20: 557–585.↩︎
Pearl J. ,Causality: Models, reasoning, and inference. 2nd ed. Cambridge: Cambridge University Press; 2009. doi:10.1017/CBO9780511803161↩︎
Package: ggdag Title: Analyze and Create Elegant Directed Acyclic Graphs Version: 0.2.10 Authors@R: person(“Malcolm”, “Barrett”, , “malcolmbarrett@gmail.com”, role = c(“aut”, “cre”), comment = c(ORCID = “0000-0003-0299-5825”)) Description: Tidy, analyze, and plot directed acyclic graphs (DAGs). ‘ggdag’ is built on top of ‘dagitty’, an R package that uses the ‘DAGitty’ web tool (http://dagitty.net) for creating and analyzing DAGs. ‘ggdag’ makes it easy to tidy and plot ‘dagitty’ objects using ‘ggplot2’ and ‘ggraph’, as well as common analytic and graphical functions, such as determining adjustment sets and node relationships. License: MIT + file LICENSE URL: https://github.com/r-causal/ggdag, https://r-causal.github.io/ggdag//↩︎
Cartwright, N. (2007). Hunting Causes and Using Them: Approaches in Philosophy and Economics. Cambridge University Press.↩︎