- In data science, correlation and causation are often confused.
- Just because two things are related, doesn’t mean one causes the other.
- Misinterpreting this can lead to poor decisions in business, health, and policy.
6/8/2025
This is an example of a spurious correlation — where two things appear related but are not causally connected.
This 3D plot shows how three numeric variables can have relationships between them.
The Pearson correlation coefficient \(r\) is calculated as:
\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum (x_i - \bar{x})^2} \sqrt{\sum (y_i - \bar{y})^2}} \]
A strong correlation (close to –1 or 1) suggests a strong linear relationship.
Sometimes, variables show correlation by coincidence, not causation.
\[ \text{Causal Effect: } \Delta Y = Y_1 - Y_0 \]