Granger-Causality is a statistical method that tests whether knowing past observations of one time series improves predictions of another time series. The procedure involves fitting a prediction model for the target series based solely on its own past values and comparing it to a second model that also incorporates past values of the predictor series. If the second model yields a significantly improved forecast based on a standard statistical test then one concludes that the predictor series Granger-causes the target series. This result does not demonstrate genuine cause and effect in the philosophical sense but instead shows that information in the history of the predictor series contains useful predictive power for the future of the target series beyond what is already captured by its own past history.
The age-old puzzle “Which came first, the chicken or the egg?” poses a deceptively simple question that has fascinated philosophers, naturalists, and curious thinkers for centuries. On the one hand, every chicken hatches from an egg; on the other, every egg is laid by a chicken. At its heart lies a fundamental issue of causality: how can we rigorously determine the direction of influence between two interdependent processes?
The data, ‘ChickEgg-2.csv’ contains time-series data on chicken populations and egg production from 1930 to 1983. Separate time-series data sets are created to help visualize the series. A log10 scale on the y-axis is included so that proportional changes in chicken counts and egg production appear as straight, comparable lines. By compressing high values and expanding low ones, it makes it easier to see and compare growth rates of both series.
chickegg = read.csv("ChickEgg-2.csv", header = TRUE)
head(chickegg)
## chicken egg
## 1 468491 3581
## 2 449743 3532
## 3 436815 3327
## 4 444523 3255
## 5 433937 3156
## 6 389958 3081
chick.ts = ts(chickegg$chicken, start = c(1930,1), frequency = 1)
egg.ts = ts(chickegg$egg, start = c(1930,1), frequency = 1)
Date = seq(as.Date("1930/1/1"), by = "year", length.out = 54)
cbind.data.frame(year = lubridate::year(Date), chick.ts, egg.ts) |>
pivot_longer(cols = c(chick.ts, egg.ts),
names_to = "series",
values_to = "value") |>
ggplot(aes(x = year, y = value, col = series)) +
geom_line(linewidth = 1) + scale_y_log10()
Before running the Granger-Causality test, we need to establish the forecasting framework that emphasizes the procedure. The basic idea is to compare two autoregressive models for egg production: one that uses past egg counts and one that uses lagged chicken counts. Under our null hypothesis, adding chicken history does not improve the ability to predict egg output. Under the alternative, past chicken levels carry information about future egg production. By estimating both models with a lag length of three years and conducting an F-test on additional coefficients, we can assess whether the chicken abundance Granger-causes egg production.
lmtest::grangertest(egg.ts ~ chick.ts, order = 3, data = chickegg)
## Granger causality test
##
## Model 1: egg.ts ~ Lags(egg.ts, 1:3) + Lags(chick.ts, 1:3)
## Model 2: egg.ts ~ Lags(egg.ts, 1:3)
## Res.Df Df F Pr(>F)
## 1 44
## 2 47 -3 0.59 0.62
The first test yields an F statistic of 0.59 with a p-value of 0.62. Because the p-value is well above 0.05, we fail to reject the null hypothesis. This means past chicken counts do not provide any further predictive assistance in determining egg output beyond what egg history already does. This leads us to our alternative test:
grangertest(chick.ts ~ egg.ts, order = 3, data = chickegg)
## Granger causality test
##
## Model 1: chick.ts ~ Lags(chick.ts, 1:3) + Lags(egg.ts, 1:3)
## Model 2: chick.ts ~ Lags(chick.ts, 1:3)
## Res.Df Df F Pr(>F)
## 1 44
## 2 47 -3 5.4 0.003 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Running our reverse test returns a p-value of 0.003 which allows us to reject the null hypothesis at the 1% level. This means past egg production carries significant information for forecasting future chicken abundance. These tests conclude that egg production Granger-causes chicken abundance, but chicken abundance does not Granger-cause egg production.
Our analysis of annual chicken and egg series from 1930 to 1983 using Granger-Causality tests reveals a clear asymmetry in predictive power: past egg production helps forecast future chicken populations, whereas past chicken counts do not improve forecasts of egg output. By fitting nested autoregressive models with a three-year lag and applying an F-test to the additional coefficients, we have shown that egg production “Granger-causes” chicken abundance at conventional significance levels.
These findings illustrate how a formal causal-inference framework can shed light on long-standing origin questions, transforming the classic “chicken or egg” puzzle into an empirical exercise in information flow. Although Granger-Causality stops short of proving true biological causation, it provides a rigorous way to identify which series leads the other in a predictive sense. Future work could extend this approach to finer-scale data, explore potential common drivers, or incorporate structural models to probe underlying mechanisms of reproduction and population dynamics.