Universidad Privada Boliviana / Universidad del Pacífico (Lima, Perú)
Author
Prof. J. Dávalos (Ph.D.)
Superconsistency and Cointegration
Spurious regression
Spurious regressions are identified when relating non-stationary variables that depend on the same deterministic trend.
Consult the link for a variety of spurious relationships examples. Many time series in the link are time dependent, probably exhibiting a unit-root, thus difference stationary: link
Now you are well versed in the concept of deterministic and stochastic trends associated to a unit-root process such as a random-walk with drift (constant).
Like in our previous link, let’s consider 2 unrelated variables whatsoever, both exhibiting unit-roots with drift process (\(X_t\) and \(Z_t\)). We can write them as:
\(Z_t = Z_0 + a_z t +\sum_{j=0}^{t-1}\varepsilon_{t-j}\)
\(X_t = X_0 + a_x t +\sum_{j=0}^{t-1}\eta_{t-j}\)
where \(\{\varepsilon_t\}\),\(\{\eta_t\}\) are two independent WN process, \(a_z\) and \(a_x\) are the variable specific drifts and \(X_0\) and \(Z_0\) are variable specific initial conditions.
A naive OLS regression between them leads to:
\(X_t = \alpha_0 + \alpha_1 Z_t + u_t\)
The slope parameter OLS estimator (\(x_t =X_t - \bar X\) and \(z_t =Z_t - \bar Z\) ) is :
Its expectation \(E(\hat \alpha) = \frac{a_x}{a_z}\) . The relationship is just artificially given by the drifts’ ratio, irrespective of their structural relationship or economic meaning.
If Z has no drift, the estimated slope converges to infinity (high spurious relationship).
Exercise 1
Find the estimated intercept expectation \(E(\hat \alpha_0)\). For the sake of simplicity assume \(X_0 = Z_0 = 0\). In which case will this constant be large? Explain.
Integration order I(d)
A given variable integration order is denoted I(d)
It only applies to DS (difference stationary) process, not trend stationary ones.
Let \(X_t\) be a DS stationary variable. It is said to be integrated of order \(d\), \(I(d)\), if its d-th difference \(\Delta^dX_t\) is \(I(0)\)
In our previous session, you probably found that the log of the US GDP (I(1)) was stationary once first differentiated. \(\Delta \log GDP\) is \(I(0)\).
Traditional econometric estimation (OLS) demands for \(I(0)\) variables and WN error for statistical inference to hold.
Cointegration - (super)consistency
In the presence of I(d) variables (d>0) traditional statistical inference still holds as long as the residual of the relationship between I(d) variables is well behaved i.e. it is stationary, it must not exhibit a unit-root. Let \(Y_t\) and \(X_t\) share the integration order I(1), where both exhibit unit-root and are difference stationary:
\(Y_t = X_t \beta + \varepsilon_t\)
This relationship will not be spurious if and only if \(\varepsilon \sim I(0)\). \(Y_t\) and \(X_t\) (integrated variables) are said to be Cointegrated.
Example 1
Interest rates are often I(1) uroot process that may eventually move together. Its spread (difference) is just some sort of residual. As you can see, it does not behave as an stationary process. This relationship is spurious despite having economic theory linking both segments of the interest rate curve:
Example 2
Criptocurrencies tend to move according the overall sentiment in their market. Consider Bitcoin and Ethereum prices (both unit root process). Their residual, is more likely to be stationary i.e. they are likely to be cointegrated
Let’s recall that the OLS estimator in a cross sectional model:
\(Y =X\beta + \varepsilon\) was said it to be consistent i.e. its probability limit converges to the true value with probability 1 as the sample size goes to infinity:
The term \(plim_{n\rightarrow\infty} (X'X)^{-1}X'\varepsilon\) goes to 0 at a speed of \(\sqrt{n}\).
In a TS setup with cointegrated series, the OLS estimator converges quickly to the true value.
In a cross section or with stationary I(0) TS the variance of the OLS estimator, a measure of its precision, reduces at rate \(1/T\) (the greater the sample T, the better)
In cointegrated regressions, the variance reduces at a rate of \(1/T^2\). This is even better, this is super-consistency
This is a blessing considering that ARMA process required large T (and stationarity) before starting any modeling. With cointegrated series, T=10 is as precise as T=100 with a non-cointegrated series.
So before FD your I(1) unit-root variables, check for cointegration.
Some examples in economics and finance
The permanent income hypothesis describes how agents spread their consumption out over their lifetime based on their expected income. Consumption and income tend to be cointegrated.
Purchasing power parity is a theory that relates the prices of a basket of goods across different countries. Nominal exchange rates and domestic and foreign prices tend to be cointegrated.
The present value model of stock prices implies a long-run relationship between stock prices and their dividends or earnings. Stock prices and dividends or earnings tend to be cointegrated
Some examples beyond economics
Joint mortality models imply a long-run relationship between mortality rates across different demographics. Male and female mortality rates are cointegrated.
Comorbidities of different types of cancers (colon, pancreas, lungs, etc.) and trends in medical welfare tend to be cointegrated.
Testing Cointegration: Engle - Granger test
Robert Engle III (the ARCH guy) and Clive Granger (the guy behind the ARCH guy) developed a cointegration test.
It just requires implementing an ADF on the residuals of the suspected cointegrating relationship to test for the presence of a unit-root.
Example
In this example, we test for a cointegrating relationship between personal income and consumption using quarterly data from the US.
Notice that you need to first run the ADF test manually in order to identify the best possible lag order. This is the general to specific approach.
Once chosen then run the ADF test on the residuals using 10 lags.
The Engle-Granger cointegration test does not reject the null, thus the series are not cointegrated
clearallwebuse balance2des* OLS to generate the residualsreg c ypredictresid, r* Engle Granger -> ADF over the residuals* Do not forget to implemente the general to specific approach * inorder to chose your lagsdfullerresid, lags(10) regress// * we donot reject the null* now we check that the residuals are well behaved* Manual ADF residuals:regd.residl.residl.d.resid l2.d.resid///l3.d.resid l4.d.resid l5.d.resid l6.d.resid///l7.d.resid l8.d.resid l9.d.resid l10.d.resid///predict residadf, rcorrgram residadf , lags(12)
Consider the relationship between gold and silver prices. They seem correlated. If a meaningful long-run relationship, you could consider diversifying your investments in such commodities by not having great exposure to both metals. Cointegration would mean that a negative shock in one market, would affect the other and viceversa. Use the data available in the kaggle competition to analyse these commodities (link).