Introduction
- Spurious regression
- Returns
Cointegration
- The Engle-Granger method
- URCA and VARS
Cointegration
- Test the random walk
Cointegration
Cointegration and pairs
Johansen test
Bibliography

Introduction

A major issue in time series analysis is non-stationary data or data that has a unit root. Non-stationary means that the moments (mean, standard deviation, skew, kurtosis and serial correlation) are stable across different samples. A unit root means that in the equation:

\[y_t = \rho y_{t-1} + \varepsilon\]

$\rho$ is estimated as unit or one. If this is the case, any shock (from $\varepsilon$) will be permanent and y will wander in an unconstrained manner. In that case, there is no reason why the mean (or any other descriptive statistics that we calculate for y) will be stable over different samples.

set.seed(124)
ry <- rnorm(100)
y <- cumsum(1 + ry)
plot(y, type = 'l', main = "Random walk")
abline(v = 50, lty = 2)
amean <- round(mean(y[1:50]),0)
bmean <- round(mean(y[51:100]),0)
text(20, 90, labels = "A", cex = 1.2)
text(80, 90, labels = "B", cex = 1.2)
text(25, 40, labels = paste("Mean part A equals ", amean))
text(75, 40, labels = paste("Mean part B equals ", bmean))

Spurious regression

The major problem with non-stationary data is that it will cause spurious regression. Recall that the equation for correlation is:

\[corr_{y,x} = \frac{\sum_{t = 1}^{t = 100} (y_t - \bar{y}) \times \sum_{t = 1}^{t = 100} (x_t - \bar{x})}{Var(y)Var(x)}\]

Imagine two series with a unit root like diagram above, they are sure to have a positive correlation (not matter what they represent). Make sure that you are clear why this is the case.

Returns

We usually deal with this by calculating returns so that we have stationary data. We can see this from a plot of ry

plot(ry, type = 'l', main = "Plot of stationary series")

Cointegration

Dealing only with returns, first difference or stationary data is usually fine, but there are times when there may be interesting information in the levels. Engle and Granger (1987) applied this to US income and consumption. It is clear from the figure below that neither income nor consumption are stationary. However, they move together as consumption depends on income and therefore they must move together or be cointegrated.

da <- read.csv('../../Data/PICon.csv')
da$DATE <- as.Date(da$DATE, format = "%Y-%m-%d")
das <- subset(da, da$DATE >= as.Date("2000-01-01"))
plot(das$DATE, das$PCE, type = 'l', main = "US personal income and consumption",
     xlab = 'Date', ylab = 'USD', lty = 2, ylim = c(min(das$PCE, das$PI),
                                                    max(das$PCE,das$PI))) 
lines(da$DATE, da$PI, type = 'l', col = 'red')
legend('topleft', inset = 0.01, legend = c('Consumption', 'Income'),
       col = c('black', 'red'), lty = c(2,1))

$\label{fig:PICon}$

The Engle-Granger method

If there is cointegration between two series, the residuals from the regression of PCE on PI should be stationary. Therefore, the Engle-Granger method involves two steps:

Run a regression of the two variables
Test whether the residuals from the regression have a unit root
If there is no unit root, use the residuals as a variable in the difference equation.

The first regression gives the long-run equilibrium

\[y_t = \delta_0 + \delta_1 x_t + u_t\]

so that

\[u_t = y_t - \hat \delta_0 -\hat \delta_1 x_t\] is the error-correction term

This will give the following error correction model:

\[\Delta y_t = \phi_0 + \sum_{j = 1}\phi_j \Delta y_{t - j} + \sum_{h = 0} \theta_h \Delta x_{t-h} + \alpha \hat{u}_{t-1} + \varepsilon_t\] where $\Delta y_t$ is the dependent variable (Personal spending in this case), $\phi_0$ is the intercept, constant or trend, while the other $\phi$s are coefficients on lagged values of the dependent variable, $\theta$ are coefficients on the current and lagged values of the independent variable (personal income here), $\alpha$ is the coefficient on the deviation of from long-term equilibrium (which must be negative), $\hat{u}_{t-1}$ is the lagged deviation from the long-term equilibrium.

Engle-Granger is a rather crude and simplistic method with some limitations.

URCA and VARS

There are two R packages that can be used for testing unit roots and cointegration.

URCA has a number of functions for testing for unit roots. These include the Augmented Dickey-Fuller tests. The URCA package is on CRAN
VARS has functions for cointegration and other time series techniques. The VARS package is on CRAN

You can find out more about using these packages on the Bernhard Pfaff site. This is a vignette that walks through the VARS package.

summary(eq1 <- lm(log(da$PCE) ~ log(da$PI)))

## 
## Call:
## lm(formula = log(da$PCE) ~ log(da$PI))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30186 -0.01387  0.00054  0.01679  0.05979 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.3576418  0.0069563  -51.41   <2e-16 ***
## log(da$PI)   1.0131451  0.0008306 1219.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02866 on 780 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
## F-statistic: 1.488e+06 on 1 and 780 DF,  p-value: < 2.2e-16

plot(da$DATE[da$DATE <= as.Date('2020-01-01')], eq1$residuals[da$DATE <= as.Date('2020-01-01')], type = 'l', xlab = 'Date', ylab = 'Residuals')

Cointegration

We can use the urca package to find unit root tests. The documentation will discuss a number of different options. The Augmented Dicky-Fuller Test (ADF) is the most popular. This is based on testing the null that $\rho$ is equal to one in the following equation:

\[y_t = \rho y_{t-1} + \varepsilon_t\] The usual test is then

\[\Delta y_t = (\rho - 1)y_{t-1} + \varepsilon_t\] or

\[\Delta y_y = \delta y_{t-1} + \varepsilon_t\]

where $\delta$ is equal to $\rho - 1$ so that if $\delta = 0, \rho = 1$

In practice, it is usual to add lags of the dependent variable $(y)$ to remove serial correlation and to test three forms:

Test for a unit root (as above)

\[\Delta y_t = \delta y_{t-1} + \varepsilon_t\]

Test for unit root with constant (mean does not equal zero)

\[\Delta y_t = \alpha_0 + \delta y_{t-1} + \varepsilon_t\]

Test for a unit root and time trend

\[\Delta y_t = \alpha_0 + \alpha_1 t + \delta y_{t-1} + \varepsilon_t\] The critical values are not standard and change for each version of the test. From looking at the residuals, it appears that we are dealing with no drift and no time trend.

Test the random walk

require(urca)
summary(ur.df(eq1$residuals))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.252646 -0.003360  0.000370  0.003931  0.094149 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.09859    0.01770  -5.570 3.51e-08 ***
## z.diff.lag -0.25396    0.03464  -7.332 5.71e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01369 on 778 degrees of freedom
## Multiple R-squared:  0.1266, Adjusted R-squared:  0.1244 
## F-statistic:  56.4 on 2 and 778 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -5.57 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

We can reject the null of a unit root in the residuals. This suggests that there is cointegration between US Personal Income and Consumption.

Cointegration

The error correction model can be specified.

# Create the percentage change in income and consumption
da$DPI <- c(NA, da$PI[2:length(da$PI)]/da$PI[1:length(da$PI) - 1] - 1)
da$DPCE <- c(NA, da$PCE[2:length(da$PCE)]/da$PCE[1:length(da$PCE) - 1] - 1)
# Create the lagged long run equation
da$LLRE <- c(NA, eq1$residuals[1:length(eq1$residuals) -1])
# Create the lagged changes in income and consumption
da$LDPI <- c(NA, da$DPI[2:length(da$DPI) -1])
da$LDPCE <- c(NA, da$DPCE[2:length(da$DPCE) -1])
# Now run the regression
head(da)

##         DATE    PI   PCE         DPI         DPCE       LLRE        LDPI
## 1 1959-01-01 391.8 306.1          NA           NA         NA          NA
## 2 1959-02-01 393.7 309.6 0.004849413  0.011434172 0.03231622          NA
## 3 1959-03-01 396.5 312.7 0.007112014  0.010012920 0.03878423 0.004849413
## 4 1959-04-01 399.9 312.2 0.008575032 -0.001598977 0.04156735 0.007112014
## 5 1959-05-01 402.4 316.1 0.006251563  0.012491992 0.03131638 0.008575032
## 6 1959-06-01 404.8 318.2 0.005964215  0.006643467 0.03741697 0.006251563
##          LDPCE
## 1           NA
## 2           NA
## 3  0.011434172
## 4  0.010012920
## 5 -0.001598977
## 6  0.012491992

head(eq1$residuals)

##          1          2          3          4          5          6 
## 0.03231622 0.03878423 0.04156735 0.03131638 0.03741697 0.03801380

summary(eq2 <- lm(DPCE ~ DPI + LDPCE + LDPI + LLRE, data = da))

## 
## Call:
## lm(formula = DPCE ~ DPI + LDPCE + LDPI + LLRE, data = da)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.123371 -0.002630  0.000008  0.003118  0.058894 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0046064  0.0004308  10.692  < 2e-16 ***
## DPI          0.0389133  0.0267569   1.454   0.1463    
## LDPCE        0.0648587  0.0343478   1.888   0.0594 .  
## LDPI         0.0325459  0.0280296   1.161   0.2459    
## LLRE        -0.0839947  0.0104646  -8.027 3.71e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007892 on 775 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.097,  Adjusted R-squared:  0.09234 
## F-statistic: 20.81 on 4 and 775 DF,  p-value: 2.601e-16

Cointegration and pairs

The error correction model that was pioneered by Engle and Granger can be used to identify pairs of securities that move together. For example, looking at the US 2-year to 10-year spread.

da1 <- read.csv('../../EC387/Pairs/Data/Yield.csv', skip = 11, stringsAsFactors = FALSE) 
da1[,1] <- as.Date(da1[, 1], format = "%Y-%m-%d")
da1[, 3] <- as.numeric(da1[, 3])
da1[, 2] <- as.numeric(da1[, 2])
colnames(da1) <- c('Date', 'BY10', 'BY2')
plot(da1$Date, da1$BY10, type = 'l',  main = "US bond yields", 
     xlab = "Date", ylab = 'Yield', ylim = c(0.0, 4.0))
lines(da1$Date, da1$BY2, col = 'red', lty = 2)
legend('topright', inset = 0.01, legend = c("10-year", "2-year"), 
       col = c('black', 'red'), lty = c(1,2))

The difference between the two can be seen here:

plot(da1$Date, da1$BY10 - da1$BY2, type = 'l', main = 'Spread between 10 and 2 year yield', xlab = 'Date', ylab = 'Yield')

summary(eq3 <- lm(da1$BY10 ~ da1$BY2 + seq(1:length(da1$BY10))))

## 
## Call:
## lm(formula = da1$BY10 ~ da1$BY2 + seq(1:length(da1$BY10)))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.81274 -0.22249 -0.05199  0.30128  0.79111 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              1.7319740  0.0349414   49.57   <2e-16 ***
## da1$BY2                  2.1947874  0.0544909   40.28   <2e-16 ***
## seq(1:length(da1$BY10)) -0.0003093  0.0000279  -11.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.356 on 1248 degrees of freedom
##   (53 observations deleted due to missingness)
## Multiple R-squared:  0.6363, Adjusted R-squared:  0.6357 
## F-statistic:  1092 on 2 and 1248 DF,  p-value: < 2.2e-16

The next step is to test whether these residuals are stationary.

summary(ur.df(lm(da1$BY10 ~ da1$BY2 + seq(1:length(da1$BY10)))$residuals, 'none'))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.167615 -0.028723 -0.001475  0.027048  0.211414 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## z.lag.1    -0.006232   0.003681  -1.693   0.0907 .
## z.diff.lag -0.070625   0.028310  -2.495   0.0127 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04608 on 1247 degrees of freedom
## Multiple R-squared:  0.007808,   Adjusted R-squared:  0.006217 
## F-statistic: 4.907 on 2 and 1247 DF,  p-value: 0.00754
## 
## 
## Value of test-statistic is: -1.6929 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

It is not possible to reject the null of a unit root. This knocks the belief that there is a long-run relationship between 10 and 2 year yields.

Johansen test

The weakness of the Engle-Granger method is that it has no systematic way of selecting dependent variables, relies on two step estimation (which means that any error in the first step will carry over to the second and becomes increasingly difficult for case where there are more than two variables (which is fine for pairs trading but less useful for many economic theories).

The Johansen technique is a more general process that can accommodate a wider variety of models Johansen and Juselius (1990). It is based on the estimation of a Vector Autoregression (VAR) type model.

\[\boldsymbol{\Delta y_{t}} = \boldsymbol{A y_{t-1}} + \varepsilon_t\]

where $\boldsymbol{y_t}$ is an (n x 1) vector of variables and $ is an (n x n) matrix of parameters to be estimated. Just like the Dickey-Fuller case, we can re-arrange by taking the vector $\boldsymbol{y_t}$ from each side to give:

\[\Delta \boldsymbol{y_t} = (\boldsymbol{A} - \boldsymbol{I}) \boldsymbol{y_{t-1}} + \varepsilon_t\] where $\boldsymbol{I}$ is the identity matrix. Now we test the rank of the matrix \[\boldsymbol{A-I}\]:

if the rank equals 0 there is no cointegrating relationship
if the rank equal n (the number of variables) the variables are stationary
Otherwise, there are r cointegrating vectors.

Example 1

This is a toy example that is inspired by quantstart

Fist load the required packages and set the seed so that the random numbers can be reproduced.

library("quantmod")
library("tseries")
library("urca")
set.seed(123)

No we will simulate a conintegrated series by creating a random walk process z as well as three variables (p, q and r) that share this common random walk.

z <- rep(0, 10000)
for (i in 2:10000) z[i] <- z[i-1] + rnorm(1)
p <- q <- r <- rep(0, 10000)
p <- 0.3*z + rnorm(10000)
q <- 0.6*z + rnorm(10000)
r <- 0.8*z + rnorm(10000)

Exercise 1.0

Test that p, q and r have a unit root.

Plot

We can look at the data that have been created.

plot(p, type = 'l', head = 'Plot p, q, r', 
     ylim = c(min(p,q,r), max(p,q,r)))
lines(q, col = 'red', lty = 1)
lines(r, col = 'blue', lty = 3)

Now use the co.ja function to make the test.

jotest=ca.jo(data.frame(p,q,r), type="trace", K=2, ecdet="none", spec="longrun")
summary(jotest)

## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: trace statistic , with linear trend 
## 
## Eigenvalues (lambda):
## [1] 0.338903321 0.330365610 0.001431603
## 
## Values of teststatistic and critical values of test:
## 
##             test 10pct  5pct  1pct
## r <= 2 |   14.32  6.50  8.18 11.65
## r <= 1 | 4023.76 15.66 17.95 23.52
## r = 0  | 8161.48 28.71 31.52 37.22
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##           p.l2        q.l2     r.l2
## p.l2  1.000000  1.00000000 1.000000
## q.l2  1.791324 -0.52269002 1.941449
## r.l2 -1.717271  0.01589134 2.750312
## 
## Weights W:
## (This is the loading matrix)
## 
##           p.l2         q.l2          r.l2
## p.d -0.1381095 -0.771055116 -0.0003442470
## q.d -0.2615348  0.404161806 -0.0006863351
## r.d  0.2439540 -0.006556227 -0.0009068179

The trace test suggests that there two cointegrating vectors. The cointegrating vectors can be studied.

s = 1.000*p + 1.791324*q - 1.717271*r
plot(s, type="l")

This looks stationary. We can test.

summary(ur.df(1 * p + 1.79324 * q - 1.717271 * r))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7540 -1.7870  0.0349  1.8312 11.1542 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -1.0253281  0.0143290  -71.56   <2e-16 ***
## z.diff.lag -0.0008041  0.0100023   -0.08    0.936    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.7 on 9996 degrees of freedom
## Multiple R-squared:  0.5131, Adjusted R-squared:  0.513 
## F-statistic:  5266 on 2 and 9996 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -71.5562 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

Where the unit root is rejected.

Exchange rate example

Here is a constructed exchange rate example

First we create the three random walks for home prices and overseas prices and make the exchange rate a function of the difference between the two as if relative PPP holds and these are all logs.

p <- rep(0, 1000)
for(i in 2:length(p)) p[i] = p[i-1] + rnorm(1)
ps <- rep(0, 1000)
for(i in 2:length(ps)) ps[i] = ps[i-1] + rnorm(1)
e <- p - ps + rnorm(100)
plot(p, type = 'l', main = 'Plot of p, ps and e', 
     ylim = c(min(p, ps, e), max(p, ps, e)))
lines(ps, lty = 2, col = 'red')
lines(e, lty = 3, col = 'blue')
legend('topleft', inset = 0.02, legend = 
         c('p (home price)', 'ps (overseas price', 'e (nominal exchange rate'), 
       lty = c(1, 2, 3), col = c('black', 'red', 'blue'), cex = 0.7)

Now ensure that they are unit root variables (this does not really need to be done as the Johansen test will identify that).

summary(ur.df(p))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5847 -0.6096  0.0591  0.7000  2.5789 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## z.lag.1    -0.000950   0.001788  -0.531    0.595
## z.diff.lag -0.001787   0.031720  -0.056    0.955
## 
## Residual standard error: 0.9903 on 996 degrees of freedom
## Multiple R-squared:  0.0002894,  Adjusted R-squared:  -0.001718 
## F-statistic: 0.1442 on 2 and 996 DF,  p-value: 0.8658
## 
## 
## Value of test-statistic is: -0.5313 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

summary(ur.df(ps))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2532 -0.7059 -0.0633  0.6384  3.3699 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## z.lag.1    -0.002093   0.002350  -0.890    0.373
## z.diff.lag -0.025246   0.031664  -0.797    0.425
## 
## Residual standard error: 1.016 on 996 degrees of freedom
## Multiple R-squared:  0.001498,   Adjusted R-squared:  -0.0005068 
## F-statistic: 0.7473 on 2 and 996 DF,  p-value: 0.4739
## 
## 
## Value of test-statistic is: -0.8904 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

summary(ur.df(e))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2782 -1.1290  0.0122  1.3393  5.6245 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.0008372  0.0020400  -0.410    0.682    
## z.diff.lag -0.1990108  0.0309833  -6.423 2.06e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.837 on 996 degrees of freedom
## Multiple R-squared:  0.04022,    Adjusted R-squared:  0.0383 
## F-statistic: 20.87 on 2 and 996 DF,  p-value: 1.32e-09
## 
## 
## Value of test-statistic is: -0.4104 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

Now carry out the Johansen test.

summary(ca.jo(data.frame(p, ps, e), type = 'trace', ecdet = 'none'))

## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: trace statistic , with linear trend 
## 
## Eigenvalues (lambda):
## [1] 0.400291780 0.006913555 0.002912503
## 
## Values of teststatistic and critical values of test:
## 
##            test 10pct  5pct  1pct
## r <= 2 |   2.91  6.50  8.18 11.65
## r <= 1 |   9.83 15.66 17.95 23.52
## r = 0  | 520.12 28.71 31.52 37.22
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##             p.l2        ps.l2      e.l2
## p.l2   1.0000000  1.000000000 1.0000000
## ps.l2 -0.9996278 -2.295756188 0.8902483
## e.l2  -0.9995624  0.005164786 0.1162132
## 
## Weights W:
## (This is the loading matrix)
## 
##             p.l2        ps.l2         e.l2
## p.d  -0.05370175 -0.002114855 -0.003607123
## ps.d -0.12847242  0.004119232 -0.001868676
## e.d   1.21972053 -0.005577312 -0.001961220

From these results indicate that there is one cointegrating vector (which is the case by design). The test statistic will reject the null hypothesis that the rank of the matrix is zero. However, we cannot reject the null that $r <= 1$.

The cointegrating vector is $p - ps - e = 0$ or $e = p - ps$. If we plot that relationship

plot(p - ps - e, type = 'l', main = 'Cointegrating relationship')

This is a stationary series

summary(ur.df(p - ps- e, type = 'none'))

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7428 -0.6406 -0.1167  0.5121  2.6150 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -1.14182    0.04489 -25.434  < 2e-16 ***
## z.diff.lag  0.11940    0.03136   3.808 0.000149 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8955 on 996 degrees of freedom
## Multiple R-squared:  0.5176, Adjusted R-squared:  0.5166 
## F-statistic: 534.4 on 2 and 996 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -25.4343 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

Which rejects the null.

The carjorls function from the urca package will return the OLS regression of a restricted VECM if the ca.jo object and rank is provided.

nomexecc <- ca.jo(data.frame(e, p, ps), type = 'trace', spec = 'longrun')
summary(nomexecc)

## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: trace statistic , with linear trend 
## 
## Eigenvalues (lambda):
## [1] 0.400291780 0.006913555 0.002912503
## 
## Values of teststatistic and critical values of test:
## 
##            test 10pct  5pct  1pct
## r <= 2 |   2.91  6.50  8.18 11.65
## r <= 1 |   9.83 15.66 17.95 23.52
## r = 0  | 520.12 28.71 31.52 37.22
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##            e.l2      p.l2    ps.l2
## e.l2   1.000000    1.0000 1.000000
## p.l2  -1.000438  193.6189 8.604875
## ps.l2  1.000065 -444.5017 7.660475
## 
## Weights W:
## (This is the loading matrix)
## 
##             e.l2          p.l2         ps.l2
## e.d  -1.21918673 -2.880562e-05 -0.0002279196
## p.d   0.05367825 -1.092277e-05 -0.0004191952
## ps.d  0.12841620  2.127495e-05 -0.0002171648

summary(cajools(nomexecc))

## Response e.d :
## 
## Call:
## lm(formula = e.d ~ constant + e.dl1 + p.dl1 + ps.dl1 + e.l2 + 
##     p.l2 + ps.l2 - 1, data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9682 -1.0794 -0.0032  1.0545  5.2369 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)    
## constant  0.28322    0.12975   2.183   0.0293 *  
## e.dl1    -1.05523    0.05676 -18.590   <2e-16 ***
## p.dl1     1.03504    0.07436  13.919   <2e-16 ***
## ps.dl1   -0.97706    0.07365 -13.265   <2e-16 ***
## e.l2     -1.21944    0.08110 -15.036   <2e-16 ***
## p.l2      1.21218    0.08128  14.914   <2e-16 ***
## ps.l2    -1.20821    0.08144 -14.836   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.611 on 991 degrees of freedom
## Multiple R-squared:  0.2647, Adjusted R-squared:  0.2595 
## F-statistic: 50.98 on 7 and 991 DF,  p-value: < 2.2e-16
## 
## 
## Response p.d :
## 
## Call:
## lm(formula = p.d ~ constant + e.dl1 + p.dl1 + ps.dl1 + e.l2 + 
##     p.l2 + ps.l2 - 1, data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7000 -0.6522  0.0229  0.6805  2.5547 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)  
## constant  0.13333    0.07970   1.673   0.0947 .
## e.dl1     0.03015    0.03487   0.865   0.3874  
## p.dl1    -0.03520    0.04568  -0.771   0.4411  
## ps.dl1    0.05586    0.04524   1.235   0.2172  
## e.l2      0.05325    0.04982   1.069   0.2854  
## p.l2     -0.05942    0.04993  -1.190   0.2342  
## ps.l2     0.05533    0.05002   1.106   0.2690  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9899 on 991 degrees of freedom
## Multiple R-squared:  0.006126,   Adjusted R-squared:  -0.0008944 
## F-statistic: 0.8726 on 7 and 991 DF,  p-value: 0.5276
## 
## 
## Response ps.d :
## 
## Call:
## lm(formula = ps.d ~ constant + e.dl1 + p.dl1 + ps.dl1 + e.l2 + 
##     p.l2 + ps.l2 - 1, data = data.mat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2395 -0.6488 -0.0155  0.6613  3.4827 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)  
## constant -0.17103    0.08149  -2.099   0.0361 *
## e.dl1     0.06106    0.03565   1.713   0.0871 .
## p.dl1    -0.05466    0.04671  -1.170   0.2421  
## ps.dl1    0.02157    0.04626   0.466   0.6411  
## e.l2      0.12822    0.05094   2.517   0.0120 *
## p.l2     -0.12622    0.05105  -2.473   0.0136 *
## ps.l2     0.11730    0.05115   2.293   0.0220 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.012 on 991 degrees of freedom
## Multiple R-squared:  0.01324,    Adjusted R-squared:  0.006272 
## F-statistic:   1.9 on 7 and 991 DF,  p-value: 0.06639

cajorls(nomexecc, r = 1)$rlm

## 
## Call:
## lm(formula = substitute(form1), data = data.mat)
## 
## Coefficients:
##           e.d       p.d       ps.d    
## ect1      -1.21919   0.05368   0.12842
## constant   0.03895   0.02023  -0.01434
## e.dl1     -1.05523   0.03039   0.06133
## p.dl1      1.04011  -0.03173  -0.05662
## ps.dl1    -0.98421   0.05429   0.02817

cajorls(nomexecc, r = 1)$beta

##            ect1
## e.l2   1.000000
## p.l2  -1.000438
## ps.l2  1.000065

Exercise 2.0

Compare the change in the exchange rate with that forecast by the error correction model.
Use the PPP data (on My Studies) to assess Purchasing Power Parity for the Indian Rupee vs US Dollar Exchange rate.

Bibliography

Engle, Robert F, and Clive WJ Granger. 1987. “Co-Integration and Error Correction: Representation, Estimation, and Testing.” Econometrica: Journal of the Econometric Society, 251–76.

Johansen, Soren, and Katarina Juselius. 1990. “Maximum Likelihood Estimation and Inference on Cointegration - with Applications to the Demad for Money.” Oxford Bulletin of Economics and Statistics 52 (2): 169–210.

Pairs and cointegration

Rob Hayward

Introduction

Spurious regression

Returns

Cointegration

The Engle-Granger method

URCA and VARS

Cointegration

Test the random walk

Cointegration

Cointegration and pairs

Johansen test

Example 1

Exercise 1.0

Plot

Exchange rate example

Exercise 2.0

Bibliography