(a) Create a single time series plot with two log prices

library(Quandl)
Quandl.api_key("KmxULt3z1Vz1neVxGioB")

library("vars")
library("urca")
library("gdata")
library("stargazer")

allwti<-Quandl("FRED/MCOILWTICO", type="zoo")
allgas<-Quandl("FRED/GASREGCOVM", type="zoo")
oil<-window(allwti,start= 1995 + 0/12, end= 2017 + 2/12)
gas<-window(allgas,start=1995 + 0/12, end=2017 + 2/12)
loil<-log(oil)
lgas<-log(gas)
dloil<-diff(loil,1)
dlgas<-diff(lgas,1)
par(mfrow=c(2,1), cex=0.5, mar=c(2,2,3,1))
plot(loil, xlab="Year", ylim=c(-1,5), main="Log(oil) and Log(gas)")
lines(lgas, xlab="Year",  lty="dashed",col="red")
plot(dloil, xlab="Year", ylab="", main="dLog(oil) and dLog(gas)")
lines(dlgas, xlab="Year", ylab="", lty="dashed",col="red")

Since the data of US regular conventional Gas Price are until march 2017, we trim WTI either until march 2017.

We plot both series, and also plot their first differences.

(b) Perform unit root tests to determin whether log(gas) and log(oil) are I(0) or I(1)

Now we do unit root tests to test for stationarity.

summary(ur.ers(loil, type="P-test", lag.max=8, model="trend") )
## 
## ############################################### 
## # Elliot, Rothenberg and Stock Unit Root Test # 
## ############################################### 
## 
## Test of type P-test 
## detrending of series with intercept and trend 
## 
## Value of test-statistic is: 9.6438 
## 
## Critical values of P-test are:
##                 1pct 5pct 10pct
## critical values 3.96 5.62  6.89
summary(ur.ers(lgas, type="P-test", lag.max=8, model="trend") )
## 
## ############################################### 
## # Elliot, Rothenberg and Stock Unit Root Test # 
## ############################################### 
## 
## Test of type P-test 
## detrending of series with intercept and trend 
## 
## Value of test-statistic is: 8.3139 
## 
## Critical values of P-test are:
##                 1pct 5pct 10pct
## critical values 3.96 5.62  6.89
summary(ur.ers(dloil, type="P-test", lag.max=8, model="trend") )
## 
## ############################################### 
## # Elliot, Rothenberg and Stock Unit Root Test # 
## ############################################### 
## 
## Test of type P-test 
## detrending of series with intercept and trend 
## 
## Value of test-statistic is: 0.8329 
## 
## Critical values of P-test are:
##                 1pct 5pct 10pct
## critical values 3.96 5.62  6.89
summary(ur.ers(dlgas, type="P-test", lag.max=8, model="trend") )
## 
## ############################################### 
## # Elliot, Rothenberg and Stock Unit Root Test # 
## ############################################### 
## 
## Test of type P-test 
## detrending of series with intercept and trend 
## 
## Value of test-statistic is: 0.4757 
## 
## Critical values of P-test are:
##                 1pct 5pct 10pct
## critical values 3.96 5.62  6.89

From the summary statistics, we can see that the log(gas) and log(oil) are not stationary and have a unit root. The first differences of both do not reject the null hypothesis, so the first differences are consistent with being stationary.

(c) Determin the number of lags to include in cointegration analysis using Schwarz information criterion. Run the Johansen’s trace and maximum eigenvalue cointegration tests for (log(gas), log(oil)) using the sample from January 1995 to December 2010. Use time series plots from (a) as a guide to determine the specification of the deterministic components in the cointegration test (i.e. whether to use Case 2, Case 3, of Case 4 cointegration test).

noil<-window(allwti,start= 1995 + 0/12, end= 2010 + 11/12)
ngas<-window(allgas,start=1995 + 0/12, end=2010 + 11/12)
lnoil<-log(noil)
lngas<-log(ngas)
y<-cbind(lnoil, lngas)
colnames(y) <- c("lnoil","lngas")
y <- na.trim(y)
VAR.SC <- VARselect(y, lag.max=10, type="const")
nlags <- VAR.SC$selection["SC(n)"]
nlags
## SC(n) 
##     2

According to SC, two lags are determined.

y.CA <- ca.jo(y, ecdet="const", type="trace", K=nlags, spec="transitory")
summary(y.CA)
## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: trace statistic , without linear trend and constant in cointegration 
## 
## Eigenvalues (lambda):
## [1] 2.219532e-01 1.088453e-02 2.499666e-18
## 
## Values of teststatistic and critical values of test:
## 
##           test 10pct  5pct  1pct
## r <= 1 |  2.08  7.52  9.24 12.97
## r = 0  | 49.76 17.85 19.96 24.60
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##           lnoil.l1   lngas.l1   constant
## lnoil.l1  1.000000  1.0000000  1.0000000
## lngas.l1 -1.597557 -0.3523755 -0.5200684
## constant -2.750321 -3.8984030 -2.9333378
## 
## Weights W:
## (This is the loading matrix)
## 
##           lnoil.l1     lngas.l1      constant
## lnoil.d 0.04302644 -0.012575530 -1.133703e-16
## lngas.d 0.24104491 -0.005055413 -7.689967e-16
y.CA <- ca.jo(y, ecdet="const", type="eigen", K=nlags, spec="transitory")
summary(y.CA)
## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration 
## 
## Eigenvalues (lambda):
## [1] 2.219532e-01 1.088453e-02 2.499666e-18
## 
## Values of teststatistic and critical values of test:
## 
##           test 10pct  5pct  1pct
## r <= 1 |  2.08  7.52  9.24 12.97
## r = 0  | 47.68 13.75 15.67 20.20
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##           lnoil.l1   lngas.l1   constant
## lnoil.l1  1.000000  1.0000000  1.0000000
## lngas.l1 -1.597557 -0.3523755 -0.5200684
## constant -2.750321 -3.8984030 -2.9333378
## 
## Weights W:
## (This is the loading matrix)
## 
##           lnoil.l1     lngas.l1      constant
## lnoil.d 0.04302644 -0.012575530 -1.133703e-16
## lngas.d 0.24104491 -0.005055413 -7.689967e-16

The value of the trace test statistic 49.76 is greater than the critical value for H0=0. Looking at the eigen test, the test statistic 47.68 is greater than the critical value for h0=0. According to both tests, they suggest that the data of Gas and Oil are in fact cointegrated.

lttest(y.CA, r=1)
## LR-test for no linear trend
## 
## H0: H*2(r<=1)
## H1: H2(r<=1)
## 
## Test statistic is distributed as chi-square
## with 1 degress of freedom
##         test statistic p-value
## LR test           1.13    0.29

The results of lttest indicates that the cointegration relationship is Case 2.

When we use the time series plots from (a), we can assume that it looks like Case 4 because of the existence of the similar trend and significant gap between two time series.

(d) Estate a bivariate VEC model

Since there is no cointegrating relationship, we estimate an unrestricted bivariate VEC model.

The aducstment parameters are 0.04303 and 0.24104.

unres.vec <- cajorls(y.CA, r=1)
unres.vec
## $rlm
## 
## Call:
## lm(formula = substitute(form1), data = data.mat)
## 
## Coefficients:
##            lnoil.d   lngas.d 
## ect1        0.04303   0.24104
## lnoil.dl1   0.24114   0.13740
## lngas.dl1  -0.01889   0.31063
## 
## 
## $beta
##               ect1
## lnoil.l1  1.000000
## lngas.l1 -1.597557
## constant -2.750321

(e) Are the adjustment parameters a1 and a2 in the estimated VEC model statistically significant? Are their signs consistent with error correction mechanism that moves the system back to the long run equilibrium, whenever there is a disruption and z_(t-1)=0?

summary(unres.vec$rlm)
## Response lnoil.d :
## 
## Call:
## lm(formula = lnoil.d ~ ect1 + lnoil.dl1 + lngas.dl1 - 1, data = data.mat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27369 -0.04779  0.01256  0.06549  0.22540 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)  
## ect1       0.04303    0.06828   0.630    0.529  
## lnoil.dl1  0.24114    0.10675   2.259    0.025 *
## lngas.dl1 -0.01889    0.13564  -0.139    0.889  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08402 on 187 degrees of freedom
## Multiple R-squared:  0.06273,    Adjusted R-squared:  0.0477 
## F-statistic: 4.172 on 3 and 187 DF,  p-value: 0.006895
## 
## 
## Response lngas.d :
## 
## Call:
## lm(formula = lngas.d ~ ect1 + lnoil.dl1 + lngas.dl1 - 1, data = data.mat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21201 -0.02562  0.00405  0.03528  0.14106 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## ect1       0.24104    0.04286   5.624 6.72e-08 ***
## lnoil.dl1  0.13740    0.06701   2.050 0.041717 *  
## lngas.dl1  0.31063    0.08514   3.648 0.000342 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05274 on 187 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3491 
## F-statistic: 34.97 on 3 and 187 DF,  p-value: < 2.2e-16

a2 is statistically significant at 0.001 level but a1 is not. Since a1 and a2 is bigger than 0, the signs indicate that there is no the long run equilibrium.

(f) Reestimate the VEC model with a restriction a2= 0.

rest.mat <- matrix(c(1,0), c(2,1))
summary(alrtest(y.CA, A=rest.mat, r=1) )
## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Estimation and testing under linear restrictions on beta 
## 
## The VECM has been estimated subject to: 
## beta=H*phi and/or alpha=A*psi
## 
##      [,1]
## [1,]    1
## [2,]    0
## 
## Eigenvalues of restricted VAR (lambda):
## [1] 0.0963 0.0000 0.0000
## 
## The value of the likelihood ratio test statistic:
## 28.46 distributed as chi square with 1 df.
## The p-value of the test statistic is: 0 
## 
## Eigenvectors, normalised to first column
## of the restricted VAR:
## 
##                [,1]
## RK.lnoil.l1  1.0000
## RK.lngas.l1 -1.5603
## RK.constant -2.7847
## 
## Weights W of the restricted VAR:
## 
##         [,1]
## [1,] -0.2321
## [2,]  0.0000

(g) What is the intuition for imposing the restriction in (f)? Hint: Think about how under this restriction log(gas) and log(oil) will change over time in response to a disruption such that z_(t-1)>0?.

For the long un equilibrium, we need to have at least one zero for a1 or a2. When a2 is zero but a1 is not, we can assume that gas price moves randomly and oil price follows it.

(h) Create and plot sequence of one month ahead forecasts for the period 2011M1-2017M4.Obtain the RMSE for the forecast of the gas price.

library(forecast)

yall <- cbind(loil,lgas)
firstM <- 1995
lastM <- 2010 + (11/12)

yall.p1 <- window(yall, end=lastM)
yall.p2 <- window(yall, start=lastM+1/12)

y.VAR.f1 <-data.frame()
y.VAR.f2 <-data.frame()

for(i in 1:length(yall.p2[,2]))
{
  y <- window(yall, end = lastM + (i-1)/12)
  y.CA <- ca.jo(y, ecdet="const", type="eigen", K=2, spec="transitory")
  y.VAR <- vec2var(y.CA, r=1)
  y.VAR.updt <- predict(y.VAR, n.ahead=1)
  y.VAR.f1 <-rbind(y.VAR.f1, as.ts(y.VAR.updt$fcst$loil))
  y.VAR.f2 <-rbind(y.VAR.f2, as.ts(y.VAR.updt$fcst$lgas))
  
} 

y.VAR.f1 <- ts(y.VAR.f1, start=2011, frequency=12)
y.VAR.f2 <- ts(y.VAR.f2, start=2011, frequency=12)


lgas <-as.ts(yall[,2])
lgas.p1 <-as.ts(yall.p1[,2])
lgas.p2 <-as.ts(yall.p2[,2])


plot(lgas, type="o",pch=16, lty="solid", main="One month ahead Forecasts - lgas", col="red", xlim=c(2010, 2017))
lines(y.VAR.f2[,1], type="o", pch=16, lty="dashed")

accuracy(lgas.p2, y.VAR.f2)
##                   ME       RMSE        MAE      MPE     MAPE      ACF1
## Test set -0.01908698 0.05083083 0.03776906 -2.18051 4.138812 0.1506658
##          Theil's U
## Test set        NA

Therefore, RMSE is 0.05083083.

(i) Create a correlogram for dl(gas). Use it to identify a suitable AR or MA or ARMA model. Estimate this AR/MA/ARMA model.

According to the results of ACF and PACF, we will use ARMA(2,6) for forecasts.

par(mfrow=c(2,1), cex=1, mar=c(3,4,3,3))
Acf(dlgas, type="correlation", lag=48, main="ACF - dlgas")
Acf(dlgas, type="partial", lag=48, main="PACF - dlgas")

arma_dlgas <- Arima(dlgas, order=c(2, 0, 6))
tsdiag(arma_dlgas, gof.lag=24)

(j) Create and plot sequence of one month ahead forecasts for the period 2011M1-2017M4 using the model from (i).Obtain the RMSE for the forecast of the gas price.

library(forecast)
dlgas2 <- window(dlgas, start=2011 + 0/12)
fcst_dlgas <-zoo()
lastM2 <- 2010+11/12
for(i in 1: length(dlgas2))
{
  y <- window(dlgas, end = lastM2+(i-1)/12)
  fcst_dlgas.updt <- arima(y, order=c(2,0,6))
  fcst_dlgas <- c(fcst_dlgas, forecast(fcst_dlgas.updt, 1)$mean)
}  
fcst_dlgas <-as.ts(fcst_dlgas)

plot(fcst_dlgas, type="o", pch=16, xlim=c(2010, 2017), ylim=c(-0.41, 0.2), main="One month ahead Forecasts - dlgas", lty="solid")
lines(dlgas, type="l", pch=16, lty="dashed", col = "blue")
lines(dlgas2, type="o", pch=16, lty="dashed", col = "red")

accuracy(fcst_dlgas, dlgas2)
##                    ME       RMSE        MAE MPE MAPE
## Test set -0.009691859 0.05387919 0.04103562 Inf  Inf

Therefore, RMSE is 0.05387919.

(k) Create a correlogram for dl(oil). Use it to identify a suitable AR or MA or ARMA model. Estimate this AR/MA/ARMA model.

The former one’s RMSE is 0.05083083 and the later one’s RMSE is 0.05387919. Therefore, the former one has better RMSE.

(l) Based on your results in (k), is there any advantage to using the VEC model to forecast the price of the gas, instead of using a simple univariate AR/MA/ARMA model?

Yes, there is. In VEC model, there is the error correction term and it reduces the deviation. If there are cointegrated variables, VEC model provides better results, smaller RMSE.