Time Series 2024 Project Topic 1

Forecasting financial instruments prices with VECM and ARIMA models
Author

Anindita Basu & Vikram Bahadur

Published

September 5, 2024

1 Importing the data

First, let’s load necessary libraries:

library(xts)
library(lmtest)
library(tidyverse)
library(vars)
library(quantmod)
library(tseries)
library(forecast)

Let’s also load the additional function:

source("../code/functions/testdf.R")
source("../code/functions/utils.R")
source("../code/functions/testdfModified.R")

We will work with the Data concerning prices of financial instrument prices: y1, y2,y3,y4,y5,y6,y7,y8,y9,y10

Now, we have to import the data:

# Start date 01/06/21
# End date 27/03/22
# total dataset 300 rows
# Training set from 01/06/21 to 07/03/22, 280 rows 
# Forecasting set from 08/03/22 to 27/03/22, 20 rows

TS2 <- read.csv("../data/TSA_2024_project_data_1.csv")

The structure of the data:

TS2 %>% glimpse()
Rows: 300
Columns: 11
$ date <chr> "2021-06-01", "2021-06-02", "2021-06-03", "2021-06-04", "2021-06-…
$ y1   <dbl> 176.04, 173.37, 173.42, 178.85, 175.39, 173.89, 167.80, 162.47, 1…
$ y2   <dbl> 173.57, 173.57, 173.57, 174.52, 174.46, 174.54, 174.57, 175.37, 1…
$ y3   <dbl> 173.57, 173.57, 175.70, 177.89, 181.10, 184.44, 187.00, 188.86, 1…
$ y4   <dbl> 188.66, 188.46, 188.57, 188.78, 188.11, 187.61, 188.22, 188.68, 1…
$ y5   <dbl> 183.66, 183.52, 183.75, 182.79, 185.46, 190.18, 196.22, 201.14, 2…
$ y6   <dbl> 178.78, 181.10, 187.25, 190.82, 187.30, 195.26, 196.95, 207.64, 2…
$ y7   <dbl> 168.46, 168.41, 168.47, 170.08, 170.15, 170.33, 170.32, 171.42, 1…
$ y8   <dbl> 173.57, 173.57, 173.57, 173.57, 172.94, 171.78, 173.37, 174.22, 1…
$ y9   <dbl> 173.57, 173.57, 173.57, 173.18, 174.50, 176.99, 179.91, 182.35, 1…
$ y10  <dbl> 188.35, 188.37, 189.36, 190.46, 192.73, 194.14, 195.44, 196.21, 1…
head(TS2)
date y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
2021-06-01 176.04 173.57 173.57 188.66 183.66 178.78 168.46 173.57 173.57 188.35
2021-06-02 173.37 173.57 173.57 188.46 183.52 181.10 168.41 173.57 173.57 188.37
2021-06-03 173.42 173.57 175.70 188.57 183.75 187.25 168.47 173.57 173.57 189.36
2021-06-04 178.85 174.52 177.89 188.78 182.79 190.82 170.08 173.57 173.18 190.46
2021-06-05 175.39 174.46 181.10 188.11 185.46 187.30 170.15 172.94 174.50 192.73
2021-06-06 173.89 174.54 184.44 187.61 190.18 195.26 170.33 171.78 176.99 194.14
tail(TS2)
date y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
295 2022-03-22 268.68 196.26 289.20 174.51 36.93 97.94 204.83 145.53 100.31 246.30
296 2022-03-23 266.55 196.34 288.83 174.58 36.87 87.23 205.03 145.76 100.22 245.85
297 2022-03-24 266.38 196.25 287.98 174.18 35.33 78.05 205.08 144.81 99.39 245.64
298 2022-03-25 272.44 197.49 287.54 174.47 34.93 85.77 206.84 144.90 99.22 245.77
299 2022-03-26 267.64 198.03 286.01 173.46 35.01 95.43 208.08 143.75 99.22 244.87
300 2022-03-27 264.95 198.61 285.43 174.40 33.51 95.25 208.76 145.46 98.49 244.08

We have to correct the type of the date variable:

TS2$date <- as.Date(TS2$date, format = "%Y-%m-%d")
TS2 %>% glimpse()
Rows: 300
Columns: 11
$ date <date> 2021-06-01, 2021-06-02, 2021-06-03, 2021-06-04, 2021-06-05, 2021…
$ y1   <dbl> 176.04, 173.37, 173.42, 178.85, 175.39, 173.89, 167.80, 162.47, 1…
$ y2   <dbl> 173.57, 173.57, 173.57, 174.52, 174.46, 174.54, 174.57, 175.37, 1…
$ y3   <dbl> 173.57, 173.57, 175.70, 177.89, 181.10, 184.44, 187.00, 188.86, 1…
$ y4   <dbl> 188.66, 188.46, 188.57, 188.78, 188.11, 187.61, 188.22, 188.68, 1…
$ y5   <dbl> 183.66, 183.52, 183.75, 182.79, 185.46, 190.18, 196.22, 201.14, 2…
$ y6   <dbl> 178.78, 181.10, 187.25, 190.82, 187.30, 195.26, 196.95, 207.64, 2…
$ y7   <dbl> 168.46, 168.41, 168.47, 170.08, 170.15, 170.33, 170.32, 171.42, 1…
$ y8   <dbl> 173.57, 173.57, 173.57, 173.57, 172.94, 171.78, 173.37, 174.22, 1…
$ y9   <dbl> 173.57, 173.57, 173.57, 173.18, 174.50, 176.99, 179.91, 182.35, 1…
$ y10  <dbl> 188.35, 188.37, 189.36, 190.46, 192.73, 194.14, 195.44, 196.21, 1…

Let’s also transform the data.frame into an xts object

TS2 <- xts(TS2[, -1], order.by = TS2$date)
TS280 <- TS2[index(TS2) < as.Date("2022-03-08"), ]

We will work 10 variables of y1,y2, y3,y4…….y10. Let’s create their first differences.

TS280$dy1 <- diff.xts(TS280$y1)
TS280$dy2 <- diff.xts(TS280$y2)
TS280$dy3 <- diff.xts(TS280$y3)
TS280$dy4 <- diff.xts(TS280$y4)
TS280$dy5 <- diff.xts(TS280$y5)
TS280$dy6 <- diff.xts(TS280$y6)
TS280$dy7 <- diff.xts(TS280$y7)
TS280$dy8 <- diff.xts(TS280$y8)
TS280$dy9 <- diff.xts(TS280$y9)
TS280$dy10 <- diff.xts(TS280$y10)
head(TS280)
               y1     y2     y3     y4     y5     y6     y7     y8     y9
2021-06-01 176.04 173.57 173.57 188.66 183.66 178.78 168.46 173.57 173.57
2021-06-02 173.37 173.57 173.57 188.46 183.52 181.10 168.41 173.57 173.57
2021-06-03 173.42 173.57 175.70 188.57 183.75 187.25 168.47 173.57 173.57
2021-06-04 178.85 174.52 177.89 188.78 182.79 190.82 170.08 173.57 173.18
2021-06-05 175.39 174.46 181.10 188.11 185.46 187.30 170.15 172.94 174.50
2021-06-06 173.89 174.54 184.44 187.61 190.18 195.26 170.33 171.78 176.99
              y10   dy1   dy2  dy3   dy4   dy5   dy6   dy7   dy8   dy9 dy10
2021-06-01 188.35    NA    NA   NA    NA    NA    NA    NA    NA    NA   NA
2021-06-02 188.37 -2.67  0.00 0.00 -0.20 -0.14  2.32 -0.05  0.00  0.00 0.02
2021-06-03 189.36  0.05  0.00 2.13  0.11  0.23  6.15  0.06  0.00  0.00 0.99
2021-06-04 190.46  5.43  0.95 2.19  0.21 -0.96  3.57  1.61  0.00 -0.39 1.10
2021-06-05 192.73 -3.46 -0.06 3.21 -0.67  2.67 -3.52  0.07 -0.63  1.32 2.27
2021-06-06 194.14 -1.50  0.08 3.34 -0.50  4.72  7.96  0.18 -1.16  2.49 1.41

Next, we plot both variables on the graph:

plot(TS280[, 1:10],
     col = c("black", "blue", "green", "yellow", "brown", "red", "orange", "khaki", "skyblue", "purple"),
     major.ticks = "years", 
     grid.ticks.on = "years",
     grid.ticks.lty = 3,
     main = "main",
     legend.loc = "topleft")

All price variables visualization.

2 Finding the cointegrating pair

2.1 Testing integration order

In this step, we will perform ADF test on each variable and on its first difference. We found that all variables failed to reject null hypothesis (not stationary) but their first differences reject it and show ~ I(1)

  • Let’s apply the ADF test for the y1 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -1.612473 0.4564815 4.5254137 0.0333950
1 -1.810867 0.3826206 0.0010027 0.9747394
2 -1.788709 0.3908699 0.0013712 0.9704610
3 -1.543457 0.4821758 0.0416617 0.8382660

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -14.690558 0.01 0.0027080 0.9584978
1 -11.220359 0.01 0.0044057 0.9470789
2 -10.819399 0.01 0.0392761 0.8429027
3 -8.547158 0.01 0.0001623 0.9898360

  • Let’s apply the ADF test for the y2 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 0.8659506 0.9900000 119.0880563 0.0000000
1 -0.5511737 0.8515967 11.5634066 0.0006726
2 -1.1720047 0.6204651 0.0238451 0.8772798
3 -1.1295237 0.6362805 0.0010299 0.9743985

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -7.562637 0.01 11.1625768 0.0008346
1 -4.988917 0.01 0.0410216 0.8394962
2 -4.956121 0.01 0.0007462 0.9782074
3 -4.773155 0.01 0.0000002 0.9996025

  • Let’s apply the ADF test for the y3 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -0.2015725 0.9304983 209.3657823 0.0000000
1 -1.9384662 0.3351162 9.9240848 0.0016313
2 -1.3632683 0.5492588 0.0017696 0.9664456
3 -1.3354933 0.5595993 0.0035390 0.9525620

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -4.123160 0.01 10.8047950 0.0010124
1 -5.094278 0.01 0.0075211 0.9308907
2 -4.959386 0.01 0.0046342 0.9457261
3 -4.144990 0.01 0.0120286 0.9126670

  • Let’s apply the ADF test for the y4 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -1.447157 0.5180277 7.6607276 0.0056436
1 -1.474528 0.5078375 0.0021985 0.9626025
2 -1.503331 0.4971142 0.0005155 0.9818865
3 -1.551921 0.4790246 0.0295377 0.8635434

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -14.021324 0.01 0.0015227 0.9688733
1 -10.510680 0.01 0.0002192 0.9881867
2 -8.495650 0.01 0.0257672 0.8724702
3 -5.710124 0.01 0.4717194 0.4921972

  • Let’s apply the ADF test for the y5 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -0.4634981 0.8842378 204.6114803 0.0000000
1 -2.0778473 0.2832255 27.7168181 0.0000001
2 -1.3675042 0.5476818 1.1042529 0.2933347
3 -1.6141238 0.4558669 0.0454257 0.8312232

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -4.568147 0.01 29.1327008 0.0000001
1 -6.551754 0.01 1.0459685 0.3064374
2 -5.150499 0.01 0.0568243 0.8115875
3 -5.215452 0.01 0.0005999 0.9804591

  • Let’s apply the ADF test for the y6 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -1.541854 0.4827726 0.3768067 0.5393172
1 -1.381901 0.5423220 0.0000028 0.9986577
2 -1.368216 0.5474167 0.0008816 0.9763125
3 -1.347570 0.5551033 0.0001197 0.9912712

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -17.535836 0.01 0.0001988 0.9887502
1 -11.932475 0.01 0.0009958 0.9748254
2 -9.689908 0.01 0.0002580 0.9871838
3 -8.193554 0.01 0.0006183 0.9801619

  • Let’s apply the ADF test for the y7 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 0.8268819 0.9900000 114.3208399 0.0000000
1 -0.5611149 0.8478956 12.2393024 0.0004679
2 -1.2165169 0.6038935 0.0135719 0.9072575
3 -1.1935022 0.6124617 0.0005247 0.9817255

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -7.750137 0.01 11.8212132 0.0005856
1 -5.008263 0.01 0.0289482 0.8648986
2 -4.937986 0.01 0.0003131 0.9858821
3 -4.811167 0.01 0.0001403 0.9905510

  • Let’s apply the ADF test for the y8 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -1.431945 0.5236908 13.4724760 0.0002421
1 -1.488012 0.5028175 0.0002090 0.9884658
2 -1.491126 0.5016582 0.0000157 0.9968406
3 -1.534106 0.4856570 0.1528621 0.6958151

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -13.270918 0.01 0.0005735 0.9808937
1 -10.396324 0.01 0.0000715 0.9932540
2 -8.099815 0.01 0.1402874 0.7079955
3 -5.386180 0.01 0.9022017 0.3421920

  • Let’s apply the ADF test for the y9 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -0.4605766 0.8853255 205.2272260 0.0000000
1 -2.0872618 0.2797205 29.9385175 0.0000000
2 -1.3408348 0.5576107 1.3903570 0.2383448
3 -1.6236822 0.4523083 0.0413206 0.8389203

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -4.541503 0.01 31.3781304 0.0000000
1 -6.622375 0.01 1.3276430 0.2492250
2 -5.115429 0.01 0.0532809 0.8174495
3 -5.214835 0.01 0.0006054 0.9803698

  • Let’s apply the ADF test for the y10 variable and its first differences.
Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -0.2448094 0.9244931 159.3485111 0.0000000
1 -1.3686879 0.5472412 6.6095975 0.0101431
2 -1.5941724 0.4632947 0.2248960 0.6353345
3 -1.2816226 0.5796550 0.0401467 0.8411941

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -5.967757 0.01 6.0503095 0.0139039
1 -4.507198 0.01 0.2194413 0.6394659
2 -5.115771 0.01 0.0321375 0.8577262
3 -4.457715 0.01 0.0015244 0.9688561

2.2 Finding cointegrated pair

We are following Engle Granger’s two steps method (All time series are integrated of the same order from previous step) + Step 1: Estimation of the cointegrating vector using standard OLS regression yt = β0 + β1xt + ut + Step 2: Testing whether residuals ut obtained in previous step are stationary

pair p_adf result cointegration_vector
(y= 1 ,y= 2 ) 0.33939 Not Conintegrated
(y= 1 ,y= 3 ) 0.26743 Not Conintegrated
(y= 1 ,y= 4 ) 0.06483 Not Conintegrated
(y= 1 ,y= 5 ) 0.10939 Not Conintegrated
(y= 1 ,y= 6 ) 0.20572 Not Conintegrated
(y= 1 ,y= 7 ) 0.33993 Not Conintegrated
(y= 1 ,y= 8 ) 0.12226 Not Conintegrated
(y= 1 ,y= 9 ) 0.10933 Not Conintegrated
(y= 1 ,y= 10 ) 0.26898 Not Conintegrated
(y= 2 ,y= 3 ) 0.72717 Not Conintegrated
(y= 2 ,y= 4 ) 0.94807 Not Conintegrated
(y= 2 ,y= 5 ) 0.97572 Not Conintegrated
(y= 2 ,y= 6 ) 0.37713 Not Conintegrated
(y= 2 ,y= 7 ) 0.01000 Conintegrated [1, 68.1851311723423 , 0.625236077108806 ]
(y= 2 ,y= 8 ) 0.95193 Not Conintegrated
(y= 2 ,y= 9 ) 0.97568 Not Conintegrated
(y= 2 ,y= 10 ) 0.72547 Not Conintegrated
(y= 3 ,y= 4 ) 0.92518 Not Conintegrated
(y= 3 ,y= 5 ) 0.63933 Not Conintegrated
(y= 3 ,y= 6 ) 0.04209 Conintegrated [1, 458.357113699047 , -1.09976516009988 ]
(y= 3 ,y= 7 ) 0.24327 Not Conintegrated
(y= 3 ,y= 8 ) 0.93852 Not Conintegrated
(y= 3 ,y= 9 ) 0.63780 Not Conintegrated
(y= 3 ,y= 10 ) 0.01000 Conintegrated [1, -203.015788110552 , 1.99754785215493 ]
(y= 4 ,y= 5 ) 0.37860 Not Conintegrated
(y= 4 ,y= 6 ) 0.35362 Not Conintegrated
(y= 4 ,y= 7 ) 0.37412 Not Conintegrated
(y= 4 ,y= 8 ) 0.01000 Conintegrated [1, 101.667786218057 , 0.500832238386179 ]
(y= 4 ,y= 9 ) 0.37904 Not Conintegrated
(y= 4 ,y= 10 ) 0.68387 Not Conintegrated
(y= 5 ,y= 6 ) 0.36053 Not Conintegrated
(y= 5 ,y= 7 ) 0.71763 Not Conintegrated
(y= 5 ,y= 8 ) 0.58779 Not Conintegrated
(y= 5 ,y= 9 ) 0.01000 Conintegrated [1, -163.627642701986 , 2.00036509575433 ]
(y= 5 ,y= 10 ) 0.60186 Not Conintegrated
(y= 6 ,y= 7 ) 0.09845 Not Conintegrated
(y= 6 ,y= 8 ) 0.33174 Not Conintegrated
(y= 6 ,y= 9 ) 0.19019 Not Conintegrated
(y= 6 ,y= 10 ) 0.02405 Conintegrated [1, 374.824324478322 , -0.789528588779833 ]
(y= 7 ,y= 8 ) 0.94874 Not Conintegrated
(y= 7 ,y= 9 ) 0.97387 Not Conintegrated
(y= 7 ,y= 10 ) 0.71908 Not Conintegrated
(y= 8 ,y= 9 ) 0.40423 Not Conintegrated
(y= 8 ,y= 10 ) 0.72315 Not Conintegrated
(y= 9 ,y= 10 ) 0.60126 Not Conintegrated

List of cointegrating pairs we found are

pair p_adf aic bic
(y= 2 ,y= 7 ) 0.01000 -375.6940 -364.78962
(y= 3 ,y= 6 ) 0.04209 2752.2384 2763.14272
(y= 3 ,y= 10 ) 0.01000 288.0713 298.97563
(y= 4 ,y= 8 ) 0.01000 -110.0208 -99.11645
(y= 5 ,y= 9 ) 0.01000 -461.1635 -450.25918
(y= 6 ,y= 10 ) 0.02405 2465.8013 2476.70562

We selected y4,y8 to move forward

2.3. Let’s examine the selected model summary:

model.coint <- lm(y4 ~ y8, data = TS280)
summary(model.coint)

Call:
lm(formula = y4 ~ y8, data = TS280)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.47565 -0.14376  0.00293  0.13099  0.50487 

Coefficients:
               Estimate  Std. Error t value Pr(>|t|)    
(Intercept) 101.6677862   0.1159222   877.0   <2e-16 ***
y8            0.5008322   0.0007632   656.2   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1974 on 278 degrees of freedom
Multiple R-squared:  0.9994,    Adjusted R-squared:  0.9994 
F-statistic: 4.306e+05 on 1 and 278 DF,  p-value: < 2.2e-16

We can observe that y8 has high significance on standard OLS regression model of Engle Granger’s method and if y8 increases by 1 unit y4 increases by 0.500849 units in the long run.

Next, we have to test stationarity of residuals. We found that the residuals of the model(y4-y8) is white noise as null hypothesis is rejected.

Stationarity Test Data
augmentations adf p_adf bgodfrey p_bg
0 -14.78343 0.01 0.0045200 0.9463977
1 -11.29182 0.01 0.0053064 0.9419294
2 -10.88541 0.01 0.0152850 0.9016062
3 -8.64154 0.01 0.0054240 0.9412908

The ADF test with no augmentations can be used its result is that non-stationarity of residuals is STRONGLY REJECTED, so residuals are stationary, which means that y4 and y8 are cointegrated.

3 Estimating VECM

3.1 Johansen cointegration test

We have already performed a univariate cointegration test of Engle and Granger. Our conclusions are that y4 and y8 variables are cointegrated of order ~CI(1,1).

As an alternative, we will now perform a multivariate test of Johansen. We will assume the K=6 lag structure.

     Johansen Trace test
johan.test.trace <- 
  ca.jo(TS280[, c("y4", "y8")], # data 
        ecdet = "const", # "none" for no intercept in cointegrating equation, 
        # "const" for constant term in cointegrating equation and 
        # "trend" for trend variable in cointegrating equation
        type = "trace",  # type of the test: trace or eigen
        K = 6,           # lag order of the series (levels) in the VAR
    ) 
summary(johan.test.trace) 

###################### 
# Johansen-Procedure # 
###################### 

Test type: trace statistic , without linear trend and constant in cointegration 

Eigenvalues (lambda):
[1] 1.554181e-01 1.037403e-02 3.122502e-17

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  2.86  7.52  9.24 12.97
r = 0  | 49.14 17.85 19.96 24.60

Eigenvectors, normalised to first column:
(These are the cointegration relations)

                y4.l6       y8.l6    constant
y4.l6       1.0000000   1.0000000   1.0000000
y8.l6      -0.5004317  -0.5965232  -0.6146598
constant -101.7277663 -88.0195945 -81.1869793

Weights W:
(This is the loading matrix)

         y4.l6      y8.l6               constant
y4.d -2.133772 0.03181788 -0.0000000000003024740
y8.d -2.378166 0.07274095 -0.0000000000003385324

Let’s find interpretation of the test results:

cbind(summary(johan.test.trace)@teststat, summary(johan.test.trace)@cval)
                   10pct  5pct  1pct
r <= 1 |  2.857331  7.52  9.24 12.97
r = 0  | 49.139638 17.85 19.96 24.60

Let’s recall that if the test statistic is SMALLER than the critical value, we CANNOT reject the null. If the test statistic is LARGER than the critical value, we REJECT the null.

We start with testing the hypothesis that r = 0. The testing statistic is greater than the critical value, hence the null is rejected (at 5% level).

Next, we test that r <= 1. In this case, the testing statistic is lower that the critical value, hence the null is NOT rejected.

We can say there is 1 cointegrating vector for pair y4 and y8

4 b. It is the first eigenvector (the first column):

summary(johan.test.trace)@V
                y4.l6       y8.l6    constant
y4.l6       1.0000000   1.0000000   1.0000000
y8.l6      -0.5004317  -0.5965232  -0.6146598
constant -101.7277663 -88.0195945 -81.1869793

Weights W:

summary(johan.test.trace)@W
         y4.l6      y8.l6               constant
y4.d -2.133772 0.03181788 -0.0000000000003024740
y8.d -2.378166 0.07274095 -0.0000000000003385324

Let’s apply the alternative variant of the test: Maximal Eigen Value Test

johan.test.eigen <- 
  ca.jo(TS280[, c("y4", "y8")], # data 
        ecdet = "const", # "none" for no intercept in cointegrating equation, 
        # "const" for constant term in cointegrating equation and 
        # "trend" for trend variable in cointegrating equation
        type = "eigen",  # type of the test: trace or eigen
        K = 6,           # lag order of the series (levels) in the VAR
) 
summary(johan.test.eigen) 

###################### 
# Johansen-Procedure # 
###################### 

Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration 

Eigenvalues (lambda):
[1] 1.554181e-01 1.037403e-02 3.122502e-17

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  2.86  7.52  9.24 12.97
r = 0  | 46.28 13.75 15.67 20.20

Eigenvectors, normalised to first column:
(These are the cointegration relations)

                y4.l6       y8.l6    constant
y4.l6       1.0000000   1.0000000   1.0000000
y8.l6      -0.5004317  -0.5965232  -0.6146598
constant -101.7277663 -88.0195945 -81.1869793

Weights W:
(This is the loading matrix)

         y4.l6      y8.l6               constant
y4.d -2.133772 0.03181788 -0.0000000000003024740
y8.d -2.378166 0.07274095 -0.0000000000003385324

The conclusions are the same: 1 cointegrating vector same as Trace test. The variant of the Johansen test does not have impact on the parameters of cointegrating vector.

3.2 The VECM model

The Trace and Maximal Eigen Value give the same cointegrating vector, so any of the two can be used to estimate VECM.

Estimation of VECM Model

TS280.vec4 <- cajorls(johan.test.eigen, # defined specification
                        r = 1) # number of cointegrating vectors
TS280.vec4 %>% head(3)
$rlm

Call:
lm(formula = substitute(form1), data = data.mat)

Coefficients:
        y4.d     y8.d   
ect1    -2.1338  -2.3782
y4.dl1  -1.0154  -0.2652
y8.dl1   0.6428   0.3681
y4.dl2  -1.1390  -0.5001
y8.dl2   0.5728   0.2340
y4.dl3  -1.4837  -0.8777
y8.dl3   0.7555   0.4572
y4.dl4  -1.5764  -1.1756
y8.dl4   0.9628   0.9257
y4.dl5  -1.8125  -1.6470
y8.dl5   0.8059   0.6391


$beta
                 ect1
y4.l6       1.0000000
y8.l6      -0.5004317
constant -101.7277663
summary(TS280.vec4)
     Length Class  Mode   
rlm  12     mlm    list   
beta  3     -none- numeric

To print results we may refer to the element called $rlm and produce its summary with significance tests.

str(TS280.vec4)
List of 2
 $ rlm :List of 12
  ..$ coefficients : num [1:11, 1:2] -2.134 -1.015 0.643 -1.139 0.573 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
  .. .. ..$ : chr [1:2] "y4.d" "y8.d"
  ..$ residuals    : num [1:274, 1:2] 0.722 -0.108 -0.767 0.313 -0.363 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:274] "2021-06-01" "2021-06-02" "2021-06-03" "2021-06-04" ...
  .. .. ..$ : chr [1:2] "y4.d" "y8.d"
  ..$ effects      : num [1:274, 1:2] 1.314 -1.983 -2.856 0.157 0.46 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:274] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
  .. .. ..$ : chr [1:2] "y4.d" "y8.d"
  ..$ rank         : int 11
  ..$ fitted.values: num [1:274, 1:2] -0.1124 0.56786 0.26744 -0.40324 -0.00748 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:274] "2021-06-01" "2021-06-02" "2021-06-03" "2021-06-04" ...
  .. .. ..$ : chr [1:2] "y4.d" "y8.d"
  ..$ assign       : int [1:11] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ qr           :List of 5
  .. ..$ qr   : num [1:274, 1:11] -3.26651 -0.03909 -0.00542 0.05887 -0.04972 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:274] "2021-06-01" "2021-06-02" "2021-06-03" "2021-06-04" ...
  .. .. .. ..$ : chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
  .. .. ..- attr(*, "assign")= int [1:11] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ qraux: num [1:11] 1.02 1.04 1 1.07 1.15 ...
  .. ..$ pivot: int [1:11] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ tol  : num 0.0000001
  .. ..$ rank : int 11
  .. ..- attr(*, "class")= chr "qr"
  ..$ df.residual  : int 263
  ..$ xlevels      : Named list()
  ..$ call         : language lm(formula = substitute(form1), data = data.mat)
  ..$ terms        :Classes 'terms', 'formula'  language z@Z0 ~ ect1 + y4.dl1 + y8.dl1 + y4.dl2 + y8.dl2 + y4.dl3 + y8.dl3 + y4.dl4 +      y8.dl4 + y4.dl5 + y8.dl5 - 1
  .. .. ..- attr(*, "variables")= language list(z@Z0, ect1, y4.dl1, y8.dl1, y4.dl2, y8.dl2, y4.dl3, y8.dl3, y4.dl4,      y8.dl4, y4.dl5, y8.dl5)
  .. .. ..- attr(*, "factors")= int [1:12, 1:11] 0 1 0 0 0 0 0 0 0 0 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:12] "z@Z0" "ect1" "y4.dl1" "y8.dl1" ...
  .. .. .. .. ..$ : chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
  .. .. ..- attr(*, "term.labels")= chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
  .. .. ..- attr(*, "order")= int [1:11] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..- attr(*, "intercept")= int 0
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: 0x7fec381470c8> 
  .. .. ..- attr(*, "predvars")= language list(z@Z0, ect1, y4.dl1, y8.dl1, y4.dl2, y8.dl2, y4.dl3, y8.dl3, y4.dl4,      y8.dl4, y4.dl5, y8.dl5)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:12] "nmatrix.2" "numeric" "numeric" "numeric" ...
  .. .. .. ..- attr(*, "names")= chr [1:12] "z@Z0" "ect1" "y4.dl1" "y8.dl1" ...
  ..$ model        :'data.frame':   274 obs. of  12 variables:
  .. ..$ z@Z0  : num [1:274, 1:2] 0.61 0.46 -0.5 -0.09 -0.37 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : chr [1:2] "y4.d" "y8.d"
  .. ..$ ect1  : num [1:274] 0.0723 -0.1277 -0.0177 0.1923 -0.1624 ...
  .. ..$ y4.dl1: num [1:274] -0.5 0.61 0.46 -0.5 -0.09 ...
  .. ..$ y8.dl1: num [1:274] -1.16 1.59 0.85 -1.77 -0.78 ...
  .. ..$ y4.dl2: num [1:274] -0.67 -0.5 0.61 0.46 -0.5 ...
  .. ..$ y8.dl2: num [1:274] -0.63 -1.16 1.59 0.85 -1.77 ...
  .. ..$ y4.dl3: num [1:274] 0.21 -0.67 -0.5 0.61 0.46 ...
  .. ..$ y8.dl3: num [1:274] 0 -0.63 -1.16 1.59 0.85 ...
  .. ..$ y4.dl4: num [1:274] 0.11 0.21 -0.67 -0.5 0.61 ...
  .. ..$ y8.dl4: num [1:274] 0 0 -0.63 -1.16 1.59 ...
  .. ..$ y4.dl5: num [1:274] -0.2 0.11 0.21 -0.67 -0.5 ...
  .. ..$ y8.dl5: num [1:274] 0 0 0 -0.63 -1.16 ...
  .. ..- attr(*, "terms")=Classes 'terms', 'formula'  language z@Z0 ~ ect1 + y4.dl1 + y8.dl1 + y4.dl2 + y8.dl2 + y4.dl3 + y8.dl3 + y4.dl4 +      y8.dl4 + y4.dl5 + y8.dl5 - 1
  .. .. .. ..- attr(*, "variables")= language list(z@Z0, ect1, y4.dl1, y8.dl1, y4.dl2, y8.dl2, y4.dl3, y8.dl3, y4.dl4,      y8.dl4, y4.dl5, y8.dl5)
  .. .. .. ..- attr(*, "factors")= int [1:12, 1:11] 0 1 0 0 0 0 0 0 0 0 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:12] "z@Z0" "ect1" "y4.dl1" "y8.dl1" ...
  .. .. .. .. .. ..$ : chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
  .. .. .. ..- attr(*, "term.labels")= chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
  .. .. .. ..- attr(*, "order")= int [1:11] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..- attr(*, "intercept")= int 0
  .. .. .. ..- attr(*, "response")= int 1
  .. .. .. ..- attr(*, ".Environment")=<environment: 0x7fec381470c8> 
  .. .. .. ..- attr(*, "predvars")= language list(z@Z0, ect1, y4.dl1, y8.dl1, y4.dl2, y8.dl2, y4.dl3, y8.dl3, y4.dl4,      y8.dl4, y4.dl5, y8.dl5)
  .. .. .. ..- attr(*, "dataClasses")= Named chr [1:12] "nmatrix.2" "numeric" "numeric" "numeric" ...
  .. .. .. .. ..- attr(*, "names")= chr [1:12] "z@Z0" "ect1" "y4.dl1" "y8.dl1" ...
  ..- attr(*, "class")= chr [1:2] "mlm" "lm"
 $ beta: num [1:3, 1] 1 -0.5 -101.7
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "y4.l6" "y8.l6" "constant"
  .. ..$ : chr "ect1"
summary(TS280.vec4$rlm)
Response y4.d :

Call:
lm(formula = y4.d ~ ect1 + y4.dl1 + y8.dl1 + y4.dl2 + y8.dl2 + 
    y4.dl3 + y8.dl3 + y4.dl4 + y8.dl4 + y4.dl5 + y8.dl5 - 1, 
    data = data.mat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.91359 -0.45138 -0.01138  0.38916  1.71334 

Coefficients:
       Estimate Std. Error t value     Pr(>|t|)    
ect1    -2.1338     0.5040  -4.234 0.0000317547 ***
y4.dl1  -1.0154     0.2105  -4.823 0.0000023981 ***
y8.dl1   0.6428     0.1087   5.912 0.0000000104 ***
y4.dl2  -1.1390     0.2813  -4.049 0.0000676006 ***
y8.dl2   0.5728     0.1456   3.934     0.000107 ***
y4.dl3  -1.4837     0.3386  -4.381 0.0000170490 ***
y8.dl3   0.7555     0.1746   4.326 0.0000215442 ***
y4.dl4  -1.5764     0.4049  -3.893     0.000125 ***
y8.dl4   0.9628     0.2075   4.640 0.0000054915 ***
y4.dl5  -1.8125     0.4557  -3.978 0.0000900790 ***
y8.dl5   0.8059     0.2345   3.437     0.000684 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6668 on 263 degrees of freedom
Multiple R-squared:  0.2513,    Adjusted R-squared:   0.22 
F-statistic: 8.025 on 11 and 263 DF,  p-value: 0.00000000000472


Response y8.d :

Call:
lm(formula = y8.d ~ ect1 + y4.dl1 + y8.dl1 + y4.dl2 + y8.dl2 + 
    y4.dl3 + y8.dl3 + y4.dl4 + y8.dl4 + y4.dl5 + y8.dl5 - 1, 
    data = data.mat)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7887 -0.7525 -0.0852  0.6830  3.4196 

Coefficients:
       Estimate Std. Error t value Pr(>|t|)  
ect1    -2.3782     0.9760  -2.437   0.0155 *
y4.dl1  -0.2652     0.4078  -0.650   0.5161  
y8.dl1   0.3681     0.2106   1.748   0.0816 .
y4.dl2  -0.5001     0.5447  -0.918   0.3594  
y8.dl2   0.2340     0.2820   0.830   0.4075  
y4.dl3  -0.8777     0.6558  -1.338   0.1820  
y8.dl3   0.4572     0.3382   1.352   0.1775  
y4.dl4  -1.1756     0.7842  -1.499   0.1350  
y8.dl4   0.9257     0.4018   2.304   0.0220 *
y4.dl5  -1.6470     0.8825  -1.866   0.0631 .
y8.dl5   0.6391     0.4541   1.407   0.1606  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.291 on 263 degrees of freedom
Multiple R-squared:  0.1914,    Adjusted R-squared:  0.1576 
F-statistic: 5.659 on 11 and 263 DF,  p-value: 0.00000003609

3.3 Interpreting all the parameters of the VECM model,

This is the F-test for y4 on y8 for joint significance test where null hypothesis is given below \[H_0 : \alpha_{11}=\gamma_{11}=\gamma_{12} =\gamma_{13}=\gamma_{14}=\gamma_{15}=0 \] ##The p-value for joint significance test is less than 0.05. Hence the null hypothesis is rejected and the parameters/coefficients are jointly significant. ##We can observe that p-values of each parameters/coefficients are less than 0.05 and hence each parameters has high significance for y4 on y8. \[H_0 : \alpha_{11}=0 \] ##Here the p-value is less than 0.05 and the coefficient of ect1 is non-zero and hence it is highly significant and thus the error correction mechanism exists.

3.4 Commenting about the signs of the estimated parameters,

The sign of the estimated coefficient/ parameter of ect1 is negative for y4 on y8

This is the F-test for y8 on y4 for joint significance test where null hypothesis is given below \[H_0 : \alpha_{22}=\gamma_{21}=\gamma_{22} =\gamma_{23}=\gamma_{24}=\gamma_{25}=0 \] ##The p-value for joint significance test is less than 0.05. Hence the null hypothesis is rejected and the parameters/coefficients are jointly significant.

We can observe that p-values of each parameters/coefficients are not less than 0.05 individually and hence each parameters is not significant for y8 on y4.

3.5 Commenting if the Error Correction Mechanism work

\[H_0 : \alpha_{21}=0 \] ##Here the p-value is less than 0.1 and the coefficient is not zero and significant at 10% level and the error correction mechanism exists. ##The sign of the coefficient of ect1 is expected to be positive but comes as negative for y8 on y4.

The VECM model is given by where p= 5 for VECM:

\[\Delta\boldsymbol{y}_t = \Pi\boldsymbol{y}_{t-1} + \Gamma_1\Delta\boldsymbol{y}_{t-1} + ... + \Gamma_5\Delta\boldsymbol{y}_{t-5} + \boldsymbol{\varepsilon}_t\]

The ect1 elements in both equations contain the adjustment coefficients (\(\alpha_{11}\) and \(\alpha_{21}\)), which are the result of the decomposition of the \(\Pi\) matrix:

\[ \boldsymbol{\Pi}=\alpha\beta'=\left[ \begin{array}{ccc} \alpha_{11} \\ \alpha_{21} \\ \end{array} \right] \left[ \begin{array}{cccc} \beta_{11} \quad \beta_{21} \\ \end{array} \right] \]

We can extract the cointegrating vector in the following way:

TS280.vec4$beta
                 ect1
y4.l6       1.0000000
y8.l6      -0.5004317
constant -101.7277663

3.6 We can reparametrize the VEC model into VAR model:

TS280.vec4.asVAR <- vec2var(johan.test.eigen, r = 1)

Lets see the result:

TS280.vec4.asVAR

Coefficient matrix of lagged endogenous variables:

A1:
         y4.l1     y8.l1
y4 -0.01542249 0.6427725
y8 -0.26515511 1.3681022


A2:
        y4.l2       y8.l2
y4 -0.1235365 -0.07001025
y8 -0.2349052 -0.13414844


A3:
        y4.l3     y8.l3
y4 -0.3447433 0.1827072
y8 -0.3776032 0.2232662


A4:
         y4.l4     y8.l4
y4 -0.09267368 0.2073016
y8 -0.29793204 0.4684647


A5:
        y4.l5      y8.l5
y4 -0.2361584 -0.1568306
y8 -0.4714176 -0.2866279


A6:
        y4.l6     y8.l6
y4 -0.3212379 0.2618668
y8 -0.7311532 0.5510530


Coefficient matrix of deterministic regressor(s).

   constant
y4 217.0639
y8 241.9255

3.7 Based on the reparametrized model, we can calculate and plot Impulse Response Functions:

We can observe the shock from y4 impact on all variables and after 20 time periods the response of the shock disappears and shock from y8 or from y4 has more impact on y4

3.8 We can also perform for the Forecast Error Variance Decomposition:

We can observe that the variance of forecast errors of y4 can be explained by the shocks to each explanatory varibles y4 and y8 for the time ahead as s= 19 .

3.9 We check if model residuals are autocorrelated.

Residuals can be extracted from the VAR reparametrized model.

tail(residuals(TS280.vec4.asVAR))
       resids of y4 resids of y8
[269,]   -0.4542077   -1.0890049
[270,]    0.9016549    1.3776238
[271,]    0.1696227    0.2910503
[272,]    1.0347593    2.1668718
[273,]    0.4234520    0.1753923
[274,]   -0.8491263   -1.4732752
serial.test(TS280.vec4.asVAR)

    Portmanteau Test (asymptotic)

data:  Residuals of VAR object TS280.vec4.asVAR
Chi-squared = 46.071, df = 42, p-value = 0.3075

Null hypothesis is the residuals of the VAR model are no autocorrelated and the p-value is greater than 0.05 and 0.1 The null is failed to be rejected and thus there is no autocorrelation of residuals of y4 and y8 of the VAR model(6) We have taken lag p=6 for VAR becuase for p<6 the residuals of y4 and y8 are autocorrelated.

3.10 You can see the ACF and PACF functions by plotting the results of the serial.test()

plot(serial.test(TS280.vec4.asVAR))

3.11 Histograms of residuals for the VAR Model test 13

TS280.vec4.asVAR %>%
  residuals() %>%
  as_tibble() %>%
  ggplot(aes(`resids of y8`)) +
  geom_histogram(aes(y =..density..),
                 colour = "red", 
                 fill = "green") +
  stat_function(fun = dnorm, 
                args = list(mean = mean(residuals(TS280.vec4.asVAR)[, 1]), 
                            sd = sd(residuals(TS280.vec4.asVAR)[, 1]))) +
  theme_bw() + 
  labs(
    title = "Density of y8 residuals", 
    y = "", x = "",
    caption = "source: own calculations"
  )

TS280.vec4.asVAR %>%
  residuals() %>%
  as_tibble() %>%
  ggplot(aes(`resids of y4`)) +
  geom_histogram(aes(y =..density..),
                 colour = "blue", 
                 fill = "pink") +
  stat_function(fun = dnorm, 
                args = list(mean = mean(residuals(TS280.vec4.asVAR)[, 2]), 
                            sd = sd(residuals(TS280.vec4.asVAR)[, 2]))) +
  theme_bw() + 
  labs(
    title = "Density of y4 residuals", 
    y = "", x = "",
    caption = "source: own calculations"
  )

3.12 We have checked normality of residuals by applying the Jarque-Bera (JB) test.

normality.test(TS280.vec4.asVAR)
$JB

    JB-Test (multivariate)

data:  Residuals of VAR object TS280.vec4.asVAR
Chi-squared = 2.9791, df = 4, p-value = 0.5613


$Skewness

    Skewness only (multivariate)

data:  Residuals of VAR object TS280.vec4.asVAR
Chi-squared = 2.9743, df = 2, p-value = 0.226


$Kurtosis

    Kurtosis only (multivariate)

data:  Residuals of VAR object TS280.vec4.asVAR
Chi-squared = 0.0048029, df = 2, p-value = 0.9976

P-value for JB normality test is greater than 0.05 and hence the null about normality is failed to be rejected and the residuals are normally distributed ##There is no skewness as the null hypothesis about no skewness is failed to be rejected and same as for the Kurtosis. ##Hence the residuals for VAR model(6) are normally distributed

3.13 Forecasting based on the VECM

Now we will calculate 20 days forecasts for y4 and y8` variables.

Forecasting the model based on VECM

TS.vec4.fore <- 
  predict(
    vec2var(
      johan.test.eigen, 
      r = 1),     # no of cointegrating vectors 
    n.ahead = 20, # forecast horizon
    ci = 0.95)    # confidence level for intervals

VEC forecasts for y4:

TS.vec4.fore$fcst$y4
          fcst    lower    upper       CI
 [1,] 175.0847 173.8043 176.3652 1.280446
 [2,] 175.1386 173.1088 177.1685 2.029872
 [3,] 174.9821 172.3809 177.5832 2.601130
 [4,] 174.6606 171.5812 177.7400 3.079364
 [5,] 174.5546 170.8565 178.2528 3.698153
 [6,] 174.5923 170.3856 178.7990 4.206710
 [7,] 174.5348 169.8859 179.1838 4.648943
 [8,] 174.4753 169.4254 179.5251 5.049848
 [9,] 174.4933 169.0326 179.9540 5.460669
[10,] 174.5463 168.7312 180.3614 5.815102
[11,] 174.5239 168.3825 180.6654 6.141446
[12,] 174.5132 168.0636 180.9627 6.449523
[13,] 174.5297 167.7747 181.2847 6.754967
[14,] 174.5517 167.5153 181.5881 7.036406
[15,] 174.5376 167.2315 181.8438 7.306146
[16,] 174.5346 166.9678 182.1015 7.566867
[17,] 174.5406 166.7179 182.3633 7.822702
[18,] 174.5461 166.4796 182.6126 8.066507
[19,] 174.5379 166.2340 182.8417 8.303838
[20,] 174.5378 166.0025 183.0730 8.535244

VEC forecasts for y8:

TS.vec4.fore$fcst$y8
          fcst    lower    upper        CI
 [1,] 146.6000 144.1202 149.0798  2.479794
 [2,] 146.8210 142.8750 150.7671  3.946044
 [3,] 146.3962 141.3365 151.4560  5.059788
 [4,] 145.7701 139.7690 151.7712  6.001057
 [5,] 145.4950 138.2447 152.7454  7.250348
 [6,] 145.6225 137.3334 153.9116  8.289066
 [7,] 145.5023 136.3134 154.6911  9.188837
 [8,] 145.3861 135.3907 155.3814  9.995338
 [9,] 145.4009 134.5821 156.2197 10.818809
[10,] 145.5081 133.9749 157.0412 11.533145
[11,] 145.4634 133.2730 157.6539 12.190470
[12,] 145.4455 132.6339 158.2572 12.811663
[13,] 145.4778 132.0522 158.9034 13.425582
[14,] 145.5221 131.5306 159.5137 13.991590
[15,] 145.4939 130.9612 160.0265 14.532631
[16,] 145.4872 130.4311 160.5433 15.056139
[17,] 145.4997 129.9304 161.0691 15.569362
[18,] 145.5110 129.4522 161.5699 16.058831
[19,] 145.4952 128.9605 162.0298 16.534609
[20,] 145.4942 128.4955 162.4929 16.998660
tail(index(TS2), 20)
 [1] "2022-03-08" "2022-03-09" "2022-03-10" "2022-03-11" "2022-03-12"
 [6] "2022-03-13" "2022-03-14" "2022-03-15" "2022-03-16" "2022-03-17"
[11] "2022-03-18" "2022-03-19" "2022-03-20" "2022-03-21" "2022-03-22"
[16] "2022-03-23" "2022-03-24" "2022-03-25" "2022-03-26" "2022-03-27"
y4_forecast <- xts(TS.vec4.fore$fcst$y4[,-4], 
                    # we exclude the last column with CI
                    tail(index(TS2), 20))

For y4 forecast:

names(y4_forecast) <- c("y4_fore", "y4_lower", "y4_upper")

For y8 forecasts:

y8_forecast <- xts(TS.vec4.fore$fcst$y8[, -4],
                    # we exclude the last column with CI
                    tail(index(TS2), 20))
names(y8_forecast) <- c("y8_fore", "y8_lower", "y8_upper")

Now, we can merge the data together:

TS <- merge(TS2, 
                 y4_forecast,
                 y8_forecast)

Lets compare the forecasted and real data on the plot.

plot(TS[index(TS) > as.Date("2022-03-07"), c("y4", "y4_fore",
                        "y4_lower", "y4_upper")], 
  
     main = "20 days forecast of energy y4",
     col = c("black", "blue", "red", "red"))
plot(TS[index(TS) > as.Date("2022-03-07"), c("y8", "y8_fore",
                        "y8_lower", "y8_upper")], 

     main = "20 days forecast of energy y8",
     col = c("black", "blue", "red", "red"))

Here we get the forecasting price of y8 and y4 on 20 time periods and the original and projected observation are lying within the upper and lower boundaries. HERE the blue represents the projected/forecast and the blue represents the original observation.

We have Calculated forecast accuracy measures (MAE, MSE, MAPE, AMAPE) based on VECM model and compared it with the accuracy of the forecast based on VECM model

errors_matrix <- data.frame()
errors_matrix <- calculate_errors(errors_matrix, TS$y4, TS$y4_fore,"y4_vecm")
errors_matrix <- calculate_errors(errors_matrix, TS$y8, TS$y8_fore, "y8_vecm")
errors_matrix[c("y4_vecm", "y8_vecm"),]
MAE MSE MAPE AMAPE
y4_vecm 0.9862111 1.399098 0.0056067 0.0028136
y8_vecm 1.9383977 5.230081 0.0130852 0.0065978

3.14 Interpretation of Errors:

  • MAE for y4: 0.986
    • On average, the absolute error in predictions for y4 is 0.986 units. This indicates the typical size of the error in your predictions.
  • MSE for y4: 1.399
    • The mean squared error for y4 is 1.399. This value is higher than the MAE, suggesting that there are some larger errors that are being squared, thus increasing the overall error value.
  • MAPE for y4: 0.0056 (or 0.56%)
    • The mean absolute percentage error for y4 is 0.56%. This indicates that, on average, the predictions for y4 deviate from the actual values by 0.56% of the actual values.
  • AMAPE for y4: 0.0028 (or 0.28%)
    • The adjusted mean absolute percentage error for y4 is 0.28%. This metric provides a different perspective on the error, accounting for the scale of the actual and predicted values.
  • MAE for y8: 1.938
    • On average, the absolute error in predictions for y8 is 1.938 units, which is significantly higher than that for y4.
  • MSE for y8: 5.230
    • The mean squared error for y8 is 5.230. This is much higher than the MSE for y4, suggesting the presence of larger errors in the predictions for y8.
  • MAPE for y8: 0.0131 (or 1.31%)
    • The mean absolute percentage error for y8 is 1.31%. This indicates that, on average, the predictions for y8 deviate from the actual values by 1.31% of the actual values.
  • AMAPE for y8: 0.0066 (or 0.66%)
    • The adjusted mean absolute percentage error for y8 is 0.66%. Again, this provides a sense of the prediction error relative to the combined scale of actual and predicted values.
  • The errors for y4 are generally lower than those for y8, indicating better prediction accuracy for y4.
  • Both MSE and MAE are higher for y8 than for y4, with the difference in MSE being particularly pronounced, suggesting that the predictions for y8 have larger errors.
  • The MAPE and AMAPE for y4 are significantly lower than for y8, indicating more accurate percentage-based predictions for y4.

Overall, these metrics suggest that the model’s predictions for y4 are more accurate and consistent than those for y8.

4 ARIMA Modeling for y4

We will apply Box-Jenkins procedure following below steps

4.1 step 1: Identification

Plotting actual prices y4

tibble(df = TS280$y4) %>%
  ggplot(aes(zoo::index(TS280), df)) +
  geom_line() +
  scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Actual Y4 Price",
    subtitle = paste0("Number of observations: ", length(TS280$y4)),
    caption = "source: TSA 2024",
    x="",
    y=""
  )

Plotting log transformed y4 prices

y4_log <- log(TS280$y4)
y4_log_return <- periodReturn(y4_log, period="daily", type="log")

## Plotting Log Transformed y4 Price
tibble(df2 = y4_log) %>%
  ggplot(aes(zoo::index(TS280), df2)) +
  geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Log Transformed y4 Price",
    subtitle = paste0("Number of observations: ", length(TS280$y4)),
    caption = "source: TSA 2024",
    x="",
    y=""
  )

Plotting return of log transferred y4 prices

tibble(df3 = y4_log_return) %>%
  ggplot(aes(zoo::index(TS280), df3)) +
  geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Log Transformed y4 return",
    subtitle = paste0("Number of observations: ", length(TS280$y4)),
    caption = "source: TSA 2024",
    x="",
    y=""
  )

Data Training_Sample ADF_Test PP_Test BG_Test
y4 01/06/2021~07/03/2022 -1.416 ( 0.822 ) -5.546 ( 0.799 ) 7.661 ( 0.006 )
y4 log 01/06/2021~07/03/2022 -1.383 ( 0.836 ) -5.218 ( 0.818 ) 7.566 ( 0.006 )
y4 log return 01/06/2021~07/03/2033 -6.072 ( 0.010 ) -258.965 ( 0.010 ) 0.006 ( 0.938 )

Result shows y4 log return is stationary

Plotting ACF and PACF for y4 log return

plot_acf_pacf(y4_log_return)

ACF and PACF suggest initially 4 lags MA and 0 lag AR process.

4.2 step 2 & 3: Estimation and Diagnostics

We will take all the possible combination of ARIMA having AR order p from 0 to 4 and MA q also from 0 to 4. It will help to see possible models with respective AIC and BIC.

y4_models_detail <- data.frame()
for (p in 0:4) {
  for (q in 0:4) {
    if(! (p==0 & q==0)){
    cur_model <- Arima(y4_log_return, order=c(p,1,q))
    y4_models_detail <- model_summary(y4_models_detail, paste("c(",p,",1,",q,")",sep=""), cur_model)
  }
  }
}
y4_models_detail
Model_Order Box_Ljung_P_Value AIC_Value BIC_Value
c(0,1,1) 0.0000024002 -3167.718 -3160.455
c(0,1,2) 0.0000645641 -3171.441 -3160.548
c(0,1,3) 0.0000783646 -3169.927 -3155.402
c(0,1,4) 0.0000962717 -3169.074 -3150.918
c(1,1,0) 0.0000000000 -3086.493 -3079.231
c(1,1,1) 0.0000809521 -3171.878 -3160.985
c(1,1,2) 0.0000996379 -3171.641 -3157.117
c(1,1,3) 0.0001841259 -3173.950 -3155.794
c(1,1,4) 0.0043451063 -3179.723 -3157.936
c(2,1,0) 0.0000000042 -3112.840 -3101.946
c(2,1,1) 0.0000833983 -3170.062 -3155.537
c(2,1,2) 0.0000729963 -3170.344 -3152.188
c(2,1,3) 0.0410809836 -3189.728 -3167.941
c(2,1,4) 0.0361307906 -3187.808 -3162.389
c(3,1,0) 0.0035289437 -3164.130 -3149.605
c(3,1,1) 0.0056519871 -3162.404 -3144.248
c(3,1,2) 0.0001019411 -3169.528 -3147.740
c(3,1,3) 0.0342542222 -3187.839 -3162.421
c(3,1,4) 0.0327545941 -3185.995 -3156.946
c(4,1,0) 0.0057443401 -3162.382 -3144.226
c(4,1,1) 0.0133993041 -3187.632 -3165.844
c(4,1,2) 0.0359111741 -3188.836 -3163.418
c(4,1,3) 0.2018730120 -3191.288 -3162.238
c(4,1,4) 0.1960197635 -3190.004 -3157.324

Find the model with the minimum AIC value

y4_aic_ind <- which.min(y4_models_detail$AIC_Value)
min_row <- y4_models_detail[y4_aic_ind, ]
Model_Order Box_Ljung_P_Value AIC_Value BIC_Value
23 c(4,1,3) 0.2018730120 -3191.288 -3162.238

Find the model with the minimum BIC value

y4_bic_ind <- which.min(y4_models_detail$BIC_Value)
min_row <- y4_models_detail[y4_bic_ind, ]
Model_Order Box_Ljung_P_Value AIC_Value BIC_Value
13 c(2,1,3) 0.0410809836 -3189.728 -3167.941

Analyzing ARIMA(2,1,3) for y4 log return series

Plotting ACF and PACF of y4 log return residuals

#plot acf and pacf for residuals
plot_acf_pacf(y4_residuals_213)

ACF plot at certain lags like 4, 6 and 10 etc indicate that the residuals are correlated at those lags, suggesting the presence of autocorrelation.

PACF plot indicate that there is a direct correlation between the residuals at specific lags like 3, 4, and 5 etc, after removing the influence of the previous lags.

Performing ADF/BG test for y4 log returns residuals

y4_residuals_acf_test_213
Data Training_Sample ADF_Test PP_Test BG_Test
Model (2,1,3) residuals 01/01/2012~14/05/2013 -6.727 ( 0.010 ) -315.415 ( 0.010 ) 0.000 ( 0.995 )

All above test (ADF,PP and BG) suggest there is no significant autocorrelation in residuals.

Performing Ljung-Box test for autocorrelation in residuals

# Ljung-Box test for autocorrelation in residuals
Box.test(y4_residuals_213, type = "Ljung-Box", lag = 10)

    Box-Ljung test

data:  y4_residuals_213
X-squared = 18.936, df = 10, p-value = 0.04108
coeftest(y4_model_213)

z test of coefficients:

     Estimate Std. Error  z value        Pr(>|z|)    
ar1 -1.100783   0.082448 -13.3512       < 2.2e-16 ***
ar2 -0.584616   0.122054  -4.7898 0.0000016693550 ***
ma1  0.351149   0.055841   6.2884 0.0000000003208 ***
ma2 -0.447449   0.077203  -5.7958 0.0000000068013 ***
ma3 -0.845642   0.079798 -10.5972       < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It shows all the parameters are significant and for 10 lags Ljung-Box p-value < .05 suggests there is significant autocorrelation in residuals.

Analyzing next suggested model ARIMA(4,1,3)

Plotting ACF and PACF of y4 log return residuals

#plot acf and pacf for residuals
plot_acf_pacf(y4_residuals_413)

ACF looks much better but still few lag shows autocorrelation in residuals.

PACF looks almost same and high direct correlation in residuals.

Performing ADF,PP and BG test

y4_residuals_acf_test_413
Data Training_Sample ADF_Test PP_Test BG_Test
Log transformed data 01/01/2012~14/05/2013 -6.603 ( 0.010 ) -286.156 ( 0.010 ) 0.000 ( 0.999 )

All above test (ADF,PP and BG) suggest there is no significant autocorrelation in residuals.

Performing # Ljung-Box test for autocorrelation in residuals

# Ljung-Box test for autocorrelation in residuals
# as https://robjhyndman.com/hyndsight/ljung-box-test/ suitable value for lag is 10
Box.test(y4_residuals_413, type = "Ljung-Box", lag = 10)

    Box-Ljung test

data:  y4_residuals_413
X-squared = 13.406, df = 10, p-value = 0.2019
coeftest(y4_model_413)

z test of coefficients:

     Estimate Std. Error z value     Pr(>|z|)    
ar1 -0.292483   0.218034 -1.3415      0.17977    
ar2 -0.195212   0.127033 -1.5367      0.12436    
ar3  0.055036   0.063143  0.8716      0.38343    
ar4  0.302114   0.061494  4.9129 0.0000008974 ***
ma1 -0.515952   0.233695 -2.2078      0.02726 *  
ma2 -0.166332   0.165961 -1.0022      0.31623    
ma3 -0.317706   0.151713 -2.0941      0.03625 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It shows not all parameters are significant and for 10 lags Ljung-Box p-value >.05 suggests there is no significant autocorrelation in residuals.

As ar1, ar2 and ar3 are not significant and ma2 as well, we can remove them from model

Plotting ACF/PACF after adusting the model

y4_model_413_modified <- Arima(y4_log_return, order=c(4,1,3),
                      fixed = c(0, 0, 0, NA, NA,0, NA))

y4_residuals_413_modified = resid(y4_model_413_modified)
#plot acf and pacf for residuals
plot_acf_pacf(y4_residuals_413_modified)

Here we can see no improvement on both graphs

Performing # Ljung-Box test for autocorrelation in residuals for adjusted model

Box.test(y4_residuals_413_modified, type = "Ljung-Box", lag = 10)

    Box-Ljung test

data:  y4_residuals_413_modified
X-squared = 18.41, df = 10, p-value = 0.04842
coeftest(y4_model_413_modified)

z test of coefficients:

     Estimate Std. Error  z value     Pr(>|z|)    
ar4  0.309674   0.058382   5.3042 0.0000001131 ***
ma1 -0.875057   0.045161 -19.3764    < 2.2e-16 ***
ma3 -0.124445   0.042774  -2.9094     0.003622 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Ljung-Box p value is 0.04991 < .05 reject the null hypothesis that residuals are non auto
# correlated, earlier model shows p value 0.2033 > .05 failed to reject the null hypothesis therefore 
# residuals are white noise and model 413 is best accepted model

p-value < .05 suggest there is significant autocorrelation in residuals.

So out of discussed three models we confirmed to select ARIMA(4,1,3) (unmodified one)

4.3 Forecasting Results

We are forecasting 20 days ahead

y4_model_413.forecast <- forecast(y4_model_413, h = 20)

Converting log returned forecast to predicted prices

last_observed_price <- tail(TS280$y4,1) 
# Convert log returns to actual price forecasts
forecasted_prices <- numeric(length(y4_model_413.forecast$mean))
forecasted_prices[1] <- last_observed_price * exp(y4_model_413.forecast$mean[1])

for (i in 2:length(forecasted_prices)) {
  forecasted_prices[i] <- forecasted_prices[i-1] * exp(y4_model_413.forecast$mean[i])
}

y4_arima_forecast <- xts(forecasted_prices, tail(index(TS2), 20))

names(y4_arima_forecast) <- c("y4_arima_forecast")

TS <- merge(TS,y4_arima_forecast)
plot(TS[index(TS) > as.Date("2022-02-28"), c("y4", "y4_arima_forecast")], 
  main = "20 days forecast of y4 prices via ARIMA",
  col = c("black", "blue"))

4.4 Forecasting Errors

By usint the utility function from util.R, we are calculating forecasting errors

errors_matrix <- calculate_errors(errors_matrix, TS$y4, TS$y4_arima_forecast,"y4_arima")
errors_matrix["y4_arima",]
MAE MSE MAPE AMAPE
y4_arima 0.7744276 0.7865475 0.0044117 0.0022082

5 ARIMA Modeling for y8

We will apply Box-Jenkins procedure following below steps

5.1 step 1: Identification

Plotting actual prices y8

tibble(df = TS280$y8) %>%
  ggplot(aes(zoo::index(TS280), df)) +
  geom_line() +
  scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Actual y8 Price",
    subtitle = paste0("Number of observations: ", length(TS280$y8)),
    caption = "source: TSA 2024",
    x="",
    y=""
  )

Plotting log transformed y8 prices

y8_log <- log(TS280$y8)
y8_log_return <- periodReturn(y8_log, period="daily", type="log")

## Plotting Log Transformed y8 Price
tibble(df2 = y8_log) %>%
  ggplot(aes(zoo::index(TS280), df2)) +
  geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Log Transformed y8 Price",
    subtitle = paste0("Number of observations: ", length(TS280$y8)),
    caption = "source: TSA 2024",
    x="",
    y=""
  )

Plotting return of log transferred y8 prices

tibble(df3 = y8_log_return) %>%
  ggplot(aes(zoo::index(TS280), df3)) +
  geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Log Transformed y8 return",
    subtitle = paste0("Number of observations: ", length(TS280$y8)),
    caption = "source: TSA 2024",
    x="",
    y=""
  )

Data Training_Sample ADF_Test PP_Test BG_Test
y8 01/06/2021~07/03/2022 -1.393 ( 0.832 ) -5.054 ( 0.827 ) 13.472 ( 0.000 )
y8 log 01/06/2021~07/03/2022 -1.320 ( 0.862 ) -4.434 ( 0.862 ) 12.676 ( 0.000 )
y8 log return 01/06/2021~07/03/2033 -5.666 ( 0.010 ) -248.913 ( 0.010 ) 0.014 ( 0.905 )

Result shows y8 log return is stationary

Plotting ACF and PACF for y8 log return

plot_acf_pacf(y8_log_return)

ACF and PACF suggest initially 4 lags MA and 0 lag AR process.

5.2 step 2 & 3: Estimation and Diagnostics

We will take all the possible combination of ARIMA having p from 0 to 4 and q also from 0 to 4. It will help to see possible models with respective AIC and BIC.

y8_models_detail <- data.frame()
for (p in 0:4) {
  for (q in 0:4) {
    if(! (p==0 & q==0)){
    cur_model <- Arima(y8_log_return, order=c(p,1,q))
    y8_models_detail <- model_summary(y8_models_detail, paste("c(",p,",1,",q,")",sep=""), cur_model)
  }
  }
}
y8_models_detail
Model_Order Box_Ljung_P_Value AIC_Value BIC_Value
c(0,1,1) 0.0000000092 -2706.433 -2699.170
c(0,1,2) 0.0000010745 -2713.190 -2702.296
c(0,1,3) 0.0000011382 -2711.869 -2697.344
c(0,1,4) 0.0000005582 -2713.028 -2694.871
c(1,1,0) 0.0000000000 -2633.347 -2626.084
c(1,1,1) 0.0000013150 -2713.903 -2703.009
c(1,1,2) 0.0000021914 -2714.322 -2699.797
c(1,1,3) 0.0000003253 -2718.892 -2700.736
c(1,1,4) 0.0003613828 -2731.881 -2710.094
c(2,1,0) 0.0000000005 -2661.344 -2650.450
c(2,1,1) 0.0000012741 -2712.055 -2697.530
c(2,1,2) 0.0000004917 -2712.182 -2694.026
c(2,1,3) 0.0098301576 -2744.184 -2722.397
c(2,1,4) 0.0077626699 -2742.485 -2717.067
c(3,1,0) 0.0000233862 -2714.681 -2700.156
c(3,1,1) 0.0000538262 -2714.220 -2696.064
c(3,1,2) 0.0000004744 -2711.300 -2689.512
c(3,1,3) 0.0062143801 -2742.702 -2717.283
c(3,1,4) 0.0057056707 -2741.207 -2712.157
c(4,1,0) 0.0000773089 -2714.258 -2696.102
c(4,1,1) 0.0000683140 -2712.308 -2690.521
c(4,1,2) 0.0004914174 -2737.300 -2711.882
c(4,1,3) 0.0768105638 -2748.239 -2719.190
c(4,1,4) 0.1467492582 -2750.963 -2718.283

Find the model with the minimum AIC value

y8_aic_ind <- which.min(y8_models_detail$AIC_Value)
# Extract the row with the minimum value
min_row <- y8_models_detail[y8_aic_ind, ]
print(min_row)
   Model_Order Box_Ljung_P_Value AIC_Value BIC_Value
24    c(4,1,4)      0.1467492582 -2750.963 -2718.283
Model_Order Box_Ljung_P_Value AIC_Value BIC_Value
24 c(4,1,4) 0.1467492582 -2750.963 -2718.283

Find the model with the minimum BIC value

y8_bic_ind <- which.min(y8_models_detail$BIC_Value)
# Extract the row with the minimum value
min_row <- y8_models_detail[y8_bic_ind, ]
Model_Order Box_Ljung_P_Value AIC_Value BIC_Value
13 c(2,1,3) 0.0098301576 -2744.184 -2722.397

Analyzing ARIMA(2,1,3) for y8 log return series

Plotting ACF and PACF of y8 log return residuals

#plot acf and pacf for residuals
plot_acf_pacf(y8_residuals_213)

ACF plot at certain lags like 4, 6 and 10 etc indicate that the residuals are correlated at those lags, suggesting the presence of autocorrelation.

PACF plot indicate that there is a direct correlation between the residuals at specific lags like 3, 4, and 5 etc, after removing the influence of the previous lags.

Performing ADF/BG test for y8 log returned residuals

y8_residuals_acf_test_213 <- data.frame()
adf.y8_residuals_213 <- adf.test(y8_residuals_213)  
pp.y8_residuals_213 <- pp.test(y8_residuals_213)
bg.y8_residuals_213 <- bgtest(adfTest(y8_residuals_213, lags = 0, type = "c")@test$lm$residuals~1, order = 1)

y8_residuals_acf_test_213 <- stationary_test_data(y8_residuals_acf_test_213, "Model (2,1,3) residuals", "01/01/2012~14/05/2013", adf.y8_residuals_213, pp.y8_residuals_213, bg.y8_residuals_213)
y8_residuals_acf_test_213
Data Training_Sample ADF_Test PP_Test BG_Test
Model (2,1,3) residuals 01/01/2012~14/05/2013 -6.445 ( 0.010 ) -320.998 ( 0.010 ) 0.000 ( 0.994 )

All above test (ADF,PP and BG) suggest there is no significant autocorrelation in residuals.

Performing Ljung-Box test for autocorrelation in residuals

# Ljung-Box test for autocorrelation in residuals
Box.test(y8_residuals_213, type = "Ljung-Box", lag = 10)

    Box-Ljung test

data:  y8_residuals_213
X-squared = 23.259, df = 10, p-value = 0.00983
coeftest(y8_model_213)

z test of coefficients:

     Estimate Std. Error  z value        Pr(>|z|)    
ar1 -1.010503   0.091022 -11.1018       < 2.2e-16 ***
ar2 -0.496163   0.075159  -6.6015 0.0000000000407 ***
ma1  0.345178   0.062780   5.4982 0.0000000383680 ***
ma2 -0.423352   0.049341  -8.5802       < 2.2e-16 ***
ma3 -0.838186   0.043293 -19.3608       < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It shows all the parameters are significant and for 10 lags Ljung-Box p-value < .05 suggests there is significant autocorrelation in residuals.

Analyzing next suggested model ARIMA(4,1,4) Plotting ACF and PACF

Plotting ACF and PACF of y8 log return residuals

#plot acf and pacf for residuals
plot_acf_pacf(y8_residuals_414)

ACF looks much better but still few lag shows autocorrelation in residuals.

PACF looks almost same and high direct correlation in residuals.

Performing ADF,PP and BG test

y8_residuals_acf_test_414
Data Training_Sample ADF_Test PP_Test BG_Test
Log transformed data 01/01/2012~14/05/2013 -6.740 ( 0.010 ) -295.548 ( 0.010 ) 0.000 ( 0.999 )

All above test (ADF,PP and BG) suggest there is no significant autocorrelation in residuals.

Performing Ljung-Box test for autocorrelation in residuals

# Ljung-Box test for autocorrelation in residuals
# as https://robjhyndman.com/hyndsight/ljung-box-test/ suitable value for lag is 10
Box.test(y8_residuals_414, type = "Ljung-Box", lag = 10)

    Box-Ljung test

data:  y8_residuals_414
X-squared = 14.615, df = 10, p-value = 0.1467
coeftest(y8_model_414)

z test of coefficients:

     Estimate Std. Error z value           Pr(>|z|)    
ar1 -0.439789   0.141437 -3.1094          0.0018744 ** 
ar2 -0.067526   0.139424 -0.4843          0.6281552    
ar3  0.306262   0.093868  3.2627          0.0011036 ** 
ar4  0.259716   0.067569  3.8437          0.0001212 ***
ma1 -0.267176   0.141293 -1.8909          0.0586318 .  
ma2 -0.433310   0.098828 -4.3845 0.0000116255458050 ***
ma3 -0.605809   0.083460 -7.2587 0.0000000000003908 ***
ma4  0.306296   0.122201  2.5065          0.0121934 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It shows not all parameters are significant and for 10 lags Ljung-Box p-value >.05 suggests there is no significant autocorrelation in residuals.

As ar1 and ma1 as well, we can remove them from model

Plotting ACF/PACF after adusting the model

y8_model_414_modified <- Arima(y8_log_return, order=c(4,1,4),
                      fixed = c(NA, 0, NA, NA, 0, NA, NA, NA))

y8_residuals_414_modified = resid(y8_model_414_modified)
#plot acf and pacf for residuals
plot_acf_pacf(y8_residuals_414_modified)

Here we can see no improvement on both graphs

Performing Ljung-Box test for autocorrelation in residuals for adjusted model

Box.test(y8_residuals_414_modified, type = "Ljung-Box", lag = 10)

    Box-Ljung test

data:  y8_residuals_414_modified
X-squared = 19.602, df = 10, p-value = 0.03325
coeftest(y8_model_414_modified)

z test of coefficients:

     Estimate Std. Error  z value    Pr(>|z|)    
ar1 -0.641702   0.057928 -11.0776   < 2.2e-16 ***
ar3  0.339821   0.072116   4.7121 0.000002451 ***
ar4  0.173336   0.071294   2.4313     0.01504 *  
ma2 -0.563643   0.053600 -10.5157   < 2.2e-16 ***
ma3 -0.691093   0.042166 -16.3900   < 2.2e-16 ***
ma4  0.266736   0.059032   4.5185 0.000006229 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Ljung-Box p value is 0.04991 < .05 reject the null hypothesis that residuals are non auto
# correlated, earlier model shows p value 0.2033 > .05 failed to reject the null hypothesis therefore 
# residuals are white noise and model 413 is best accepted model

p-value < .05 suggest there is significant autocorrelation in residuals.

So out of discussed three models we confirmed to select (4,1,4) (unmodified one)

5.3 Forecasting Results

We are forecasting for 20 days ahead

y8_model_414.forecast <- forecast(y8_model_414, h = 20)

Converting log returned forecast to predicted prices

last_observed_price <- tail(TS280$y8,1) 
# Convert log returns to actual price forecasts
forecasted_prices <- numeric(length(y8_model_414.forecast$mean))

forecasted_prices[1] <- last_observed_price * 
  exp(y8_model_414.forecast$mean[1])

for (i in 2:length(forecasted_prices)) {
  forecasted_prices[i] <- forecasted_prices[i-1] * exp(y8_model_414.forecast$mean[i])
}

y8_arima_forecast <- xts(forecasted_prices, tail(index(TS2), 20))

names(y8_arima_forecast) <- c("y8_arima_forecast")

TS <- merge(TS,y8_arima_forecast)
plot(TS[index(TS) > as.Date("2022-02-28"), c("y8", "y8_arima_forecast")], 
  main = "20 days forecast of y8 prices via ARIMA",
  col = c("black", "blue"))

5.4 Forecasting Erros

calculating ex-post forecast errors for both series on the out-of-sample period.

MAE MSE MAPE AMAPE
y8_arima 1.505458 2.798017 0.0102116 0.0051174

6 VECM Vs ARIMAs forecasts

Data frame used to capture all respective forecast errors, shows error matrix.

MAE MSE MAPE AMAPE
y4_vecm 0.9862111 1.3990977 0.0056067 0.0028136
y8_vecm 1.9383977 5.2300808 0.0130852 0.0065978
y4_arima 0.7744276 0.7865475 0.0044117 0.0022082
y8_arima 1.5054579 2.7980167 0.0102116 0.0051174

In multi-column graph it looks like

ggplot(df_long, aes(x = Value, y = Index, fill = Variable)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Multi-Column Plot with Models as Columns", x = "Errors", y = "Models") +
  theme_minimal()

  • MAE : Forecast of y4 using ARIMA shows the smaller absolute errors on average. This looks very attractive, but can be explained as visible from the out-of-sample forecast, here forecast grouped into two parts, almost 50% below actual values and remaining above actual values cancel out each other. In reality out-of-sample forecasting y4 using VECM plot for each data points are more closer to actual values and hence shows promising result. Sample explanation apply to comparing smaller value of y8 ARIMA 1.5054579 vs y8 VECM 1.9383977.

  • MSE : Lower the squared error implies more weight to larger error, MAE explained above is justified here. We can see y4 via ARIMA (0.7865475) is lower compare to y4 via VECM (1.3990977) and signified the fact more weight is given to larger errors. Same apply to smaller value of y8 via ARIMA (2.7980167) compare to y8 via VECM (5.2300808).

  • MAPE : Lower values signify smaller % error on average. Ideally it should be smaller for y4 (VECM) compare to y4 (ARIMA) but in our case it’s not ( y4 (ARIMA) 0.0044117 < y 4 (VECM) 0.0056067). It could be most possible as we are forecasting ahead 20 data points which is lager than sum of parameters p+q ( 4+3 = 7) of ARIMA model. Best possible forecasting should be ahead < 7 data points and model should be recalculated after extending in sample data set adding past forecated data points. Same explanation goes with y8 (ARIMA) vs y8 (VECM).

  • AMAPE : Lower the value implies adjusted well for bias in MAPE, particularly when actual and forecast values are far apart. In our case y4 (ARIMA) 0.0022082 compare to y4 (VECM) 0.0028136 exactly try to do the same and same apply to y8 (ARIMA) 0.0051174 < y8 (VECM) 0.0065978