library(xts)
library(lmtest)
library(tidyverse)
library(vars)
library(quantmod)
library(tseries)
library(forecast)Time Series 2024 Project Topic 1
1 Importing the data
First, let’s load necessary libraries:
Let’s also load the additional function:
source("../code/functions/testdf.R")
source("../code/functions/utils.R")
source("../code/functions/testdfModified.R")We will work with the Data concerning prices of financial instrument prices: y1, y2,y3,y4,y5,y6,y7,y8,y9,y10
Now, we have to import the data:
# Start date 01/06/21
# End date 27/03/22
# total dataset 300 rows
# Training set from 01/06/21 to 07/03/22, 280 rows
# Forecasting set from 08/03/22 to 27/03/22, 20 rows
TS2 <- read.csv("../data/TSA_2024_project_data_1.csv")The structure of the data:
TS2 %>% glimpse()Rows: 300
Columns: 11
$ date <chr> "2021-06-01", "2021-06-02", "2021-06-03", "2021-06-04", "2021-06-…
$ y1 <dbl> 176.04, 173.37, 173.42, 178.85, 175.39, 173.89, 167.80, 162.47, 1…
$ y2 <dbl> 173.57, 173.57, 173.57, 174.52, 174.46, 174.54, 174.57, 175.37, 1…
$ y3 <dbl> 173.57, 173.57, 175.70, 177.89, 181.10, 184.44, 187.00, 188.86, 1…
$ y4 <dbl> 188.66, 188.46, 188.57, 188.78, 188.11, 187.61, 188.22, 188.68, 1…
$ y5 <dbl> 183.66, 183.52, 183.75, 182.79, 185.46, 190.18, 196.22, 201.14, 2…
$ y6 <dbl> 178.78, 181.10, 187.25, 190.82, 187.30, 195.26, 196.95, 207.64, 2…
$ y7 <dbl> 168.46, 168.41, 168.47, 170.08, 170.15, 170.33, 170.32, 171.42, 1…
$ y8 <dbl> 173.57, 173.57, 173.57, 173.57, 172.94, 171.78, 173.37, 174.22, 1…
$ y9 <dbl> 173.57, 173.57, 173.57, 173.18, 174.50, 176.99, 179.91, 182.35, 1…
$ y10 <dbl> 188.35, 188.37, 189.36, 190.46, 192.73, 194.14, 195.44, 196.21, 1…
head(TS2)| date | y1 | y2 | y3 | y4 | y5 | y6 | y7 | y8 | y9 | y10 |
|---|---|---|---|---|---|---|---|---|---|---|
| 2021-06-01 | 176.04 | 173.57 | 173.57 | 188.66 | 183.66 | 178.78 | 168.46 | 173.57 | 173.57 | 188.35 |
| 2021-06-02 | 173.37 | 173.57 | 173.57 | 188.46 | 183.52 | 181.10 | 168.41 | 173.57 | 173.57 | 188.37 |
| 2021-06-03 | 173.42 | 173.57 | 175.70 | 188.57 | 183.75 | 187.25 | 168.47 | 173.57 | 173.57 | 189.36 |
| 2021-06-04 | 178.85 | 174.52 | 177.89 | 188.78 | 182.79 | 190.82 | 170.08 | 173.57 | 173.18 | 190.46 |
| 2021-06-05 | 175.39 | 174.46 | 181.10 | 188.11 | 185.46 | 187.30 | 170.15 | 172.94 | 174.50 | 192.73 |
| 2021-06-06 | 173.89 | 174.54 | 184.44 | 187.61 | 190.18 | 195.26 | 170.33 | 171.78 | 176.99 | 194.14 |
tail(TS2)| date | y1 | y2 | y3 | y4 | y5 | y6 | y7 | y8 | y9 | y10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 295 | 2022-03-22 | 268.68 | 196.26 | 289.20 | 174.51 | 36.93 | 97.94 | 204.83 | 145.53 | 100.31 | 246.30 |
| 296 | 2022-03-23 | 266.55 | 196.34 | 288.83 | 174.58 | 36.87 | 87.23 | 205.03 | 145.76 | 100.22 | 245.85 |
| 297 | 2022-03-24 | 266.38 | 196.25 | 287.98 | 174.18 | 35.33 | 78.05 | 205.08 | 144.81 | 99.39 | 245.64 |
| 298 | 2022-03-25 | 272.44 | 197.49 | 287.54 | 174.47 | 34.93 | 85.77 | 206.84 | 144.90 | 99.22 | 245.77 |
| 299 | 2022-03-26 | 267.64 | 198.03 | 286.01 | 173.46 | 35.01 | 95.43 | 208.08 | 143.75 | 99.22 | 244.87 |
| 300 | 2022-03-27 | 264.95 | 198.61 | 285.43 | 174.40 | 33.51 | 95.25 | 208.76 | 145.46 | 98.49 | 244.08 |
We have to correct the type of the date variable:
TS2$date <- as.Date(TS2$date, format = "%Y-%m-%d")
TS2 %>% glimpse()Rows: 300
Columns: 11
$ date <date> 2021-06-01, 2021-06-02, 2021-06-03, 2021-06-04, 2021-06-05, 2021…
$ y1 <dbl> 176.04, 173.37, 173.42, 178.85, 175.39, 173.89, 167.80, 162.47, 1…
$ y2 <dbl> 173.57, 173.57, 173.57, 174.52, 174.46, 174.54, 174.57, 175.37, 1…
$ y3 <dbl> 173.57, 173.57, 175.70, 177.89, 181.10, 184.44, 187.00, 188.86, 1…
$ y4 <dbl> 188.66, 188.46, 188.57, 188.78, 188.11, 187.61, 188.22, 188.68, 1…
$ y5 <dbl> 183.66, 183.52, 183.75, 182.79, 185.46, 190.18, 196.22, 201.14, 2…
$ y6 <dbl> 178.78, 181.10, 187.25, 190.82, 187.30, 195.26, 196.95, 207.64, 2…
$ y7 <dbl> 168.46, 168.41, 168.47, 170.08, 170.15, 170.33, 170.32, 171.42, 1…
$ y8 <dbl> 173.57, 173.57, 173.57, 173.57, 172.94, 171.78, 173.37, 174.22, 1…
$ y9 <dbl> 173.57, 173.57, 173.57, 173.18, 174.50, 176.99, 179.91, 182.35, 1…
$ y10 <dbl> 188.35, 188.37, 189.36, 190.46, 192.73, 194.14, 195.44, 196.21, 1…
Let’s also transform the data.frame into an xts object
TS2 <- xts(TS2[, -1], order.by = TS2$date)
TS280 <- TS2[index(TS2) < as.Date("2022-03-08"), ]We will work 10 variables of y1,y2, y3,y4…….y10. Let’s create their first differences.
TS280$dy1 <- diff.xts(TS280$y1)
TS280$dy2 <- diff.xts(TS280$y2)
TS280$dy3 <- diff.xts(TS280$y3)
TS280$dy4 <- diff.xts(TS280$y4)
TS280$dy5 <- diff.xts(TS280$y5)
TS280$dy6 <- diff.xts(TS280$y6)
TS280$dy7 <- diff.xts(TS280$y7)
TS280$dy8 <- diff.xts(TS280$y8)
TS280$dy9 <- diff.xts(TS280$y9)
TS280$dy10 <- diff.xts(TS280$y10)head(TS280) y1 y2 y3 y4 y5 y6 y7 y8 y9
2021-06-01 176.04 173.57 173.57 188.66 183.66 178.78 168.46 173.57 173.57
2021-06-02 173.37 173.57 173.57 188.46 183.52 181.10 168.41 173.57 173.57
2021-06-03 173.42 173.57 175.70 188.57 183.75 187.25 168.47 173.57 173.57
2021-06-04 178.85 174.52 177.89 188.78 182.79 190.82 170.08 173.57 173.18
2021-06-05 175.39 174.46 181.10 188.11 185.46 187.30 170.15 172.94 174.50
2021-06-06 173.89 174.54 184.44 187.61 190.18 195.26 170.33 171.78 176.99
y10 dy1 dy2 dy3 dy4 dy5 dy6 dy7 dy8 dy9 dy10
2021-06-01 188.35 NA NA NA NA NA NA NA NA NA NA
2021-06-02 188.37 -2.67 0.00 0.00 -0.20 -0.14 2.32 -0.05 0.00 0.00 0.02
2021-06-03 189.36 0.05 0.00 2.13 0.11 0.23 6.15 0.06 0.00 0.00 0.99
2021-06-04 190.46 5.43 0.95 2.19 0.21 -0.96 3.57 1.61 0.00 -0.39 1.10
2021-06-05 192.73 -3.46 -0.06 3.21 -0.67 2.67 -3.52 0.07 -0.63 1.32 2.27
2021-06-06 194.14 -1.50 0.08 3.34 -0.50 4.72 7.96 0.18 -1.16 2.49 1.41
Next, we plot both variables on the graph:
plot(TS280[, 1:10],
col = c("black", "blue", "green", "yellow", "brown", "red", "orange", "khaki", "skyblue", "purple"),
major.ticks = "years",
grid.ticks.on = "years",
grid.ticks.lty = 3,
main = "main",
legend.loc = "topleft")2 Finding the cointegrating pair
2.1 Testing integration order
In this step, we will perform ADF test on each variable and on its first difference. We found that all variables failed to reject null hypothesis (not stationary) but their first differences reject it and show ~ I(1)
- Let’s apply the ADF test for the y1 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -1.612473 | 0.4564815 | 4.5254137 | 0.0333950 |
| 1 | -1.810867 | 0.3826206 | 0.0010027 | 0.9747394 |
| 2 | -1.788709 | 0.3908699 | 0.0013712 | 0.9704610 |
| 3 | -1.543457 | 0.4821758 | 0.0416617 | 0.8382660 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -14.690558 | 0.01 | 0.0027080 | 0.9584978 |
| 1 | -11.220359 | 0.01 | 0.0044057 | 0.9470789 |
| 2 | -10.819399 | 0.01 | 0.0392761 | 0.8429027 |
| 3 | -8.547158 | 0.01 | 0.0001623 | 0.9898360 |
- Let’s apply the ADF test for the y2 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | 0.8659506 | 0.9900000 | 119.0880563 | 0.0000000 |
| 1 | -0.5511737 | 0.8515967 | 11.5634066 | 0.0006726 |
| 2 | -1.1720047 | 0.6204651 | 0.0238451 | 0.8772798 |
| 3 | -1.1295237 | 0.6362805 | 0.0010299 | 0.9743985 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -7.562637 | 0.01 | 11.1625768 | 0.0008346 |
| 1 | -4.988917 | 0.01 | 0.0410216 | 0.8394962 |
| 2 | -4.956121 | 0.01 | 0.0007462 | 0.9782074 |
| 3 | -4.773155 | 0.01 | 0.0000002 | 0.9996025 |
- Let’s apply the ADF test for the y3 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -0.2015725 | 0.9304983 | 209.3657823 | 0.0000000 |
| 1 | -1.9384662 | 0.3351162 | 9.9240848 | 0.0016313 |
| 2 | -1.3632683 | 0.5492588 | 0.0017696 | 0.9664456 |
| 3 | -1.3354933 | 0.5595993 | 0.0035390 | 0.9525620 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -4.123160 | 0.01 | 10.8047950 | 0.0010124 |
| 1 | -5.094278 | 0.01 | 0.0075211 | 0.9308907 |
| 2 | -4.959386 | 0.01 | 0.0046342 | 0.9457261 |
| 3 | -4.144990 | 0.01 | 0.0120286 | 0.9126670 |
- Let’s apply the ADF test for the y4 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -1.447157 | 0.5180277 | 7.6607276 | 0.0056436 |
| 1 | -1.474528 | 0.5078375 | 0.0021985 | 0.9626025 |
| 2 | -1.503331 | 0.4971142 | 0.0005155 | 0.9818865 |
| 3 | -1.551921 | 0.4790246 | 0.0295377 | 0.8635434 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -14.021324 | 0.01 | 0.0015227 | 0.9688733 |
| 1 | -10.510680 | 0.01 | 0.0002192 | 0.9881867 |
| 2 | -8.495650 | 0.01 | 0.0257672 | 0.8724702 |
| 3 | -5.710124 | 0.01 | 0.4717194 | 0.4921972 |
- Let’s apply the ADF test for the y5 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -0.4634981 | 0.8842378 | 204.6114803 | 0.0000000 |
| 1 | -2.0778473 | 0.2832255 | 27.7168181 | 0.0000001 |
| 2 | -1.3675042 | 0.5476818 | 1.1042529 | 0.2933347 |
| 3 | -1.6141238 | 0.4558669 | 0.0454257 | 0.8312232 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -4.568147 | 0.01 | 29.1327008 | 0.0000001 |
| 1 | -6.551754 | 0.01 | 1.0459685 | 0.3064374 |
| 2 | -5.150499 | 0.01 | 0.0568243 | 0.8115875 |
| 3 | -5.215452 | 0.01 | 0.0005999 | 0.9804591 |
- Let’s apply the ADF test for the y6 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -1.541854 | 0.4827726 | 0.3768067 | 0.5393172 |
| 1 | -1.381901 | 0.5423220 | 0.0000028 | 0.9986577 |
| 2 | -1.368216 | 0.5474167 | 0.0008816 | 0.9763125 |
| 3 | -1.347570 | 0.5551033 | 0.0001197 | 0.9912712 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -17.535836 | 0.01 | 0.0001988 | 0.9887502 |
| 1 | -11.932475 | 0.01 | 0.0009958 | 0.9748254 |
| 2 | -9.689908 | 0.01 | 0.0002580 | 0.9871838 |
| 3 | -8.193554 | 0.01 | 0.0006183 | 0.9801619 |
- Let’s apply the ADF test for the y7 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | 0.8268819 | 0.9900000 | 114.3208399 | 0.0000000 |
| 1 | -0.5611149 | 0.8478956 | 12.2393024 | 0.0004679 |
| 2 | -1.2165169 | 0.6038935 | 0.0135719 | 0.9072575 |
| 3 | -1.1935022 | 0.6124617 | 0.0005247 | 0.9817255 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -7.750137 | 0.01 | 11.8212132 | 0.0005856 |
| 1 | -5.008263 | 0.01 | 0.0289482 | 0.8648986 |
| 2 | -4.937986 | 0.01 | 0.0003131 | 0.9858821 |
| 3 | -4.811167 | 0.01 | 0.0001403 | 0.9905510 |
- Let’s apply the ADF test for the y8 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -1.431945 | 0.5236908 | 13.4724760 | 0.0002421 |
| 1 | -1.488012 | 0.5028175 | 0.0002090 | 0.9884658 |
| 2 | -1.491126 | 0.5016582 | 0.0000157 | 0.9968406 |
| 3 | -1.534106 | 0.4856570 | 0.1528621 | 0.6958151 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -13.270918 | 0.01 | 0.0005735 | 0.9808937 |
| 1 | -10.396324 | 0.01 | 0.0000715 | 0.9932540 |
| 2 | -8.099815 | 0.01 | 0.1402874 | 0.7079955 |
| 3 | -5.386180 | 0.01 | 0.9022017 | 0.3421920 |
- Let’s apply the ADF test for the y9 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -0.4605766 | 0.8853255 | 205.2272260 | 0.0000000 |
| 1 | -2.0872618 | 0.2797205 | 29.9385175 | 0.0000000 |
| 2 | -1.3408348 | 0.5576107 | 1.3903570 | 0.2383448 |
| 3 | -1.6236822 | 0.4523083 | 0.0413206 | 0.8389203 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -4.541503 | 0.01 | 31.3781304 | 0.0000000 |
| 1 | -6.622375 | 0.01 | 1.3276430 | 0.2492250 |
| 2 | -5.115429 | 0.01 | 0.0532809 | 0.8174495 |
| 3 | -5.214835 | 0.01 | 0.0006054 | 0.9803698 |
- Let’s apply the ADF test for the y10 variable and its first differences.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -0.2448094 | 0.9244931 | 159.3485111 | 0.0000000 |
| 1 | -1.3686879 | 0.5472412 | 6.6095975 | 0.0101431 |
| 2 | -1.5941724 | 0.4632947 | 0.2248960 | 0.6353345 |
| 3 | -1.2816226 | 0.5796550 | 0.0401467 | 0.8411941 |
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -5.967757 | 0.01 | 6.0503095 | 0.0139039 |
| 1 | -4.507198 | 0.01 | 0.2194413 | 0.6394659 |
| 2 | -5.115771 | 0.01 | 0.0321375 | 0.8577262 |
| 3 | -4.457715 | 0.01 | 0.0015244 | 0.9688561 |
2.2 Finding cointegrated pair
We are following Engle Granger’s two steps method (All time series are integrated of the same order from previous step) + Step 1: Estimation of the cointegrating vector using standard OLS regression yt = β0 + β1xt + ut + Step 2: Testing whether residuals ut obtained in previous step are stationary
| pair | p_adf | result | cointegration_vector |
|---|---|---|---|
| (y= 1 ,y= 2 ) | 0.33939 | Not Conintegrated | |
| (y= 1 ,y= 3 ) | 0.26743 | Not Conintegrated | |
| (y= 1 ,y= 4 ) | 0.06483 | Not Conintegrated | |
| (y= 1 ,y= 5 ) | 0.10939 | Not Conintegrated | |
| (y= 1 ,y= 6 ) | 0.20572 | Not Conintegrated | |
| (y= 1 ,y= 7 ) | 0.33993 | Not Conintegrated | |
| (y= 1 ,y= 8 ) | 0.12226 | Not Conintegrated | |
| (y= 1 ,y= 9 ) | 0.10933 | Not Conintegrated | |
| (y= 1 ,y= 10 ) | 0.26898 | Not Conintegrated | |
| (y= 2 ,y= 3 ) | 0.72717 | Not Conintegrated | |
| (y= 2 ,y= 4 ) | 0.94807 | Not Conintegrated | |
| (y= 2 ,y= 5 ) | 0.97572 | Not Conintegrated | |
| (y= 2 ,y= 6 ) | 0.37713 | Not Conintegrated | |
| (y= 2 ,y= 7 ) | 0.01000 | Conintegrated | [1, 68.1851311723423 , 0.625236077108806 ] |
| (y= 2 ,y= 8 ) | 0.95193 | Not Conintegrated | |
| (y= 2 ,y= 9 ) | 0.97568 | Not Conintegrated | |
| (y= 2 ,y= 10 ) | 0.72547 | Not Conintegrated | |
| (y= 3 ,y= 4 ) | 0.92518 | Not Conintegrated | |
| (y= 3 ,y= 5 ) | 0.63933 | Not Conintegrated | |
| (y= 3 ,y= 6 ) | 0.04209 | Conintegrated | [1, 458.357113699047 , -1.09976516009988 ] |
| (y= 3 ,y= 7 ) | 0.24327 | Not Conintegrated | |
| (y= 3 ,y= 8 ) | 0.93852 | Not Conintegrated | |
| (y= 3 ,y= 9 ) | 0.63780 | Not Conintegrated | |
| (y= 3 ,y= 10 ) | 0.01000 | Conintegrated | [1, -203.015788110552 , 1.99754785215493 ] |
| (y= 4 ,y= 5 ) | 0.37860 | Not Conintegrated | |
| (y= 4 ,y= 6 ) | 0.35362 | Not Conintegrated | |
| (y= 4 ,y= 7 ) | 0.37412 | Not Conintegrated | |
| (y= 4 ,y= 8 ) | 0.01000 | Conintegrated | [1, 101.667786218057 , 0.500832238386179 ] |
| (y= 4 ,y= 9 ) | 0.37904 | Not Conintegrated | |
| (y= 4 ,y= 10 ) | 0.68387 | Not Conintegrated | |
| (y= 5 ,y= 6 ) | 0.36053 | Not Conintegrated | |
| (y= 5 ,y= 7 ) | 0.71763 | Not Conintegrated | |
| (y= 5 ,y= 8 ) | 0.58779 | Not Conintegrated | |
| (y= 5 ,y= 9 ) | 0.01000 | Conintegrated | [1, -163.627642701986 , 2.00036509575433 ] |
| (y= 5 ,y= 10 ) | 0.60186 | Not Conintegrated | |
| (y= 6 ,y= 7 ) | 0.09845 | Not Conintegrated | |
| (y= 6 ,y= 8 ) | 0.33174 | Not Conintegrated | |
| (y= 6 ,y= 9 ) | 0.19019 | Not Conintegrated | |
| (y= 6 ,y= 10 ) | 0.02405 | Conintegrated | [1, 374.824324478322 , -0.789528588779833 ] |
| (y= 7 ,y= 8 ) | 0.94874 | Not Conintegrated | |
| (y= 7 ,y= 9 ) | 0.97387 | Not Conintegrated | |
| (y= 7 ,y= 10 ) | 0.71908 | Not Conintegrated | |
| (y= 8 ,y= 9 ) | 0.40423 | Not Conintegrated | |
| (y= 8 ,y= 10 ) | 0.72315 | Not Conintegrated | |
| (y= 9 ,y= 10 ) | 0.60126 | Not Conintegrated |
List of cointegrating pairs we found are
| pair | p_adf | aic | bic |
|---|---|---|---|
| (y= 2 ,y= 7 ) | 0.01000 | -375.6940 | -364.78962 |
| (y= 3 ,y= 6 ) | 0.04209 | 2752.2384 | 2763.14272 |
| (y= 3 ,y= 10 ) | 0.01000 | 288.0713 | 298.97563 |
| (y= 4 ,y= 8 ) | 0.01000 | -110.0208 | -99.11645 |
| (y= 5 ,y= 9 ) | 0.01000 | -461.1635 | -450.25918 |
| (y= 6 ,y= 10 ) | 0.02405 | 2465.8013 | 2476.70562 |
We selected y4,y8 to move forward
2.3. Let’s examine the selected model summary:
model.coint <- lm(y4 ~ y8, data = TS280)
summary(model.coint)
Call:
lm(formula = y4 ~ y8, data = TS280)
Residuals:
Min 1Q Median 3Q Max
-0.47565 -0.14376 0.00293 0.13099 0.50487
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 101.6677862 0.1159222 877.0 <2e-16 ***
y8 0.5008322 0.0007632 656.2 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1974 on 278 degrees of freedom
Multiple R-squared: 0.9994, Adjusted R-squared: 0.9994
F-statistic: 4.306e+05 on 1 and 278 DF, p-value: < 2.2e-16
We can observe that y8 has high significance on standard OLS regression model of Engle Granger’s method and if y8 increases by 1 unit y4 increases by 0.500849 units in the long run.
Next, we have to test stationarity of residuals. We found that the residuals of the model(y4-y8) is white noise as null hypothesis is rejected.
| augmentations | adf | p_adf | bgodfrey | p_bg |
|---|---|---|---|---|
| 0 | -14.78343 | 0.01 | 0.0045200 | 0.9463977 |
| 1 | -11.29182 | 0.01 | 0.0053064 | 0.9419294 |
| 2 | -10.88541 | 0.01 | 0.0152850 | 0.9016062 |
| 3 | -8.64154 | 0.01 | 0.0054240 | 0.9412908 |
The ADF test with no augmentations can be used its result is that non-stationarity of residuals is STRONGLY REJECTED, so residuals are stationary, which means that y4 and y8 are cointegrated.
3 Estimating VECM
3.1 Johansen cointegration test
We have already performed a univariate cointegration test of Engle and Granger. Our conclusions are that y4 and y8 variables are cointegrated of order ~CI(1,1).
As an alternative, we will now perform a multivariate test of Johansen. We will assume the K=6 lag structure.
Johansen Trace test
johan.test.trace <-
ca.jo(TS280[, c("y4", "y8")], # data
ecdet = "const", # "none" for no intercept in cointegrating equation,
# "const" for constant term in cointegrating equation and
# "trend" for trend variable in cointegrating equation
type = "trace", # type of the test: trace or eigen
K = 6, # lag order of the series (levels) in the VAR
)
summary(johan.test.trace)
######################
# Johansen-Procedure #
######################
Test type: trace statistic , without linear trend and constant in cointegration
Eigenvalues (lambda):
[1] 1.554181e-01 1.037403e-02 3.122502e-17
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r <= 1 | 2.86 7.52 9.24 12.97
r = 0 | 49.14 17.85 19.96 24.60
Eigenvectors, normalised to first column:
(These are the cointegration relations)
y4.l6 y8.l6 constant
y4.l6 1.0000000 1.0000000 1.0000000
y8.l6 -0.5004317 -0.5965232 -0.6146598
constant -101.7277663 -88.0195945 -81.1869793
Weights W:
(This is the loading matrix)
y4.l6 y8.l6 constant
y4.d -2.133772 0.03181788 -0.0000000000003024740
y8.d -2.378166 0.07274095 -0.0000000000003385324
Let’s find interpretation of the test results:
cbind(summary(johan.test.trace)@teststat, summary(johan.test.trace)@cval) 10pct 5pct 1pct
r <= 1 | 2.857331 7.52 9.24 12.97
r = 0 | 49.139638 17.85 19.96 24.60
Let’s recall that if the test statistic is SMALLER than the critical value, we CANNOT reject the null. If the test statistic is LARGER than the critical value, we REJECT the null.
We start with testing the hypothesis that r = 0. The testing statistic is greater than the critical value, hence the null is rejected (at 5% level).
Next, we test that r <= 1. In this case, the testing statistic is lower that the critical value, hence the null is NOT rejected.
We can say there is 1 cointegrating vector for pair y4 and y8
4 b. It is the first eigenvector (the first column):
summary(johan.test.trace)@V y4.l6 y8.l6 constant
y4.l6 1.0000000 1.0000000 1.0000000
y8.l6 -0.5004317 -0.5965232 -0.6146598
constant -101.7277663 -88.0195945 -81.1869793
Weights W:
summary(johan.test.trace)@W y4.l6 y8.l6 constant
y4.d -2.133772 0.03181788 -0.0000000000003024740
y8.d -2.378166 0.07274095 -0.0000000000003385324
Let’s apply the alternative variant of the test: Maximal Eigen Value Test
johan.test.eigen <-
ca.jo(TS280[, c("y4", "y8")], # data
ecdet = "const", # "none" for no intercept in cointegrating equation,
# "const" for constant term in cointegrating equation and
# "trend" for trend variable in cointegrating equation
type = "eigen", # type of the test: trace or eigen
K = 6, # lag order of the series (levels) in the VAR
)
summary(johan.test.eigen)
######################
# Johansen-Procedure #
######################
Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration
Eigenvalues (lambda):
[1] 1.554181e-01 1.037403e-02 3.122502e-17
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r <= 1 | 2.86 7.52 9.24 12.97
r = 0 | 46.28 13.75 15.67 20.20
Eigenvectors, normalised to first column:
(These are the cointegration relations)
y4.l6 y8.l6 constant
y4.l6 1.0000000 1.0000000 1.0000000
y8.l6 -0.5004317 -0.5965232 -0.6146598
constant -101.7277663 -88.0195945 -81.1869793
Weights W:
(This is the loading matrix)
y4.l6 y8.l6 constant
y4.d -2.133772 0.03181788 -0.0000000000003024740
y8.d -2.378166 0.07274095 -0.0000000000003385324
The conclusions are the same: 1 cointegrating vector same as Trace test. The variant of the Johansen test does not have impact on the parameters of cointegrating vector.
3.2 The VECM model
The Trace and Maximal Eigen Value give the same cointegrating vector, so any of the two can be used to estimate VECM.
Estimation of VECM Model
TS280.vec4 <- cajorls(johan.test.eigen, # defined specification
r = 1) # number of cointegrating vectorsTS280.vec4 %>% head(3)$rlm
Call:
lm(formula = substitute(form1), data = data.mat)
Coefficients:
y4.d y8.d
ect1 -2.1338 -2.3782
y4.dl1 -1.0154 -0.2652
y8.dl1 0.6428 0.3681
y4.dl2 -1.1390 -0.5001
y8.dl2 0.5728 0.2340
y4.dl3 -1.4837 -0.8777
y8.dl3 0.7555 0.4572
y4.dl4 -1.5764 -1.1756
y8.dl4 0.9628 0.9257
y4.dl5 -1.8125 -1.6470
y8.dl5 0.8059 0.6391
$beta
ect1
y4.l6 1.0000000
y8.l6 -0.5004317
constant -101.7277663
summary(TS280.vec4) Length Class Mode
rlm 12 mlm list
beta 3 -none- numeric
To print results we may refer to the element called $rlm and produce its summary with significance tests.
str(TS280.vec4)List of 2
$ rlm :List of 12
..$ coefficients : num [1:11, 1:2] -2.134 -1.015 0.643 -1.139 0.573 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
.. .. ..$ : chr [1:2] "y4.d" "y8.d"
..$ residuals : num [1:274, 1:2] 0.722 -0.108 -0.767 0.313 -0.363 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:274] "2021-06-01" "2021-06-02" "2021-06-03" "2021-06-04" ...
.. .. ..$ : chr [1:2] "y4.d" "y8.d"
..$ effects : num [1:274, 1:2] 1.314 -1.983 -2.856 0.157 0.46 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:274] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
.. .. ..$ : chr [1:2] "y4.d" "y8.d"
..$ rank : int 11
..$ fitted.values: num [1:274, 1:2] -0.1124 0.56786 0.26744 -0.40324 -0.00748 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:274] "2021-06-01" "2021-06-02" "2021-06-03" "2021-06-04" ...
.. .. ..$ : chr [1:2] "y4.d" "y8.d"
..$ assign : int [1:11] 1 2 3 4 5 6 7 8 9 10 ...
..$ qr :List of 5
.. ..$ qr : num [1:274, 1:11] -3.26651 -0.03909 -0.00542 0.05887 -0.04972 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:274] "2021-06-01" "2021-06-02" "2021-06-03" "2021-06-04" ...
.. .. .. ..$ : chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
.. .. ..- attr(*, "assign")= int [1:11] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ qraux: num [1:11] 1.02 1.04 1 1.07 1.15 ...
.. ..$ pivot: int [1:11] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ tol : num 0.0000001
.. ..$ rank : int 11
.. ..- attr(*, "class")= chr "qr"
..$ df.residual : int 263
..$ xlevels : Named list()
..$ call : language lm(formula = substitute(form1), data = data.mat)
..$ terms :Classes 'terms', 'formula' language z@Z0 ~ ect1 + y4.dl1 + y8.dl1 + y4.dl2 + y8.dl2 + y4.dl3 + y8.dl3 + y4.dl4 + y8.dl4 + y4.dl5 + y8.dl5 - 1
.. .. ..- attr(*, "variables")= language list(z@Z0, ect1, y4.dl1, y8.dl1, y4.dl2, y8.dl2, y4.dl3, y8.dl3, y4.dl4, y8.dl4, y4.dl5, y8.dl5)
.. .. ..- attr(*, "factors")= int [1:12, 1:11] 0 1 0 0 0 0 0 0 0 0 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:12] "z@Z0" "ect1" "y4.dl1" "y8.dl1" ...
.. .. .. .. ..$ : chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
.. .. ..- attr(*, "term.labels")= chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
.. .. ..- attr(*, "order")= int [1:11] 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..- attr(*, "intercept")= int 0
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: 0x7fec381470c8>
.. .. ..- attr(*, "predvars")= language list(z@Z0, ect1, y4.dl1, y8.dl1, y4.dl2, y8.dl2, y4.dl3, y8.dl3, y4.dl4, y8.dl4, y4.dl5, y8.dl5)
.. .. ..- attr(*, "dataClasses")= Named chr [1:12] "nmatrix.2" "numeric" "numeric" "numeric" ...
.. .. .. ..- attr(*, "names")= chr [1:12] "z@Z0" "ect1" "y4.dl1" "y8.dl1" ...
..$ model :'data.frame': 274 obs. of 12 variables:
.. ..$ z@Z0 : num [1:274, 1:2] 0.61 0.46 -0.5 -0.09 -0.37 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : chr [1:2] "y4.d" "y8.d"
.. ..$ ect1 : num [1:274] 0.0723 -0.1277 -0.0177 0.1923 -0.1624 ...
.. ..$ y4.dl1: num [1:274] -0.5 0.61 0.46 -0.5 -0.09 ...
.. ..$ y8.dl1: num [1:274] -1.16 1.59 0.85 -1.77 -0.78 ...
.. ..$ y4.dl2: num [1:274] -0.67 -0.5 0.61 0.46 -0.5 ...
.. ..$ y8.dl2: num [1:274] -0.63 -1.16 1.59 0.85 -1.77 ...
.. ..$ y4.dl3: num [1:274] 0.21 -0.67 -0.5 0.61 0.46 ...
.. ..$ y8.dl3: num [1:274] 0 -0.63 -1.16 1.59 0.85 ...
.. ..$ y4.dl4: num [1:274] 0.11 0.21 -0.67 -0.5 0.61 ...
.. ..$ y8.dl4: num [1:274] 0 0 -0.63 -1.16 1.59 ...
.. ..$ y4.dl5: num [1:274] -0.2 0.11 0.21 -0.67 -0.5 ...
.. ..$ y8.dl5: num [1:274] 0 0 0 -0.63 -1.16 ...
.. ..- attr(*, "terms")=Classes 'terms', 'formula' language z@Z0 ~ ect1 + y4.dl1 + y8.dl1 + y4.dl2 + y8.dl2 + y4.dl3 + y8.dl3 + y4.dl4 + y8.dl4 + y4.dl5 + y8.dl5 - 1
.. .. .. ..- attr(*, "variables")= language list(z@Z0, ect1, y4.dl1, y8.dl1, y4.dl2, y8.dl2, y4.dl3, y8.dl3, y4.dl4, y8.dl4, y4.dl5, y8.dl5)
.. .. .. ..- attr(*, "factors")= int [1:12, 1:11] 0 1 0 0 0 0 0 0 0 0 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:12] "z@Z0" "ect1" "y4.dl1" "y8.dl1" ...
.. .. .. .. .. ..$ : chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
.. .. .. ..- attr(*, "term.labels")= chr [1:11] "ect1" "y4.dl1" "y8.dl1" "y4.dl2" ...
.. .. .. ..- attr(*, "order")= int [1:11] 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..- attr(*, "intercept")= int 0
.. .. .. ..- attr(*, "response")= int 1
.. .. .. ..- attr(*, ".Environment")=<environment: 0x7fec381470c8>
.. .. .. ..- attr(*, "predvars")= language list(z@Z0, ect1, y4.dl1, y8.dl1, y4.dl2, y8.dl2, y4.dl3, y8.dl3, y4.dl4, y8.dl4, y4.dl5, y8.dl5)
.. .. .. ..- attr(*, "dataClasses")= Named chr [1:12] "nmatrix.2" "numeric" "numeric" "numeric" ...
.. .. .. .. ..- attr(*, "names")= chr [1:12] "z@Z0" "ect1" "y4.dl1" "y8.dl1" ...
..- attr(*, "class")= chr [1:2] "mlm" "lm"
$ beta: num [1:3, 1] 1 -0.5 -101.7
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "y4.l6" "y8.l6" "constant"
.. ..$ : chr "ect1"
summary(TS280.vec4$rlm)Response y4.d :
Call:
lm(formula = y4.d ~ ect1 + y4.dl1 + y8.dl1 + y4.dl2 + y8.dl2 +
y4.dl3 + y8.dl3 + y4.dl4 + y8.dl4 + y4.dl5 + y8.dl5 - 1,
data = data.mat)
Residuals:
Min 1Q Median 3Q Max
-1.91359 -0.45138 -0.01138 0.38916 1.71334
Coefficients:
Estimate Std. Error t value Pr(>|t|)
ect1 -2.1338 0.5040 -4.234 0.0000317547 ***
y4.dl1 -1.0154 0.2105 -4.823 0.0000023981 ***
y8.dl1 0.6428 0.1087 5.912 0.0000000104 ***
y4.dl2 -1.1390 0.2813 -4.049 0.0000676006 ***
y8.dl2 0.5728 0.1456 3.934 0.000107 ***
y4.dl3 -1.4837 0.3386 -4.381 0.0000170490 ***
y8.dl3 0.7555 0.1746 4.326 0.0000215442 ***
y4.dl4 -1.5764 0.4049 -3.893 0.000125 ***
y8.dl4 0.9628 0.2075 4.640 0.0000054915 ***
y4.dl5 -1.8125 0.4557 -3.978 0.0000900790 ***
y8.dl5 0.8059 0.2345 3.437 0.000684 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6668 on 263 degrees of freedom
Multiple R-squared: 0.2513, Adjusted R-squared: 0.22
F-statistic: 8.025 on 11 and 263 DF, p-value: 0.00000000000472
Response y8.d :
Call:
lm(formula = y8.d ~ ect1 + y4.dl1 + y8.dl1 + y4.dl2 + y8.dl2 +
y4.dl3 + y8.dl3 + y4.dl4 + y8.dl4 + y4.dl5 + y8.dl5 - 1,
data = data.mat)
Residuals:
Min 1Q Median 3Q Max
-3.7887 -0.7525 -0.0852 0.6830 3.4196
Coefficients:
Estimate Std. Error t value Pr(>|t|)
ect1 -2.3782 0.9760 -2.437 0.0155 *
y4.dl1 -0.2652 0.4078 -0.650 0.5161
y8.dl1 0.3681 0.2106 1.748 0.0816 .
y4.dl2 -0.5001 0.5447 -0.918 0.3594
y8.dl2 0.2340 0.2820 0.830 0.4075
y4.dl3 -0.8777 0.6558 -1.338 0.1820
y8.dl3 0.4572 0.3382 1.352 0.1775
y4.dl4 -1.1756 0.7842 -1.499 0.1350
y8.dl4 0.9257 0.4018 2.304 0.0220 *
y4.dl5 -1.6470 0.8825 -1.866 0.0631 .
y8.dl5 0.6391 0.4541 1.407 0.1606
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.291 on 263 degrees of freedom
Multiple R-squared: 0.1914, Adjusted R-squared: 0.1576
F-statistic: 5.659 on 11 and 263 DF, p-value: 0.00000003609
3.3 Interpreting all the parameters of the VECM model,
This is the F-test for y4 on y8 for joint significance test where null hypothesis is given below \[H_0 : \alpha_{11}=\gamma_{11}=\gamma_{12} =\gamma_{13}=\gamma_{14}=\gamma_{15}=0 \] ##The p-value for joint significance test is less than 0.05. Hence the null hypothesis is rejected and the parameters/coefficients are jointly significant. ##We can observe that p-values of each parameters/coefficients are less than 0.05 and hence each parameters has high significance for y4 on y8. \[H_0 : \alpha_{11}=0 \] ##Here the p-value is less than 0.05 and the coefficient of ect1 is non-zero and hence it is highly significant and thus the error correction mechanism exists.
3.4 Commenting about the signs of the estimated parameters,
The sign of the estimated coefficient/ parameter of ect1 is negative for y4 on y8
This is the F-test for y8 on y4 for joint significance test where null hypothesis is given below \[H_0 : \alpha_{22}=\gamma_{21}=\gamma_{22} =\gamma_{23}=\gamma_{24}=\gamma_{25}=0 \] ##The p-value for joint significance test is less than 0.05. Hence the null hypothesis is rejected and the parameters/coefficients are jointly significant.
We can observe that p-values of each parameters/coefficients are not less than 0.05 individually and hence each parameters is not significant for y8 on y4.
3.5 Commenting if the Error Correction Mechanism work
\[H_0 : \alpha_{21}=0 \] ##Here the p-value is less than 0.1 and the coefficient is not zero and significant at 10% level and the error correction mechanism exists. ##The sign of the coefficient of ect1 is expected to be positive but comes as negative for y8 on y4.
The VECM model is given by where p= 5 for VECM:
\[\Delta\boldsymbol{y}_t = \Pi\boldsymbol{y}_{t-1} + \Gamma_1\Delta\boldsymbol{y}_{t-1} + ... + \Gamma_5\Delta\boldsymbol{y}_{t-5} + \boldsymbol{\varepsilon}_t\]
The ect1 elements in both equations contain the adjustment coefficients (\(\alpha_{11}\) and \(\alpha_{21}\)), which are the result of the decomposition of the \(\Pi\) matrix:
\[ \boldsymbol{\Pi}=\alpha\beta'=\left[ \begin{array}{ccc} \alpha_{11} \\ \alpha_{21} \\ \end{array} \right] \left[ \begin{array}{cccc} \beta_{11} \quad \beta_{21} \\ \end{array} \right] \]
We can extract the cointegrating vector in the following way:
TS280.vec4$beta ect1
y4.l6 1.0000000
y8.l6 -0.5004317
constant -101.7277663
3.6 We can reparametrize the VEC model into VAR model:
TS280.vec4.asVAR <- vec2var(johan.test.eigen, r = 1)Lets see the result:
TS280.vec4.asVAR
Coefficient matrix of lagged endogenous variables:
A1:
y4.l1 y8.l1
y4 -0.01542249 0.6427725
y8 -0.26515511 1.3681022
A2:
y4.l2 y8.l2
y4 -0.1235365 -0.07001025
y8 -0.2349052 -0.13414844
A3:
y4.l3 y8.l3
y4 -0.3447433 0.1827072
y8 -0.3776032 0.2232662
A4:
y4.l4 y8.l4
y4 -0.09267368 0.2073016
y8 -0.29793204 0.4684647
A5:
y4.l5 y8.l5
y4 -0.2361584 -0.1568306
y8 -0.4714176 -0.2866279
A6:
y4.l6 y8.l6
y4 -0.3212379 0.2618668
y8 -0.7311532 0.5510530
Coefficient matrix of deterministic regressor(s).
constant
y4 217.0639
y8 241.9255
3.7 Based on the reparametrized model, we can calculate and plot Impulse Response Functions:
We can observe the shock from y4 impact on all variables and after 20 time periods the response of the shock disappears and shock from y8 or from y4 has more impact on y4
3.8 We can also perform for the Forecast Error Variance Decomposition:
We can observe that the variance of forecast errors of y4 can be explained by the shocks to each explanatory varibles y4 and y8 for the time ahead as s= 19 .
3.10 You can see the ACF and PACF functions by plotting the results of the serial.test()
plot(serial.test(TS280.vec4.asVAR))3.11 Histograms of residuals for the VAR Model test 13
TS280.vec4.asVAR %>%
residuals() %>%
as_tibble() %>%
ggplot(aes(`resids of y8`)) +
geom_histogram(aes(y =..density..),
colour = "red",
fill = "green") +
stat_function(fun = dnorm,
args = list(mean = mean(residuals(TS280.vec4.asVAR)[, 1]),
sd = sd(residuals(TS280.vec4.asVAR)[, 1]))) +
theme_bw() +
labs(
title = "Density of y8 residuals",
y = "", x = "",
caption = "source: own calculations"
)TS280.vec4.asVAR %>%
residuals() %>%
as_tibble() %>%
ggplot(aes(`resids of y4`)) +
geom_histogram(aes(y =..density..),
colour = "blue",
fill = "pink") +
stat_function(fun = dnorm,
args = list(mean = mean(residuals(TS280.vec4.asVAR)[, 2]),
sd = sd(residuals(TS280.vec4.asVAR)[, 2]))) +
theme_bw() +
labs(
title = "Density of y4 residuals",
y = "", x = "",
caption = "source: own calculations"
)3.12 We have checked normality of residuals by applying the Jarque-Bera (JB) test.
normality.test(TS280.vec4.asVAR)$JB
JB-Test (multivariate)
data: Residuals of VAR object TS280.vec4.asVAR
Chi-squared = 2.9791, df = 4, p-value = 0.5613
$Skewness
Skewness only (multivariate)
data: Residuals of VAR object TS280.vec4.asVAR
Chi-squared = 2.9743, df = 2, p-value = 0.226
$Kurtosis
Kurtosis only (multivariate)
data: Residuals of VAR object TS280.vec4.asVAR
Chi-squared = 0.0048029, df = 2, p-value = 0.9976
P-value for JB normality test is greater than 0.05 and hence the null about normality is failed to be rejected and the residuals are normally distributed ##There is no skewness as the null hypothesis about no skewness is failed to be rejected and same as for the Kurtosis. ##Hence the residuals for VAR model(6) are normally distributed
3.13 Forecasting based on the VECM
Now we will calculate 20 days forecasts for y4 and y8` variables.
Forecasting the model based on VECM
TS.vec4.fore <-
predict(
vec2var(
johan.test.eigen,
r = 1), # no of cointegrating vectors
n.ahead = 20, # forecast horizon
ci = 0.95) # confidence level for intervalsVEC forecasts for y4:
TS.vec4.fore$fcst$y4 fcst lower upper CI
[1,] 175.0847 173.8043 176.3652 1.280446
[2,] 175.1386 173.1088 177.1685 2.029872
[3,] 174.9821 172.3809 177.5832 2.601130
[4,] 174.6606 171.5812 177.7400 3.079364
[5,] 174.5546 170.8565 178.2528 3.698153
[6,] 174.5923 170.3856 178.7990 4.206710
[7,] 174.5348 169.8859 179.1838 4.648943
[8,] 174.4753 169.4254 179.5251 5.049848
[9,] 174.4933 169.0326 179.9540 5.460669
[10,] 174.5463 168.7312 180.3614 5.815102
[11,] 174.5239 168.3825 180.6654 6.141446
[12,] 174.5132 168.0636 180.9627 6.449523
[13,] 174.5297 167.7747 181.2847 6.754967
[14,] 174.5517 167.5153 181.5881 7.036406
[15,] 174.5376 167.2315 181.8438 7.306146
[16,] 174.5346 166.9678 182.1015 7.566867
[17,] 174.5406 166.7179 182.3633 7.822702
[18,] 174.5461 166.4796 182.6126 8.066507
[19,] 174.5379 166.2340 182.8417 8.303838
[20,] 174.5378 166.0025 183.0730 8.535244
VEC forecasts for y8:
TS.vec4.fore$fcst$y8 fcst lower upper CI
[1,] 146.6000 144.1202 149.0798 2.479794
[2,] 146.8210 142.8750 150.7671 3.946044
[3,] 146.3962 141.3365 151.4560 5.059788
[4,] 145.7701 139.7690 151.7712 6.001057
[5,] 145.4950 138.2447 152.7454 7.250348
[6,] 145.6225 137.3334 153.9116 8.289066
[7,] 145.5023 136.3134 154.6911 9.188837
[8,] 145.3861 135.3907 155.3814 9.995338
[9,] 145.4009 134.5821 156.2197 10.818809
[10,] 145.5081 133.9749 157.0412 11.533145
[11,] 145.4634 133.2730 157.6539 12.190470
[12,] 145.4455 132.6339 158.2572 12.811663
[13,] 145.4778 132.0522 158.9034 13.425582
[14,] 145.5221 131.5306 159.5137 13.991590
[15,] 145.4939 130.9612 160.0265 14.532631
[16,] 145.4872 130.4311 160.5433 15.056139
[17,] 145.4997 129.9304 161.0691 15.569362
[18,] 145.5110 129.4522 161.5699 16.058831
[19,] 145.4952 128.9605 162.0298 16.534609
[20,] 145.4942 128.4955 162.4929 16.998660
tail(index(TS2), 20) [1] "2022-03-08" "2022-03-09" "2022-03-10" "2022-03-11" "2022-03-12"
[6] "2022-03-13" "2022-03-14" "2022-03-15" "2022-03-16" "2022-03-17"
[11] "2022-03-18" "2022-03-19" "2022-03-20" "2022-03-21" "2022-03-22"
[16] "2022-03-23" "2022-03-24" "2022-03-25" "2022-03-26" "2022-03-27"
y4_forecast <- xts(TS.vec4.fore$fcst$y4[,-4],
# we exclude the last column with CI
tail(index(TS2), 20))For y4 forecast:
names(y4_forecast) <- c("y4_fore", "y4_lower", "y4_upper")For y8 forecasts:
y8_forecast <- xts(TS.vec4.fore$fcst$y8[, -4],
# we exclude the last column with CI
tail(index(TS2), 20))
names(y8_forecast) <- c("y8_fore", "y8_lower", "y8_upper")Now, we can merge the data together:
TS <- merge(TS2,
y4_forecast,
y8_forecast)Lets compare the forecasted and real data on the plot.
plot(TS[index(TS) > as.Date("2022-03-07"), c("y4", "y4_fore",
"y4_lower", "y4_upper")],
main = "20 days forecast of energy y4",
col = c("black", "blue", "red", "red"))
plot(TS[index(TS) > as.Date("2022-03-07"), c("y8", "y8_fore",
"y8_lower", "y8_upper")],
main = "20 days forecast of energy y8",
col = c("black", "blue", "red", "red"))Here we get the forecasting price of y8 and y4 on 20 time periods and the original and projected observation are lying within the upper and lower boundaries. HERE the blue represents the projected/forecast and the blue represents the original observation.
We have Calculated forecast accuracy measures (MAE, MSE, MAPE, AMAPE) based on VECM model and compared it with the accuracy of the forecast based on VECM model
errors_matrix <- data.frame()
errors_matrix <- calculate_errors(errors_matrix, TS$y4, TS$y4_fore,"y4_vecm")
errors_matrix <- calculate_errors(errors_matrix, TS$y8, TS$y8_fore, "y8_vecm")errors_matrix[c("y4_vecm", "y8_vecm"),]| MAE | MSE | MAPE | AMAPE | |
|---|---|---|---|---|
| y4_vecm | 0.9862111 | 1.399098 | 0.0056067 | 0.0028136 |
| y8_vecm | 1.9383977 | 5.230081 | 0.0130852 | 0.0065978 |
3.14 Interpretation of Errors:
- MAE for y4: 0.986
- On average, the absolute error in predictions for
y4is 0.986 units. This indicates the typical size of the error in your predictions.
- On average, the absolute error in predictions for
- MSE for y4: 1.399
- The mean squared error for
y4is 1.399. This value is higher than the MAE, suggesting that there are some larger errors that are being squared, thus increasing the overall error value.
- The mean squared error for
- MAPE for y4: 0.0056 (or 0.56%)
- The mean absolute percentage error for
y4is 0.56%. This indicates that, on average, the predictions fory4deviate from the actual values by 0.56% of the actual values.
- The mean absolute percentage error for
- AMAPE for y4: 0.0028 (or 0.28%)
- The adjusted mean absolute percentage error for
y4is 0.28%. This metric provides a different perspective on the error, accounting for the scale of the actual and predicted values.
- The adjusted mean absolute percentage error for
- MAE for y8: 1.938
- On average, the absolute error in predictions for
y8is 1.938 units, which is significantly higher than that fory4.
- On average, the absolute error in predictions for
- MSE for y8: 5.230
- The mean squared error for
y8is 5.230. This is much higher than the MSE fory4, suggesting the presence of larger errors in the predictions fory8.
- The mean squared error for
- MAPE for y8: 0.0131 (or 1.31%)
- The mean absolute percentage error for
y8is 1.31%. This indicates that, on average, the predictions fory8deviate from the actual values by 1.31% of the actual values.
- The mean absolute percentage error for
- AMAPE for y8: 0.0066 (or 0.66%)
- The adjusted mean absolute percentage error for
y8is 0.66%. Again, this provides a sense of the prediction error relative to the combined scale of actual and predicted values.
- The adjusted mean absolute percentage error for
- The errors for
y4are generally lower than those fory8, indicating better prediction accuracy fory4. - Both MSE and MAE are higher for
y8than fory4, with the difference in MSE being particularly pronounced, suggesting that the predictions fory8have larger errors. - The MAPE and AMAPE for
y4are significantly lower than fory8, indicating more accurate percentage-based predictions fory4.
Overall, these metrics suggest that the model’s predictions for y4 are more accurate and consistent than those for y8.
4 ARIMA Modeling for y4
We will apply Box-Jenkins procedure following below steps
4.1 step 1: Identification
Plotting actual prices y4
tibble(df = TS280$y4) %>%
ggplot(aes(zoo::index(TS280), df)) +
geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "Actual Y4 Price",
subtitle = paste0("Number of observations: ", length(TS280$y4)),
caption = "source: TSA 2024",
x="",
y=""
)Plotting log transformed y4 prices
y4_log <- log(TS280$y4)
y4_log_return <- periodReturn(y4_log, period="daily", type="log")
## Plotting Log Transformed y4 Price
tibble(df2 = y4_log) %>%
ggplot(aes(zoo::index(TS280), df2)) +
geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "Log Transformed y4 Price",
subtitle = paste0("Number of observations: ", length(TS280$y4)),
caption = "source: TSA 2024",
x="",
y=""
)Plotting return of log transferred y4 prices
tibble(df3 = y4_log_return) %>%
ggplot(aes(zoo::index(TS280), df3)) +
geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "Log Transformed y4 return",
subtitle = paste0("Number of observations: ", length(TS280$y4)),
caption = "source: TSA 2024",
x="",
y=""
)| Data | Training_Sample | ADF_Test | PP_Test | BG_Test |
|---|---|---|---|---|
| y4 | 01/06/2021~07/03/2022 | -1.416 ( 0.822 ) | -5.546 ( 0.799 ) | 7.661 ( 0.006 ) |
| y4 log | 01/06/2021~07/03/2022 | -1.383 ( 0.836 ) | -5.218 ( 0.818 ) | 7.566 ( 0.006 ) |
| y4 log return | 01/06/2021~07/03/2033 | -6.072 ( 0.010 ) | -258.965 ( 0.010 ) | 0.006 ( 0.938 ) |
Result shows y4 log return is stationary
Plotting ACF and PACF for y4 log return
plot_acf_pacf(y4_log_return)ACF and PACF suggest initially 4 lags MA and 0 lag AR process.
4.2 step 2 & 3: Estimation and Diagnostics
We will take all the possible combination of ARIMA having AR order p from 0 to 4 and MA q also from 0 to 4. It will help to see possible models with respective AIC and BIC.
y4_models_detail <- data.frame()
for (p in 0:4) {
for (q in 0:4) {
if(! (p==0 & q==0)){
cur_model <- Arima(y4_log_return, order=c(p,1,q))
y4_models_detail <- model_summary(y4_models_detail, paste("c(",p,",1,",q,")",sep=""), cur_model)
}
}
}y4_models_detail| Model_Order | Box_Ljung_P_Value | AIC_Value | BIC_Value |
|---|---|---|---|
| c(0,1,1) | 0.0000024002 | -3167.718 | -3160.455 |
| c(0,1,2) | 0.0000645641 | -3171.441 | -3160.548 |
| c(0,1,3) | 0.0000783646 | -3169.927 | -3155.402 |
| c(0,1,4) | 0.0000962717 | -3169.074 | -3150.918 |
| c(1,1,0) | 0.0000000000 | -3086.493 | -3079.231 |
| c(1,1,1) | 0.0000809521 | -3171.878 | -3160.985 |
| c(1,1,2) | 0.0000996379 | -3171.641 | -3157.117 |
| c(1,1,3) | 0.0001841259 | -3173.950 | -3155.794 |
| c(1,1,4) | 0.0043451063 | -3179.723 | -3157.936 |
| c(2,1,0) | 0.0000000042 | -3112.840 | -3101.946 |
| c(2,1,1) | 0.0000833983 | -3170.062 | -3155.537 |
| c(2,1,2) | 0.0000729963 | -3170.344 | -3152.188 |
| c(2,1,3) | 0.0410809836 | -3189.728 | -3167.941 |
| c(2,1,4) | 0.0361307906 | -3187.808 | -3162.389 |
| c(3,1,0) | 0.0035289437 | -3164.130 | -3149.605 |
| c(3,1,1) | 0.0056519871 | -3162.404 | -3144.248 |
| c(3,1,2) | 0.0001019411 | -3169.528 | -3147.740 |
| c(3,1,3) | 0.0342542222 | -3187.839 | -3162.421 |
| c(3,1,4) | 0.0327545941 | -3185.995 | -3156.946 |
| c(4,1,0) | 0.0057443401 | -3162.382 | -3144.226 |
| c(4,1,1) | 0.0133993041 | -3187.632 | -3165.844 |
| c(4,1,2) | 0.0359111741 | -3188.836 | -3163.418 |
| c(4,1,3) | 0.2018730120 | -3191.288 | -3162.238 |
| c(4,1,4) | 0.1960197635 | -3190.004 | -3157.324 |
Find the model with the minimum AIC value
y4_aic_ind <- which.min(y4_models_detail$AIC_Value)
min_row <- y4_models_detail[y4_aic_ind, ]| Model_Order | Box_Ljung_P_Value | AIC_Value | BIC_Value | |
|---|---|---|---|---|
| 23 | c(4,1,3) | 0.2018730120 | -3191.288 | -3162.238 |
Find the model with the minimum BIC value
y4_bic_ind <- which.min(y4_models_detail$BIC_Value)
min_row <- y4_models_detail[y4_bic_ind, ]| Model_Order | Box_Ljung_P_Value | AIC_Value | BIC_Value | |
|---|---|---|---|---|
| 13 | c(2,1,3) | 0.0410809836 | -3189.728 | -3167.941 |
Analyzing ARIMA(2,1,3) for y4 log return series
Plotting ACF and PACF of y4 log return residuals
#plot acf and pacf for residuals
plot_acf_pacf(y4_residuals_213)ACF plot at certain lags like 4, 6 and 10 etc indicate that the residuals are correlated at those lags, suggesting the presence of autocorrelation.
PACF plot indicate that there is a direct correlation between the residuals at specific lags like 3, 4, and 5 etc, after removing the influence of the previous lags.
Performing ADF/BG test for y4 log returns residuals
y4_residuals_acf_test_213| Data | Training_Sample | ADF_Test | PP_Test | BG_Test |
|---|---|---|---|---|
| Model (2,1,3) residuals | 01/01/2012~14/05/2013 | -6.727 ( 0.010 ) | -315.415 ( 0.010 ) | 0.000 ( 0.995 ) |
All above test (ADF,PP and BG) suggest there is no significant autocorrelation in residuals.
Performing Ljung-Box test for autocorrelation in residuals
# Ljung-Box test for autocorrelation in residuals
Box.test(y4_residuals_213, type = "Ljung-Box", lag = 10)
Box-Ljung test
data: y4_residuals_213
X-squared = 18.936, df = 10, p-value = 0.04108
coeftest(y4_model_213)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -1.100783 0.082448 -13.3512 < 2.2e-16 ***
ar2 -0.584616 0.122054 -4.7898 0.0000016693550 ***
ma1 0.351149 0.055841 6.2884 0.0000000003208 ***
ma2 -0.447449 0.077203 -5.7958 0.0000000068013 ***
ma3 -0.845642 0.079798 -10.5972 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It shows all the parameters are significant and for 10 lags Ljung-Box p-value < .05 suggests there is significant autocorrelation in residuals.
Analyzing next suggested model ARIMA(4,1,3)
Plotting ACF and PACF of y4 log return residuals
#plot acf and pacf for residuals
plot_acf_pacf(y4_residuals_413)ACF looks much better but still few lag shows autocorrelation in residuals.
PACF looks almost same and high direct correlation in residuals.
Performing ADF,PP and BG test
y4_residuals_acf_test_413| Data | Training_Sample | ADF_Test | PP_Test | BG_Test |
|---|---|---|---|---|
| Log transformed data | 01/01/2012~14/05/2013 | -6.603 ( 0.010 ) | -286.156 ( 0.010 ) | 0.000 ( 0.999 ) |
All above test (ADF,PP and BG) suggest there is no significant autocorrelation in residuals.
Performing # Ljung-Box test for autocorrelation in residuals
# Ljung-Box test for autocorrelation in residuals
# as https://robjhyndman.com/hyndsight/ljung-box-test/ suitable value for lag is 10
Box.test(y4_residuals_413, type = "Ljung-Box", lag = 10)
Box-Ljung test
data: y4_residuals_413
X-squared = 13.406, df = 10, p-value = 0.2019
coeftest(y4_model_413)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -0.292483 0.218034 -1.3415 0.17977
ar2 -0.195212 0.127033 -1.5367 0.12436
ar3 0.055036 0.063143 0.8716 0.38343
ar4 0.302114 0.061494 4.9129 0.0000008974 ***
ma1 -0.515952 0.233695 -2.2078 0.02726 *
ma2 -0.166332 0.165961 -1.0022 0.31623
ma3 -0.317706 0.151713 -2.0941 0.03625 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It shows not all parameters are significant and for 10 lags Ljung-Box p-value >.05 suggests there is no significant autocorrelation in residuals.
As ar1, ar2 and ar3 are not significant and ma2 as well, we can remove them from model
Plotting ACF/PACF after adusting the model
y4_model_413_modified <- Arima(y4_log_return, order=c(4,1,3),
fixed = c(0, 0, 0, NA, NA,0, NA))
y4_residuals_413_modified = resid(y4_model_413_modified)#plot acf and pacf for residuals
plot_acf_pacf(y4_residuals_413_modified)Here we can see no improvement on both graphs
Performing # Ljung-Box test for autocorrelation in residuals for adjusted model
Box.test(y4_residuals_413_modified, type = "Ljung-Box", lag = 10)
Box-Ljung test
data: y4_residuals_413_modified
X-squared = 18.41, df = 10, p-value = 0.04842
coeftest(y4_model_413_modified)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar4 0.309674 0.058382 5.3042 0.0000001131 ***
ma1 -0.875057 0.045161 -19.3764 < 2.2e-16 ***
ma3 -0.124445 0.042774 -2.9094 0.003622 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Ljung-Box p value is 0.04991 < .05 reject the null hypothesis that residuals are non auto
# correlated, earlier model shows p value 0.2033 > .05 failed to reject the null hypothesis therefore
# residuals are white noise and model 413 is best accepted modelp-value < .05 suggest there is significant autocorrelation in residuals.
So out of discussed three models we confirmed to select ARIMA(4,1,3) (unmodified one)
4.3 Forecasting Results
We are forecasting 20 days ahead
y4_model_413.forecast <- forecast(y4_model_413, h = 20)Converting log returned forecast to predicted prices
last_observed_price <- tail(TS280$y4,1)
# Convert log returns to actual price forecasts
forecasted_prices <- numeric(length(y4_model_413.forecast$mean))
forecasted_prices[1] <- last_observed_price * exp(y4_model_413.forecast$mean[1])
for (i in 2:length(forecasted_prices)) {
forecasted_prices[i] <- forecasted_prices[i-1] * exp(y4_model_413.forecast$mean[i])
}
y4_arima_forecast <- xts(forecasted_prices, tail(index(TS2), 20))
names(y4_arima_forecast) <- c("y4_arima_forecast")
TS <- merge(TS,y4_arima_forecast)plot(TS[index(TS) > as.Date("2022-02-28"), c("y4", "y4_arima_forecast")],
main = "20 days forecast of y4 prices via ARIMA",
col = c("black", "blue"))4.4 Forecasting Errors
By usint the utility function from util.R, we are calculating forecasting errors
errors_matrix <- calculate_errors(errors_matrix, TS$y4, TS$y4_arima_forecast,"y4_arima")
errors_matrix["y4_arima",]| MAE | MSE | MAPE | AMAPE | |
|---|---|---|---|---|
| y4_arima | 0.7744276 | 0.7865475 | 0.0044117 | 0.0022082 |
5 ARIMA Modeling for y8
We will apply Box-Jenkins procedure following below steps
5.1 step 1: Identification
Plotting actual prices y8
tibble(df = TS280$y8) %>%
ggplot(aes(zoo::index(TS280), df)) +
geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "Actual y8 Price",
subtitle = paste0("Number of observations: ", length(TS280$y8)),
caption = "source: TSA 2024",
x="",
y=""
)Plotting log transformed y8 prices
y8_log <- log(TS280$y8)
y8_log_return <- periodReturn(y8_log, period="daily", type="log")
## Plotting Log Transformed y8 Price
tibble(df2 = y8_log) %>%
ggplot(aes(zoo::index(TS280), df2)) +
geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "Log Transformed y8 Price",
subtitle = paste0("Number of observations: ", length(TS280$y8)),
caption = "source: TSA 2024",
x="",
y=""
)Plotting return of log transferred y8 prices
tibble(df3 = y8_log_return) %>%
ggplot(aes(zoo::index(TS280), df3)) +
geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "Log Transformed y8 return",
subtitle = paste0("Number of observations: ", length(TS280$y8)),
caption = "source: TSA 2024",
x="",
y=""
)| Data | Training_Sample | ADF_Test | PP_Test | BG_Test |
|---|---|---|---|---|
| y8 | 01/06/2021~07/03/2022 | -1.393 ( 0.832 ) | -5.054 ( 0.827 ) | 13.472 ( 0.000 ) |
| y8 log | 01/06/2021~07/03/2022 | -1.320 ( 0.862 ) | -4.434 ( 0.862 ) | 12.676 ( 0.000 ) |
| y8 log return | 01/06/2021~07/03/2033 | -5.666 ( 0.010 ) | -248.913 ( 0.010 ) | 0.014 ( 0.905 ) |
Result shows y8 log return is stationary
Plotting ACF and PACF for y8 log return
plot_acf_pacf(y8_log_return)ACF and PACF suggest initially 4 lags MA and 0 lag AR process.
5.2 step 2 & 3: Estimation and Diagnostics
We will take all the possible combination of ARIMA having p from 0 to 4 and q also from 0 to 4. It will help to see possible models with respective AIC and BIC.
y8_models_detail <- data.frame()
for (p in 0:4) {
for (q in 0:4) {
if(! (p==0 & q==0)){
cur_model <- Arima(y8_log_return, order=c(p,1,q))
y8_models_detail <- model_summary(y8_models_detail, paste("c(",p,",1,",q,")",sep=""), cur_model)
}
}
}y8_models_detail| Model_Order | Box_Ljung_P_Value | AIC_Value | BIC_Value |
|---|---|---|---|
| c(0,1,1) | 0.0000000092 | -2706.433 | -2699.170 |
| c(0,1,2) | 0.0000010745 | -2713.190 | -2702.296 |
| c(0,1,3) | 0.0000011382 | -2711.869 | -2697.344 |
| c(0,1,4) | 0.0000005582 | -2713.028 | -2694.871 |
| c(1,1,0) | 0.0000000000 | -2633.347 | -2626.084 |
| c(1,1,1) | 0.0000013150 | -2713.903 | -2703.009 |
| c(1,1,2) | 0.0000021914 | -2714.322 | -2699.797 |
| c(1,1,3) | 0.0000003253 | -2718.892 | -2700.736 |
| c(1,1,4) | 0.0003613828 | -2731.881 | -2710.094 |
| c(2,1,0) | 0.0000000005 | -2661.344 | -2650.450 |
| c(2,1,1) | 0.0000012741 | -2712.055 | -2697.530 |
| c(2,1,2) | 0.0000004917 | -2712.182 | -2694.026 |
| c(2,1,3) | 0.0098301576 | -2744.184 | -2722.397 |
| c(2,1,4) | 0.0077626699 | -2742.485 | -2717.067 |
| c(3,1,0) | 0.0000233862 | -2714.681 | -2700.156 |
| c(3,1,1) | 0.0000538262 | -2714.220 | -2696.064 |
| c(3,1,2) | 0.0000004744 | -2711.300 | -2689.512 |
| c(3,1,3) | 0.0062143801 | -2742.702 | -2717.283 |
| c(3,1,4) | 0.0057056707 | -2741.207 | -2712.157 |
| c(4,1,0) | 0.0000773089 | -2714.258 | -2696.102 |
| c(4,1,1) | 0.0000683140 | -2712.308 | -2690.521 |
| c(4,1,2) | 0.0004914174 | -2737.300 | -2711.882 |
| c(4,1,3) | 0.0768105638 | -2748.239 | -2719.190 |
| c(4,1,4) | 0.1467492582 | -2750.963 | -2718.283 |
Find the model with the minimum AIC value
y8_aic_ind <- which.min(y8_models_detail$AIC_Value)
# Extract the row with the minimum value
min_row <- y8_models_detail[y8_aic_ind, ]
print(min_row) Model_Order Box_Ljung_P_Value AIC_Value BIC_Value
24 c(4,1,4) 0.1467492582 -2750.963 -2718.283
| Model_Order | Box_Ljung_P_Value | AIC_Value | BIC_Value | |
|---|---|---|---|---|
| 24 | c(4,1,4) | 0.1467492582 | -2750.963 | -2718.283 |
Find the model with the minimum BIC value
y8_bic_ind <- which.min(y8_models_detail$BIC_Value)
# Extract the row with the minimum value
min_row <- y8_models_detail[y8_bic_ind, ]| Model_Order | Box_Ljung_P_Value | AIC_Value | BIC_Value | |
|---|---|---|---|---|
| 13 | c(2,1,3) | 0.0098301576 | -2744.184 | -2722.397 |
Analyzing ARIMA(2,1,3) for y8 log return series
Plotting ACF and PACF of y8 log return residuals
#plot acf and pacf for residuals
plot_acf_pacf(y8_residuals_213)ACF plot at certain lags like 4, 6 and 10 etc indicate that the residuals are correlated at those lags, suggesting the presence of autocorrelation.
PACF plot indicate that there is a direct correlation between the residuals at specific lags like 3, 4, and 5 etc, after removing the influence of the previous lags.
Performing ADF/BG test for y8 log returned residuals
y8_residuals_acf_test_213 <- data.frame()
adf.y8_residuals_213 <- adf.test(y8_residuals_213)
pp.y8_residuals_213 <- pp.test(y8_residuals_213)
bg.y8_residuals_213 <- bgtest(adfTest(y8_residuals_213, lags = 0, type = "c")@test$lm$residuals~1, order = 1)
y8_residuals_acf_test_213 <- stationary_test_data(y8_residuals_acf_test_213, "Model (2,1,3) residuals", "01/01/2012~14/05/2013", adf.y8_residuals_213, pp.y8_residuals_213, bg.y8_residuals_213)y8_residuals_acf_test_213| Data | Training_Sample | ADF_Test | PP_Test | BG_Test |
|---|---|---|---|---|
| Model (2,1,3) residuals | 01/01/2012~14/05/2013 | -6.445 ( 0.010 ) | -320.998 ( 0.010 ) | 0.000 ( 0.994 ) |
All above test (ADF,PP and BG) suggest there is no significant autocorrelation in residuals.
Performing Ljung-Box test for autocorrelation in residuals
# Ljung-Box test for autocorrelation in residuals
Box.test(y8_residuals_213, type = "Ljung-Box", lag = 10)
Box-Ljung test
data: y8_residuals_213
X-squared = 23.259, df = 10, p-value = 0.00983
coeftest(y8_model_213)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -1.010503 0.091022 -11.1018 < 2.2e-16 ***
ar2 -0.496163 0.075159 -6.6015 0.0000000000407 ***
ma1 0.345178 0.062780 5.4982 0.0000000383680 ***
ma2 -0.423352 0.049341 -8.5802 < 2.2e-16 ***
ma3 -0.838186 0.043293 -19.3608 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It shows all the parameters are significant and for 10 lags Ljung-Box p-value < .05 suggests there is significant autocorrelation in residuals.
Analyzing next suggested model ARIMA(4,1,4) Plotting ACF and PACF
Plotting ACF and PACF of y8 log return residuals
#plot acf and pacf for residuals
plot_acf_pacf(y8_residuals_414)ACF looks much better but still few lag shows autocorrelation in residuals.
PACF looks almost same and high direct correlation in residuals.
Performing ADF,PP and BG test
y8_residuals_acf_test_414| Data | Training_Sample | ADF_Test | PP_Test | BG_Test |
|---|---|---|---|---|
| Log transformed data | 01/01/2012~14/05/2013 | -6.740 ( 0.010 ) | -295.548 ( 0.010 ) | 0.000 ( 0.999 ) |
All above test (ADF,PP and BG) suggest there is no significant autocorrelation in residuals.
Performing Ljung-Box test for autocorrelation in residuals
# Ljung-Box test for autocorrelation in residuals
# as https://robjhyndman.com/hyndsight/ljung-box-test/ suitable value for lag is 10
Box.test(y8_residuals_414, type = "Ljung-Box", lag = 10)
Box-Ljung test
data: y8_residuals_414
X-squared = 14.615, df = 10, p-value = 0.1467
coeftest(y8_model_414)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -0.439789 0.141437 -3.1094 0.0018744 **
ar2 -0.067526 0.139424 -0.4843 0.6281552
ar3 0.306262 0.093868 3.2627 0.0011036 **
ar4 0.259716 0.067569 3.8437 0.0001212 ***
ma1 -0.267176 0.141293 -1.8909 0.0586318 .
ma2 -0.433310 0.098828 -4.3845 0.0000116255458050 ***
ma3 -0.605809 0.083460 -7.2587 0.0000000000003908 ***
ma4 0.306296 0.122201 2.5065 0.0121934 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It shows not all parameters are significant and for 10 lags Ljung-Box p-value >.05 suggests there is no significant autocorrelation in residuals.
As ar1 and ma1 as well, we can remove them from model
Plotting ACF/PACF after adusting the model
y8_model_414_modified <- Arima(y8_log_return, order=c(4,1,4),
fixed = c(NA, 0, NA, NA, 0, NA, NA, NA))
y8_residuals_414_modified = resid(y8_model_414_modified)#plot acf and pacf for residuals
plot_acf_pacf(y8_residuals_414_modified)Here we can see no improvement on both graphs
Performing Ljung-Box test for autocorrelation in residuals for adjusted model
Box.test(y8_residuals_414_modified, type = "Ljung-Box", lag = 10)
Box-Ljung test
data: y8_residuals_414_modified
X-squared = 19.602, df = 10, p-value = 0.03325
coeftest(y8_model_414_modified)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 -0.641702 0.057928 -11.0776 < 2.2e-16 ***
ar3 0.339821 0.072116 4.7121 0.000002451 ***
ar4 0.173336 0.071294 2.4313 0.01504 *
ma2 -0.563643 0.053600 -10.5157 < 2.2e-16 ***
ma3 -0.691093 0.042166 -16.3900 < 2.2e-16 ***
ma4 0.266736 0.059032 4.5185 0.000006229 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Ljung-Box p value is 0.04991 < .05 reject the null hypothesis that residuals are non auto
# correlated, earlier model shows p value 0.2033 > .05 failed to reject the null hypothesis therefore
# residuals are white noise and model 413 is best accepted modelp-value < .05 suggest there is significant autocorrelation in residuals.
So out of discussed three models we confirmed to select (4,1,4) (unmodified one)
5.3 Forecasting Results
We are forecasting for 20 days ahead
y8_model_414.forecast <- forecast(y8_model_414, h = 20)Converting log returned forecast to predicted prices
last_observed_price <- tail(TS280$y8,1)
# Convert log returns to actual price forecasts
forecasted_prices <- numeric(length(y8_model_414.forecast$mean))
forecasted_prices[1] <- last_observed_price *
exp(y8_model_414.forecast$mean[1])
for (i in 2:length(forecasted_prices)) {
forecasted_prices[i] <- forecasted_prices[i-1] * exp(y8_model_414.forecast$mean[i])
}
y8_arima_forecast <- xts(forecasted_prices, tail(index(TS2), 20))
names(y8_arima_forecast) <- c("y8_arima_forecast")
TS <- merge(TS,y8_arima_forecast)plot(TS[index(TS) > as.Date("2022-02-28"), c("y8", "y8_arima_forecast")],
main = "20 days forecast of y8 prices via ARIMA",
col = c("black", "blue"))5.4 Forecasting Erros
calculating ex-post forecast errors for both series on the out-of-sample period.
| MAE | MSE | MAPE | AMAPE | |
|---|---|---|---|---|
| y8_arima | 1.505458 | 2.798017 | 0.0102116 | 0.0051174 |
6 VECM Vs ARIMAs forecasts
Data frame used to capture all respective forecast errors, shows error matrix.
| MAE | MSE | MAPE | AMAPE | |
|---|---|---|---|---|
| y4_vecm | 0.9862111 | 1.3990977 | 0.0056067 | 0.0028136 |
| y8_vecm | 1.9383977 | 5.2300808 | 0.0130852 | 0.0065978 |
| y4_arima | 0.7744276 | 0.7865475 | 0.0044117 | 0.0022082 |
| y8_arima | 1.5054579 | 2.7980167 | 0.0102116 | 0.0051174 |
In multi-column graph it looks like
ggplot(df_long, aes(x = Value, y = Index, fill = Variable)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Multi-Column Plot with Models as Columns", x = "Errors", y = "Models") +
theme_minimal()MAE : Forecast of y4 using ARIMA shows the smaller absolute errors on average. This looks very attractive, but can be explained as visible from the out-of-sample forecast, here forecast grouped into two parts, almost 50% below actual values and remaining above actual values cancel out each other. In reality out-of-sample forecasting y4 using VECM plot for each data points are more closer to actual values and hence shows promising result. Sample explanation apply to comparing smaller value of y8 ARIMA 1.5054579 vs y8 VECM 1.9383977.
MSE : Lower the squared error implies more weight to larger error, MAE explained above is justified here. We can see y4 via ARIMA (0.7865475) is lower compare to y4 via VECM (1.3990977) and signified the fact more weight is given to larger errors. Same apply to smaller value of y8 via ARIMA (2.7980167) compare to y8 via VECM (5.2300808).
MAPE : Lower values signify smaller % error on average. Ideally it should be smaller for y4 (VECM) compare to y4 (ARIMA) but in our case it’s not ( y4 (ARIMA) 0.0044117 < y 4 (VECM) 0.0056067). It could be most possible as we are forecasting ahead 20 data points which is lager than sum of parameters p+q ( 4+3 = 7) of ARIMA model. Best possible forecasting should be ahead < 7 data points and model should be recalculated after extending in sample data set adding past forecated data points. Same explanation goes with y8 (ARIMA) vs y8 (VECM).
AMAPE : Lower the value implies adjusted well for bias in MAPE, particularly when actual and forecast values are far apart. In our case y4 (ARIMA) 0.0022082 compare to y4 (VECM) 0.0028136 exactly try to do the same and same apply to y8 (ARIMA) 0.0051174 < y8 (VECM) 0.0065978