library(tseries)
library(forecast)
library(gridExtra)
data <- read.csv("data.csv")
attach(data)
head(data,5)
y1 y2 y3 y4 y5
1 0.9209901 2.2157946 0.6819222 -0.0510231 1.181397
2 -0.7553562 1.4160652 1.6202822 -0.2182747 2.080426
3 -1.8629512 -0.1971389 0.8598497 0.2574356 1.440632
4 -1.2798546 -0.1321779 2.5016393 -0.4767962 2.452831
5 -1.8911987 -1.5814481 2.6356750 0.9732679 3.845812
str(data)
'data.frame': 404 obs. of 5 variables:
$ y1: num 0.921 -0.755 -1.863 -1.28 -1.891 ...
$ y2: num 2.216 1.416 -0.197 -0.132 -1.581 ...
$ y3: num 0.682 1.62 0.86 2.502 2.636 ...
$ y4: num -0.051 -0.218 0.257 -0.477 0.973 ...
$ y5: num 1.18 2.08 1.44 2.45 3.85 ...
Split the data in an in-sample period, and an out of sample period there are 404 observations, save the last four to make predictions on Given that you know them, you can calculate the forecast error measures
y_simdata_insample <- data[1:400,1:5]
y1 <- y_simdata_insample[1:400,1]
y2 <- y_simdata_insample[1:400,2]
y3 <- y_simdata_insample[1:400,3]
y4 <- y_simdata_insample[1:400,4]
y5 <- y_simdata_insample[1:400,5]
We can now fit the model using the data y1, y2, y3, y4 and y5
y1_True_401_404 <- data[401:404,1]
y2_True_401_404 <- data[401:404,2]
y3_True_401_404 <- data[401:404,3]
y4_True_401_404 <- data[401:404,4]
y5_True_401_404 <- data[401:404,5]
It is convenient to set here what data to use. This way, you do not have to change between y1 and y2 etc in every place in the code, when you analyze a different dataset
y_data <- y1
layout(matrix(c(1, 1, 1, 1,
2, 2, 3, 3), nrow=2, byrow=TRUE))
ts.plot(y_data); acf(y_data); pacf(y_data)
par(mfrow = c(1,1))
Ljung Box test for autocorrelation/serial dependency This test is done on original data, no parameters are estimated, thus: fitdf = 0
LBQ_obj <- Box.test(y_data, lag = 1, type = c( "Ljung-Box"), fitdf = 0)
LBQ_obj
Box-Ljung test
data: y_data
X-squared = 141.99, df = 1, p-value < 2.2e-16
The Box-Ljung test is a statistical test used to assess whether a given set of data exhibits autocorrelation. Autocorrelation refers to the relationship between values in a time series dataset, where the value of a data point is dependent on the values of previous data points. In this case, the test was performed on the dataset labeled as “y_data.” The test statistic, denoted as X-squared, has a value of 141.99. The degrees of freedom (df) for this test are 1, indicating that only one lag was considered in the autocorrelation analysis.
The reported p-value, which is 2.2e-16 (a very small value), far much less than signgificant alpha or 0.05 suggests strong evidence against the null hypothesis of no autocorrelation. In other words, the data in “y_data” is likely to exhibit significant autocorrelation. It’s important to note that the Box-Ljung test is commonly used in time series analysis to evaluate the presence of autocorrelation, but it does not provide information about the specific nature or pattern of autocorrelation.
adf_obj <- adf.test(y_data)
adf_obj
Augmented Dickey-Fuller Test
data: y_data
Dickey-Fuller = -4.8936, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary
The reported p-value from the results above is 0.01, which is less than the conventional significance level of 0.05. This indicates that there is sufficient evidence to reject the null hypothesis of non-stationarity in favor of the alternative hypothesis of stationarity. Therefore, based on the ADF test, the series is likely to be stationary.
m1 <- arima(y_data, order = c(0, 0, 0 ),seasonal = list(order = c(0, 0, 0), period = NA),
include.mean = FALSE)
summary(m1)
Call:
arima(x = y_data, order = c(0, 0, 0), seasonal = list(order = c(0, 0, 0), period = NA),
include.mean = FALSE)
sigma^2 estimated as 1.69: log likelihood = -672.47, aic = 1346.93
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.1680615 1.299822 1.051932 100 100 1.13203 0.5935726
Resids_m1 = m1$residuals
layout(matrix(c(1, 1, 1, 1,
2, 2, 3, 3), nrow=2, byrow=TRUE))
ts.plot(Resids_m1); acf(Resids_m1); pacf(Resids_m1)
par(mfrow = c(1,1))
summary(m1$residuals)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.8946 -0.7719 0.1301 0.1681 1.0246 3.8327
The prediction are set to four. In other words, we are getting predictionf for y_401, y_402, y_403 and y_404
y_pred <- predict(m1, n.ahead = 4)
y_pred
$pred
Time Series:
Start = 401
End = 404
Frequency = 1
[1] 0 0 0 0
$se
Time Series:
Start = 401
End = 404
Frequency = 1
[1] 1.299822 1.299822 1.299822 1.299822
y_pred$pred
Time Series:
Start = 401
End = 404
Frequency = 1
[1] 0 0 0 0
ts.plot(y_pred$pred)
data_1 <- read.csv("C:\\Users\\user\\Downloads\\Crude oil data.csv")
attach(data_1)
head(data_1,5)
Date Price
1 21/08/2019 55.68
2 20/08/2019 56.34
3 19/08/2019 56.21
4 16/08/2019 54.87
5 15/08/2019 54.47
str(data)
'data.frame': 404 obs. of 5 variables:
$ y1: num 0.921 -0.755 -1.863 -1.28 -1.891 ...
$ y2: num 2.216 1.416 -0.197 -0.132 -1.581 ...
$ y3: num 0.682 1.62 0.86 2.502 2.636 ...
$ y4: num -0.051 -0.218 0.257 -0.477 0.973 ...
$ y5: num 1.18 2.08 1.44 2.45 3.85 ...
x_simdata_insample <- data_1[1:4900,1:2]
x1 <- x_simdata_insample[1:4900,1]
x2 <- x_simdata_insample[1:4900,2]
We can now fit the model using the data x1, x2
x1_True_4901_5000 <- data_1[4901:5000,1]
x2_True_4901_5000 <- data_1[4901:5000,2]
It is convenient to set here what data to use. This way, you do not have to change between y1 and y2 etc in every place in the code, when you analyze a different dataset
x_data <- x2
head(x_data,5)
[1] 55.68 56.34 56.21 54.87 54.47
layout(matrix(c(1, 1, 1, 1,
2, 2, 3, 3), nrow=2, byrow=TRUE))
ts.plot(x_data); acf(x_data); pacf(x_data)
par(mfrow = c(1,1))
Ljung Box test for autocorrelation/serial dependency This test is done on original data, no parameters are estimated, thus: fitdf = 0
LBQ_obj <- Box.test(x_data, lag = 1, type = c( "Ljung-Box"), fitdf = 0)
LBQ_obj
Box-Ljung test
data: x_data
X-squared = 4887.2, df = 1, p-value < 2.2e-16
The reported p-value, which is 2.2e-16 (a very small value), far much less than signgificant alpha or 0.05 suggests strong evidence against the null hypothesis of no autocorrelation. In other words, the data in “x_data” is likely to exhibit significant autocorrelation. It’s important to note that the Box-Ljung test is commonly used in time series analysis to evaluate the presence of autocorrelation, but it does not provide information about the specific nature or pattern of autocorrelation.
adf_obj <- adf.test(x_data)
adf_obj
Augmented Dickey-Fuller Test
data: x_data
Dickey-Fuller = -2.1417, Lag order = 16, p-value = 0.5183
alternative hypothesis: stationary
The reported p-value from the results above is 0.5183, which is more than the conventional significance level of 0.05. This indicates that there is insufficient evidence to reject the null hypothesis of non-stationarity in favor of the alternative hypothesis of stationarity. Therefore, based on the ADF test, the series is likely not to be stationary.
adf_obj <- adf.test(diff(x_data))
adf_obj
Augmented Dickey-Fuller Test
data: diff(x_data)
Dickey-Fuller = -16.107, Lag order = 16, p-value = 0.01
alternative hypothesis: stationary
After the first difference, the series was found to be stationary
m2 <- arima(x_data, order = c(0, 1, 0 ),seasonal = list(order = c(0, 1, 0), period = NA),
include.mean = FALSE)
summary(m2)
Call:
arima(x = x_data, order = c(0, 1, 0), seasonal = list(order = c(0, 1, 0), period = NA),
include.mean = FALSE)
sigma^2 estimated as 4.149: log likelihood = -10434.7, aic = 20871.39
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.0002159848 2.036549 1.434585 -0.002821044 2.468067 1.45523
ACF1
Training set -0.5107874
Resids_m2 = m2$residuals
layout(matrix(c(1, 1, 1, 1,
2, 2, 3, 3), nrow=2, byrow=TRUE))
ts.plot(Resids_m2); acf(Resids_m2); pacf(Resids_m2)
par(mfrow = c(1,1))
summary(m2$residuals)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-30.680000 -1.060000 -0.030000 -0.000216 1.040000 14.790000
The prediction are set to four. In other words, we are getting predictionf for y_4900, y_5000
x_pred <- predict(m2, n.ahead = 100)
x_pred
$pred
Time Series:
Start = 4901
End = 5000
Frequency = 1
[1] 29.65 29.30 28.95 28.60 28.25 27.90 27.55 27.20 26.85 26.50 26.15 25.80
[13] 25.45 25.10 24.75 24.40 24.05 23.70 23.35 23.00 22.65 22.30 21.95 21.60
[25] 21.25 20.90 20.55 20.20 19.85 19.50 19.15 18.80 18.45 18.10 17.75 17.40
[37] 17.05 16.70 16.35 16.00 15.65 15.30 14.95 14.60 14.25 13.90 13.55 13.20
[49] 12.85 12.50 12.15 11.80 11.45 11.10 10.75 10.40 10.05 9.70 9.35 9.00
[61] 8.65 8.30 7.95 7.60 7.25 6.90 6.55 6.20 5.85 5.50 5.15 4.80
[73] 4.45 4.10 3.75 3.40 3.05 2.70 2.35 2.00 1.65 1.30 0.95 0.60
[85] 0.25 -0.10 -0.45 -0.80 -1.15 -1.50 -1.85 -2.20 -2.55 -2.90 -3.25 -3.60
[97] -3.95 -4.30 -4.65 -5.00
$se
Time Series:
Start = 4901
End = 5000
Frequency = 1
[1] 2.036965 4.554791 7.621624 11.156915 15.106534 19.431404
[7] 24.101691 29.093674 34.387921 39.968132 45.820386 51.932612
[13] 58.294213 64.895788 71.728927 78.786043 86.060249 93.545258
[19] 101.235298 109.125045 117.209572 125.484296 133.944946 142.587525
[25] 151.408282 160.403690 169.570424 178.905341 188.405465 198.067972
[31] 207.890180 217.869532 228.003595 238.290040 248.726645 259.311279
[37] 270.041901 280.916552 291.933349 303.090481 314.386207 325.818845
[43] 337.386776 349.088436 360.922314 372.886949 384.980927 397.202881
[49] 409.551484 422.025450 434.623531 447.344519 460.187235 473.150537
[55] 486.233315 499.434486 512.752999 526.187828 539.737976 553.402468
[61] 567.180358 581.070717 595.072645 609.185257 623.407694 637.739113
[67] 652.178693 666.725628 681.379133 696.138438 711.002791 725.971453
[73] 741.043703 756.218835 771.496154 786.874983 802.354656 817.934520
[79] 833.613934 849.392271 865.268915 881.243260 897.314713 913.482690
[85] 929.746619 946.105936 962.560090 979.108536 995.750739 1012.486176
[91] 1029.314329 1046.234691 1063.246761 1080.350048 1097.544068 1114.828345
[97] 1132.202409 1149.665800 1167.218062 1184.858749
x_pred$pred
Time Series:
Start = 4901
End = 5000
Frequency = 1
[1] 29.65 29.30 28.95 28.60 28.25 27.90 27.55 27.20 26.85 26.50 26.15 25.80
[13] 25.45 25.10 24.75 24.40 24.05 23.70 23.35 23.00 22.65 22.30 21.95 21.60
[25] 21.25 20.90 20.55 20.20 19.85 19.50 19.15 18.80 18.45 18.10 17.75 17.40
[37] 17.05 16.70 16.35 16.00 15.65 15.30 14.95 14.60 14.25 13.90 13.55 13.20
[49] 12.85 12.50 12.15 11.80 11.45 11.10 10.75 10.40 10.05 9.70 9.35 9.00
[61] 8.65 8.30 7.95 7.60 7.25 6.90 6.55 6.20 5.85 5.50 5.15 4.80
[73] 4.45 4.10 3.75 3.40 3.05 2.70 2.35 2.00 1.65 1.30 0.95 0.60
[85] 0.25 -0.10 -0.45 -0.80 -1.15 -1.50 -1.85 -2.20 -2.55 -2.90 -3.25 -3.60
[97] -3.95 -4.30 -4.65 -5.00
ts.plot(x_pred$pred)