The objective of this study is to:
check the plot of original weekly “Brent Crude Oil Price” data and also the log transformed data.
initially identify the model based on ACF and PACF plot of the log transformed 1st differenced dataset of “Brent Crude Oil Price”.
estimate ARIMA(p,d,q) model and diagnose the adequacy based on AIC, BIC and Q-statistic.
The data was collected from Quandl where the main data source is Federal Reserve Economic Data website. For this analysis, weekly “Brent Crude Oil Price” data had been considered and the weekly data is from May 1987 to January 2016. For convenience, I assume cop = Crude Oil Price.
cop<- read.csv(url("http://research.stlouisfed.org/fred2/data/WCOILBRENTEU.csv"))
str(cop)
## 'data.frame': 1500 obs. of 2 variables:
## $ DATE : Factor w/ 1500 levels "1987-05-15","1987-05-22",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ VALUE: num 18.6 18.5 18.6 18.7 18.8 ...
The following table suggests that the dataset is weekly:
head(cop)
## DATE VALUE
## 1 1987-05-15 18.58
## 2 1987-05-22 18.54
## 3 1987-05-29 18.60
## 4 1987-06-05 18.70
## 5 1987-06-12 18.75
## 6 1987-06-19 19.01
Now, if we plot the quarterly data against time, the following plot shows an upward trend over time.
plot(cop, xlab="Weekly Time period (May 1987-January 2016)", ylab="Dollars per Barrel Not Seasonally Adjusted", main= "Weekly COP (Crude Oil Price)")
According to the instruction in the problem, I took log of the weekly data (defined as “lcop”“) and then took a difference of the log transformed”cop“” which had been defined as “dlcop”.
lcop<- log(cop[,2])
dlcop<- diff(lcop)
plot(dlcop, xlab="Weekly Time Period (May 1987-January 2016)", ylab="Log difference of COP", main= "Log difference of Weekly COP")
From the plot of the differenced dataset, we can see that the trend had been removed. Furthermore, there is no cyclical or seasonal effect is visible. Therefore, from the plot, we can assume that the data is stationary.
\(Note\) : Usually, a dataset is differenced to make the dataset stationary (as without making the dataset stationary, running an AR(p) or MA(q) model will not give any proper meaning). Therefore, after differencing, whether the data is stationary or not, it is important to do the Stationarity test. I conducted the Augmented Dickey-Fuller test which suggests that the differenced log value of Crude Oil Price (dlcop) is stationary (\(p-value = 0.01 ; lag = 11\))
The specific order of AR(p) and MA(q) are tried to be found in this stage. The auto correlation function (ACF) and partial autocorrelation function (PACF) are constructed for the differenced dataset.
acf(dlcop, lag=20, xlab="Weekly Lag", ylab="ACF", main="Sample ACF")
pacf(dlcop, lag=20, xlab="Weekly Lag", ylab="PACF", main="Sample PACF")
Based on the PACF, we can suggest that spike on lag 1 and lag 3 are significant as they are outside of the bound.Hence, we can have an assumption that AR(1) or AR(3) might be a fit. In sample PACF, the spikes are showing decay with an oscialiation process.
The sample ACF shows the spike on lag 1 and lag 3 are significant. Therefore, we can assume that MA(1) or MA(3) can be a better fit for the dataset. In sample ACF, the spikes are showing decay with an oscialiation process.
Based on the observation of sample PACF, AR(1), AR(2), AR(3) had been checked.
arima(dlcop, order=c(1,0,0))
##
## Call:
## arima(x = dlcop, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.1854 0.0004
## s.e. 0.0254 0.0013
##
## sigma^2 estimated as 0.001727: log likelihood = 2641.04, aic = -5276.09
ar1<-arima(dlcop, order=c(1,0,0))
BIC(ar1)
## [1] -5260.152
ar1$coef/sqrt(diag(ar1$var.coef))
## ar1 intercept
## 7.3080692 0.2785137
(1-pnorm(abs(ar1$coef)/sqrt(diag(ar1$var.coef))))*2
## ar1 intercept
## 2.711165e-13 7.806181e-01
tsdiag(ar1, gof.lag=20)
arima(dlcop, order=c(2,0,0))
##
## Call:
## arima(x = dlcop, order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 0.1862 -0.0043 0.0004
## s.e. 0.0258 0.0259 0.0013
##
## sigma^2 estimated as 0.001726: log likelihood = 2641.06, aic = -5274.12
ar2<-arima(dlcop, order=c(2,0,0))
BIC(ar2)
## [1] -5252.867
ar2$coef/sqrt(diag(ar2$var.coef))
## ar1 ar2 intercept
## 7.2118835 -0.1657320 0.2795364
(1-pnorm(abs(ar2$coef)/sqrt(diag(ar2$var.coef))))*2
## ar1 ar2 intercept
## 5.517808e-13 8.683679e-01 7.798332e-01
tsdiag(ar2, gof.lag=20)
arima(dlcop, order=c(3,0,0))
##
## Call:
## arima(x = dlcop, order = c(3, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 intercept
## 0.1866 -0.0215 0.0907 0.0004
## s.e. 0.0257 0.0263 0.0258 0.0014
##
## sigma^2 estimated as 0.001712: log likelihood = 2647.21, aic = -5284.41
ar3<-arima(dlcop, order=c(3,0,0))
BIC(ar3)
## [1] -5257.849
ar3$coef/sqrt(diag(ar3$var.coef))
## ar1 ar2 ar3 intercept
## 7.2576925 -0.8171781 3.5140275 0.2555733
(1-pnorm(abs(ar3$coef)/sqrt(diag(ar3$var.coef))))*2
## ar1 ar2 ar3 intercept
## 3.936851e-13 4.138266e-01 4.413672e-04 7.982803e-01
tsdiag(ar3, gof.lag=20)
arima(dlcop, order=c(3,0,0), fixed=c(NA,0,NA,NA))
## Warning in arima(dlcop, order = c(3, 0, 0), fixed = c(NA, 0, NA, NA)): some
## AR parameters were fixed: setting transform.pars = FALSE
##
## Call:
## arima(x = dlcop, order = c(3, 0, 0), fixed = c(NA, 0, NA, NA))
##
## Coefficients:
## ar1 ar2 ar3 intercept
## 0.1827 0 0.0867 0.0004
## s.e. 0.0253 0 0.0254 0.0015
##
## sigma^2 estimated as 0.001713: log likelihood = 2646.87, aic = -5285.74
ar3R<-arima(dlcop, order=c(3,0,0), fixed=c(NA,0,NA,NA))
## Warning in arima(dlcop, order = c(3, 0, 0), fixed = c(NA, 0, NA, NA)): some
## AR parameters were fixed: setting transform.pars = FALSE
BIC(ar3R)
## [1] -5264.493
ar3R$coef/sqrt(diag(ar3R$var.coef))
## Warning in ar3R$coef/sqrt(diag(ar3R$var.coef)): longer object length is not
## a multiple of shorter object length
## ar1 ar2 ar3 intercept
## 7.22687818 0.00000000 59.29053529 0.01451317
(1-pnorm(abs(ar3R$coef)/sqrt(diag(ar3R$var.coef))))*2
## Warning in abs(ar3R$coef)/sqrt(diag(ar3R$var.coef)): longer object length
## is not a multiple of shorter object length
## ar1 ar2 ar3 intercept
## 4.942713e-13 1.000000e+00 0.000000e+00 9.884206e-01
tsdiag(ar3R, gof.lag=20)
Analyzing the Standarized Residual plot, ACF of REsiduals plot, and p-values for Ljung-Box Statistic plot. In case of AR(3) and Restricted AR(3), ACF residual is closer to zero. Moreover, p-value for Ljung -Box stastistic is higher than 0.6. This, along with these measures and AIC and BIC criterion, AR(3) is the best model among all the AR(p) models.
Based on the observation of sample ACF, MA(1) and MA(3) had been checked.
arima(dlcop, order=c(0,0,1))
##
## Call:
## arima(x = dlcop, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 0.1906 0.0004
## s.e. 0.0260 0.0013
##
## sigma^2 estimated as 0.001726: log likelihood = 2641.29, aic = -5276.59
ma1<-arima(dlcop, order=c(0,0,1))
BIC(ma1)
## [1] -5260.649
ma1$coef/sqrt(diag(ma1$var.coef))
## ma1 intercept
## 7.3347489 0.2865553
(1-pnorm(abs(ma1$coef)/sqrt(diag(ma1$var.coef))))*2
## ma1 intercept
## 2.220446e-13 7.744529e-01
tsdiag(ma1, gof.lag=20)
arima(dlcop, order=c(0,0,3))
##
## Call:
## arima(x = dlcop, order = c(0, 0, 3))
##
## Coefficients:
## ma1 ma2 ma3 intercept
## 0.1908 0.0121 0.1017 0.0004
## s.e. 0.0257 0.0263 0.0262 0.0014
##
## sigma^2 estimated as 0.001709: log likelihood = 2648.72, aic = -5287.44
ma3<-arima(dlcop, order=c(0,0,3))
BIC(ma3)
## [1] -5260.881
ma3$coef/sqrt(diag(ma3$var.coef))
## ma1 ma2 ma3 intercept
## 7.4266525 0.4612297 3.8861315 0.2644201
(1-pnorm(abs(ma3$coef)/sqrt(diag(ma3$var.coef))))*2
## ma1 ma2 ma3 intercept
## 1.114664e-13 6.446338e-01 1.018543e-04 7.914562e-01
tsdiag(ma3, gof.lag=20)
arima(dlcop, order=c(0,0,3), fixed=c(NA,0,NA,NA))
##
## Call:
## arima(x = dlcop, order = c(0, 0, 3), fixed = c(NA, 0, NA, NA))
##
## Coefficients:
## ma1 ma2 ma3 intercept
## 0.1887 0 0.0995 0.0004
## s.e. 0.0250 0 0.0257 0.0014
##
## sigma^2 estimated as 0.001709: log likelihood = 2648.62, aic = -5289.23
ma3R<-arima(dlcop, order=c(0,0,3), fixed=c(NA,0,NA,NA))
BIC(ma3R)
## [1] -5267.983
ma3R$coef/sqrt(diag(ma3R$var.coef))
## Warning in ma3R$coef/sqrt(diag(ma3R$var.coef)): longer object length is not
## a multiple of shorter object length
## ma1 ma2 ma3 intercept
## 7.55044131 0.00000000 72.31855339 0.01472545
(1-pnorm(abs(ma3R$coef)/sqrt(diag(ma3R$var.coef))))*2
## Warning in abs(ma3R$coef)/sqrt(diag(ma3R$var.coef)): longer object length
## is not a multiple of shorter object length
## ma1 ma2 ma3 intercept
## 4.329870e-14 1.000000e+00 0.000000e+00 9.882512e-01
tsdiag(ma3R, gof.lag=20)
From the above tables,we can see that MA(3) shows significant coefficient with 1% level of significance and AIC of -5287.44 and BIC of -5260.881. We have also considered Restricted MA(3) which suggests a lower AIC (-5289.23) and lower BIC(-5267.983). In case of MA(3) and Restricted MA(3), ACF residual is closer to zero. Moreover, p-value for Ljung -Box stastistic is higher than 0.6. This indicats Restricted MA(2) or MA(2) are the better models among all the MA(p) models.
Based on the ACF and PACF, I checked the ARMA(p,q) model for p and q equals to 1,2 and 3.
arima(dlcop, order=c(1,0,1))
##
## Call:
## arima(x = dlcop, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## -0.2695 0.4533 0.0003
## s.e. 0.2132 0.2004 0.0012
##
## sigma^2 estimated as 0.001725: log likelihood = 2641.64, aic = -5275.28
ar1ma1<-arima(dlcop, order=c(1,0,1))
BIC(ar1ma1)
## [1] -5254.026
ar1ma1$coef/sqrt(diag(ar1ma1$var.coef))
## ar1 ma1 intercept
## -1.2639133 2.2613880 0.2733312
(1-pnorm(abs(ar1ma1$coef)/sqrt(diag(ar1ma1$var.coef))))*2
## ar1 ma1 intercept
## 0.20626114 0.02373524 0.78459860
tsdiag(ar1ma1, gof.lag=20)
arima(dlcop, order=c(1,0,2))
##
## Call:
## arima(x = dlcop, order = c(1, 0, 2))
##
## Coefficients:
## ar1 ma1 ma2 intercept
## 0.7628 -0.5712 -0.1122 0.0004
## s.e. 0.1812 0.1830 0.0524 0.0014
##
## sigma^2 estimated as 0.001723: log likelihood = 2642.78, aic = -5275.56
ar1ma2<-arima(dlcop, order=c(1,0,2))
BIC(ar1ma2)
## [1] -5248.993
ar1ma2$coef/sqrt(diag(ar1ma2$var.coef))
## ar1 ma1 ma2 intercept
## 4.210204 -3.121585 -2.142477 0.259995
(1-pnorm(abs(ar1ma2$coef)/sqrt(diag(ar1ma2$var.coef))))*2
## ar1 ma1 ma2 intercept
## 2.551406e-05 1.798804e-03 3.215510e-02 7.948676e-01
tsdiag(ar1ma2, gof.lag=20)
arima(dlcop, order=c(1,0,3))
##
## Call:
## arima(x = dlcop, order = c(1, 0, 3))
##
## Coefficients:
## ar1 ma1 ma2 ma3 intercept
## -0.0287 0.2191 0.0175 0.1014 0.0004
## s.e. 0.2651 0.2639 0.0560 0.0264 0.0014
##
## sigma^2 estimated as 0.001709: log likelihood = 2648.73, aic = -5285.45
ar1ma3<-arima(dlcop, order=c(1,0,3))
BIC(ar1ma3)
## [1] -5253.579
ar1ma3$coef/sqrt(diag(ar1ma3$var.coef))
## ar1 ma1 ma2 ma3 intercept
## -0.1080853 0.8303760 0.3116848 3.8417013 0.2716545
(1-pnorm(abs(ar1ma3$coef)/sqrt(diag(ar1ma3$var.coef))))*2
## ar1 ma1 ma2 ma3 intercept
## 0.9139280479 0.4063262129 0.7552800707 0.0001221845 0.7858876649
tsdiag(ar1ma3, gof.lag=20)
arima(dlcop, order=c(2,0,1))
##
## Call:
## arima(x = dlcop, order = c(2, 0, 1))
##
## Coefficients:
## ar1 ar2 ma1 intercept
## -0.5328 0.0880 0.7303 0.0004
## s.e. 0.1032 0.0362 0.0990 0.0013
##
## sigma^2 estimated as 0.001719: log likelihood = 2644.52, aic = -5279.04
ar2ma1<-arima(dlcop, order=c(2,0,1))
BIC(ar2ma1)
## [1] -5252.474
ar2ma1$coef/sqrt(diag(ar2ma1$var.coef))
## ar1 ar2 ma1 intercept
## -5.1639939 2.4335151 7.3769934 0.3033692
(1-pnorm(abs(ar2ma1$coef)/sqrt(diag(ar2ma1$var.coef))))*2
## ar1 ar2 ma1 intercept
## 2.417356e-07 1.495301e-02 1.618705e-13 7.616085e-01
tsdiag(ar2ma1, gof.lag=20)
arima(dlcop, order=c(3,0,1))
##
## Call:
## arima(x = dlcop, order = c(3, 0, 1))
##
## Coefficients:
## ar1 ar2 ar3 ma1 intercept
## -0.2815 0.0664 0.0861 0.4725 0.0004
## s.e. 0.2152 0.0498 0.0286 0.2157 0.0014
##
## sigma^2 estimated as 0.001709: log likelihood = 2648.52, aic = -5285.05
ar3ma1<-arima(dlcop, order=c(3,0,1))
BIC(ar3ma1)
## [1] -5253.174
ar3ma1$coef/sqrt(diag(ar3ma1$var.coef))
## ar1 ar2 ar3 ma1 intercept
## -1.3076471 1.3346598 3.0086368 2.1904664 0.2824619
(1-pnorm(abs(ar3ma1$coef)/sqrt(diag(ar3ma1$var.coef))))*2
## ar1 ar2 ar3 ma1 intercept
## 0.190993038 0.181987703 0.002624226 0.028490428 0.777589392
tsdiag(ar3ma1, gof.lag=20)
arima(dlcop, order=c(2,0,2))
##
## Call:
## arima(x = dlcop, order = c(2, 0, 2))
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept
## 0.0291 0.3669 0.1672 -0.3648 0.0004
## s.e. 0.2073 0.0906 0.2090 0.1109 0.0014
##
## sigma^2 estimated as 0.001712: log likelihood = 2647.54, aic = -5283.09
ar2ma2<-arima(dlcop, order=c(2,0,2))
BIC(ar2ma2)
## [1] -5251.214
ar2ma2$coef/sqrt(diag(ar2ma2$var.coef))
## ar1 ar2 ma1 ma2 intercept
## 0.1404141 4.0509608 0.8000632 -3.2895654 0.2587292
(1-pnorm(abs(ar2ma2$coef)/sqrt(diag(ar2ma2$var.coef))))*2
## ar1 ar2 ma1 ma2 intercept
## 8.883328e-01 5.100775e-05 4.236742e-01 1.003422e-03 7.958442e-01
tsdiag(ar2ma2, gof.lag=20)
arima(dlcop, order=c(2,0,3))
##
## Call:
## arima(x = dlcop, order = c(2, 0, 3))
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3 intercept
## -0.0330 0.0408 0.2234 -0.0220 0.0932 0.0004
## s.e. 0.2501 0.2300 0.2491 0.2291 0.0539 0.0014
##
## sigma^2 estimated as 0.001709: log likelihood = 2648.74, aic = -5283.49
ar2ma3<-arima(dlcop, order=c(2,0,3))
BIC(ar2ma3)
## [1] -5246.299
ar2ma3$coef/sqrt(diag(ar2ma3$var.coef))
## ar1 ar2 ma1 ma2 ma3 intercept
## -0.13181285 0.17745900 0.89714225 -0.09605911 1.73133435 0.27202272
(1-pnorm(abs(ar2ma3$coef)/sqrt(diag(ar2ma3$var.coef))))*2
## ar1 ar2 ma1 ma2 ma3 intercept
## 0.89513232 0.85914787 0.36964301 0.92347362 0.08339215 0.78560455
tsdiag(ar2ma3, gof.lag=20)
arima(dlcop, order=c(3,0,2))
##
## Call:
## arima(x = dlcop, order = c(3, 0, 2))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 intercept
## -0.2089 0.1184 0.0762 0.3999 -0.0666 0.0004
## s.e. 0.2757 0.1813 0.0451 0.2761 0.2145 0.0014
##
## sigma^2 estimated as 0.001709: log likelihood = 2648.57, aic = -5283.14
ar3ma2<-arima(dlcop, order=c(3,0,2))
BIC(ar3ma2)
## [1] -5245.953
ar3ma2$coef/sqrt(diag(ar3ma2$var.coef))
## ar1 ar2 ar3 ma1 ma2 intercept
## -0.7578626 0.6531604 1.6923120 1.4481365 -0.3106969 0.2829736
(1-pnorm(abs(ar3ma2$coef)/sqrt(diag(ar3ma2$var.coef))))*2
## ar1 ar2 ar3 ma1 ma2 intercept
## 0.4485333 0.5136529 0.0905865 0.1475789 0.7560311 0.7771971
tsdiag(ar3ma2, gof.lag=20)
arima(dlcop, order=c(3,0,3))
##
## Call:
## arima(x = dlcop, order = c(3, 0, 3))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 intercept
## 0.303 -0.0556 -0.1716 -0.1128 0.0094 0.2787 0.0004
## s.e. 0.346 0.2600 0.1754 0.3444 0.2377 0.1892 0.0014
##
## sigma^2 estimated as 0.001708: log likelihood = 2649.02, aic = -5282.05
ar3ma3<-arima(dlcop, order=c(3,0,3))
BIC(ar3ma3)
## [1] -5239.547
ar3ma3$coef/sqrt(diag(ar3ma3$var.coef))
## ar1 ar2 ar3 ma1 ma2 ma3
## 0.87595101 -0.21379882 -0.97822982 -0.32761611 0.03960915 1.47274334
## intercept
## 0.27167642
(1-pnorm(abs(ar3ma3$coef)/sqrt(diag(ar3ma3$var.coef))))*2
## ar1 ar2 ar3 ma1 ma2 ma3 intercept
## 0.3810567 0.8307039 0.3279607 0.7432019 0.9684047 0.1408203 0.7858708
tsdiag(ar3ma3, gof.lag=20)
arima(dlcop, order=c(3,0,1))
##
## Call:
## arima(x = dlcop, order = c(3, 0, 1))
##
## Coefficients:
## ar1 ar2 ar3 ma1 intercept
## -0.2815 0.0664 0.0861 0.4725 0.0004
## s.e. 0.2152 0.0498 0.0286 0.2157 0.0014
##
## sigma^2 estimated as 0.001709: log likelihood = 2648.52, aic = -5285.05
Rar3ma1<-arima(dlcop, order=c(3,0,1), fixed=c(0,0,NA,NA,NA))
## Warning in arima(dlcop, order = c(3, 0, 1), fixed = c(0, 0, NA, NA, NA)):
## some AR parameters were fixed: setting transform.pars = FALSE
BIC(Rar3ma1)
## [1] -5266.464
Rar3ma1$coef/sqrt(diag(Rar3ma1$var.coef))
## Warning in Rar3ma1$coef/sqrt(diag(Rar3ma1$var.coef)): longer object length
## is not a multiple of shorter object length
## ar1 ar2 ar3 ma1 intercept
## 0.00000000 0.00000000 66.88167583 7.29666990 0.01456911
(1-pnorm(abs(Rar3ma1$coef)/sqrt(diag(Rar3ma1$var.coef))))*2
## Warning in abs(Rar3ma1$coef)/sqrt(diag(Rar3ma1$var.coef)): longer object
## length is not a multiple of shorter object length
## ar1 ar2 ar3 ma1 intercept
## 1.000000e+00 1.000000e+00 0.000000e+00 2.948752e-13 9.883759e-01
tsdiag(Rar3ma1, gof.lag=20)
From the above tables,we can see that ARMA(2,2) and Restricted ARMA(3,1) shows significant coefficient with with 5% and 10% level of significance. ARMA(2,2) shows AIC of -5283.09 and BIC of -5251.24. Restricted ARMA(3,1) shows AIC of -5285.05 and BIC of -5266.464.For these models, ACF residual is closer to zero. Moreover, p-value for Ljung -Box stastistic is higher than 0.6. Therefore, ARMA(2,2) is the best fit when we do not impose restriction. Otherwise, with restriction, Restricted ARMA(3,1) can also be suggested.
4.1.6 Comments on the AR(p) models