Introduction
For this model development we will follow the Box-Jenkins methodology to build a time series model for the log-change in the price of Brent Crude Oil.
Data
I will import the time series data for the Weekly Price for Brent Crude Oil from the Quandl website.
Before I get started with loading the data I need to take care of a few house keeping issues. I will load/require the necessary packages that will be used during this model development.
require(forecast)
require(Quandl)
require(ggplot2)
require(dygraphs)
require(xtable)
require(DT)
Next I will load the data.
Quandl.api_key('Ltw-PAye5rkz6MwzLNx-')
WCOILBRENTEU <- Quandl("FRED/WCOILBRENTEU", type="zoo")
A quick inspection shows that this data on Weekly Price for Brent Crude Oil is at weekly frequency and that the data is available for the period 1987 May to 2016 January.
str(WCOILBRENTEU)
## 'zoo' series from 1987-05-15 to 2016-01-29
## Data: num [1:1499] 18.6 18.5 18.6 18.7 18.8 ...
## Index: Date[1:1499], format: "1987-05-15" "1987-05-22" "1987-05-29" "1987-06-05" ...
A plot of the original time series data will allow us to quickly determine if any transformations of the data are necessary.
ar1.model <- arima(diff_log_WCOILBRENTEU, order = c(1,0,0))
ar2.model <- arima(diff_log_WCOILBRENTEU, order = c(2,0,0))
ar3.model <- arima(diff_log_WCOILBRENTEU, order = c(3,0,0))
# Testing AR(1) Model
tsdiag(ar1.model, gof.lag = 12)
# Testing AR(2) Model
tsdiag(ar2.model, gof.lag = 12)
# Testing AR(3) Model
tsdiag(ar3.model, gof.lag = 12)
According to the above diagnostics, the AR(1) and AR(2) models are a poor fit since there is some autocorrelation between the residuals but more importantly the p-values on the Ljung-Box statistics for both the AR(1) and AR(2) models are all small indicating some pattern in the residuals.
The AR(3) may be an option but I will look at the MA models next and then reevaluate based on the results.
Moving Average (MA) Model Exploration
ma1.model <- arima(diff_log_WCOILBRENTEU, order = c(0,0,1))
ma2.model <- arima(diff_log_WCOILBRENTEU, order = c(0,0,2))
ma3.model <- arima(diff_log_WCOILBRENTEU, order = c(0,0,3))
# Testing MA(1) Model
tsdiag(ma1.model, gof.lag = 12)
# Testing MA(2) Model
tsdiag(ma2.model, gof.lag = 12)
# Testing MA(3) Model
tsdiag(ma3.model, gof.lag = 12)
A similar pattern emerges as we saw in the AR(p) model explorations. Again, there is some small autocorrelation between the residuals in the MA(1) and MA(2) models but more importantly the p-values on the Ljung-Box statistics for both the MA(1) and MA(2) models are all small indicating patterns in the residuals.
The MA(3) model seems to be the best fit for this time series data set. Based on the diagnostic information between the AR(3) and MA(3) models I am inclined to favor the MA(3) model based on the slightly higher p-values for the Ljung-Box statistic.
A comparison of the Akaike Information Criterion (AIC) and the Schwarz-Bayesian information criterion (BIC) may help to bring more clear insight into which is the better model. I will construct a chart to help present these results.
In addition to the AR(p) and MA(q) models where (p,q)={1,2,3}, I will model a series of ARMA(p,q) models where p={1,2,3} and q=3. These ARMA(p,q) models will be included in the comparison chart in order to be thorough in the analysis.
arma1_3.model <- arima(diff_log_WCOILBRENTEU, order = c(1,0,3))
arma2_3.model <- arima(diff_log_WCOILBRENTEU, order = c(2,0,3))
arma3_3.model <- arima(diff_log_WCOILBRENTEU, order = c(3,0,3))
# Testing ARMA(1,3) Model
tsdiag(arma1_3.model, gof.lag = 12)
# Testing ARMA(2,3) Model
tsdiag(arma2_3.model, gof.lag = 12)
# Testing ARMA(3,3) Model
tsdiag(arma3_3.model, gof.lag = 12)
Conclusion
The table shown below details a summary of the AIC and BIC results from the analysis presented in this report. According to all the diagnostics from above and the summary of the table below we can conclude that modeling this time series data set with an MA(3) model will provide the best results.