Model Development of Weekly Price for Brent Crude Oil


Introduction

For this model development we will follow the Box-Jenkins methodology to build a time series model for the log-change in the price of Brent Crude Oil.

Data

I will import the time series data for the Weekly Price for Brent Crude Oil from the Quandl website.
Before I get started with loading the data I need to take care of a few house keeping issues. I will load/require the necessary packages that will be used during this model development.

require(forecast)
require(Quandl)
require(ggplot2)
require(dygraphs)
require(xtable)
require(DT)


Next I will load the data.

Quandl.api_key('Ltw-PAye5rkz6MwzLNx-')
WCOILBRENTEU <- Quandl("FRED/WCOILBRENTEU", type="zoo")


A quick inspection shows that this data on Weekly Price for Brent Crude Oil is at weekly frequency and that the data is available for the period 1987 May to 2016 January.

str(WCOILBRENTEU)
## 'zoo' series from 1987-05-15 to 2016-01-29
##   Data: num [1:1499] 18.6 18.5 18.6 18.7 18.8 ...
##   Index:  Date[1:1499], format: "1987-05-15" "1987-05-22" "1987-05-29" "1987-06-05" ...


A plot of the original time series data will allow us to quickly determine if any transformations of the data are necessary.


We can see from the above plot that the original data follows an exponential trend and is non-stationary. Applying a log change transformation to the data will remove this exponential trend and allow us to determine if this transformation will be sufficiently stationary to satisfy the necessary conditions for time series weak stationarity.

The following plot shows a time series for the log change in Weekly Price for Brent Crude Oil.


From the above plot we can see that the exponential trend is gone and the application of the log change transformation has rendered the series time invariant with respect to the mean and variance. In addition, the covariance between \(x_t\) and \(x_{t-\tau}\) only depend on the lag \(\tau\), where \(\tau\) is a finite integer. The time series is now weakly stationary where \(E(x_t)=\mu\) and \(y_\tau=cov(x_t, x_{t-\tau})\).

Model Identification, Estimation & Checking for Adequacy

The next step will be to identify an appropriate model by looking at a plot of the autocorrelation (ACF) function and partial autocorrelation function (PACF) for \(y_t\).


There does not appear to be an obvious favor toward AR, MA or ARMA. The third lag in both the ACF and PACF are significant. I do not see a clear drop to zero in either ACF or PACF but I am inclined to favor an MA(3) model for this data set.

I will quickly model up to three lags on an AR model, and three lags on an MA model. I will determine if an ARMA model is viable to explore based on the results of the explorations on the AR/MA models.

Autoregressive (AR) Model Exploration

ar1.model <- arima(diff_log_WCOILBRENTEU, order = c(1,0,0))
ar2.model <- arima(diff_log_WCOILBRENTEU, order = c(2,0,0))
ar3.model <- arima(diff_log_WCOILBRENTEU, order = c(3,0,0))


# Testing AR(1) Model
tsdiag(ar1.model, gof.lag = 12)

# Testing AR(2) Model
tsdiag(ar2.model, gof.lag = 12)

# Testing AR(3) Model
tsdiag(ar3.model, gof.lag = 12)


According to the above diagnostics, the AR(1) and AR(2) models are a poor fit since there is some autocorrelation between the residuals but more importantly the p-values on the Ljung-Box statistics for both the AR(1) and AR(2) models are all small indicating some pattern in the residuals.

The AR(3) may be an option but I will look at the MA models next and then reevaluate based on the results.

Moving Average (MA) Model Exploration

ma1.model <- arima(diff_log_WCOILBRENTEU, order = c(0,0,1))
ma2.model <- arima(diff_log_WCOILBRENTEU, order = c(0,0,2))
ma3.model <- arima(diff_log_WCOILBRENTEU, order = c(0,0,3))
# Testing MA(1) Model
tsdiag(ma1.model, gof.lag = 12)

# Testing MA(2) Model
tsdiag(ma2.model, gof.lag = 12)

# Testing MA(3) Model
tsdiag(ma3.model, gof.lag = 12)


A similar pattern emerges as we saw in the AR(p) model explorations. Again, there is some small autocorrelation between the residuals in the MA(1) and MA(2) models but more importantly the p-values on the Ljung-Box statistics for both the MA(1) and MA(2) models are all small indicating patterns in the residuals.

The MA(3) model seems to be the best fit for this time series data set. Based on the diagnostic information between the AR(3) and MA(3) models I am inclined to favor the MA(3) model based on the slightly higher p-values for the Ljung-Box statistic.

A comparison of the Akaike Information Criterion (AIC) and the Schwarz-Bayesian information criterion (BIC) may help to bring more clear insight into which is the better model. I will construct a chart to help present these results.

In addition to the AR(p) and MA(q) models where (p,q)={1,2,3}, I will model a series of ARMA(p,q) models where p={1,2,3} and q=3. These ARMA(p,q) models will be included in the comparison chart in order to be thorough in the analysis.

arma1_3.model <- arima(diff_log_WCOILBRENTEU, order = c(1,0,3))
arma2_3.model <- arima(diff_log_WCOILBRENTEU, order = c(2,0,3))
arma3_3.model <- arima(diff_log_WCOILBRENTEU, order = c(3,0,3))


# Testing ARMA(1,3) Model
tsdiag(arma1_3.model, gof.lag = 12)

# Testing ARMA(2,3) Model
tsdiag(arma2_3.model, gof.lag = 12)

# Testing ARMA(3,3) Model
tsdiag(arma3_3.model, gof.lag = 12)



Conclusion

The table shown below details a summary of the AIC and BIC results from the analysis presented in this report. According to all the diagnostics from above and the summary of the table below we can conclude that modeling this time series data set with an MA(3) model will provide the best results.