Setup

# Clear All variables
rm(list = ls(all = TRUE))

# Import Library
library(readxl)
library(forecast)
library(strucchange)
library(lmtest)
library(vars)

I. Introduction

For this project, we analyzed sunflower oil and palm oil prices imported from FRED St. Louis federal website from January 2003 to January 2019. We first explored the data using a monthly time frame. The mean price for sunflower oil was \(939.35\) while the average for palm oil equalled \(731.68\) over this time period. Both data sets appears to show a slight upward trend with two spikes, one larger spike in 2008 and one smaller spike in 2011. The oil prices appear to follow each other rather well, thus implying that there may be one series helping cause the other. Sunflower oil prices appear to be slightly more volatile which would make sense given that the quantity of palm oil produced and sold is much larger, giving the price more stability the sunflower oil prices.

We checked if there existed significant structural breaks as we had encountered on the previous project of Police Call in Oregon. Fortunately, we did not encounter a situation when this issue breaks the data; CUMSUM plot confirmed this conclusion.

II. Results

(a) Time Series Plot of Sunflower and Palm Oil

# Import and separate Sunflower Oil and Palm Oil
setwd("~/Downloads")
sf <- read_xlsx("sf.xlsx")
pm <- read_xlsx("pm.xlsx")
sf_ts <- ts(sf$s_price, start = 2003, frequency = 12)
pm_ts <- ts(pm$p_price, start = 2003, frequency = 12)
plot(sf_ts, col = "blue", ylab = "Price ($)", ylim = c(300, 2100), 
    xlab = "Year")
lines(pm_ts, col = "red")
legend("topright", legend = c("Sunflower Oil Prices", "Palm Oil Prices"), 
    fill = c("blue", "red"), text.col = c("blue", "red"), bty = "o")
grid()

Sunflower Oil Time Series (2003-2019)

tsdisplay(sf_ts, main = "Sunflower Oil Prices")

Palm Oil Time Series (2013-2019)

tsdisplay(pm_ts, main = "Palm Oil Prices")

At first glance, both datasets demonstrate persistence that resemble an AR model of order 2. We can see a large increase around the years of 2007 to 2011 which might post a challenge to our models.

(b) Model Trend+Seasonality+Cyclical Components

fitsf <- auto.arima(sf_ts)
summary(fitsf)  #ARIMA(0,1,1)
## Series: sf_ts 
## ARIMA(0,1,1) 
## 
## Coefficients:
##          ma1
##       0.5436
## s.e.  0.0599
## 
## sigma^2 estimated as 3289:  log likelihood=-1049.56
## AIC=2103.11   AICc=2103.17   BIC=2109.63
## 
## Training set error measures:
##                     ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 0.2350056 57.05333 36.22178 -0.02264966 3.677318 0.1619093
##                    ACF1
## Training set 0.05042802
fitpm <- auto.arima(pm_ts)
summary(fitpm)  #ARIMA(1,1,0)
## Series: pm_ts 
## ARIMA(1,1,0) 
## 
## Coefficients:
##          ar1
##       0.3896
## s.e.  0.0664
## 
## sigma^2 estimated as 2104:  log likelihood=-1006.59
## AIC=2017.17   AICc=2017.24   BIC=2023.69
## 
## Training set error measures:
##                     ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 0.5052126 45.63499 32.29416 0.031984 4.375842 0.1896681
##                    ACF1
## Training set 0.01512611

Using auto.arima to find the best model to fit data, we find that an ARIMA(0,1,1) fits the sunflower oil prices. This is an MA(1) with the first difference of the data taken. An ARIMA(1,1,0) fits the palm oil prices which is an AR(1) with the first difference of the data taken. Even though both models seem like an AR model, the computer concludes that for sunflower oil prices, after differentiation, is better fitted with MA. This is likely because the sunflower oil dataset exhibits more frequent mean reversion than that of the palm oil prices. For the palm oil prices, we obtain an AR model of order 1 with 1 degree of differentiation. Differentiation is applied to both models likely due to the preference over lower orders.

plot(stl(sf_ts, s.window = "periodic"), main = "STL Sunflower Oil Prices")

plot(stl(pm_ts, s.window = "periodic"), main = "STL Palm Oil Prices")

To examine cycles and seasonality of the data more closely, we use a stl function. It shows that both datasets have some sort of yearly seasonality at first glance. However, if we look at the y-values, seasonality variance accounts for less than 10 percent of the average data price. In reality, there is not much seasonality and that is why an ARIMA model accounting for cycles and differentiation is enough for both.

(c) Residuals vs Fitted for Sunflower and Palm Oils

plot(y = fitsf$res, x = fitsf$fit, pch = 16, main = "Residuals vs Fitted Values Sunflower Oils", 
    xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "blue", lty = 2)
grid()

plot(y = fitpm$res, x = fitpm$fit, pch = 16, main = "Residuals vs Fitted Values Palm Oils", 
    xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "blue", lty = 2)
grid()

Compared to palm oil prices, sunflower oil prices have more clustered residuals with respect to fitted values. Palm oil prices are more randomly scattered, albeit not perfectly. This outcome may affect the plot such that residuals of sunflower oil prices may exhibit strongly volatility.

(d) ACF and PACF of Residuals

tsdisplay(fitsf$residuals, main = "Residuals of Sunflower Oil Prices")

Box.test(fitsf$residuals)
## 
##  Box-Pierce test
## 
## data:  fitsf$residuals
## X-squared = 0.4908, df = 1, p-value = 0.4836
tsdisplay(fitpm$residuals, main = "Residuals of Palm Oil Prices")

Box.test(fitpm$residuals)
## 
##  Box-Pierce test
## 
## data:  fitpm$residuals
## X-squared = 0.044158, df = 1, p-value = 0.8336

For the sunflower oil prices, the ACF and the PACF shows a few spikes over the bands, however, by running a Box-Pierce test, the residuals are found to be consistent with white noise. For the palm oil prices, the ACF and PACF also show a few spikes over the bands but again, a Box-Pierce test shows that the residuals are consistent with white noise.

(e) Plot the respective CUSUM and interpret the plot.

plot(efp(fitsf$res ~ 1, type = "Rec-CUSUM"), main = "Recusrive CUMSUM Test of Sunflower Oil")

plot(efp(fitpm$res ~ 1, type = "Rec-CUSUM"), main = "Recursive CUMSUM Test of Palm Oil")

For both models, there doesn’t seem to be any structural break.

(f) Plot the respective Recursive Residuals and interpret the plot

y1 = recresid(fitsf$res ~ 1)
plot(y1, pch = 16, ylab = "Recursive Residuals", type = "l")

y2 = recresid(fitpm$res ~ 1)
plot(y2, pch = 16, ylab = "Recursive Residuals", type = "l")

For both models, the recursive residuals generally stay flat around 0. But we do observe some outliers between 2007 to 2011. This can be explained by the Financial Crisis that took place during this time which raised commodity prices across the board.

(g) Diagnostic Statistics

accuracy(fitsf)
##                     ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 0.2350056 57.05333 36.22178 -0.02264966 3.677318 0.1619093
##                    ACF1
## Training set 0.05042802
accuracy(fitpm)
##                     ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 0.5052126 45.63499 32.29416 0.031984 4.375842 0.1896681
##                    ACF1
## Training set 0.01512611

Although we calculated the accuracy statistics, these statistics only gain meaning when compared against other models. Independently, these values do not give us any insight into the fit of the models.

(h) 12-Step Ahead Forecast

plot(forecast(fitsf, h = 12), shadecols = "oldstyle")

plot(forecast(fitpm, h = 12), shadecols = "oldstyle")

Since both models are low order ARIMA model, we see that the 12-step ahead forecasts quickly revert back to their respective unconditional means.

(i) VAR Model

ccf(sf_ts, pm_ts, ylab = "Cross-Correlation Function", main = "Sunflower and Palm Oil CCF")

y = cbind(sf_ts, pm_ts)
oil = data.frame(y)
VARselect(oil)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      8      2      2      8 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 1.561358e+01 1.524769e+01 1.527536e+01 1.526158e+01 1.524791e+01
## HQ(n)  1.565624e+01 1.531878e+01 1.537489e+01 1.538954e+01 1.540431e+01
## SC(n)  1.571881e+01 1.542307e+01 1.552090e+01 1.557727e+01 1.563375e+01
## FPE(n) 6.038029e+06 4.187937e+06 4.305636e+06 4.247066e+06 4.189954e+06
##                   6            7            8            9           10
## AIC(n) 1.521568e+01 1.520765e+01 1.516089e+01 1.520028e+01 1.523255e+01
## HQ(n)  1.540052e+01 1.542093e+01 1.540260e+01 1.547042e+01 1.553113e+01
## SC(n)  1.567168e+01 1.573380e+01 1.575718e+01 1.586673e+01 1.596916e+01
## FPE(n) 4.057852e+06 4.026435e+06 3.843758e+06 3.999902e+06 4.133288e+06
var1 = VAR(oil, p = 2)

plot(fevd(var1, n.ahead = 5))

We choose to fit a VAR model of order 2 to minimize BIC. In the VAR model for sunflower oil prices, the coefficients of both lagged values of palm oil prices are not statistically significant. Therefore, we do not expect the VAR model to be much different from the autoregressive model in this case. However, in the VAR model for palm oil prices, both lagged values of sunflower oil prices seem to have influence. Specifically, the first lag of sunflower prices positively influence the palm oil price, while the second lag of sunflower prices negatively influences it.

(j) Impulse Response Function

plot(irf(var1))

The IRF plots are consistent with our interpretation of the VAR models above. Price increase in sunflower oil positively influences both sunflower oil prices and palm oil prices and eventually decays its effect on its influence. Whereas price increase in palm oil price only affects future palm oil prices and stagnates overtime.

(k) Granger-Causality Test

grangertest(sf_ts ~ pm_ts, order = 2)
grangertest(pm_ts ~ sf_ts, order = 2)

According to the tests, the sunflower oil prices granger causes the palm oil prices. However, the palm oil prices do not granger-cause the sunflower oil prices. This result confirms our analysis using the VAR model. As sunflower oil prices cause palm oil prices, we would expect lags of the sunflower prices to be significant in the model for palm oils prices, and likewise, since palm oil prices do not cause sunflowers oil prices we would expect to see lags of palm oil not to be significant in the model for sunflower oil. This is what we saw above.

(l) 12-Step Ahead Forecast and Model Comparison

plot(predict(var1, n = 12))

test.var1 = accuracy(var1$varresult[[1]])
test1 = accuracy(fitsf)
var1$varresult[1]
## $sf_ts
## 
## Call:
## lm(formula = y ~ -1 + ., data = datamat)
## 
## Coefficients:
## sf_ts.l1  pm_ts.l1  sf_ts.l2  pm_ts.l2     const  
##  1.50306  -0.06266  -0.59398   0.13735  31.12798
test.var2 = accuracy(var1$varresult[[2]])
test2 = accuracy(fitpm)

dm.test(test.var1, test1, h = 12)
## 
##  Diebold-Mariano Test
## 
## data:  test.var1test1
## DM = -29667000, Forecast horizon = 12, Loss function power = 2,
## p-value < 2.2e-16
## alternative hypothesis: two.sided
dm.test(test.var2, test2, h = 12)
## 
##  Diebold-Mariano Test
## 
## data:  e1e2
## DM = -0.83404, Forecast horizon = 1, Loss function power = 2,
## p-value = 0.4362
## alternative hypothesis: two.sided

According to error metrics, the VAR and ARIMA models are very comparable and produce similar error values. When looking at RSME, the VAR models outperform the ARIMA models but when looking at MAE, the ARIMA model outperforms the VAR model. Looking at plots of the forecasts, the ARIMA models quickly revert to the mean and show a flat trend 12 steps ahead while the VAR models forecasts an increasing trend over the next year.

III. Conclusions and Future Work

Our analysis of the data underwent multiple interactions. Our first attempt at analysis led us to calculate the change in prices and use this data to conduct our analysis. We originally thought to use the change in the prices in order to make our data covariance stationary. However, through our analysis we discovered that a granger causality test was inconclusive as neither oil price changes granger caused one another. We then switched back to prices with the goal of being able to model granger causality and were successful with this attempt. Sunflower oil prices were found to granger cause palm oil prices. This is difficult to explain as much more palm oil is produced in the world versus the sunflower oil and thus we would expect the price of palm oil to influence sunflower oil. Further analysis of the market would be needed to help understand our statistics results.

In the future, we could split our data into prediction and validation samples to evaluate the accuracy of our out-of-sample forecasts. This would provide a more useful comparison between the models (ARIMA vs. VAR) we chose in this project. To implement, we would run a for-loop to sum up the squared differences between the forecast values n-steps ahead and the observed values. By comparing the sums, we can see which model provides a better fit.

In addition, we also noticed that there was some heteroskedasticity in the residual plots of the ARIMA models, and the PACF of squared residuals showed a few significant spikes as well. For future work, we could try to fit an ARCH or GARCH model to account for the changing volatility.

IV. References

The sunflower oil prices were found here:https://fred.stlouisfed.org/series/PSUNOUSDM.

The palm oil prices were found here: https://fred.stlouisfed.org/series/PPOILUSDM