In this analysis we have used a Hybrid Forecasting Model that consists of both ARIMA and Neural Netowrk Autoregression models with automated parameter selection. In the Hybrid Model each model is weighted equally to produce the forecasts.
The Autoregressive Integrated Moving Average (ARIMA) which is a classical approach to time-series forecasting that aims to describe the autocorrelations in the data. ARIMA models are subject to three main parameters: the number of lagged observations (p), the number of times that the observations are differenced (d), and the size of the moving average window (q).
The general equation for the model is as follows:
\(\hat{y_t}=\mu+\phi_1y_{t-1}+...+\phi_py_{t-p}-\theta_1e_{t-1}-...-\theta_qe_{t-q}\)
Similar to the ARIMA Model, the Autoregressive model takes the recent differences in the time series lags to generate forecasts. These differences are then fed into a ’Single Layer Feed-Forward Network’optimized through a hidden layer, and then fed out to provide us our Forecasts. The outputs of one layer is the input of the next layer. A visual representation of this can be seen in the image below.
This hybrid model has shown to provide extremely accurate results when forecasting short-terms months. The Cross-Validated Accuracy table below showcases Mean Absolute Errors (MAE) when forecasting out 12 Months at a time. After keeping a rolling window size of 72 months we see that the Expected 1 Month Forecast Error is $.066.
As can be seen below, the forecast using the Hybrid Model tends to flatten out in time. This is because the Hybrid Model uses a 6-Year Windown to make forecasts, consequently forcing future forecasts to use previous forecasts as inputs. The flat line describes the Average (Expected) value in the time series within the last 6 years which in reality the real values would likely be oscillating around it.
Below is the correlation matrix for NGL Indexes in which we can see that they are all have a strong positive correlation with one another. This likely means that the market often moves together and is affected unilateraly by overall market affects.
Below we have decided to remove FEI Propane & Mount Belvieu Propane from the correlation plot and add in their differentials (e.g. FEI-MtBelvieu)
We can see that that all indexes have moderately positive correlation with the Differential Feature. This is likely because they were correlated with the underlying features to begin with, but the differentials don’t move as well with the market as their individual indexes.
In the Biplot below we can clearly see that Propane FEI is most closely related to Normal Butane FEI. We can alos see that there are four distinct grops within the first two Principle Components. This is significant because the first two Principle Components explains roughly ~80% of the variance within all the data. Therefore, making strong conclusions about how some variables are related is reasonable (to a certain degree).
This section by utilizing Granger Causality tests which uses statistical tests to determine whether one variable’s time series is useful in forecasting FEI Propane monthly prices. From Granger Causaility Test below that we are 98.3% confident that the Normal Butane Mont Belvieu is statistically significant for predicting FEI Propane.
This means that using lagged values from Normal Butane Mont Belvieu are useful for predicting FEI Propane. It should be noted that there were no other indexes that were statistically significant for this test.
## $Granger
##
## Granger causality H0: NormalButane.MontBelvieu do not Granger-cause
## Propane.FEI
##
## data: VAR object var1
## F-Test = 5.7579, df1 = 1, df2 = 228, p-value = 0.01722
##
##
## $Instant
##
## H0: No instantaneous causality between: NormalButane.MontBelvieu and
## Propane.FEI
##
## data: VAR object var1
## Chi-squared = 6.5442, df = 1, p-value = 0.01052
Lastly, in this section we test for variables that are cointegrated with Propane FEI. Cointegration tests are significant because they test whether correlated time-series are statistically significant and not correlated by chance.
For conciseness we will not show the output for every variable test but we have found that the following variables are highly cointegrated with FEI Propane:
## Propane-FEI[i] = 0.9560 Propane-ARA[i] + 0.1568 + R[i], R[i] = 0.7701 R[i-1] + eps[i], eps ~ N(0, 0.0597^2)
## (0.0169) (0.0361) (0.0599)
##
## R[130] = 0.0040 (t = 0.045)
##
## Unit Root Tests of Residuals
## Statistic p-value
## Augmented Dickey Fuller (ADF) -1.887 0.53698
## Phillips-Perron (PP) -31.242 0.00794
## Pantula, Gonzales-Farias and Fuller (PGFF) 0.730 0.00719
## Elliott, Rothenberg and Stock DF-GLS (ERSD) -1.406 0.41717
## Johansen's Trace Test (JOT) -19.890 0.06897
## Schmidt and Phillips Rho (SPR) -35.811 0.00854
##
## Variances
## SD(diff(Propane-ARA)) = 0.124282
## SD(diff(Propane-FEI)) = 0.123816
## SD(diff(residuals)) = 0.064306
## SD(residuals) = 0.087602
## SD(innovations) = 0.059742
##
## Half life = 2.653882
## R[last] = 0.003967 (t=0.05)