I. Introduction

In this assignment we aim to use a vector error correction model to forecast gas prices during 2011M1 to 2017M3. We compare this forecast to a forecast genereated from an \(ARMA(1,0,2)\) model.

II. The Data

We download the monthly time series for Crude Oil Prices and the monthly time series for US Regular Conventional Gas Prices from FRED using the following code. The sample period is from 1995M1 to 2017M3.

data.Crude.Oil.Prices <- Quandl("FRED/MCOILWTICO", type = 'zoo')
data.Conventional.Gas.Prices <- Quandl("FRED/GASREGCOVM", type = 'zoo')

For convience, we include a time series plot of both the original and the log of the data sets over the entire sample period.

As an aside we can see that the original time series plots do not appear co-move together because gas prices are so much lower than oil prices; however, when we scale the prices by taking the log we see that oil and gas prices appear to move together. For the remainder of the analysis we will use the logarithm of the oil and gas prices unless otherwise specified.

III. Data Manipulation

Before we proceed to estimate a bivariate VEC model, we first perform a unit root test and a conintegration analysis on the log of both oil and gas prices. The unit root test is used to determine if our two data sets are approximately weakly stationary. The cointegration analysis further examines if our data can be de-trended properly.

A. Unit Root Test

We run two different tests for each time series to examine if a unit root is present. The first test is the KPSS test. The null hypothesis of the KPSS test is that the time series is stationary (mean or trend stationary). Recall, that a trend stationary time series has a mean that is non-constant, and non-zero, but is deterministic. The alternative hypothesis is that the time series is difference stationary (has a unit root). In short, this test assumes that the time series is trend stationary and tests to see if there is enough evidence to reject this hypothesis and conclude that the time series is difference stationary.

The second test is the ERS test. The null hypothesis of this test is that the time series is difference stationary (has a unit root), which implies that the mean is non-constant, non-zero, and is stochastic; that is, not weaklt-stationary. The alternative hypothesis is that the time series is stationary, level stationary, or trend stationary depending upon which specification you use in the test. A stationary time series under this test represents a time series with a constant mean of zero. A level stationary time series is a time series with a non-zero but constant mean. A trend stationary time series has a mean that is non-constant, and non-zero, but is deterministic. In short, this test assumes the time series is difference stationary and tests to see if there is enough evidence to reject this hypothesis and conclude that the time series is trend stationary.

The code for both tests is given below, followed by the results and the interpretation of the statistics of each test.

KPSS.test1 <- ur.kpss(y1, type = "tau", lags = "short")
KPSS.test2 <- ur.kpss(y2, type = "tau", lags = "short")

ERS.test1 <- ur.ers(y1, type = "P-test", model = "trend")
ERS.test2 <- ur.ers(y2, type = "P-test", model = "trend")

KPSS test - Oil Prices


####################### 
# KPSS Unit Root Test # 
####################### 

Test is of type: tau with 5 lags. 

Value of test-statistic is: 0.5563 

Critical value for a significance level of: 
                10pct  5pct 2.5pct  1pct
critical values 0.119 0.146  0.176 0.216

The above KPSS test has a test statistic of 0.5563, which is larger than even the 1% critical value threshold. This implies that we can reject the null hypothesis, which suggests that the logarithm of oil prices has a unit root, and is non-stationary.

KPSS test - Gas Prices


####################### 
# KPSS Unit Root Test # 
####################### 

Test is of type: tau with 5 lags. 

Value of test-statistic is: 0.4928 

Critical value for a significance level of: 
                10pct  5pct 2.5pct  1pct
critical values 0.119 0.146  0.176 0.216

The above KPSS test has a test statistic of 0.4928, which is larger than even the 1% critical value threshold. This implies that we can reject the null hypothesis, which suggests that the logarithm of gas prices has a unit root, and is non-stationary.

ERS test - Oil Prices


############################################### 
# Elliot, Rothenberg and Stock Unit Root Test # 
############################################### 

Test of type P-test 
detrending of series with intercept and trend 

Value of test-statistic is: 9.6438 

Critical values of P-test are:
                1pct 5pct 10pct
critical values 3.96 5.62  6.89

Similar to the KPSS test, the ERS test shows that the logarithm of oil prices has a unit root, and is thus non-stationary. The test statistic of the ERS test is 9.6438 which is much larger than the test statistic even at the 10% level. Therefore, we fail to reject the null hypothesis which suggests the precense of a unit root.

ERS test - Gas Prices


############################################### 
# Elliot, Rothenberg and Stock Unit Root Test # 
############################################### 

Test of type P-test 
detrending of series with intercept and trend 

Value of test-statistic is: 8.3139 

Critical values of P-test are:
                1pct 5pct 10pct
critical values 3.96 5.62  6.89

Similar to the KPSS test, the ERS test shows that the logarithm of gas prices has a unit root, and is thus non-stationary. The test statistic of the ERS test is 8.3139 which is much larger than the test statistic even at the 10% level. Therefore, we fail to reject the null hypothesis which suggests the precense of a unit root.

In short, both the logarithm of oil and gas prices have a unit root. The first solution to dealing with a unit root is to tranform the data by taking the first difference. Generally after taking the first difference of the data the data will be approximately weakly stationary and classified as an I(1) process. If the data is still not weakly stationary then higher and higher orders of differences are usually taken until the time series are approximately weakly stationary.

Thus, following this pattern, we take the first difference of both the logarithm of oil and gas prices and re-run the unit root tests. If the tests show that no unit root is present then we will conclude that the two time series are I(1) processes. The code for both tests is given below, followed by the results and the interpretation of the statistics of each test.

y1.diff <- diff(y1, differences = 1)
y2.diff <- diff(y2, differences = 1)

KPSS.test3 <- ur.kpss(y1.diff, type = "tau", lags = "short")
KPSS.test4 <- ur.kpss(y2.diff, type = "tau", lags = "short")

ERS.test3 <- ur.ers(y1.diff, type = "P-test", model = "trend")
ERS.test4 <- ur.ers(y2.diff, type = "P-test", model = "trend")

KPSS test - First Difference of Oil Prices


####################### 
# KPSS Unit Root Test # 
####################### 

Test is of type: tau with 5 lags. 

Value of test-statistic is: 0.0494 

Critical value for a significance level of: 
                10pct  5pct 2.5pct  1pct
critical values 0.119 0.146  0.176 0.216

The above KPSS test has a test statistic of 0.0494, which is much smaller than even the 10% critical value test statistic. This implies that we fail to reject the null hypothesis, which suggests that the first differnece of the logarithm of oil prices follows an I(1) process.

KPSS test - First Difference of Gas Prices


####################### 
# KPSS Unit Root Test # 
####################### 

Test is of type: tau with 5 lags. 

Value of test-statistic is: 0.0403 

Critical value for a significance level of: 
                10pct  5pct 2.5pct  1pct
critical values 0.119 0.146  0.176 0.216

The above KPSS test has a test statistic of 0.0403, which is much smaller than even the 10% critical value test statistic. This implies that we fail to reject the null hypothesis, which suggests that the first differnece of the logarithm of gas prices follows an I(1) process.

ERS test - First Difference of Oil Prices


############################################### 
# Elliot, Rothenberg and Stock Unit Root Test # 
############################################### 

Test of type P-test 
detrending of series with intercept and trend 

Value of test-statistic is: 0.848 

Critical values of P-test are:
                1pct 5pct 10pct
critical values 3.96 5.62  6.89

Similar to the KPSS test, the ERS test shows that the first difference of the logarithm of oil prices follows an I(1) process. The test statistic of the ERS test is 0.8480 which is much smaller than the test statistic even at the 1% critical value level. Therefore we can reject the null hypothesis, and assume that the first difference of the logarithm of oil prices is approximately weakly stationary.

ERS test - First Difference of Gas Prices


############################################### 
# Elliot, Rothenberg and Stock Unit Root Test # 
############################################### 

Test of type P-test 
detrending of series with intercept and trend 

Value of test-statistic is: 0.4788 

Critical values of P-test are:
                1pct 5pct 10pct
critical values 3.96 5.62  6.89

Similar to the KPSS test, the ERS test shows that the first difference of the logarithm of gas prices follows an I(1) process. The test statistic of the ERS test is 0.4788 which is much smaller than the test statistic even at the 1% critical value level. Therefore we can reject the null hypothesis, and assume that the first difference of the logarithm of gas prices is approximately weakly stationary.

In short the evidence provided by the unit root tests above shows us that both the logarithm of oil and gas prices follows an I(1) process. This suggests that these two time series, after taking the first difference, are approximately weakly stationary.

B. Cointegration Analysis

It should be noted that the above unit root analysis implies that the two time series are only approximately weakly stationary if the two time series are not cointegrated. If the two variables are cointegrated then we cannot say that the I(1) time series are actually approximately weakly stationary. Therefore, it is necessary that we also perform a cointegration analysis. The importance of this test will become more evident in section IV. We run this test over the sample period 1995M1 to 2010M12.

The first step to running the cointegration analysis is to determine the number of lags we should use in the analysis. To do this we use the VARselect function in R and use the Schwarz (BIC) information criteria. The code is given below. For clarification the number of lags used in the cointegration analysis is 2. This number is auotmatically stored in the nlags variable.

y <- cbind(y1, y2)
y <- window(y, start = 1995, end = 2010 + 11/12)

# determine number of lags to be included in cointegration test and in VEC model
y.VAR.IC <- VARselect(y, type="const")
nlags <- y.VAR.IC$selection["SC(n)"]

Now that we have determined the number of lags to use in the cointegration test we run both the Johansen’s trace and maximum eigenvalue cointegration tests for the logarithm of the price of gas and the logarithm of the price of oil. There are four different specifications of each test all dependent upon the time series “case”. The four possible scenarios are:

  • Case 1: No drift and zero mean (series is weakly stationary)
  • Case 2: No drift and non-zero mean (restricted constant)
  • Case 3: Drift and has non-zero mean (unrestricted constant)
  • Case 4: Drift and linear trend (restricted trend)

From the time series plot given in section II, we think that our time series follow Case 4. The drift can be seen by the fact that both time series deviate away from the dashed horizontal black line, which is equal to 0. If the time series flucutated around this line, then we might say that there is no drift, however, since neither time series is close to fluctuating around this line, then we say that both time series have drift.

The linear trend can also be approximately seen in both time series. If no trend were present then we might expect both trends to be much more flat, and to fluctuate around a “horizontal line”. Since both time series tend to move upwards, then we say a linear trend is present.

Case 4 implies that the ecdet parameter in the cointegration tests below will be set equal to trend. It is also important to note that Johansen’s Methodology estimates a VAR(p) model for \(y_t\) in levels and not in differences. The code used to run these cointegration tests are given below. The results of each test follow as well as comments regarding the results of each test.

y.CA.trace <- ca.jo(y, ecdet="trend", type="trace", K=nlags, spec="transitory")
y.CA.eigen <- ca.jo(y, ecdet="trend", type="eigen", K=nlags, spec="transitory")

Johansen’s cointegration trace test


###################### 
# Johansen-Procedure # 
###################### 

Test type: trace statistic , without linear trend and constant in cointegration 

Eigenvalues (lambda):
[1] 2.219532e-01 1.088453e-02 2.499666e-18

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  2.08  7.52  9.24 12.97
r = 0  | 49.76 17.85 19.96 24.60

Eigenvectors, normalised to first column:
(These are the cointegration relations)

             y1.l1      y2.l1   constant
y1.l1     1.000000  1.0000000  1.0000000
y2.l1    -1.597557 -0.3523755 -0.5200684
constant -2.750321 -3.8984030 -2.9333378

Weights W:
(This is the loading matrix)

          y1.l1        y2.l1      constant
y1.d 0.04302644 -0.012575530 -1.133703e-16
y2.d 0.24104491 -0.005055413 -7.689967e-16

From this test we see that the null hypothesis that \(r = 0\) is rejected easily even at the 1% level. The test statistic is 49.76 and the critical value at the 1% level is 24.60. Hence, we reject the null that \(r = 0\). However, we fail to reject the test that \(r <= 1\) since the test statistic is 2.08 and the critical value at the 10% level is 7.52. Therefore, since we reject that \(r = 0\) and we fail to reject that \(r <= 1\), the cointegration trace test suggests that \(r\) should be equal to 1.

Johansen’s cointegration maximum eigenvalue test


###################### 
# Johansen-Procedure # 
###################### 

Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration 

Eigenvalues (lambda):
[1] 2.219532e-01 1.088453e-02 2.499666e-18

Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 1 |  2.08  7.52  9.24 12.97
r = 0  | 47.68 13.75 15.67 20.20

Eigenvectors, normalised to first column:
(These are the cointegration relations)

             y1.l1      y2.l1   constant
y1.l1     1.000000  1.0000000  1.0000000
y2.l1    -1.597557 -0.3523755 -0.5200684
constant -2.750321 -3.8984030 -2.9333378

Weights W:
(This is the loading matrix)

          y1.l1        y2.l1      constant
y1.d 0.04302644 -0.012575530 -1.133703e-16
y2.d 0.24104491 -0.005055413 -7.689967e-16

Similar to the trace test, we see that the null hypothesis that \(r = 0\) is rejected easily even at the 1% level. The test statistic is 47.68 and the critical value at the 1% level is 20.20. Hence, we reject the null that \(r = 0\). However, we fail to reject the test that \(r <= 1\) since the test statistic is 2.08 and the critical value at the 10% level is 7.52. Therefore, since we reject that \(r = 0\) and we fail to reject that \(r <= 1\), the cointegration trace test suggests that \(r\) should be equal to 1.

In sum, the cointegration tests imply that the variables are cointegrated, which implies that even the I(1) time series do not contain a unit root, the time series are still not approximately weakly stationary. This gives ground for us to employ a VEC model instead of a VAR model as will be discussed in detail in section IV.

IV. Estimating a Bivariate VEC Model

A. Defining a Vector Error Correction Model

In this section we aim to describe the intuition behind using a vector error correction (VEC) model. However, before we can describe the VEC model in depth we must first discuss a VAR model.

A vector autoregression model (VAR) is an econometric model used to capture the linear interdependencies among multiple time series. VAR models are essentially a multi-variate form of the univariate autoregressive models, also known as AR models. VAR models are thus useful for forecasting and generally prodcue better forecasts than an AR model since, ideally, more information will be included in the VAR model from the other time series examined. In other words, the multiple time series in the VAR model should provide extra information for forecasting a variable beyond the information given by historical values of the variable alone (also see Granger causality).

However, in a VAR model all variables need to be weakly stationary. If all varaibles in the VAR model are weakly stationary then we can estimate the VAR equations using standard OLS. In general, stationarity is an important feature of a time series data set and our forecating techniques are based on the assumption of stationarity. In other words, to forecast a time series properly we need the time series to be stationary. There are varying levels of stationarity the most common being strong (strict) and weakly stationary. A strong or strictly stationary time series is a time series in which the joint distribution of the time series data is constant over time; that is, the model that produces the data we are interested in does not change over time. Hence, for a strictly stationary times series, all of the parameters will be constant and thus we should expect the estimates of these parameters (statistics such as mean, variance, etc.) to be statistically indifferent from one another. Weakly stationary occurs when the mean and covariance of the time series data is time invariant; that is consant, does not change over time. Obvisouly in practice we will see more weakly stationary time series, and strictly stationary time series are more of a theoretical idea than an a real process we observe.

However, as one might suspect, most time series data will not be stationary,that is, non-stationary. Non-stationarity can be “fixed” by transforming the data in such a way that the resulting transformed data is approximately weakly stationary. This process of transformation is useful since it then allows us to forecast non-stationary time series. Effectively we perform the forecast on the transformed data (which is stationary) and then “untransform” the forecast (using the inverse of the transformation process) to see the forecast of our original data (the non-stationary process).

More specifically in the context of this problem, if not all of the variables in a VAR model are weakly stationary, then it will be inappropriate to use OLS to estimate the VAR system of equations. If we use OLS for a VAR system in which at least one of the variables is not weakly stationary then the result will be a spurious regression. A spurious regression implies that certain integrated variables within the VAR system could yield significant coefficients even if none of the variables are related.

To solve the problem of using a VAR model when at least one of the variables is not weakly stationary we use a vector error correction model (VEC). Intuitively, a VEC model is a restricted VAR model that has cointegration restrictions built into the specification so that non-stationary time series can be included into the model without having to worry about the spurious regression problem.

As we saw in section III not only are both data sets I(1) but they are also cointegrated. If both data sets were only I(1) but not cointegrated then to obtain an approximately weakly stationary time series we would simply need to take the first differnece of both data sets and a VAR model could be used. However, the fact that the I(1) time series are also cointegrated implies that just taking the simple first difference will not eliminate the problem of non-stationarity, and therefore we must use a VEC model.

B. Estimating a bivariate VEC model

Using the results from section III we conclude that we should use a VEC model as opposed to a VAR model. Specifically, the cointegration tests suggest that we should specify \(r = 1\) in our VEC model, in order to prevent the problem of spurious correlations and mis-specified coefficients. The bivariate VEC model is estimated over the sample period from 1995M1 to 2010M12 and is given below:

y.VEC <- cajorls(y.CA.eigen, r = 1)

We now want to determine whether the adjustment parameters, \(\alpha_1\) and \(\alpha_2\), are statistically significant in the VEC model. We also want to know whether the adjustment parameters signs are consistent with error correction mechanism that moves the system back to the long run equilibirum, whenever there is a disruption. To examine this we show the coefficients of the VEC model and their respective test statistics, along with the adjustment parameters.

Response y1.d :

Call:
lm(formula = y1.d ~ ect1 + constant + y1.dl1 + y2.dl1 - 1, data = data.mat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.28518 -0.05350  0.00407  0.05837  0.21570 

Coefficients:
          Estimate Std. Error t value Pr(>|t|)  
ect1      0.073557   0.064048   1.148   0.2523  
constant -0.197205   0.177369  -1.112   0.2676  
y1.dl1    0.212477   0.106215   2.000   0.0469 *
y2.dl1    0.005519   0.135883   0.041   0.9676  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0838 on 186 degrees of freedom
Multiple R-squared:  0.07275,   Adjusted R-squared:  0.05281 
F-statistic: 3.648 on 4 and 186 DF,  p-value: 0.00691


Response y2.d :

Call:
lm(formula = y2.d ~ ect1 + constant + y1.dl1 + y2.dl1 - 1, data = data.mat)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.212101 -0.025861  0.003027  0.030801  0.126767 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
ect1      0.24207    0.03990   6.067 7.16e-09 ***
constant -0.66769    0.11050  -6.042 8.11e-09 ***
y1.dl1    0.12761    0.06617   1.928 0.055321 .  
y2.dl1    0.32742    0.08465   3.868 0.000152 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0522 on 186 degrees of freedom
Multiple R-squared:  0.3757,    Adjusted R-squared:  0.3623 
F-statistic: 27.99 on 4 and 186 DF,  p-value: < 2.2e-16

The above table shows us that \(\alpha_1\) is not statistically significant and is positive, and that \(\alpha_2\) is statistically significant at the 0.01% level and is also positive. Whenever there is a disruption in the long run equilibirum, the signs of the adjustment parameters are consistent with the error correction mechanism.

We now re-estimate the VEC model with the restriction that \(\alpha_2 = 0\). This restriction is shown below, along with the summary of the VEC model’s coefficient estimates.

restricted.alpha <- matrix(c(1,0), c(2,1))

y.CA.restricted.alpha <- alrtest(y.CA.eigen, A = restricted.alpha, r = 1)
y.VEC.restricted <- cajorls(y.CA.restricted.alpha, r = 1)
Response y1.d :

Call:
lm(formula = y1.d ~ ect1 + constant + y1.dl1 + y2.dl1 - 1, data = data.mat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.28523 -0.05631  0.00818  0.05999  0.21571 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)   
ect1     -0.03144    0.07126  -0.441  0.65953   
constant  0.09237    0.19498   0.474  0.63624   
y1.dl1    0.28482    0.10626   2.680  0.00801 **
y2.dl1   -0.07055    0.13314  -0.530  0.59683   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08405 on 186 degrees of freedom
Multiple R-squared:  0.06715,   Adjusted R-squared:  0.04709 
F-statistic: 3.347 on 4 and 186 DF,  p-value: 0.01128


Response y2.d :

Call:
lm(formula = y2.d ~ ect1 + constant + y1.dl1 + y2.dl1 - 1, data = data.mat)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.229057 -0.026117  0.000566  0.032729  0.160288 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
ect1      0.19514    0.04628   4.216 3.87e-05 ***
constant -0.53140    0.12663  -4.196 4.20e-05 ***
y1.dl1    0.17653    0.06901   2.558  0.01132 *  
y2.dl1    0.25005    0.08647   2.892  0.00429 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05459 on 186 degrees of freedom
Multiple R-squared:  0.3174,    Adjusted R-squared:  0.3028 
F-statistic: 21.63 on 4 and 186 DF,  p-value: 1.148e-14

The intuition behind the restriction is the idea that innovations in oil prices affect gas prices, but innovations in gas prices do not affect oil prices. Therefore, we should restrict the model to imply that gas prices cannot affect oil prices.

C. Forecasting using the Estimated VEC model

In this section we create and plot sequence of one month ahead forecasts of gas prices for the period 2011M1 to 2017M3 using the estimated bivaraite VEC model described in part B of this section. We also report the RMSE for the forecast of the gas price. In our forecast we do not use the restricted model but the unrestricted model originally estimated. Also, for convience we show the actual gas price instead of the logarithm of the gas price.

As we can see from the above plot that the forecast from the VEC model is extremely close to the actual gas price. To quantify how accurate the forecast is we calculate the RMSE of the forecast which is equal to 4.93%.

V. Estimating an ARMA model

A. The ARMA Model

In the last section we showed that we produced a VEC model which can fairly accurately predict future gas prices using past gas and oil prices. We now want to see how the VEC model performs compared to an AR, MA, or ARMA model. We thus create a correlogram for the first difference of the logarithm of the price of gas. Note that a correlogram is also known as an autocorrelation plot. Specifically, we show a plot of the sample autocorrelations and partial autocorrelations of the first differnce of logarithm of the price of gas time series. Using the ACF and PACF we can thus estimate an Arima model. The plots of the ACF and PACF and the code to produce these plots is given below.

y2.diff.estimation <- window(y2.diff, end = 2010 + 11/12)

par(mfrow=c(2,1))
Acf(y2.diff.estimation, type='correlation')
Acf(y2.diff.estimation, type='partial')

From the above figure we see that the first lag of the ACF is significant which suggests that we should have one MA component. The PACF shows that the first two lags are significant which suggests that we should have two AR compmonents. We thus estimate a \(ARMA(2,1)\) model.

model <- Arima(y2.diff.estimation, order = c(2,0,1))

B. Forecasting using the Estimated VEC model

In this section we create and plot sequence of one month ahead forecasts of gas prices for the period 2011M1 to 2017M3 using the estimated \(ARMA(2,1)\) model described in part A of this section. We also report the RMSE for the forecast of the gas price. For convience we show the actual gas price instead of the logarithm of the gas price.

The above forecast appears to perform relatively well at forecasting price differences in the logarithm of gas prices. To quantify how well this forecast does we again calculate the RMSE which is equal to 5.11%.

VI. Comparing Forecasts

In this section we compare the forecast of the VEC model and the ARMA model. At first class a direct comparison between the RMSE appears to show that the ARMA model performs just as well as the VEC model. However, we point out that the ARMA model is predicting first differences not the actual prices of gas. Hence, we transform the ARMA forecast into what they actually would be in terms of price level so that we can directly compare the ARMA forecast to that of the VEC forecast. We show the results in the figure below. The black line are the actual gas price levels, the red line is the forecasted gas price levels from the VEC model, and the blue line is the forecasted gas price level from the ARMA model.

From the above graph we see that the VEC model greatly outperforms the ARMA model in forecasting gas prices. To quantify how much better we show the RMSE of both models. The VEC model has a RMSE of 4.93%. The ARMA model has an RMSE of 5.11%. Hence, we should use the VEC model to forecast gas price levels instead of a simple ARMA model.

VII. Conclusion

We have shown that the VEC model well outpeforms an ARMA model in forecasting future gas price levels.