Introduction

After we have done our AT2 project, we realized that fuel prices in Sydney are influenced principally by the changes in the international petrol prices. Like (Commission, 2012) mention, “For petrol, diesel and automotive LPG the largest component of the price you pay is represented by the international benchmark price”.

Hence, international fuel prices affect a chain of sectors. First, it is a cost that fuel wholesalers and retailers have to take into account to find a proper buying period. Second, consumers and companies organize their purchases considering the change in retail fuel prices. See Appendix 0.1 for details.

Therefore, because of having an outlook of the international fuel prices is a useful insight for these sectors, the main goal of this project is to focus on the following research questions:

  1. Is it possible to forecast the international fuel prices using time series methods, since we have daily prices?

  2. What kind of seasonality do international petrol prices have?

The “Singapore Gasoline Mogas 95 Unleaded” index from Bloomberg was used as a proxy of the international petrol price. The data was extended to June 18, 2020, to include the most current period.

Taking into account that the data probably has complex seasonal patterns, the project was divided into two sections. First, an exploration of the data was done. Second, advanced methods were evaluated to have an idea of how they perform with daily data.

Problems with daily data: complex seasonality

The main issue with daily data is that it often exhibits more than one seasonal pattern. Likewise, since the years have different quantity of days, the seasonal periods of daily data are probably non-integer (11.1 Complex Seasonality Forecasting, n.d.).

On the other hand, Mogas 95 data only contains weekdays data.

Therefore, it was used a baseline of two seasonal patterns, weekly and yearly. Regarding the weekdays, frequencies of 5 and 262 days were used for methods that allow only integer values and 5 and 262.1 for methods that enable non-integer values (“Forecasting with Daily Data,” 2013).

Exploration Data Analysis

The first step was to plot the entire period of the data to have some ideas of the series. The figure below shows there are two drops in the data, one in the last part of 2018 and the other in March 2020 because of Covid lookdown. No missing values were found.

Second, the data was decomposed into the two seasonal patterns described above.

# msts() class transform the data into a multiple seasonal time series with non-integer values.
y <- msts(intpetrolprice$`FG95M1 Index`,seasonal.periods = c(5,262.1),start = decimal_date(as.Date("2017-01-02"))) 

# The mstl() function decomposed the data.
y %>% mstl(iterate=5) %>% #iterations 
  autoplot() + xlab("year")+
  labs(title = "Seasonal Patterns in Mogas 95 Index")+
  theme_light()

We can realized that the trend has a broad scale, and it shows the drop of the prices because of Covid.

On the other hand, the two seasonal patterns have relatively narrow ranges and the yearly seasonal pattern is more clear.

Evaluating Models

Since forecasts are new data, the models were evaluated using train and test sets. Because the data is daily, the forecast was for 14 days.

The prediction intervals in the forecast plots are shown as shaded region, with the light purple indicating the 95% probability associated with the interval, and the strength purple the 80% probability.

Besides, check.residuals() and accuracy() functions were used to explore models performance. Root Mean Square Error (RMSE) and Mean Absolut Error(MAE) metrics were used to find the best performance model.

Context with ARIMA model

ARIMA model aims to describe the autocorrelations in the data(Forecasting in R DataCamp, n.d.).

Despite the ARIMA model has some issues with daily data because it only takes into account one type of seasonality (always symmetric and an integer value); it was useful to explore it because advanced models widely use this algorithm.

Since Mogas 95 data is non-stationary because it has trend and seasonalities (the main assumption in the ARIMA model), some methods were used to explore the autocorrelation in the data. Because these methods only allow one seasonality, the yearly seasonality was used.

First, ACF and PACF correlograms were analysed. The first figure below shows a large and positive r1 in the ACF plot, a usual pattern for non-stationary series. Also, it shows a probable trend in the data (changes over 95% limits of ACF values). PACF plot shows that the most significant correlation is in lag 1, but also in other periods.

After differencing the data, the trend in ACF plot disappears, but the data still has some significant correlation periods.

Second, Ljung-Box test was explored (Details here). Likewise of the above results, this test shows that the p-value is < 0.05; Therefore, we can not conclude that the residuals are part of a white noise series.

## 
##  Box-Ljung test
## 
## data:  mogas95
## X-squared = 8248.5, df = 10, p-value < 2.2e-16

Finally, the KPSS parametric test shows that the differenced data perform better than the raw because the test statistic is in the range of the critical values. So probably the differenced data could be stationary.

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 2.795 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.1229 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Hence, because of the subjectivity in selecting which differences to apply in ARIMA model, auto.arima() function was implemented in the advanced models.

The auto.arima() function in R uses a variation of the Hyndman-Khandakar algorithm (Hyndman & Khandakar, 2008), which combines unit root tests, minimisation of the AICc and MLE to obtain an ARIMA model” (8.7 ARIMA Modelling in R Forecasting, n.d.). The main components of ARIMA (p,d,q) are the lags, the degree of differencing, and the moving average, respectively.

Dynamic Harmonic Regression with multiple seasonal periods

This model is based on a dynamic harmonic regression model with an ARMA error structure. In other words, it is a regression model that includes Fourier terms. Fourier terms are a series of sine and cosine terms that approximate a periodic function. Hence, they are like a proxy of the season patterns (5.4 Some Useful Predictors Forecasting, n.d.).

The Fourier terms are produced using the fourier() function, where the dataset and K= c(seasonality1, seasonality2), need to be specified. K is about how many sine and cosine are used. The higher the order, the more “waves” are allowed.

The first approach to find the total number of K for each seasonal period, was choosing the K that minimise the AICc.

y <- msts(intpetrolprice$`FG95M1 Index`,seasonal.periods = c(5,262.1),start = decimal_date(as.Date("2017-01-02"))) 

#Searching the best K minimizing AIC
my_aic_df <- matrix(ncol = 2 , nrow = 20)

for(i in 1:2){ 
   for(j in 1:20){ 
   fn <- fourier( y , K=  c(i , j) )
   FourierFit <- auto.arima(y , seasonal=FALSE,  xreg=fn,stepwise=FALSE, approximation=FALSE)
   my_aic_df[(j),(i)] <- FourierFit$aicc
   }
}

 which(my_aic_df == min(my_aic_df), arr.ind = TRUE)
#K=c(2,1) 

This K was K=(2,1). However, because the AICc was based on all the data also K=(2,10) was considered to evaluate the model.

One problem with this method is that it forces the seasonal patterns to repeat periodically without changing. Therefore, for some complex seasonality (different days per year) it could be an issue.

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(5,1,0) errors
## Q* = 187.32, df = 167, p-value = 0.1344
## 
## Model df: 11.   Total lags used: 178
##                       ME     RMSE      MAE        MPE      MAPE       MASE
## Training set -0.02369237 1.530491 1.040648 -0.1077347  1.957034 0.06076427
## Test set      5.20055817 5.530403 5.200558 11.6867658 11.686766 0.30366484
##                      ACF1 Theil's U
## Training set -0.002260478        NA
## Test set      0.461653876   3.16863

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(0,1,0) errors
## Q* = 194.1, df = 154, p-value = 0.01581
## 
## Model df: 24.   Total lags used: 178
##                       ME     RMSE      MAE        MPE      MAPE       MASE
## Training set -0.03326608 1.518904 1.034365 -0.1607712  1.943875 0.06039739
## Test set      8.93027152 9.475334 8.930272 20.0929743 20.092974 0.52144584
##                      ACF1 Theil's U
## Training set -0.007248655        NA
## Test set      0.652809193  5.425554

For the model with K=c(2,1), the function chose an ARIMA(5,1,0) model. In other words, it set the lag value to 5 for autoregression, used a difference order of 1 to make the time series stationary, and used a moving average model of 0. For K=c(2,10), the algorithm only determined an order of differencing of 1.

For both models, the residual error plots suggest that there may still be some information not captured by the model. The ACF plot shows some significant correlation in the residuals series. The histograms appear to be normally distributed; Hence we can assume the residual variance as constant.

However, the model with K=c(2,1) had the best performance. Its p-value is higher than 0.05, so residuals are not distinguishable from a white noise series. Also, this model achieved smaller RMSE and MAE. Besides, the forecast with this model relies on the 95% confidence interval for prediction.

STLF model

The STLF model is based on decomposing the time series data in two parts, the seasonal components and the seasonally adjusted data. The forecasts of the seasonally adjusted data are then “reseasonalised” by adding the forecast of the seasonal component (6.8 Forecasting with Decomposition Forecasting, n.d.).

stlf() function forecast automatically a naïve method for the seasonal components and for the seasonally adjusted data four types of models can be used, “ets”, “arima”, “naive” and “rwdrift”. All of them were studied.

For the prediction intervals, the uncertainty in the forecasts of the seasonal component is ignored because it is much smaller than that for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk
## Q* = 185.25, df = 178, p-value = 0.3393
## 
## Model df: 0.   Total lags used: 178
##                       ME      RMSE        MAE        MPE      MAPE      MASE
## Training set -0.03086174  1.615334  0.9991897 -0.1771199  1.958087 0.0583435
## Test set     11.70245781 12.173415 11.7024578 26.4180316 26.418032 0.6833161
##                     ACF1 Theil's U
## Training set -0.01579243        NA
## Test set      0.60155863  6.953702

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ARIMA(1,1,3)
## Q* = 152.8, df = 174, p-value = 0.8749
## 
## Model df: 4.   Total lags used: 178
##                       ME      RMSE        MAE       MPE      MAPE       MASE
## Training set -0.02729954  1.595184  0.9986491 -0.153516  1.941868 0.05831194
## Test set     11.60848466 12.065833 11.6084847 26.210099 26.210099 0.67782889
##                       ACF1 Theil's U
## Training set -0.0007088909        NA
## Test set      0.6059203317  6.886756

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk with drift
## Q* = 185.25, df = 177, p-value = 0.3202
## 
## Model df: 1.   Total lags used: 178
##                        ME      RMSE        MAE        MPE      MAPE       MASE
## Training set 3.842231e-15  1.615039  0.9991902 -0.1281428  1.957029 0.05834353
## Test set     1.193392e+01 12.422944 11.9339209 26.9397728 26.939773 0.69683138
##                     ACF1 Theil's U
## Training set -0.01579243        NA
## Test set      0.61219614  7.096381

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,Ad,N)
## Q* = 169.28, df = 173, p-value = 0.5657
## 
## Model df: 5.   Total lags used: 178
##                       ME      RMSE        MAE        MPE      MAPE       MASE
## Training set -0.02280054  1.601922  0.9987353 -0.1168805  1.938583 0.05831697
## Test set     11.48085671 11.928328 11.4808567 25.9220005 25.922001 0.67037659
##                     ACF1 Theil's U
## Training set 0.000984285        NA
## Test set     0.591639902  6.811562

Checking the residuals, all the models have high p-values in Ljung-Box test. Thus, it can be concluded that the residuals could be white noise. Also, looking at the residual histograms, we can assume the residual variance as constant. However, all the models show some autocorrelations out of the threshold limits.

ETS model has the lowest MAE and RMSE; Nonetheless, they are higher than those of the Harmonic Regression model.

Finally, looking at the forecasts of ETS model, they are out of the prediction intervals.

TBATS model

“TBATS is a model developed by De Livera, Hyndman, & Snyder (2011) that uses a combination of Fourier terms with an exponential smoothing state space model and a Box-Cox transformation, in a completely automated manner” (11.1 Complex Seasonality Forecasting, n.d.)

A TBATS model differs from the above models because in TBATS, the seasonality is allowed to change slowly over time.

## 
##  Ljung-Box test
## 
## data:  Residuals from BATS(0.999, {0,0}, 0.8, -)
## Q* = 188.57, df = 172, p-value = 0.1836
## 
## Model df: 6.   Total lags used: 178
##                       ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -0.02595158 1.547656 1.044512 -0.115451  1.961758 0.0609899
## Test set      5.17066763 5.534752 5.170668 11.606296 11.606296 0.3019195
##                     ACF1 Theil's U
## Training set 0.002964118        NA
## Test set     0.475929004   3.17034

The algorithm chose a BATS with a Box-Cox parameter of 0.999 (Details Here), an autoregressive model of order 0, a moving average of order 0, and a dumping parameter in the trend of 0.8. Similarly to the above models, some significant correlation appears in the residuals, and the Ljung-Box test shows a p-value higher than 0.05. Hence, the residuals are not distinguishable from a white noise series.

The RMSE and MAE measures are very similar to those of the Harmonic Regression model with K=c(2,1).

On the other hand, the Forecast Figure shows that the forecast performs well on the 95% prediction interval.

Regression model using the residuals

Since the residuals are the difference between the actual values and the values created from the regression, combining the forecast from the regression model with a forecast of the residuals of that same regression model should give a good forecast (Forecasting Daily Data with Multiple Seasonality in R, n.d.).

First, the tslm() function was used to fit the regression. The daily Mogas 95 price was the dependent variable, and the trend and seasonality were the predictors. Because the function permitted only one seasonality pattern (integer), yearly seasonality was used.

To model the residuals auto.arima() was used. Only the accuracy was evaluated because the complete model is not a function allowed in check.residuals() and in the Forecast plot.

#ts object for the regression
tsw <- ts(intpetrolprice$`FG95M1 Index`, start = decimal_date(as.Date("2017-01-02")), frequency = 262) 

train <- subset(tsw, end=length(tsw)-14)
test  <- subset(tsw, start=length(tsw)-13)

model.ses <- tslm(tsw ~ trend + season)
model.ses.fc <- forecast(model.ses, h=14)
regressionF <- as.numeric(model.ses.fc$mean)

#auto.arima for residuals
model.ses2 <- auto.arima(model.ses$residuals) 
model.ses.fc2<- forecast(model.ses2, h=14)
residualsF <- as.numeric(model.ses.fc2$mean)

forecastR <- regressionF+residualsF
accuracy(forecastR,test)
##                 ME     RMSE      MAE       MPE     MAPE      ACF1 Theil's U
## Test set -6.483608 6.694978 6.483608 -14.96775 14.96775 0.4218343  3.695724

The RMSE and MAE were higher than those of the TBATS and Harmonic with K=c(2,1) models but lower than those of the STLF model.

Prophet model

The last model evaluated was an open-source software released by Facebook’s Core Data Science team.

“Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well”(“Seasonality, Holiday Effects, and Regressors,” n.d.)

Because Mogas 95 data has a drift in the trend and a complex seasonality, some changes in the parameters were applied. Likewise, since the data is only for weekdays, the periods of the seasonal patterns were changed. See code below.

##--read data-##
intpetrolprice <- read_csv("Intpetrolprice2.csv", 
col_types = cols(Date = col_date(format = "%d/%m/%Y")))

intpetrolprice<-data.frame(Date=seq(min(intpetrolprice$Date), max(intpetrolprice$Date), by="days")) %>%
left_join(intpetrolprice)

##--changing names to prophet style--##
df<-intpetrolprice[,c(1,3)]
colnames(df)<-c("ds","y")
df$ds<-as.Date(df$ds)

##--taking into account only weekdays--##
df2 <- df %>%
  mutate(ds= as.Date(ds)) %>%
  filter(wday(ds) >1 & wday(ds) <7)

##--changing some parameters--##

#Number of change.points (number of relevant dates that changes the trend= 25 or 30) and the number of data considered to fit the model were changed because the drift in March 2020. Also, changepoint.prior.scale was changed to make the trend more flexible.

#seasonality_prior_scale was changed to be more flexible, like the trend. 

m <- prophet(yearly.seasonality = FALSE,daily.seasonality=FALSE, weekly.seasonality = FALSE, changepoint.range = 0.95,changepoint.prior.scale = 10, seasonality_prior_scale=10, n.changepoints = 25)

#daily and year seasonality
m <- add_seasonality(m, name='weekly_weekdays', period=5, fourier.order=2) 
m <- add_seasonality(m, name='yearly_weekdays', period=262.1, fourier.order=1) 
m <- fit.prophet(m, df2) 

# weekdays for forecast
future <- make_future_dataframe(m, period=21, freq="day") #normal without changes
future2<-future %>%
  filter(wday(ds) >1 & wday(ds) <7)

#forecast
forecast2 <- predict(m, future2)

The figure below shows the actual values and the values created by the model, using all the data. In the period before and after March 2020, the model has a relatively good performance. However, near this period, it is not very accurate.

Therefore, when the accuracy measures were evaluated in the test set, the model had the lowest errors. However, we need to remember that this result is not considering predictions for March.

##                ME     RMSE      MAE       MPE     MAPE
## Test set -1.80336 3.170624 2.336436 -4.160082 5.329758

Measuring the accuracy of the models with Cross Validation

Because only one split into train and test set is not representative for different forecast periods, a sophisticated version with cross-validation was made to evaluate the accuracy of the models.

The method used was a manual CV with two types of 20 rolling partitions and 14 day forecast. To avoid overfitting the test set, one type of rolling partition was made with 14 days of difference in the length of the train set. Only for comparison, the other was created with one day. See the examples below.

Example of CV method with 14 days of difference in the train set

Figure 1: Example of CV method with 14 days of difference in the train set

Example of CV method with one day of difference in the train set

Figure 2: Example of CV method with one day of difference in the train set

Integer and non-integer values for the season patterns were applied.

For Harmonic Dynamic Regression model, K=c(2,1) and K=c(2,10) were evaluated and for STLF model, ETS and Naive.

Besides, for the Prophet model, four different parameters were evaluated. n.changepoint= 25 or 30 and Fourier term for the yearly seasonal pattern = 1 or 10. In each type of rolling partition, the best performing model was used to compare it with the rest of the models.

Since each CV split has different results, an average was made to have a total measure of accuracy (the same method used by tsCV() in forecast package).

The code below shows an example using the TBATS model.

msts <- msts(intpetrolprice$`FG95M1 Index`,seasonal.periods = c(5,262),start = decimal_date(as.Date("2017-01-02"))) #change to c(5,262.1)

values1<-list()
for (i in 1:20) 
{ nTest <-  14*i #window with one day:14+(i-1) 
  nTrain <- length(msts)- nTest 
  train <- window(msts,start=decimal_date(as.Date("2017-01-02")),end=c(decimal_date(as.Date("2017-01-02")),nTrain))
  test <- window(msts, start=c(decimal_date(as.Date("2017-01-02")),nTrain+1), end=c(decimal_date(as.Date("2017-01-02")),nTrain+14))
  
  s <- tbats(train, use.arma.errors=TRUE, use.damped.trend=TRUE, use.box.cox=TRUE,use.trend = TRUE,trace = TRUE)
  sp<- predict(s,h=14)
  
  values1[[i]]<-c(accuracy(sp,test))
}

The tables below show the results for the RMSE and MAE for the rolling partition with 14 days of difference in the train set.

Table 1: Accuracy Measures with integer seasonal patterns
Accuracy TBATS ARIMA STLFnaive STLFets HARMONIC HARMONIC2 PROPHET
RMSEt 5.256507 6.812795 8.158937 8.284470 5.271499 5.716373 7.422731
MAEt 4.491241 6.129821 6.999154 7.118815 4.502606 4.933633 6.535623
Table 2: Accuracy Measures with non-integer seasonal patterns
Accuracy TBATS ARIMA STLFnaive STLFets HARMONIC HARMONIC2 PROPHET
RMSEt 5.256507 6.812795 8.158937 8.284470 5.283851 5.712219 7.422731
MAEt 4.491241 6.129821 6.999154 7.118815 4.515843 4.924988 6.535623

Similarly, the tables below show the same accuracy measures for the window of one day in the CV train set.

Table 3: Accuracy Measures with integer seasonal patterns
Accuracy TBATS ARIMA STLFnaive STLFets HARMONIC HARMONIC2 PROPHET
RMSEt 4.228899 6.828176 7.211693 6.918112 4.417431 6.014763 6.246217
MAEt 3.735329 6.338764 6.258412 6.046320 3.923374 5.179874 5.449851
Table 4: Accuracy Measures with non-integer seasonal patterns
Accuracy TBATS ARIMA STLFnaive STLFets HARMONIC HARMONIC2 PROPHET
RMSEt 4.346924 6.828176 7.211693 6.918112 4.368480 5.984515 6.246217
MAEt 3.840317 6.338764 6.258412 6.046320 3.880844 5.143970 5.449851

Since only TBATS and Dynamic Harmonic models permit non-integer values for the seasonal patterns, they were the only ones that change their results when they were applied. However, the changes were relatively small.

The results show that similar to the last section, STLF model has the worst performance. Contrary, Prophet has not a very good performance with CV, probably because its forecast is less accurate in the period near March 2020.

TBATS model has the best performance for the two types of rolling partitions, with an error of roughly 5 and 4 cents per litre, respectively. Dynamic Harmonic Regression with K=c(2,1) was the second-best.

Forecast with the Best Model

The plot below shows the forecast of Mogas 95 for 14 days using TBATS model with all the data.

Conclusion

Regarding the research questions, it is possible to forecast the Mogas 95 Index with a time series model. However, because this data exhibited complex seasonality and a drop in the trend near the end of the series, the predictions were less accurate.

After using a train-test split with CV to measure the accuracy of the models, the two best performing models were the only two that allowed non-integer values for the seasonal patterns, TBATS and Dynamic Harmonic Regression. Although the Prophet model also allowed it, the model did not perform well for CV split because it did not work well with the drift in the trend in March 2020.

The series showed a more clear yearly seasonal pattern than a weekly one. However, the residuals of the models showed some significant correlations; Hence, other seasonalities could exist. Then, if fuel retailers or wholesalers knew the seasonal periods, a TBATS model or a combination with a Dynamic Harmonic Regression could be an option to forecast International Mogas 95 prices.

Appendix

0.1 Chain Process in Retail Fuel Prices

Commission, A. C. and C. (2012)

Figure 3: Commission, A. C. and C. (2012)

References

11.1 Complex seasonality Forecasting: Principles and Practice. (n.d.). Retrieved from https://Otexts.com/fpp2/

5.4 Some useful predictors Forecasting: Principles and Practice. (n.d.). Retrieved from https://Otexts.com/fpp2/

6.8 Forecasting with decomposition Forecasting: Principles and Practice. (n.d.). Retrieved from https://Otexts.com/fpp2/

8.7 ARIMA modelling in R Forecasting: Principles and Practice. (n.d.). Retrieved from https://Otexts.com/fpp2/

Commission, A. C. and C. (2012). About fuel prices [Text]. Retrieved from https://www.accc.gov.au/consumers/petrol-diesel-lpg/about-fuel-prices

Forecasting Daily Data with Multiple Seasonality in R. (n.d.). Retrieved from http://www.dbenson.co.uk/Rparts/subpages/forecastR/

Forecasting in R DataCamp. (n.d.). Retrieved from https://learn.datacamp.com/courses/forecasting-in-r

Forecasting with daily data. (2013). Retrieved from https://www.r-bloggers.com/forecasting-with-daily-data/

Seasonality, Holiday Effects, And Regressors. (n.d.). Retrieved from http://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html