Introduction

One of the salient features of palm oil is price variability and volatility. The price instability and hence uncertainty pose a significant challenge to decision makers in coming up with proper production and marketing plans to minimise risk. Price forecast therefore, is vital to facilitate efficient decisions as there is a time lag between making decisions and the time at which the outputs flowing from these decisions reach the market place. An efficient decision is when the forecast prices and quantities correspond to realised market prices and quantities.

The purpose of this study is to predict the price of Crude Palm Oil (CPO) using Facebook Prophet. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. One useful feture of the Prophet time series model by Facebook is the ability to identify changepoints, or periods of significant structural change in a time series. The model is suited for CPO data which is well-known to exhibit monthly seasonality pattern along with the change of weather and CPO production.

Data Wrangling

# data wrangling
library(tidyverse) 
library(lubridate) # date manipulation
library(zoo)

# time series 
library(forecast) # time series library
library(MLmetrics) # calculate error
library(padr)
library(prophet)

# read the data
a <- read_csv("Crude Palm Oil Futures Historical Data.csv")
b <- read_csv("test.csv")
cpo <- rbind(b, a)
rm(a, b)

head(cpo)

Prophet only accept a dataframe with two columns: ds and y. So, we change the date column to date-time type, then only choose the columns of interest – Date as ds and Price as y.

cpo <- cpo %>%
  mutate(ds = mdy(Date)) %>%
  arrange(ds) %>%
  rename(y = Price) %>%
  select(ds, y)

Altough Prophet can accept a dataset with missing value, let’s first check for NAs.

colSums((is.na(cpo)))

## ds  y 
##  0  0

Data summary.

summary(cpo)

##        ds                   y         
##  Min.   :2011-12-12   Min.   :   0.0  
##  1st Qu.:2014-02-28   1st Qu.: 493.1  
##  Median :2016-09-29   Median : 545.3  
##  Mean   :2016-10-26   Mean   : 601.0  
##  3rd Qu.:2019-05-28   3rd Qu.: 614.2  
##  Max.   :2022-01-20   Max.   :1258.3

The observation periode is from 2011-12-12 to 2022-01-20 with 2,729 observations. Meanwhile, the mean price of CPO is 601. However, CPO market is closed during weekends and holidays. We need to do padding, to ensure that all sequences have the same length.

cpo <- cpo %>%
  pad()

## pad applied on the interval: day

After padding, there are 964 missing values in the data.

colSums((is.na(cpo)))

##  ds   y 
##   0 964

I did several tuning and apparently, this dataset works best if we fill the missing value.

cpo <- cpo %>% 
  mutate(y = na.fill(y, fill = "extend"))

As this is a financial time series, the series is converetd into logarithmic format to smooth out the data and ensure that returns on a percentage basis are being accounted for by the model.

cpo$y <- log(cpo$y)

Replace inf value with NA, then do forward fill imputation.

cpo <- do.call(data.frame,lapply(cpo, function(x) replace(x, is.infinite(x),NA)))

cpo <- cpo %>% 
  mutate(y = na.fill(y, fill = "extend"))

Exploratory Data Analysis

Plot the CPO price index.

cpo %>%
  plot(type = 'l', main = "CPO Price Index")

CPO price fluctuates a lot for the past 10 years, but the COVID-19 pandemic brought supply constraint that since has increased the CPO price rapidly to a new high.

To further analyze the CPO price, we need to decompose the time series. First, we create a multi-seasonal time series object. Set the seasonality using seasonality.periods to quarterly and yearly.

cpo_msts <- msts(data = cpo$y,seasonal.periods = c(30*3,30*12))

Then, we decompose the data for last three years period to capture the multiple seasonality using mstl() which estimates the seasonal components iteratively using STL.

cpo_msts %>% 
  tail(30*36) %>% 
  mstl() %>% 
  autoplot() +
    labs(title = "Multiple seasonal decomposition")

Notice that the quarterly seasonality change slowly overtime, while the yearly seasonality for the last three years remains constant, with peak CPO price during earlier months in a year, with lowest price during mid year.

Fitting Prophet Model

First, we fit the model without parameter tuning.

cpo_prophet <- prophet(cpo)

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

summary(cpo_prophet)

##                         Length Class      Mode     
## growth                     1   -none-     character
## changepoints              25   POSIXct    numeric  
## n.changepoints             1   -none-     numeric  
## changepoint.range          1   -none-     numeric  
## yearly.seasonality         1   -none-     character
## weekly.seasonality         1   -none-     character
## daily.seasonality          1   -none-     character
## holidays                   0   -none-     NULL     
## seasonality.mode           1   -none-     character
## seasonality.prior.scale    1   -none-     numeric  
## changepoint.prior.scale    1   -none-     numeric  
## holidays.prior.scale       1   -none-     numeric  
## mcmc.samples               1   -none-     numeric  
## interval.width             1   -none-     numeric  
## uncertainty.samples        1   -none-     numeric  
## specified.changepoints     1   -none-     logical  
## start                      1   POSIXct    numeric  
## y.scale                    1   -none-     numeric  
## logistic.floor             1   -none-     logical  
## t.scale                    1   -none-     numeric  
## changepoints.t            25   -none-     numeric  
## seasonalities              2   -none-     list     
## extra_regressors           0   -none-     list     
## country_holidays           0   -none-     NULL     
## stan.fit                   4   -none-     list     
## params                     6   -none-     list     
## history                    5   data.frame list     
## history.dates           3693   POSIXct    numeric  
## train.holiday.names        0   -none-     NULL     
## train.component.cols       4   data.frame list     
## component.modes            2   -none-     list     
## fit.kwargs                 0   -none-     list

Changepoints are moments in the data where the data shifts direction. By default, Prophet specifies 25 potential changepoints which are uniformly placed in the first 80% of the time series. The changepoints are as such:

cpo_prophet$changepoints

##  [1] "2012-04-08 GMT" "2012-08-04 GMT" "2012-11-30 GMT" "2013-03-28 GMT"
##  [5] "2013-07-25 GMT" "2013-11-20 GMT" "2014-03-18 GMT" "2014-07-14 GMT"
##  [9] "2014-11-09 GMT" "2015-03-07 GMT" "2015-07-03 GMT" "2015-10-29 GMT"
## [13] "2016-02-25 GMT" "2016-06-22 GMT" "2016-10-18 GMT" "2017-02-13 GMT"
## [17] "2017-06-11 GMT" "2017-10-07 GMT" "2018-02-02 GMT" "2018-05-31 GMT"
## [21] "2018-09-27 GMT" "2019-01-23 GMT" "2019-05-21 GMT" "2019-09-16 GMT"
## [25] "2020-01-12 GMT"

Since changepoints are only placed in the first 80% of the series, the last changepoint is in January 2020. Increasing the range of changepoints placement may be beneficial for the model since COVID disruption of CPO price happens only in the last years in the series.

Also, there are two seasonalities in the model in ‘additive’ mode. Prophet will by default fit weekly and yearly seasonalities, if the time series is more than two cycles long.

cpo_prophet$seasonalities

## $yearly
## $yearly$period
## [1] 365.25
## 
## $yearly$fourier.order
## [1] 10
## 
## $yearly$prior.scale
## [1] 10
## 
## $yearly$mode
## [1] "additive"
## 
## $yearly$condition.name
## NULL
## 
## 
## $weekly
## $weekly$period
## [1] 7
## 
## $weekly$fourier.order
## [1] 3
## 
## $weekly$prior.scale
## [1] 10
## 
## $weekly$mode
## [1] "additive"
## 
## $weekly$condition.name
## NULL

Using the helper method make_future_dataframe, we create a suitable dataframe that extends into the future a specified number of days. By default it will also include the dates from the history, so we will see the model fit as well. We create a dataframe with 120 periods.

future <- make_future_dataframe(cpo_prophet, periods = 120)

Then, we predict the CPO price.

forecast <- predict(cpo_prophet, future)

Then, we plot the forecast result. From the plot, the fitted value doesn’t follow the variance in the historical data well, the model seem to be underfit. Increasing the model’s trend flexibility might help to help the model to fit the data better.

plot(cpo_prophet, forecast)

Hyperparameter Tuning

Let’s fix some of the key problems the above model has:

Misses the downturn

Prophet was unable to incorporate the downturn in new COVID cases after the new year, since the range of data points considered when identifying changepoints is the first 80% data in the time series. To solve this, we can incorporate 100% of the data when identifying changepoints. In other situation, it may be good to keep the changepoint range at 80% or lower to ensure that themodel doesn’t overfit. But in the case of COVID anomaly, we will allow the adjustment to 100%.

Strength of changepoints

If the trend changes are being underfit (not enough flexibility), you can adjust the strength of the sparse prior. We can allow detection of more changepoints.

Seasonality

By default, Prophet will fit weekly and yearly seasonalities, if the time series is more than two cycles long. However, our earlier exploratory data analysis shows that the seasonality of the series are quarterly and yearly. We can disable the weekly seasonalities and add quarterly seasonality to the model.

Let’s tune the parameter.

cpo_prophet_1 <- prophet(changepoint.range = 1, # incorporate 100% of data
                         changepoint.prior.scale = 0.2, 
                         # strength of changepoints, higher value = more flexible trend
                         weekly.seasonality = F) # turn off default weekly seasonality
cpo_prophet_1 <- add_seasonality(cpo_prophet_1, 
                                 name='quarterly', 
                                 period=91.3125, # quarterly period
                                 fourier.order=10) # determines how quickly the seasonality can change  
cpo_prophet_1 <- fit.prophet(cpo_prophet_1, cpo) # model fitting

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

Do another forecast on the new model.

forecast <- predict(cpo_prophet_1, future)

Plot the forecast result.

plot(cpo_prophet_1, forecast)

This model’s fitted values have a higher variance than the original model after increasing the trend flexibility, and after using 100% of the data for changepoints placement, it can predict the price anomly caused by COVID-19 after 2020 better. This model, with 4 adjustments, is better than the original one.

Then, we can see the Prophet forecast components:

prophet_plot_components(cpo_prophet_1, forecast)

The overall trend since 2011-2019 were constant, but it has been increasing since 2020 due to imbalance between supply and demand of CPO. While the yearly seasonality shows that price tends to be constant during earlier months in a year, then it peaks on August and September. CPO price tends to be the lowest during the last months in a year.

Model Validation

Prophet includes functionality for time series cross validation to measure forecast error using historical data. This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutoff point. We can then compare the forecasted values to the actual values.

This cross validation procedure can be done automatically for a range of historical cutoffs. We specify the forecast horizon (horizon), and then optionally the size of the initial training period (initial) and the spacing between cutoff dates (period). By default, the initial training period is set to three times the horizon, and cutoffs are made every half a horizon.

Here we do cross-validation to assess prediction performance on a horizon of 365 days, starting with 1,095 days of training data in the first cutoff and then making predictions every 180 days. On this 11 year time series, this corresponds to 13 total forecasts.

Using a initial history of 3 years, we make a forecast between 2015-02-21 to 2021-01-20.

validation <- prophet::cross_validation(cpo_prophet_1, initial = 1095, period = 180, horizon = 365, unit = 'days')

## Making 13 forecasts with cutoffs between 2015-02-21 and 2021-01-20

validation

The y column stands for actual data, while yhat is the predicted value. Note that initially y and yhat are quite close in value, but the further the time horizon the farther yhat from y.

Then, we check the performance of the model after cross validation. The statistics computed are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), and coverage of the yhat_lower and yhat_upper estimates. These are computed on a rolling window of the predictions after sorting by horizon (ds minus cutoff). A rolling window of 0.1 means that 10% of the predictions will be included in each window.

performance_metrics(validation, metrics = NULL, rolling_window = 0.1)

Cross validation performance metrics can be visualized using plot_cross_validation_metric, here shown for MAPE. Dots show the absolute percent error for each prediction. The blue line shows the MAPE, where the mean is taken over a rolling window of the dots.

We see for this forecast that errors around 2.5-5% are typical for predictions 250 days into the future, and that errors increase up to around 6% for predictions afterwards.

plot_cross_validation_metric(validation, 'mape', rolling_window = 0.1)

The model after hyperparamater tuning shows an adequate ability to predict the actual data, with MAPE around 2.5-6%.

dyplot.prophet(cpo_prophet_1, forecast)

## Warning: `select_()` was deprecated in dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Conclusion

Using 10 years daily data of CPO (Crude Palm Oil) price since 2012 until early 2022, we fitted a Prophet model to forecast CPO price. Our early exploratory data analysis indicates that the data has multiple seasonality, precisely quarterly and yearly seasonality. This is in line with the domain knowledge that CPO price changes according to the weather and seasons in a year. Rainy season and drought brought by El Nino and La Nina during mid year will constraints CPO supply, which in turn will increase the price during mid-year.

The original Prophet model does a decent job at predicting actual values, but the performance worsen post-pandemic after a drastic structural break, therefore a parameter tuning is needed.

The hyperparameter tuning done to the model are as follow:

Change the changepoints range from 80% first data to 100% of the data due to COVID-19 situation.
Increase the trend flexibility from the default 0.05 to 0.2 to prevent underfitting.
Turn off the default weekly seasonality, and add quarterly seasonality as indicated by earlier data exploration.

As a result, the model’s performance significantly increased, especially for post-pandemic data.

For data validation, we use a 365 days horizon and the first 3 years data for training. Then, we make prediction every 180 days. With a rolling window of 10%, the model’s performance is decent, with 2.5-6% MAPE. With a slight tweaking, we successfully fitted a model that is able to predict CPO price quite well.

Crude Palm Oil Price Prediction using Prophet

Dyana

1/21/2022