3. Simple Moving Average (SMA)

We use the data set on coffee prices. The data set in included in the TSstudio package, in R.

The data set contains two time series (hence, the mts format, for multiple time series).

For SMA, we will use the time series on the prices of Robusta coffee, in the first column of the data set.

library(TSstudio)
data("Coffee_Prices")
ts_info(Coffee_Prices)
##  The Coffee_Prices series is a mts object with 2 variables and 701 observations
##  Frequency: 12 
##  Start time: 1960 1 
##  End time: 2018 5
head(Coffee_Prices)
##        Robusta Arabica
## [1,] 0.6968643  0.9409
## [2,] 0.6887074  0.9469
## [3,] 0.6887074  0.9281
## [4,] 0.6845187  0.9303
## [5,] 0.6906915  0.9200
## [6,] 0.6968643  0.9123
robusta <- Coffee_Prices[,1]; head(robusta)
## [1] 0.6968643 0.6887074 0.6887074 0.6845187 0.6906915 0.6968643
library(plotly)
ts_plot(robusta,
        title = "The Robusta Coffee Monthly Prices",
        Ytitle = "Price in USD",
        Xtitle = "Year")

The SMA-model is based on simple arithmetic computations, and there’s no need to use sophisticated software.

The file can be downloaded here.

4. Weighted Moving Average

The same data set is used as in 3. SMA above.

5. Simple Exponential Smoothing (SES)

The parameter α (0 < α < 1) indicates the rate of decay.

In the SES model, the forecast for T+1, in a series of T observations, is:

T+1 = α∗YT + (1-α)∗ŶT

In this formula:

Since it is assumed that the level of the model does not change over time, the forecast for h (the planning horizon) periods ahead ŶT+h is the same as the forecast for ŶT+1.

The interesting thing to note about this formula is that it describes a so-called recursive model. Why is that?

In the model, the forecasted value for T+1 is a function of the actual and forecasted values at period T. The forecasted value at period T, in turn, is a function of the the actual and forecasted values at period T-1. The latter value is a function of the actual and forecasted values at T-2, and so on and so forth.

By expanding the formula above, it can be shown that:

T+1 = α∗YT + α∗(1-α)∗YT-1 + α∗(1-α)2∗YT-2 + …

Since α is a positive number between 0 and 1, (1-α) is also a positive number between 0 and 1. The weight of an observation at any period is equal to the weight of an observation one period later times (1-α). For example, if α=.8 then the weight of the observation at lag 3, is 0.2 times the weight of the observation at lag 2.

The rate of decay at various levels of α can be visualized using the weight decay Excel-file that can be downloaded here

Forecasting with SES

So far, we have used the monthly prices of Robusta coffee.

Now, we want to focus on predicting the quarterly prices of Arabica coffee.

The prices of Arabica coffee are stored in the Coffee Prices data set, which come with the TSstudio package in R.

Let’s load, describe and visualize the data.

library(TSstudio)
data("Coffee_Prices")
head(Coffee_Prices)
##        Robusta Arabica
## [1,] 0.6968643  0.9409
## [2,] 0.6887074  0.9469
## [3,] 0.6887074  0.9281
## [4,] 0.6845187  0.9303
## [5,] 0.6906915  0.9200
## [6,] 0.6968643  0.9123
arabica <- Coffee_Prices[,2]; head(arabica)
## [1] 0.9409 0.9469 0.9281 0.9303 0.9200 0.9123
ts_info(Coffee_Prices)
##  The Coffee_Prices series is a mts object with 2 variables and 701 observations
##  Frequency: 12 
##  Start time: 1960 1 
##  End time: 2018 5
ts_plot(Coffee_Prices)

From the graph, it’s clear that the prices of Robusta and Arabica are highly correlated, and at least until the early 1980s the two time series were almost identical.

In the late 1980s, there was a break in trend, and ever since the prices started to diverge with prices of Arabica well above the prices of Robusta. Moreover, the peaks and troughs of the series do not coincide, since 2005. The peaks in Arabica prices are very high, and more volatile.

However, our interest is in quarterly prices of Arabica.

We prepare the data using the aggregate() function of the stats package. Since we have monthly data, with a frequency of 12 (there are twelve months in a year), we will set the frequency at 4 instead.

The aggregate() function groups our data, in slices of 3 months and then computes a summary statistic, in the FUN= argument. For price levels, we can use the mean of the monthly prices in each quarter.

arabicaQ <- aggregate(arabica, nfrequency=4, FUN = "mean")
head(arabica,12)
##  [1] 0.9409 0.9469 0.9281 0.9303 0.9200 0.9123 0.9160 0.9292 0.9226 0.9237
## [11] 0.9116 0.9006
head(arabicaQ,4)
## [1] 0.9386333 0.9208667 0.9226000 0.9119667
ts_plot(arabicaQ,
        title = "The Arabica Coffee Quarterly Prices",
        Ytitle = "Price in USD",
        Xtitle = "Year")

Recall that the SES model applies to a series without trend and seasonality.

In order to justify the model selection, we can decompose the time series into components, and visualize the results.

ts_decompose(arabicaQ)

Since the trend has the same shape as the time series, we can conclude that there is no clear upward or downward trend in the data apart from the cyclical variations that occur since the late 1970s.

After accounting for the trend and the cycle, what remains is seasonality and random variation. It seems that there is some seasonal (quarterly) variation in the data, but for a correct interpretation, we have to look at the scales. The seasonal fluctuation is in the range from -0.05 to +0.05, which is very low compared to the level of the series (around 4, since 2010).

We can visualize the seasonal component using a ggsubseries() plot.

From the graph below, we see that indeed the price levels differ only marginally between the quarters.

library(forecast)
ggsubseriesplot(arabicaQ)

We will continue using the SES-model for forecasting.

For estimating and evaluating the model, we will split the data set into a training set and a test set.

When splitting the data, we have to decide on the length of the test set.

If our forecasting horizon is 2 years (8 quarters), then it makes sense to use a test set of the same length.

Alternatively, we can use test sets of different lengths (1 year and 2 years), which would allow us to assess the forecasting accuracy at different forecasting horizons.

arabicaQ_split <- ts_split(arabicaQ, sample.out = 8)
train <- arabicaQ_split$train
test  <- arabicaQ_split$test
ts_info(train)
##  The train series is a ts object with 1 variable and 225 observations
##  Frequency: 4 
##  Start time: 1960 1 
##  End time: 2016 1
ts_info(test)
##  The test series is a ts object with 1 variable and 8 observations
##  Frequency: 4 
##  Start time: 2016 2 
##  End time: 2018 1

We can combine the training and test sets (which are both defined as single time series) in one data set, as multiple time series, and plot them in one graph. By using the slider, we can zoom in on the most recent years, and have a better idea of the forecasting challenge that lies ahead of us!

tr1 <- as.data.frame(train)  # define as data frames
te1 <- as.data.frame(test)
names(tr1) <- "train"        # label the variables in the data frames
names(te1) <- "test"
tr1$d <- 1:nrow(tr1)         # add an index (d), from 1 to 209 for the training set
te1$d <- (1+nrow(tr1)):(nrow(tr1)+nrow(te1)) # id, from 210 to 233 for the test set
trte <- merge(tr1,te1,by="d",all=TRUE)       # merge the data 
trte_mts <- ts(trte[,c("train","test")],start=c(1960,1),frequency=4) # define as multiple time series
ts_info(trte_mts)
##  The trte_mts series is a mts object with 2 variables and 233 observations
##  Frequency: 4 
##  Start time: 1960 1 
##  End time: 2018 1
ts_plot(trte_mts,type="single",Xgrid=TRUE,Ygrid=TRUE,title="Arabica Training and Test Set",Xtitle="Year",Ytitle="Average Quarterly Price",slider=TRUE)

The SES model can be estimated using the ses() function of the forecast package in R.

In using this function, three arguments are of importance:

  • The initial argument can be set at either “simple” or “optimal”. Setting this argument has to do with the recursive nature of the model. Forecasts for any period are based on forecasted and actual values in the previous period. But for the first observation in the series, that information is not available. In order to start up the process, we have to set the initial state at some value. In the simple method, the initial state is based on simple calculations using the first observations. In the optimal method, a complex optimization method is used. The initial state is shown as l (for lambda) in the output. In long time series and/or in time series with large values for α, the choice hardly matters.
  • The alpha argument, sets the value for α. If the argument is not used, then α will be estimated.
  • The argument h sets the forecasting horizon.

Leaving out the alpha argument suggests that an “optimal” value will be estimated. However, other values of alpha work better!

Since our main aim is to produce accurate forecasts, we will first look at what the ses() function suggests as an estimate of α, and then - in the next step - compare the accuracy of the forecast to alternative models.

The model estimation is straightforward:

library(forecast)
fc_ses_default <- ses(train, h = 8, initial = "optimal")
fc_ses_default$model
## Simple exponential smoothing 
## 
## Call:
##  ses(y = train, h = 8, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 0.6328 
## 
##   sigma:  0.4233
## 
##      AIC     AICc      BIC 
## 835.7267 835.8353 845.9750

The output shows that α as estimated by ses() is very close to 1.

Note that, since the forecast one period ahead (T+1) is almost completely determined by the last observation in period T, this model is identical to the naive model!

The initial state is estimated. We would expect the initial state to be close the price of Arabic coffee at the start of the series in 1960 (around 0.90). But the output shows that the initial state is estimated as 0.63, which probably takes into consideration the long run upward trend in the coffee prices.

To be on the safe side, we can estimate the model with initial = “simple”. Indeed, the initial state, is close to the first observations (actually, it’s identical to the first observation).

library(forecast)
fc_ses_simple <- ses(train, h = 24, initial = "simple")
fc_ses_simple$model
## Simple exponential smoothing 
## 
## Call:
##  ses(y = train, h = 24, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 1 
## 
##   Initial states:
##     l = 0.9386 
## 
##   sigma:  0.4209
cat("First 4 observations:",arabicaQ[1:4],"\n")
## First 4 observations: 0.9386333 0.9208667 0.9226 0.9119667
cat("Initial state",fc_ses_simple$model$par[2],"\n")
## Initial state 0.9386333

We plot the actual and forecasted data using the test_forecast() function from the TSstudio package in R.

But, as expected, the implications for the forecasts are negligible, given the high α and the length (almost 60 years of data) of the series.

Now that we have estimated the model, we can visualize the actual observations in both the training and the test set, and show the 24 months forecasts. The forecasts of the SES model, as explained, are on a horizontal line.

test_forecast(actual = arabicaQ,
              forecast.obj = fc_ses_default,
              test = test) %>%
  layout(title = "Arabica Coffee Prices Forecast versus Actual",
         xaxis = list(range = c(2005, max(time(arabicaQ)))),
         yaxis = list(range = c(2, 7)))

What we see from the graph is that the forecasts using the default SES model (which is almost identical to the naive model) are systematically different from the actual values in the test set.

This is despite the fact that the red line (the fitted values) are very close to the actual values in the training set. But that is exactly what we would expect in an SES model with α≈1: the fitted values are identical to the actual values with a lag of 1 period!

So,let’s have a closer look at the accuracy of our models!

For the evaluation of our model, we need the following:

  • A test set, containing data that have not been used in estimating the model. In our case, we have set apart the last 24 observations of the data set, to see if these can be accurately predicted from the estimated model.
  • A statistic that enables us to assess how close our forecasts from the training set are to the actual values in the test set.
  • A benchmark. Normally, we would compare our model to a simple model. The idea is that using complex models only pays off if they systematically generate superior forecasts.

In our example, the default SES model with an estimated α≈1, is almost identical to the simple naive model, that forecasts the value at T+1 as the last observation at T. That is, using the naive model as the benchmark, would lead us to start using the naive model for forecasting.

More formally, we can assess the accuracy of our forecasts using a variety of measures.

Below, we first train the naive model, with a forecasting horizon

fc_naive <- naive(train,h=24)
accuracy(fc_naive, test)
##                     ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.0105803 0.4218174 0.2506363 -0.2608367 8.689122 0.3840583
## Test set     0.1216767 0.3189048 0.2632316  2.8297994 7.457427 0.4033586
##                   ACF1 Theil's U
## Training set 0.2495545        NA
## Test set     0.6548135  1.603047
accuracy(fc_ses_default,test)
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.01189372 0.4213831 0.2508906 -0.1149186 8.795616 0.3844480
## Test set     0.12167690 0.3189049 0.2632316  2.8298067 7.457427 0.4033586
##                   ACF1 Theil's U
## Training set 0.2488819        NA
## Test set     0.6548135  1.603047

6. Holt-model

The Holt model is an expanded version of the SES model. The model is based on double smoothing, with the use of two rather than one parameter. In addition to the α that has the same function as in SES, the Holt model introduces a second parameter β. The β parameter is intended to smooth the slope of the linear trend.

Like in SES, the default settings will estimate values for α and β, but these values are not necessarily the best for forecasting! So, we have to go through all the steps of the forecasting process:

Alternative models can be scanned in a grid search, with a range of values for both smoothing parameters and their impact on a relevant measure of accuracy, like MAPE.

Although we have introduced Holt as the model for a linear trend, the parameters of the model allow us to deal with short run forecasts for models with nonlinear trends, or more complex times series with breaks in trends.

The example is based on demand for a product that was affected by the Covid-19 pandemic. After 2020, demand went down sharply, but picked up even before the pandemic ended, at an accelerated pace.

Let’s read and visualize the time series.

demand <- read.csv("demand.csv")
demand <- ts(demand, end=c(2023,4), frequency=4)
ts_info(demand)
##  The demand series is a ts object with 1 variable and 48 observations
##  Frequency: 4 
##  Start time: 2012 1 
##  End time: 2023 4
ts_plot(demand, Xgrid=TRUE,Ygrid=TRUE,title="Demand for Product",Xtitle="Year",Ytitle="Demand in Units",slider=TRUE)

We will split the time series in a training set and a test set. If our forecasting horizon is two years (eight quarters), then it makes sense to set the length of the test set at eight quarters.

Using the default settings, in the holt() function of the forecast package, we would get the following.

The optimal initial values of the level and the slope are computed by the model. Alternatively we can use a simple method based on the first observations.

demand_par <- ts_split(demand, sample.out = 8)
traind <- demand_par$train
testd  <- demand_par$test
fc_demand_holt <- holt(traind,  h = 8, initial = "optimal") 
fc_demand_holt$model
## Holt's method 
## 
## Call:
##  holt(y = traind, h = 8, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.9985 
##     beta  = 0.0941 
## 
##   Initial states:
##     l = 14603.0487 
##     b = 161.9411 
## 
##   sigma:  80.9869
## 
##      AIC     AICc      BIC 
## 504.8838 506.6485 513.3282
accuracy(fc_demand_holt, testd)
##                      ME       RMSE      MAE         MPE      MAPE       MASE
## Training set   10.99164   76.83093  61.7626  0.05241543 0.3460477 0.08790283
## Test set     -665.87355 1141.19097 854.8069 -3.29699460 4.0874667 1.21659297
##                   ACF1 Theil's U
## Training set 0.0192117        NA
## Test set     0.2975704   1.08055
test_forecast(demand, forecast.obj = fc_demand_holt, test = testd)

Both from the accuracy measures and from the graph, it is obvious that this model performs poorly, for the simple reason that the sharp decrease in demand occurs within the quarters of the test set.

Even though there’s a strong linear trend in the training data, the β is quite low. That does not deny the existence of the strong linear trend, and just confirms that the variations in the slope of the linear trend are minor!

The MAPE tells what we can see from the graph: even though the fitted values (the red line) match the actual observations very well, the forecasted values in the 8 quarters of the test set are very poor. The MAPE in the test set is around 8 times the MAPE in the training set!

Since the forecasts in the test set turn out to be inaccurate forecasts of the actual observations, we have little confidence in the capability of this model to generate accurate forecasts. It makes sense to look for better models.

Given the uncertainty in the data caused by a unique event (COVID-19), it is probably better to choose a shorter forecasting horizon, e.g., 4 quarters rather than 8 quarters. In addition, we can try different values for β, since it is likely that the changes in the slope of the curve are strongly linked to recent changes in the slope.

Let’s put these two changes into practice:

In addition, we will use the option of dampening the trend, which prevents sudden peaks or troughs in recent periods to overestimate or underestimate forecasts.

For a good comparison, we will estimate a model with the default settings (4.a) as well.

# Training and test set; test set consists of 4 quarters, equal to the forecast horizon
demand_par_4 <- ts_split(demand, sample.out = 4)
traind4 <- demand_par_4$train
testd4  <- demand_par_4$test
# Default settings
fc_demand_holt_4_a <- holt(traind4,  h = 4, initial = "optimal") 
fc_demand_holt_4_a$model
## Holt's method 
## 
## Call:
##  holt(y = traind4, h = 4, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.4323 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 14644.2311 
##     b = 151.4496 
## 
##   sigma:  391.4726
## 
##      AIC     AICc      BIC 
## 697.6633 699.2422 706.5842
test_forecast(demand, forecast.obj = fc_demand_holt_4_a, test = testd4)
accuracy(fc_demand_holt_4_a, testd4)
##                         ME      RMSE      MAE         MPE      MAPE     MASE
## Training set   -0.05613861  373.2545  151.563 -0.01178639 0.7849943 0.215217
## Test set     1306.14875103 1414.0990 1306.149  5.61325289 5.6132529 1.854709
##                   ACF1 Theil's U
## Training set 0.1035931        NA
## Test set     0.1770183  2.379496
# Our own settings for beta, and a dampened trend
fc_demand_holt_4_b <- holt(traind4,  h = 4, initial = "optimal", beta=.70, damped=TRUE) 
fc_demand_holt_4_b$model
## Damped Holt's method 
## 
## Call:
##  holt(y = traind4, h = 4, damped = TRUE, initial = "optimal",  
## 
##  Call:
##      beta = 0.7) 
## 
##   Smoothing parameters:
##     alpha = 0.7 
##     beta  = 0.7 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 14642.7038 
##     b = 172.036 
## 
##   sigma:  512.9011
## 
##     AIC    AICc     BIC 
## 720.324 721.903 729.245
accuracy(fc_demand_holt_4_b, testd4)
##                    ME     RMSE      MAE       MPE     MAPE     MASE        ACF1
## Training set  59.3026 482.8805 194.9087 0.2980881 1.000961 0.276767 -0.22306674
## Test set     419.7282 493.1221 419.7282 1.7940209 1.794021 0.596007  0.04825039
##              Theil's U
## Training set        NA
## Test set     0.8351803
test_forecast(demand, forecast.obj = fc_demand_holt_4_b, test = testd4)

The results indicate that model 4.b performs quite well. The MAPE is much lower than in model 4.a. The MAPE in the test set is still higher than in the training set, but, firstly, this is common since the data in the training set are used in estimating the model, and, secondly, especially for this time series a unique event (COVID-19) has an impact in the latest observations, including the observations in the data set.

Probably, the MAPE would be even lower for the test set if we would have left out the option to dampen the trend. However, our ultimate aim is to forecast future demand rather than the test set! The dampened trend, in this example, produces a more conservative forecast of future demand, implicitly assuming (or reasoning) that the exponential increase of catch-up demand after COVID-19 will slow down.

We see that it’s not recommended to use the default settings! The best ways to assess the validity of the forecasting model are to:

  1. Use training and test sets of an appropriate length.

Whatever is appropriate depends on:

  1. Use a grid search for the parameter values that minimize the accuracy statistics in the test set.

Often, parameter values in a range (e.g., .30 < α < .70) tend to perform robustly and satisfactorily, and, hence, there’s no need to hunt for the very best parameter with four decimal precision (e.g., α = 0.6487). The optimal parameter is bound to change slightly as time goes by, and approximations (α = 0.65) that are kept constant for some time often produce robust forecasts.

  1. Visualize patterns in the forecasts. In our last model, there is a pattern in the forecasts, in the sense that the differences between the actual observations and the forecasts in the test set are widening. Preferably, the differences should be random, rather than systematic. The pattern in our example, is partly inspired by our choice of a dampened trend. If we expect the accelerating trend to be permanent, we can opt for a different model, like an exponential trend.

7. Holt-Winters model

The Holt-Winters model is an extended version of the Holt model (which is in its turn an extension of the SES model).

The Holt-Winters model is able to handle time series with both a trend and a seasonal component. For forecasting the seasonal component, the Holt-Winters model introduces a third parameter (γ, pronounced as gamma).

With the introduction of a third parameter, the mathematical formulas become even more complex, but it’s important to realize that the principle behind these parameters remains the same.

There are several packages in R that have functions for forecasting with the popular Holt-Winters model. In this section, we will mainly use the HoltWinters() function from the forecast package. Alternatively, we can make use of the hw() function of the stats package, which has the options to use exponential trends, and dampened trends.

We will use quarterly demand for a product, in the years 2003-2022. The data can be found in demand_hw.csv (download here).

demand_hw <- read.csv("demand_hw.csv")
head(demand_hw)
##   demand_hw
## 1    6891.8
## 2    4929.3
## 3    4653.7
## 4    6063.8
## 5    7233.1
## 6    4774.0

Since the data set only contains the data and not the time index, we have to define the data as a time series with the starting quarter (2003, Q1) as follows.

demand_hw_ts <- ts(demand_hw, start=c(2003,1),frequency=4)

We can inspect and plot the time series, and see if there is a seasonal (quarterly) component.

ts_info(demand_hw_ts)
##  The demand_hw_ts series is a ts object with 1 variable and 79 observations
##  Frequency: 4 
##  Start time: 2003 1 
##  End time: 2022 3
ts_plot(demand_hw_ts)
plot(decompose(demand_hw_ts))

ts_seasonal(demand_hw_ts)

The plots suggest an upward trend, which is accelerating as of 2010.

The seasonal plot confirms that there’s a strong and systematic seasonal component. In each of the years 2003-2021, demand in Q1 is around two thousand units higher than in Q2 and Q3. Demand in Q4 is in between the peaks in Q1 and the troughs in Q2/Q3.

This information justifies the use of the Holt-Winters model.

Since we want to train the model, and assess its performance with data that we have at our disposal, we partition the data set in a training and test set. In case of seasonal data, we want to see to it that the test set includes all seasons. Since we have quarterly data, the minimum length of the test set should therefore be 4. If our forecasting horizon is one year, we can pick 4 as the length of the training set, but longer periods (8, or even 12 quarters) are also good choices.

Using both short and longer test sets, may shed light on the forecasting accuracy of various models in the short, medium and longer run!

For the sake of ease, we will stick to a test set length and forecasting horizon of h=4.

d_hw_par <- ts_split(demand_hw_ts, 4)
train_hw <- d_hw_par$train
test_hw  <- d_hw_par$test

Next, we use the training set to estimate the model, and use the model to fit and forecast the observations in the training and test set.

The output shows the estimations of the parameters α, β and γ. The sprintf() function is used to force R to print the output in 4 decimals.

d_hw <- HoltWinters(train_hw)
cat("alpha =",sprintf("%.4f",d_hw$alpha),"\n")
## alpha = 0.1863
cat("beta  =",sprintf("%.4f",d_hw$beta),"\n")
## beta  = 0.0917
cat("gamma =",sprintf("%.4f",d_hw$gamma),"\n")
## gamma = 0.2025

We can use the forecast() function to forecast values for the test set (4 quarters, 2022Q4 to 2023Q3), and beyond (for example the next 8 quarters on which we don’t have data, 2023Q4 to 2025Q3).

The forecasts come with both point estimates and 80% and 95% confidence intervals. You will notice that the confidence intervals are in quite a broad range, and the range increases with the number of quarters forecasted ahead.

In many cases, the ranges are so broad that they hardly give guidance about what the future may bring!

fc1 <- forecast(d_hw, 12)
plot_forecast(fc1, title="Fit & Eight Quarter Forecast", Xtitle = "Year", Ytitle = "Demand")