Overview

Forecasting models generally fall under one of three umbrellas:

Explanatory models use external variables to predict a dependent variable
Time series models use past variable values to predict future values
Dynamic models combine both explanatory and time series models

Time series models are often used in analyzing characteristics of organization performance such as attrition, supply, and cost to assist leaders in decision making. This tutorial will provide a foundational introduction to moving averages and time series decomposition. These two foundations are critical to understanding a practical approach to using the exponential smoothing method of time series models.

Replication Requirements

This tutorial leverages a variety of data sets to illustrate unique time series features. The data sets are all provided by the fpp2, and zoo packages. Furthermore, these packages provide various functions for computing and visualizing basic time series components.

library(tidyverse)      # data manipulation and visualization
library(lubridate)      # easily work with dates and times
library(fpp2)           # working with time series data
library(zoo)            # working with time series data

Moving Averages

Exponential smoothing (discussed later) derives its foundation from the theory of moving averages. A moving average of order \(m\) can be written as

\[ \begin{aligned} \hat{T}_{t} = \frac{1}{m} \sum_{j=-k}^k y_{t+j}, \end{aligned} \] where \(m=2k+1\). That is, the estimate of the trend-cycle at time \(t\) is obtained by averaging values of the time series within \(k\) periods of \(t\). Observations that are nearby in time are also likely to be close in value, and the average eliminates some of the randomness in the data, leaving a smooth trend-cycle component. We call this an \(m\)-MA meaning a moving average of order \(m\).

To compute moving averages, we can leverage the rollmean function from the zoo package. Here, we focus on the personal savings rate (psavert) variable in the economics data frame. Using mutate and rollmean, I compute the 13, 25, …, 121 month moving average values and add this data back to the data frame. Note that we need to explicitly add the fill argument to ensure that values that cannot be computed (due to lack of data) are marked as missing.

savings <- economics %>%
  select(date, srate = psavert) %>%
  mutate(srate_ma01 = rollmean(srate, k = 13, fill = NA),
         srate_ma02 = rollmean(srate, k = 25, fill = NA),
         srate_ma03 = rollmean(srate, k = 37, fill = NA),
         srate_ma05 = rollmean(srate, k = 61, fill = NA),
         srate_ma10 = rollmean(srate, k = 121, fill = NA))

savings

Now we can go ahead and plot these values and compare the actual data to the different moving average smoothers.

savings %>%
  gather(metric, value, srate:srate_ma10) %>%
  ggplot(aes(date, value, color = metric)) +
  geom_line()

You may notice that as the number of points used for the average increases, the curve becomes smoother and smoother. Choosing a value for k is a balance between eliminating noise while still capturing the data’s true structure. For this set, the 10 year moving average (\(k = 121\)) eliminates most of the pattern and is probably too much smoothing, while a 1 year moving average (\(k = 13\)) offers little more than just looking at the data itself.

Trailing Moving Averages

For purposes of forecasting, we study trailing moving averages, where the window of k periods is placed over the most recent available k values of the series. For example, if we have data up to time period t, we can predict the value for t+1 by averaging over k periods prior to t+1. If we want to use the 5 most recent time periods to predict for t+1 then our function looks like:

\[ \begin{aligned} \hat{y}_{t+1} = \dfrac{y_{t-4} + y_{t-3} + y_{t-2} + y_{t-1} + y_{t}}{5} \end{aligned} \]

So, if we wanted to predict the next month’s savings rate based on the previous year’s average, we can use rollmean with the align = "right" argument to compute a trailing moving average. We can see that if we wanted to predict what the savings rate would be for 2015-05-01 based on the the last 12 months, our prediction would be 5.6% (the 12-month average for 2015-04-01).

savings_tma <- economics %>%
  select(date, srate = psavert) %>%
  mutate(srate_tma = rollmean(srate, k = 12, fill = NA, align = "right"))

tail(savings_tma, 12)

We can visualize how the 12-month trailing moving average predicts future savings rates with the following plot. It’s easy to see that trailing moving averages have a delayed reaction to changes in patterns and trends.

savings_tma %>%
  gather(metric, value, -date) %>%
  ggplot(aes(date, value, color = metric)) +
  geom_line()

Moving Averages of Moving Averages

The concept of simple moving averages can be extended to taking moving averages of moving averages. This technique is often employed with an even number of data points so that the final product is symmetric around each point. For example, let’s look at the built-in data set elecsales provided by the fpp2 package. For our first example we convert to a data frame. This data frame is even numbered with 20 rows.

elecsales.df <- data.frame(year = time(elecsales), sales = elecsales)

nrow(elecsales.df)

## [1] 20

An even-numbered moving average is unbalanced, and for our purposes, the unbalancing will be in favor of more recent observations. For example, to calculate a 4-MA, the equation is as follows:

\[ \begin{aligned} \hat{y_t} = \dfrac{y_{t-1} + y_{t} + y_{t+1} + y_{t+2}}{4} \end{aligned} \]

To make the moving average symmetric (and therefore more accurate), we then take a 2-MA of the 4-MA to create a 2 x 4-MA. For the 2-MA step, we average the current and previous moving averages, thus resulting in an overall estimate of:

\[ \begin{aligned} \hat{y_t} = \dfrac{1}{8}y_{t-2} + \dfrac{1}{4}y_{t-1} + \dfrac{1}{4}y_{t} + \dfrac{1}{4}y_{t+1} + \dfrac{1}{8}y_{t+2} \end{aligned} \]

This two-step process can be performed easily with the ma function by setting order = 4 and centre = TRUE.

elecsales.df %>%
  mutate(ma4 = ma(sales, order = 4, centre = TRUE)) %>%
  head()

To compare this moving average to a regular moving average we can plot the two outputs:

elecsales.df %>%
  mutate(ma2 = rollmean(sales, k = 2, fill = NA),
         ma2x4 = ma(sales, order = 4, centre = TRUE)) %>%
  gather(ma, value, ma2:ma2x4) %>%
  ggplot(aes(x = year)) + 
  geom_point(aes(y = sales)) +
  geom_line(aes(y = value, color = ma))

This 2 x 4-MA process produces the best fit yet. It massages out some of the noise while maintaining the overall trend of the data. Other combinations of moving averages are possible, such as 3 x 3-MA. To maintain symmetry, if your first moving average is an even number of points, the follow-up MA should also contain an even number. Likewise, if your first MA uses an odd number of points, the follow-up should use an odd number of points. Just keep in mind that moving averages of moving averages will lose information as you do not retain as many data points.

Weighted Moving Averages

A moving average of a moving average can be thought of as a symmetric MA that has different weights on each nearby observation. For example, the 2x4-MA discussed above is equivalent to a weighted 5-MA with weights given by \([1/8,1/4,1/4,1/4,1/8]\). In general, a weighted m-MA can be written as

\[ \begin{aligned} \hat{T}_t = \sum^k_{j=-k} a_j y_{t+j} \end{aligned} \]

where \(k=(m-1)/2\) and the weights are given by \([a_{-k}, \dots, a_k]\). It is important that the weights all sum to one and that they are symmetric so that \(a_j = a_{-j}\). This simple m-MA is a special case where all the weights are equal to \(1 / m\). A major advantage of weighted moving averages is that they yield a smoother estimate of the trend-cycle. Instead of observations entering and leaving the calculation at full weight, their weights are slowly increased and then slowly decreased resulting in a smoother curve.

Exercises

Using the economics data set provided by the ggplot2 package:

Compute and plot the 1, 3, and 5 year moving average for the personal consumption expenditures.
Forecast the personal consumption expenditure for 2015-05-01 using a 1, 3, and 5 year trailing moving average.
Compute and plot a 2x12 weighted smoothing average.

Time Series Decomposition

Up to this point, we have used data frames of date and numeric variables to product plots. Now, it is worth taking a moment to understand the different components of time series data - a prerequisite to any exponential smoothing forecasting. Information on converting data to time series formats can be found here.

We can describe the common time series components using this quarterly cement production data.

autoplot(fpp2::qcement) + ylab("Quarterly Cement Production (1956-2014)")

The trend is the long-term increase or decrease in the data. There is an increasing trend in the cement data.
The seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. The quarterly cement data above shows seasonality likely induced by the change in weather and its impact on being able to pour cement.
The cycle occurs when the data exhibit rises and falls that are not of a fixed period. These fluctuations are usually due to economic conditions and are often related to the “business cycle”. We can see a few cycles in our cement data in the early ’80s, ’90s, ’00s, and around 2008 - all these date ranges are around economic depressions that occurred.

The decompose() and forecast::stl() functions split the time series into clearly visible components. This decomposition can be either multiplicative or additive in nature.

tsData <- EuStockMarkets[, 1] 
decomposedRes <- decompose(tsData, type="mult") # use type = "additive" for additive components
plot (decomposedRes)

Exercises

Data set books contains the daily sales of paperback and hardcover books at the same store. Plot the series and discuss the main features of the data.

Exponential Smoothing

Exponential smoothing applies to time series data and uses weighted averages of past observations to predict future values, with weights decreasing exponentially as the observations get older. The more recent the observations, the higher the associated weight. There are a multitude of different methods of exponential smoothing depending on the data’s types of time series components.

Simple Exponential Smoothing (SES)

The simplest of the exponentially smoothing methods is naturally called “simple exponential smoothing” (SES). This method is suitable for forecasting data with no trend or seasonal pattern. Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past — the smallest weights are associated with the oldest observations:

\[ \begin{aligned} \begin{equation}\hat{y}_{T+1|T} = \alpha y_T +\alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2}+ \cdots, \end{equation} \end{aligned} \]

where \(0≤α≤1\) is the smoothing parameter. The one-step-ahead forecast for time \(T+1\) is a weighted average of all the observations in the series \(y_1,\dots,y_T\). The rate at which the weights decrease is controlled by the parameter \(α\).

For any \(α\) between 0 and 1, the weights attached to the observations decrease exponentially as we go back in time, hence the name “exponential smoothing”. If \(α\) is small (i.e., close to 0), more weight is given to observations from the more distant past. If \(α\) is large (i.e., close to 1), more weight is given to the more recent observations.

Check out the oil production in Saudi Arabia from 1996 to 2007, shown below.

oildata <- window(oil,start=1996,end=2007)
plot(oildata, ylab="Oil (millions of tonnes)",xlab="Year")

This simple exponentional smoothing method can be applied with the ses() function of the forecast package. Set initial to either "optimal" or "simple" and h to the number of desired periods for forecasting. The output includes a point forecast for each period along with intervals for 80% and 95% confidences.

ses(oildata, alpha=0.2, initial="simple", h=3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2008       481.4481 442.3387 520.5575 421.6354 541.2607
## 2009       481.4481 441.5641 521.3320 420.4509 542.4453
## 2010       481.4481 440.8044 522.0917 419.2889 543.6072

See the plot of these forecasts alongside the actual data.

fit1 <- ses(oildata, alpha=0.2, initial="simple", h=3)
fit2 <- ses(oildata, alpha=0.6, initial="simple", h=3)
fit3 <- ses(oildata, h=3)
plot(fit1, plot.conf=FALSE, ylab="Oil (millions of tonnes)",
  xlab="Year", main="", fcol="white", type="o")
lines(fitted(fit1), col="blue", type="o")
lines(fitted(fit2), col="red", type="o")
lines(fitted(fit3), col="green", type="o")
lines(fit1$mean, col="blue", type="o")
lines(fit2$mean, col="red", type="o")
lines(fit3$mean, col="green", type="o")
legend("topleft",lty=1, col=c(1,"blue","red","green"), 
  c("data", expression(alpha == 0.2), expression(alpha == 0.6),
  expression(alpha == 0.89)),pch=1)

In this example, simple exponential smoothing is applied to forecast oil production in Saudi Arabia. The black line above is a plot of the data over the period 1996–2007, which shows a changing level over time but no obvious trending behavior.

The influence of \(α\) on the smoothing process is clearly visible. The larger the \(α\) the greater the adjustment that takes place in the next forecast in the direction of the previous data point; smaller \(α\) leads to less adjustment.

Holt’s Linear Trend Method

Adding a trend to the forecast requires the holt() function of the forecast package and adds a \(\beta^*\) variable to represent the trend level of the forecast. \(\beta^*\) is the smoothing parameter for the trend for values \(0≤\beta^*≤1\). As shown, the forecast is no longer flat.

livestock2 <- window(livestock,start=1970,end=2000)
fit1 <- ses(livestock2)
fit2 <- holt(livestock2)
fit3 <- holt(livestock2,exponential=TRUE)
fit4 <- holt(livestock2,damped=TRUE)
fit5 <- holt(livestock2,exponential=TRUE,damped=TRUE)

fit2

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2001       419.2502 401.4138 437.0866 391.9718 446.5286
## 2002       424.2786 399.3216 449.2357 386.1101 462.4472
## 2003       429.3071 398.8500 459.7642 382.7270 475.8872
## 2004       434.3355 399.2289 469.4422 380.6446 488.0265
## 2005       439.3640 400.1545 478.5734 379.3983 499.3296
## 2006       444.3924 401.4698 487.3150 378.7480 510.0368
## 2007       449.4209 403.0811 495.7606 378.5503 520.2914
## 2008       454.4493 404.9269 503.9717 378.7113 530.1873
## 2009       459.4777 406.9647 511.9908 379.1659 539.7895
## 2010       464.5062 409.1633 519.8491 379.8665 549.1458

plot(fit3, type="o", ylab="Livestock, sheep in Asia (millions)", 
  flwd=1, plot.conf=FALSE)
lines(window(livestock,start=2001),type="o")
lines(fit1$mean,col=2)
lines(fit2$mean,col=3)
lines(fit4$mean,col=5)
lines(fit5$mean,col=6)
legend("topleft", lty=1, pch=1, col=1:6,
    c("Data","SES","Holt's","Exponential",
      "Additive Damped","Multiplicative Damped"))

The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Even more extreme are the forecasts generated by the exponential trend method which include exponential growth or decline.

Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Motivated by this observation, Gardner and McKenzie (1985) introduced a parameter that “dampens” the trend to a flat line some time in the future. Methods that include a damped trend have proven to be very successful and are arguably the most popular individual methods when forecasts are required automatically for many series.

Holt-Winters Seasonal Method

Holt (1957) and Winters (1960) extended Holt’s method to capture seasonality. We use \(m\) to denote the period of the seasonality, i.e., the number of seasons in a year. For example, for quarterly data \(m=4\), and for monthly data \(m=12\). The hw() function of the forecast package adds in the seasonality component of forecasting, which can also be multiplicative or additive in nature. This function utilizes the same \(α\) and \(\beta^*\) variables as the Holt Method and adds a final \(\gamma\) parameter for the season.

aust <- window(austourists,start=2005)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")

fit1

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2016 Q1       75.94826 73.65908 78.23744 72.44726 79.44925
## 2016 Q2       51.40294 49.04240 53.76347 47.79281 55.01306
## 2016 Q3       63.78886 61.35813 66.21959 60.07137 67.50635
## 2016 Q4       68.26542 65.76554 70.76531 64.44218 72.08866
## 2017 Q1       78.75684 75.77117 81.74252 74.19065 83.32304
## 2017 Q2       54.21152 51.16776 57.25528 49.55649 58.86655
## 2017 Q3       66.59744 63.49592 69.69896 61.85408 71.34081
## 2017 Q4       71.07401 67.91504 74.23297 66.24278 75.90523

plot(fit2,ylab="International visitor night in Australia (millions)",
     plot.conf=FALSE, type="o", fcol="white", xlab="Year")
lines(fitted(fit1), col="red", lty=2)
lines(fitted(fit2), col="green", lty=2)
lines(fit1$mean, type="o", col="red")
lines(fit2$mean, type="o", col="green")
legend("topleft",lty=1, pch=1, col=1:3, 
  c("data","Holt Winters' Additive","Holt Winters' Multiplicative"))

ETS Modeling

Assigning roles to each of the time series parameters can be quite subjective. The ets() function from the forecast package objectifies this process by estimating the \(α\), \(\beta^*\), and \(\gamma\) parameters that best fit the data. We now employ the ets() function to forecast tourists visitor nights in Australia by international arrivals over the period 2011–2012. We let the ets() function select the model; in this case, the function chooses a multiplicative error, additive trend, and multiplicative seasonal component to forecast future tourist visitor night numbers.

vndata <- window(austourists, start=2005)
fit <- ets(vndata)
summary(fit)

## ETS(M,A,M) 
## 
## Call:
##  ets(y = vndata) 
## 
##   Smoothing parameters:
##     alpha = 0.47 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 32.3107 
##     b = 0.6903 
##     s=1.0237 0.9623 0.7671 1.2469
## 
##   sigma:  0.0328
## 
##      AIC     AICc      BIC 
## 220.0729 225.3670 236.1306 
## 
## Training set error measures:
##                      ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 0.04344974 1.565227 1.255809 -0.03986347 2.710219 0.4115083
##                    ACF1
## Training set -0.1101862

plot(forecast(fit,h=8),
  ylab="International visitor night in Australia (millions)")

Exercises

Data set books contains the daily sales of paperback and hardcover books at the same store. Let ses() select the optimal value of \(α\). Use this value to generate forecasts for the next four days.
Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
For this exercise, use the monthly Australian short-term overseas visitors data, May 1985–April 2005. (Data set: visitors.) Forecast the next two years using Holt-Winters’ multiplicative method.

Additional Forecasting Techniques

For additional/more advanced forecasting techniques, see below topics:

ARIMA
Dynamic Regression Models
Vector Autoregressions
Neural Network Models
Forecasting Hierarchical or Grouped Time Series

Introductory Time Series Forecasting

James Okamoto

August 18, 2017

Overview

Replication Requirements

Moving Averages

Trailing Moving Averages

Moving Averages of Moving Averages

Weighted Moving Averages

Exercises

Time Series Decomposition

Exercises

Exponential Smoothing

Simple Exponential Smoothing (SES)

Holt’s Linear Trend Method

Holt-Winters Seasonal Method

ETS Modeling

Exercises

Additional Forecasting Techniques

References