Forecasting models generally fall under one of three umbrellas:
Time series models are often used in analyzing characteristics of organization performance such as attrition, supply, and cost to assist leaders in decision making. This tutorial will provide a foundational introduction to moving averages and time series decomposition. These two foundations are critical to understanding a practical approach to using the exponential smoothing method of time series models.
This tutorial leverages a variety of data sets to illustrate unique time series features. The data sets are all provided by the fpp2, and zoo packages. Furthermore, these packages provide various functions for computing and visualizing basic time series components.
library(tidyverse) # data manipulation and visualization
library(lubridate) # easily work with dates and times
library(fpp2) # working with time series data
library(zoo) # working with time series data
Exponential smoothing (discussed later) derives its foundation from the theory of moving averages. A moving average of order \(m\) can be written as
\[ \begin{aligned} \hat{T}_{t} = \frac{1}{m} \sum_{j=-k}^k y_{t+j}, \end{aligned} \] where \(m=2k+1\). That is, the estimate of the trend-cycle at time \(t\) is obtained by averaging values of the time series within \(k\) periods of \(t\). Observations that are nearby in time are also likely to be close in value, and the average eliminates some of the randomness in the data, leaving a smooth trend-cycle component. We call this an \(m\)-MA meaning a moving average of order \(m\).
To compute moving averages, we can leverage the rollmean function from the zoo package. Here, we focus on the personal savings rate (psavert) variable in the economics data frame. Using mutate and rollmean, I compute the 13, 25, …, 121 month moving average values and add this data back to the data frame. Note that we need to explicitly add the fill argument to ensure that values that cannot be computed (due to lack of data) are marked as missing.
savings <- economics %>%
select(date, srate = psavert) %>%
mutate(srate_ma01 = rollmean(srate, k = 13, fill = NA),
srate_ma02 = rollmean(srate, k = 25, fill = NA),
srate_ma03 = rollmean(srate, k = 37, fill = NA),
srate_ma05 = rollmean(srate, k = 61, fill = NA),
srate_ma10 = rollmean(srate, k = 121, fill = NA))
savings
Now we can go ahead and plot these values and compare the actual data to the different moving average smoothers.
savings %>%
gather(metric, value, srate:srate_ma10) %>%
ggplot(aes(date, value, color = metric)) +
geom_line()
You may notice that as the number of points used for the average increases, the curve becomes smoother and smoother. Choosing a value for k is a balance between eliminating noise while still capturing the data’s true structure. For this set, the 10 year moving average (\(k = 121\)) eliminates most of the pattern and is probably too much smoothing, while a 1 year moving average (\(k = 13\)) offers little more than just looking at the data itself.
For purposes of forecasting, we study trailing moving averages, where the window of k periods is placed over the most recent available k values of the series. For example, if we have data up to time period t, we can predict the value for t+1 by averaging over k periods prior to t+1. If we want to use the 5 most recent time periods to predict for t+1 then our function looks like:
\[ \begin{aligned} \hat{y}_{t+1} = \dfrac{y_{t-4} + y_{t-3} + y_{t-2} + y_{t-1} + y_{t}}{5} \end{aligned} \]
So, if we wanted to predict the next month’s savings rate based on the previous year’s average, we can use rollmean with the align = "right" argument to compute a trailing moving average. We can see that if we wanted to predict what the savings rate would be for 2015-05-01 based on the the last 12 months, our prediction would be 5.6% (the 12-month average for 2015-04-01).
savings_tma <- economics %>%
select(date, srate = psavert) %>%
mutate(srate_tma = rollmean(srate, k = 12, fill = NA, align = "right"))
tail(savings_tma, 12)
We can visualize how the 12-month trailing moving average predicts future savings rates with the following plot. It’s easy to see that trailing moving averages have a delayed reaction to changes in patterns and trends.
savings_tma %>%
gather(metric, value, -date) %>%
ggplot(aes(date, value, color = metric)) +
geom_line()
The concept of simple moving averages can be extended to taking moving averages of moving averages. This technique is often employed with an even number of data points so that the final product is symmetric around each point. For example, let’s look at the built-in data set elecsales provided by the fpp2 package. For our first example we convert to a data frame. This data frame is even numbered with 20 rows.
elecsales.df <- data.frame(year = time(elecsales), sales = elecsales)
nrow(elecsales.df)
## [1] 20
An even-numbered moving average is unbalanced, and for our purposes, the unbalancing will be in favor of more recent observations. For example, to calculate a 4-MA, the equation is as follows:
\[ \begin{aligned} \hat{y_t} = \dfrac{y_{t-1} + y_{t} + y_{t+1} + y_{t+2}}{4} \end{aligned} \]
To make the moving average symmetric (and therefore more accurate), we then take a 2-MA of the 4-MA to create a 2 x 4-MA. For the 2-MA step, we average the current and previous moving averages, thus resulting in an overall estimate of:
\[ \begin{aligned} \hat{y_t} = \dfrac{1}{8}y_{t-2} + \dfrac{1}{4}y_{t-1} + \dfrac{1}{4}y_{t} + \dfrac{1}{4}y_{t+1} + \dfrac{1}{8}y_{t+2} \end{aligned} \]
This two-step process can be performed easily with the ma function by setting order = 4 and centre = TRUE.
elecsales.df %>%
mutate(ma4 = ma(sales, order = 4, centre = TRUE)) %>%
head()
To compare this moving average to a regular moving average we can plot the two outputs:
elecsales.df %>%
mutate(ma2 = rollmean(sales, k = 2, fill = NA),
ma2x4 = ma(sales, order = 4, centre = TRUE)) %>%
gather(ma, value, ma2:ma2x4) %>%
ggplot(aes(x = year)) +
geom_point(aes(y = sales)) +
geom_line(aes(y = value, color = ma))
This 2 x 4-MA process produces the best fit yet. It massages out some of the noise while maintaining the overall trend of the data. Other combinations of moving averages are possible, such as 3 x 3-MA. To maintain symmetry, if your first moving average is an even number of points, the follow-up MA should also contain an even number. Likewise, if your first MA uses an odd number of points, the follow-up should use an odd number of points. Just keep in mind that moving averages of moving averages will lose information as you do not retain as many data points.
A moving average of a moving average can be thought of as a symmetric MA that has different weights on each nearby observation. For example, the 2x4-MA discussed above is equivalent to a weighted 5-MA with weights given by \([1/8,1/4,1/4,1/4,1/8]\). In general, a weighted m-MA can be written as
\[ \begin{aligned} \hat{T}_t = \sum^k_{j=-k} a_j y_{t+j} \end{aligned} \]
where \(k=(m-1)/2\) and the weights are given by \([a_{-k}, \dots, a_k]\). It is important that the weights all sum to one and that they are symmetric so that \(a_j = a_{-j}\). This simple m-MA is a special case where all the weights are equal to \(1 / m\). A major advantage of weighted moving averages is that they yield a smoother estimate of the trend-cycle. Instead of observations entering and leaving the calculation at full weight, their weights are slowly increased and then slowly decreased resulting in a smoother curve.
Using the economics data set provided by the ggplot2 package:
Up to this point, we have used data frames of date and numeric variables to product plots. Now, it is worth taking a moment to understand the different components of time series data - a prerequisite to any exponential smoothing forecasting. Information on converting data to time series formats can be found here.
We can describe the common time series components using this quarterly cement production data.
autoplot(fpp2::qcement) + ylab("Quarterly Cement Production (1956-2014)")
The trend is the long-term increase or decrease in the data. There is an increasing trend in the cement data.
The seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. The quarterly cement data above shows seasonality likely induced by the change in weather and its impact on being able to pour cement.
The cycle occurs when the data exhibit rises and falls that are not of a fixed period. These fluctuations are usually due to economic conditions and are often related to the “business cycle”. We can see a few cycles in our cement data in the early ’80s, ’90s, ’00s, and around 2008 - all these date ranges are around economic depressions that occurred.
The decompose() and forecast::stl() functions split the time series into clearly visible components. This decomposition can be either multiplicative or additive in nature.
tsData <- EuStockMarkets[, 1]
decomposedRes <- decompose(tsData, type="mult") # use type = "additive" for additive components
plot (decomposedRes)
books contains the daily sales of paperback and hardcover books at the same store. Plot the series and discuss the main features of the data.Exponential smoothing applies to time series data and uses weighted averages of past observations to predict future values, with weights decreasing exponentially as the observations get older. The more recent the observations, the higher the associated weight. There are a multitude of different methods of exponential smoothing depending on the data’s types of time series components.
The simplest of the exponentially smoothing methods is naturally called “simple exponential smoothing” (SES). This method is suitable for forecasting data with no trend or seasonal pattern. Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past — the smallest weights are associated with the oldest observations:
\[ \begin{aligned} \begin{equation}\hat{y}_{T+1|T} = \alpha y_T +\alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2}+ \cdots, \end{equation} \end{aligned} \]
where \(0≤α≤1\) is the smoothing parameter. The one-step-ahead forecast for time \(T+1\) is a weighted average of all the observations in the series \(y_1,\dots,y_T\). The rate at which the weights decrease is controlled by the parameter \(α\).
For any \(α\) between 0 and 1, the weights attached to the observations decrease exponentially as we go back in time, hence the name “exponential smoothing”. If \(α\) is small (i.e., close to 0), more weight is given to observations from the more distant past. If \(α\) is large (i.e., close to 1), more weight is given to the more recent observations.
Check out the oil production in Saudi Arabia from 1996 to 2007, shown below.
oildata <- window(oil,start=1996,end=2007)
plot(oildata, ylab="Oil (millions of tonnes)",xlab="Year")
This simple exponentional smoothing method can be applied with the ses() function of the forecast package. Set initial to either "optimal" or "simple" and h to the number of desired periods for forecasting. The output includes a point forecast for each period along with intervals for 80% and 95% confidences.
ses(oildata, alpha=0.2, initial="simple", h=3)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2008 481.4481 442.3387 520.5575 421.6354 541.2607
## 2009 481.4481 441.5641 521.3320 420.4509 542.4453
## 2010 481.4481 440.8044 522.0917 419.2889 543.6072
See the plot of these forecasts alongside the actual data.
fit1 <- ses(oildata, alpha=0.2, initial="simple", h=3)
fit2 <- ses(oildata, alpha=0.6, initial="simple", h=3)
fit3 <- ses(oildata, h=3)
plot(fit1, plot.conf=FALSE, ylab="Oil (millions of tonnes)",
xlab="Year", main="", fcol="white", type="o")
lines(fitted(fit1), col="blue", type="o")
lines(fitted(fit2), col="red", type="o")
lines(fitted(fit3), col="green", type="o")
lines(fit1$mean, col="blue", type="o")
lines(fit2$mean, col="red", type="o")
lines(fit3$mean, col="green", type="o")
legend("topleft",lty=1, col=c(1,"blue","red","green"),
c("data", expression(alpha == 0.2), expression(alpha == 0.6),
expression(alpha == 0.89)),pch=1)
In this example, simple exponential smoothing is applied to forecast oil production in Saudi Arabia. The black line above is a plot of the data over the period 1996–2007, which shows a changing level over time but no obvious trending behavior.
The influence of \(α\) on the smoothing process is clearly visible. The larger the \(α\) the greater the adjustment that takes place in the next forecast in the direction of the previous data point; smaller \(α\) leads to less adjustment.
Adding a trend to the forecast requires the holt() function of the forecast package and adds a \(\beta^*\) variable to represent the trend level of the forecast. \(\beta^*\) is the smoothing parameter for the trend for values \(0≤\beta^*≤1\). As shown, the forecast is no longer flat.
livestock2 <- window(livestock,start=1970,end=2000)
fit1 <- ses(livestock2)
fit2 <- holt(livestock2)
fit3 <- holt(livestock2,exponential=TRUE)
fit4 <- holt(livestock2,damped=TRUE)
fit5 <- holt(livestock2,exponential=TRUE,damped=TRUE)
fit2
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2001 419.2502 401.4138 437.0866 391.9718 446.5286
## 2002 424.2786 399.3216 449.2357 386.1101 462.4472
## 2003 429.3071 398.8500 459.7642 382.7270 475.8872
## 2004 434.3355 399.2289 469.4422 380.6446 488.0265
## 2005 439.3640 400.1545 478.5734 379.3983 499.3296
## 2006 444.3924 401.4698 487.3150 378.7480 510.0368
## 2007 449.4209 403.0811 495.7606 378.5503 520.2914
## 2008 454.4493 404.9269 503.9717 378.7113 530.1873
## 2009 459.4777 406.9647 511.9908 379.1659 539.7895
## 2010 464.5062 409.1633 519.8491 379.8665 549.1458
plot(fit3, type="o", ylab="Livestock, sheep in Asia (millions)",
flwd=1, plot.conf=FALSE)
lines(window(livestock,start=2001),type="o")
lines(fit1$mean,col=2)
lines(fit2$mean,col=3)
lines(fit4$mean,col=5)
lines(fit5$mean,col=6)
legend("topleft", lty=1, pch=1, col=1:6,
c("Data","SES","Holt's","Exponential",
"Additive Damped","Multiplicative Damped"))
The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Even more extreme are the forecasts generated by the exponential trend method which include exponential growth or decline.
Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Motivated by this observation, Gardner and McKenzie (1985) introduced a parameter that “dampens” the trend to a flat line some time in the future. Methods that include a damped trend have proven to be very successful and are arguably the most popular individual methods when forecasts are required automatically for many series.
Holt (1957) and Winters (1960) extended Holt’s method to capture seasonality. We use \(m\) to denote the period of the seasonality, i.e., the number of seasons in a year. For example, for quarterly data \(m=4\), and for monthly data \(m=12\). The hw() function of the forecast package adds in the seasonality component of forecasting, which can also be multiplicative or additive in nature. This function utilizes the same \(α\) and \(\beta^*\) variables as the Holt Method and adds a final \(\gamma\) parameter for the season.
aust <- window(austourists,start=2005)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
fit1
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 Q1 75.94826 73.65908 78.23744 72.44726 79.44925
## 2016 Q2 51.40294 49.04240 53.76347 47.79281 55.01306
## 2016 Q3 63.78886 61.35813 66.21959 60.07137 67.50635
## 2016 Q4 68.26542 65.76554 70.76531 64.44218 72.08866
## 2017 Q1 78.75684 75.77117 81.74252 74.19065 83.32304
## 2017 Q2 54.21152 51.16776 57.25528 49.55649 58.86655
## 2017 Q3 66.59744 63.49592 69.69896 61.85408 71.34081
## 2017 Q4 71.07401 67.91504 74.23297 66.24278 75.90523
plot(fit2,ylab="International visitor night in Australia (millions)",
plot.conf=FALSE, type="o", fcol="white", xlab="Year")
lines(fitted(fit1), col="red", lty=2)
lines(fitted(fit2), col="green", lty=2)
lines(fit1$mean, type="o", col="red")
lines(fit2$mean, type="o", col="green")
legend("topleft",lty=1, pch=1, col=1:3,
c("data","Holt Winters' Additive","Holt Winters' Multiplicative"))
Assigning roles to each of the time series parameters can be quite subjective. The ets() function from the forecast package objectifies this process by estimating the \(α\), \(\beta^*\), and \(\gamma\) parameters that best fit the data. We now employ the ets() function to forecast tourists visitor nights in Australia by international arrivals over the period 2011–2012. We let the ets() function select the model; in this case, the function chooses a multiplicative error, additive trend, and multiplicative seasonal component to forecast future tourist visitor night numbers.
vndata <- window(austourists, start=2005)
fit <- ets(vndata)
summary(fit)
## ETS(M,A,M)
##
## Call:
## ets(y = vndata)
##
## Smoothing parameters:
## alpha = 0.47
## beta = 1e-04
## gamma = 1e-04
##
## Initial states:
## l = 32.3107
## b = 0.6903
## s=1.0237 0.9623 0.7671 1.2469
##
## sigma: 0.0328
##
## AIC AICc BIC
## 220.0729 225.3670 236.1306
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.04344974 1.565227 1.255809 -0.03986347 2.710219 0.4115083
## ACF1
## Training set -0.1101862
plot(forecast(fit,h=8),
ylab="International visitor night in Australia (millions)")
books contains the daily sales of paperback and hardcover books at the same store. Let ses() select the optimal value of \(α\). Use this value to generate forecasts for the next four days.paperback and hardback series and compute four-day forecasts in each case.For additional/more advanced forecasting techniques, see below topics: