Exponential Smoothing

September 17, 2019

Agenda

Overview
Simple methods
Trend methods
Seasonal methods
Taxonomy of Exponential Smoothing Methods
Innovations State Space Models
Estimation and Model Selection
Forecasting with ETS models

Overview

What’s exponential smoothing?

When making predictions, weight recent observations more than older observations
Specifically, decrease weights geometrically / exponentially when moving backward
Window function (a.k.a. apodization function or tapering function)
- zero-valued outside of some chosen interval
- normally symmetric around middle of interval
- typically near maximum in the middle and tapering away from there
Easily learned and applied (i.e. signal processing community in 1940’s)

Simple Exponential Smoothing

SES Methods

Time series \(y_1,y_2,\dots,y_T\).

Random walk forecast

\[ \begin{aligned} \hat{y}_{T+h|T} = y_T \end{aligned} \]

Average forecast

\[ \begin{aligned} \hat{y}_{T+h|T} = \frac1T\sum_{t=1}^T y_t \end{aligned} \]

Consider a happy median that privileges recent information…

Simple Exponential Smoothing

SES Methods

Forecast equation \[ \begin{aligned} \hat{y}_{T+1|T} = \alpha y_T + \alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2}+ \cdots \end{aligned} \] where \(0 \le \alpha \le1\)

Weighted moving average with weights that decrease exponentially

for \(y_{T}\) and \(\alpha = .2\), the weighting is \(.2\)
for \(y_{T-1}\), the weighting is \(.2 * (1 - .2) = .16\)
for \(y_{T-2}\), \(.2 * (1 - .2)^2 = .128\)
etc.

Simple Exponential Smoothing

Component Form

Component form: forecast equation \[ \begin{align*} \hat{y}_{t+h|t} = \ell_{t} \end{align*} \]

Component form: smoothing equation \[ \begin{align*} \ell_{t} = \alpha y_{t} + (1 - \alpha)\ell_{t-1} \end{align*} \]

\(\ell_t\) is level / smoothed value of series at time \(t\)

\(\hat{y}_{t+1|t} = \alpha y_t + (1-\alpha) \hat{y}_{t|t-1}\)
Iterate for exponentially weighted moving average form

Simple Exponential Smoothing

Weighted Average Form

Weighted average form \[ \begin{align*} \hat{y}_{T+1|T}=\sum_{j=0}^{T-1} \alpha(1-\alpha)^j y_{T-j}+(1-\alpha)^T \ell_{0} \end{align*} \]

Simple Exponential Smoothing

Optimization

To choose value for \(\alpha\) and \(\ell_0\), minimize SSE: \[ \begin{aligned} \text{SSE}=\sum_{t=1}^T(y_t - \hat{y}_{t|t-1})^2. \end{aligned} \]

Unlike regression, no closed-form solution so use numerical optimization

Simple Exponential Smoothing

Example: Oil production

oildata <- window(oil, start=1996)
fc <- ses(oildata, h=5)
autoplot(fc) +
  autolayer(fitted(fc), series="Fitted") +
  ylab("Oil (millions of tonnes)") + xlab("Year")

Simple Exponential Smoothing

Example: Oil production

Trend Methods

Holt’s Linear Trend

Component form: forecast \[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t+h}b_{t} \end{align*} \]

Component form: level \[ \begin{align*} \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \end{align*} \]

Component form: trend \[ \begin{align*} b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1} \end{align*} \]

Trend Methods

Holt’s Linear Trend (cont’d)

Two smoothing parameters \(\alpha\) and \(\beta^*\) \((0\le\alpha,\beta^*\le1)\)
\(\ell_t\) level: weighted average between \(y_t\) and one-step ahead forecast for time \(t\), \((\ell_{t-1} + b_{t-1} = \hat{y}_{t|t-1})\)
\(b_t\) slope: weighted average of \((\ell_{t} - \ell_{t-1})\) and \(b_{t-1}\), current and previous estimate of slope
Select \(\alpha, \beta^*, \ell_0, b_0\) to minimise SSE

Trend Methods

Example: Holt’s Method

window(ausair, start=1990, end=2004) %>%
  holt(h=5, PI=FALSE) %>%
  autoplot()

Trend Methods

Damped Trend Method

Component form \[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)\phi b_{t-1} \end{align*} \]

Damping parameter: \(0<\phi<1\)
If \(\phi=1\), identical to Holt’s linear trend
As \(h\rightarrow\infty\), \(\hat{y}_{T+h|T}\rightarrow \ell_T+\phi b_T/(1-\phi)\)
Short-run forecasts trended, long-run forecasts constant

Seasonal Methods

Holt-Winters Additive Method

Holt’s method extended to capture seasonality

Component form \[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + hb _{t} + s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m}, \end{align*} \]

\(k\) is integer part of \((h-1)/m\), ensuring final year estimates in forecast
Parameters: \(0\le \alpha\le 1\), \(0\le \beta^*\le 1\), \(0\le \gamma\le 1-\alpha\) and \(m\) for period of seasonality (e.g. \(m=4\) for quarterly data)

Seasonal Methods

Holt-Winters Additive Method

Seasonal component usually expressed as \(s_{t} = \gamma^* (y_{t}-\ell_{t}) + (1-\gamma^*)s_{t-m}\)
Substitute for \(\ell_t\): \(s_{t} = \gamma^* (1-\alpha) (y_{t}-\ell_{t-1}-b_{t-1}) + [1-\gamma^*(1-\alpha)]s_{t-m}\)
Set \(\gamma=\gamma^*(1-\alpha)\)
Usual parameter restriction is \(0\le\gamma^*\le1\) or \(0\le\gamma\le(1-\alpha)\)

Seasonal Methods

Holt-Winters Multiplicative Method

When seasonal variations change in proportion to level of series

Component form

\[ \begin{align*} \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t})s_{t+h-m(k+1)}\\ \ell_{t} &= \alpha \frac{y_{t}}{s_{t-m}} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t}-\ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + b_{t-1})} + (1 - \gamma)s_{t-m} \end{align*} \]

For additive method, \(s_t\) is in absolute terms: within each year \(\sum_i s_i \approx 0\)
For multiplicative method, \(s_t\) is in relative terms: within each year \(\sum_i s_i \approx m\)

Seasonal Methods

Example: Visitor Nights

aust <- window(austourists,start=2005)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")

tmp <- cbind(Data=aust,
  "HW additive forecasts" = fit1[["mean"]],
  "HW multiplicative forecasts" = fit2[["mean"]])

autoplot(tmp) + xlab("Year") +
  ylab("International visitor night in Australia (millions)") +
  scale_color_manual(name="",
    values=c('#000000','#1b9e77','#d95f02'),
    breaks=c("Data","HW additive forecasts","HW multiplicative forecasts"))

Seasonal Methods

Example: Visitor Nights

Seasonal Methods

Example: Visitor nights estimated components

addstates <- fit1$model$states[,1:3]
multstates <- fit2$model$states[,1:3]
colnames(addstates) <- colnames(multstates) <-
  c("level","slope","season")
p1 <- autoplot(addstates, facets=TRUE) + xlab("Year") +
  ylab("") + ggtitle("Additive states")
p2 <- autoplot(multstates, facets=TRUE) + xlab("Year") +
  ylab("") + ggtitle("Multiplicative states")
gridExtra::grid.arrange(p1,p2,ncol=2)

Seasonal Methods

Example: Vistor nights estimated components

Seasonal Methods

Holt-Winters Damped Method

\[ \begin{align*} \hat{y}_{t+h|t} &= [\ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t}]s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} / s_{t-m}) + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + \phi b_{t-1})} + (1 - \gamma)s_{t-m} \end{align*} \]

Often single most accurate forecasting method for seasonal data

Seasonal Methods

Example: Damped trend

air <- window(ausair, start = 1990, end = 2004)
fit3 <- holt(air, h = 15)
fit4 <- holt(air, damped = TRUE, phi = .9, h = 15)
autoplot(air) +
  autolayer(fit3, series = 'Holt\'s method', PI = FALSE) +
  autolayer(fit4, series = 'Dampled Holt\'s method', PI = FALSE) +
  ggtitle('Forecasts from Holt\'s method') +
  xlab('Year') +
  ylab('Air passengers in Australia (millions)') +
  guides(color = guide_legend(title = 'Forecast'))

Seasonal Methods

Example: Damped trend

Taxonomy of Exponential Smoothing Methods

Taxonomy

Exponential Smoothing Methods

	SEASONAL COMPONENT
TREND COMPONENT	None (N)	Additive (A)	Multiplicative (M)
None (N)	(N, N)	(N, A)	(N, M)
Additive (A)	(A, N)	(A, A)	(A, M)
Additive Damped (A\(_d\))	(A\(_d\), N)	(A\(_d\), A)	(A\(_d\), M)

Taxonomy

Variations Explained

Short hand representation of methods:

(N, N): Simple exponential smoothing
(A, N): Holt’s linear method
(A\(_d\), N): Additive damped trend method
(A, A): Additive Holt-Winters’ method
(A, M): Multiplicative Holt-Winters’ method
(A\(_d\), M): Holt-Winters’ damped method

Taxonomy

R Functions

Simple Exponential Smoothing: No Trend
> ses(y)

Holt’s Method: Linear Trend
> holt(y)

Damped trend Method.
> holt(y, damped=TRUE)

Holt-Winters methods
> hw(y, damped=TRUE, seasonal="additive")
> hw(y, damped=FALSE, seasonal="additive")
> hw(y, damped=TRUE, seasonal="multiplicative")
> hw(y, damped=FALSE, seasonal="multiplicative")

Innovations State Space Models

Exponential Smoothing Models vs. Methods

Methods: Apply algorithms to obtain point forecasts
Models: Apply stochastic generating process to forecast an entire distribution.
- Generate point forecasts and forecast intervals
- Beneficial for model selection

Innovations State Space Models

Overview

Each model contains two equations:
- Observation equation for observed data components.
- State equation for unobserved changes in level, trend, and seasonal components.
Each method contains two models distinguished by either additive or multiplicative errors.
Error, Trend, Seasonal (ETS) labeling is used to differentiate models, methods, and type.

Innovations State Space Models

ETS Structure

18 variations of ETS models exist and are classified as follows:

Error: \(\{\)\(A\) | \(M\)\(\}\)
- Additive Errors
- Multipicative Errors
Trend: \(\{\)\(N\) | \(A\) | \(A_d\)\(\}\)
Seasonal: \(\{\)\(N\) | \(A\) | \(M\)\(\}\)

Innovations State Space Models

ETS Model Classification Examples

\((\)\(A\), \(N\), \(N\)\()\): Simple exponential smoothing; additive errors
\((\)\(M\), \(N\), \(N\)\()\): Simple exponential smoothing; multiplicative errors
\((\)\(A\), \(A\), \(N\)\()\): Holt’s linear method; additive errors
\((\)\(M\), \(A\), \(N\)\()\): Holt’s linear method; multiplicative errors
\((\)\(M\), \(A\), \(M\)\()\): Multiplicative Holt-Winters’ method; multiplicative errors

Estimation and Model Selection

Estimating ETS models

Smoothing parameters \(\alpha\), \(\beta\), \(\gamma\) and \(\phi\), and the initial states \(\ell_0\), \(b_0\), \(s_0,s_{-1},\dots,s_{-m+1}\) are estimated by maximising the “likelihood” or the probability of the data arising from the specified model.
Constraints are placed on parameters for weighted averages and to prevent numerical difficulties in estimating the models.
We will estimate models with the ets() function in the forecast package.

Estimation and Model Selection

Selecting ETS models

Akaike’s Information Criterion (AIC):

\[ \text{AIC} = -2\log(\text{L}) + 2k \]

Corrected Akaike’s Information Criterion (AIC\(_c\)):

\[ \text{AIC}_{\text{c}} = \text{AIC} + \frac{2(k+1)(k+2)}{T-k} \]

Schwarz’s Bayesian Information Criterion (BIC):

\[ \text{BIC} = \text{AIC} + k(\log(T)-2) \]

Estimation and Model Selection

Automatic Forecasting: Tourist Example

Automatic forecasting using the ets() function:

Automatically selects models when arguments are set to default.
Produces forecasts using best method.
Obtain forecast intervals using underlying state spacemodel.

Tourist example from text:

aust <- window(austourists, start=2005) 
fit <- ets(aust) 
summary(fit)

Estimation and Model Selection

Tourist Example: Summary

## ETS(M,A,M) 
## 
## Call:
##  ets(y = aust) 
## 
##   Smoothing parameters:
##     alpha = 0.1908 
##     beta  = 0.0392 
##     gamma = 2e-04 
## 
##   Initial states:
##     l = 32.3679 
##     b = 0.9281 
##     s = 1.0218 0.9628 0.7683 1.2471
## 
##   sigma:  0.0383
## 
##      AIC     AICc      BIC 
## 224.8628 230.1569 240.9205 
## 
## Training set error measures:
##                      ME     RMSE     MAE        MPE     MAPE     MASE
## Training set 0.04836907 1.670893 1.24954 -0.1845609 2.692849 0.409454
##                   ACF1
## Training set 0.2005962

Estimation and Model Selection

Tourist Example: Summary (Continued)

Model evaluation:

##           AIC     AICc      BIC
## [1,] 224.8628 230.1569 240.9205

Train set error measures:

##                      ME     RMSE     MAE        MPE     MAPE     MASE
## Training set 0.04836907 1.670893 1.24954 -0.1845609 2.692849 0.409454
##                   ACF1
## Training set 0.2005962

Estimation and Model Selection

Tourist Example: `autoplot(fit)`

Estimation and Model Selection

Tourist Example: Residuals Plot

cbind('Residuals' = residuals(fit), 
      'Forecast errors' = residuals(fit, type='response')) %>% 
  autoplot(facet=TRUE) + xlab("Year") + ylab("")

Estimation and Model Selection

Tourist Example: Residuals Plot

Forecasting with ETS models

Prediction Intervals

Models only! Cannot be generated using the methods.

The prediction intervals will differ between models with additive and multiplicative errors.
More general to simulate future sample paths, conditional on the last estimate of the states, and to obtain prediction intervals from the percentiles of these simulated future paths.
Options are available in R using the forecast function in the forecast package.

Forecasting with ETS models

Tourist Example: Forecast

fit %>% forecast(h=8) %>%
  autoplot() +
  ylab("International visitor night in Australia (millions)")

Agenda

Overview

Overview

What’s exponential smoothing?

Simple Exponential Smoothing

Simple Exponential Smoothing

SES Methods

Simple Exponential Smoothing

SES Methods

Simple Exponential Smoothing

Component Form

Simple Exponential Smoothing

Weighted Average Form

Simple Exponential Smoothing

Optimization

Simple Exponential Smoothing

Example: Oil production

Simple Exponential Smoothing

Example: Oil production

Trend Methods

Trend Methods

Holt’s Linear Trend

Trend Methods

Holt’s Linear Trend (cont’d)

Trend Methods

Example: Holt’s Method

Trend Methods

Damped Trend Method

Seasonal Methods

Seasonal Methods

Holt-Winters Additive Method

Seasonal Methods

Holt-Winters Additive Method

Seasonal Methods

Holt-Winters Multiplicative Method

Seasonal Methods

Example: Visitor Nights

Seasonal Methods

Example: Visitor Nights

Seasonal Methods

Example: Visitor nights estimated components

Seasonal Methods

Example: Vistor nights estimated components

Seasonal Methods

Holt-Winters Damped Method

Seasonal Methods

Example: Damped trend

Seasonal Methods

Example: Damped trend

Taxonomy of Exponential Smoothing Methods

Taxonomy

Exponential Smoothing Methods

Taxonomy

Variations Explained

Taxonomy

R Functions

Innovations State Space Models

Innovations State Space Models

Exponential Smoothing Models vs. Methods

Innovations State Space Models

Overview

Innovations State Space Models

ETS Structure

Innovations State Space Models

ETS Model Classification Examples

Estimation and Model Selection

Estimation and Model Selection

Estimating ETS models

Estimation and Model Selection

Selecting ETS models

Estimation and Model Selection

Automatic Forecasting: Tourist Example

Estimation and Model Selection

Tourist Example: Summary

Estimation and Model Selection

Tourist Example: Summary (Continued)

Estimation and Model Selection

Tourist Example: autoplot(fit)

Estimation and Model Selection

Tourist Example: Residuals Plot

Tourist Example: `autoplot(fit)`