exponential smoothing

Exponential smoothing

Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older.

In other words, the more recent the observation the higher the associated weight.

This framework generates reliable forecasts quickly and for a wide spectrum of time series which is a great advantage and of major importance to applications in industry.

Simple exponential smoothing

This method is suitable for forecasting data with no trend or seasonal pattern, although the mean of the data may be changing slowly over time.

The naïve method assumes that the most current observation is the only important one and all previous observations provide no information for the future (\(\hat{y}_{T+h|T} = y_T,\) for \(h=1,2,\dots\)). This can be thought of as a weighted average where all the weight is given to the last observation.

The average method assumes that all observations are of equal importance and they are given equal weight when generating forecasts (\(\hat{y}_{T+h|T} = \frac1T \sum_{t=1}^T y_t,\) for \(h=1,2,\dots\)).

oildata <- window(oil,start=1996,end=2007)
plot(oildata, ylab="Oil (millions of tonnes)",xlab="Year")

We often want something between these two extremes. For example it may be sensible to attach larger weights to more recent observations than to observations from the distant past. This is exactly the concept behind simple exponential smoothing. Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past — the smallest weights are associated with the oldest observations:

\[\hat{y}_{T+1|T} = \alpha y_T + \alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2}+ \cdots,\]

where \(0 \le \alpha \le 1\) is the smoothing parameter. The one-step-ahead forecast for time \(T+1\) is a weighted average of all the observations in the series \(y_1,\dots,y_T\). The rate at which the weights decrease is controlled by the parameter \(\alpha\).

The one-step-ahead forecast for time \(T+1\) is a weighted average of all the observations in the series \(y_1,\dots,y_T\). The rate at which the weights decrease is controlled by the parameter \(\alpha\).

Note that the sum of the weights even for a small \(\alpha\) will be approximately equal to one.

For any \(\alpha\) between 0 and 1, the weights attached to lagged observations decrease exponentially. If \(\alpha\) is small (i.e., close to 0), more weight is given to observations from the more distant past. If \(\alpha\) is large (i.e., close to 1), more weight is given to the more recent observations.

At the extreme case where \(\alpha=1\), \(\hat{y}_{T+1|T}=y_T\) and forecasts are equal to the naive forecasts.

Weighted average form

The forecast at time \(t+1\) is equal to a weighted average between the most recent observation \(y_t\) and the most recent forecast \(\hat{y}_{t|t-1}\), \[\hat{y}_{t+1|t} = \alpha y_t + (1-\alpha) \hat{y}_{t|t-1} \] for \(t=1,\dots,T\), where \(0 \le \alpha \le 1\) is the smoothing parameter.

The process has to start somewhere, so we let the first forecast of \(y_1\) be denoted by \(\ell_0\). Then

\[ \begin{align*} \hat{y}_{2|1} &= \alpha y_1 + (1-\alpha) \ell_0\\ \hat{y}_{3|2} &= \alpha y_2 + (1-\alpha) \hat{y}_{2|1}\\ \hat{y}_{T+1|T} &= \alpha y_T + (1-\alpha) \hat{y}_{T|T-1} \end{align*} \]

Then substituting each equation into the following equation, we obtain \[ \begin{align*} \hat{y}_{3|2} &= \alpha y_2 + (1-\alpha) \left[\alpha y_1 + (1-\alpha) \ell_0\right]\\ \nonumber &= \alpha y_2 + \alpha(1-\alpha) y_1 + (1-\alpha)^2 \ell_0\\ \hat{y}_{4|3} &= \alpha y_3 + (1-\alpha) [\alpha y_2 + \alpha(1-\alpha) y_1 + (1-\alpha)^2 \ell_0]\\ &= \alpha y_3 + \alpha(1-\alpha) y_2 + \alpha(1-\alpha)^2 y_1 + (1-\alpha)^3 \ell_0\\ &\vdots \end{align*} \] \[ \begin{align*} \hat{y}_{T+1|T} &= \sum_{j=0}^{T-1} \alpha(1-\alpha)^j y_{T-j} + (1-\alpha)^T \ell_{0}. \end{align*} \]

Component form

For simple exponential smoothing the only component included is the level, \(\ell_t\). (Other methods considered later in this chapter may also include a trend \(b_t\) and seasonal component \(s_t\).)

Component form representations of exponential smoothing methods comprise a forecast equation and a smoothing equation for each of the components included in the method: \[ \begin{align*} \text{Forecast equation}&&\hat{y}_{t+1|t} &= \ell_{t}\\ \text{Smoothing equation}&&\ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1}, \end{align*} \] where \(\ell_{t}\) is the level (or the smoothed value) of the series at time \(t\).

The forecast equation shows that the forecasted value at time \(t+1\) is the estimated level at time \(t\). The smoothing equation for the level (usually referred to as the level equation) gives the estimated level of the series at each period \(t\).

Applying the forecast equation for time \(T\) gives, \(\hat{y}_{T+1|T} = \ell_{T}\), the most recent estimated level.

If we replace \(\ell_{t}\) by \(\hat{y}_{t+1|t}\) and \(\ell_{t-1}\) by \(\hat{y}_{t|t-1}\) in the smoothing equation, we will recover the weighted average form of simple exponential smoothing.

Error correction form

The third form of simple exponential smoothing is obtained by re-arranging the level equation in the component form to get what we refer to as the error correction form \[ \begin{align*} \ell_{t} &= \ell_{t-1}+\alpha( y_{t}-\ell_{t-1})\\ &= \ell_{t-1}+\alpha e_{t} \end{align*} \] where \(e_{t}=y_{t}-\ell_{t-1}=y_{t}-\hat{y}_{t|t-1}\) for \(t=1,\dots,T\).

That is, \(e_{t}\) is the one-step within-sample forecast error at time \(t\).

The within-sample forecast errors lead to the adjustment/correction of the estimated level throughout the smoothing process for \(t=1,\dots,T\).

For example, if the error at time \(t\) is negative, then \(\hat{y}_{t|t-1} > y_t\) and so the level at time \(t-1\) has been over-estimated. The new level \(\ell_t\) is then the previous level \(\ell_{t-1}\) adjusted downwards. The closer \(\alpha\) is to one the rougher the estimate of the level (large adjustments take place). The smaller the \(\alpha\) the “smoother” the level (small adjustments take place).

Multi-horizon Forecasts

Simple exponential smoothing has a flat forecast function, and therefore for longer forecast horizons, \[\hat{y}_{T+h|T}=\hat{y}_{T+1|T}=\ell_T, \qquad h=2,3,\dots.\] Remember these forecasts will only be suitable if the time series has no trend or seasonal component.

Initialisation

For simple exponential smoothing we need to specify an initial value for the level, \(\ell_0\). Hence \(\ell_0\) plays a role in all forecasts generated by the process. In general, the weight attached to \(\ell_0\) is small.

However, in the case that \(\alpha\) is small and/or the time series is relatively short, the weight may be large enough to have a noticeable effect on the resulting forecasts. Therefore, selecting suitable initial values can be quite important.

A common approach is to set \(\ell_0=y_1\) (recall that \(\ell_0=\hat{y}_{1|0}\)).

An alternative approach is to use optimization to estimate the value of \(\ell_0\) rather than set it to some value. Even if optimization is used, selecting appropriate initial values can assist the speed and precision of the optimization process.

Optimization

For every exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter (\(\alpha\)).

There are cases where the smoothing parameters may be chosen in a subjective manner - the forecaster specifies the value of the smoothing parameters based on previous experience.

However, a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.

Similarly to a regression model, the unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE.

The errors are specified as \(e_t=y_t - \hat{y}_{t|t-1}\) for \(t=1,\dots,T\) (the one-step-ahead within-sample forecast errors).

\[\text{SSE}=\sum_{t=1}^T(y_t - \hat{y}_{t|t-1})^2=\sum_{t=1}^Te_t^2.\]

This involves a non-linear minimization problem and we need to use an optimization tool to perform this.

Example

fit1 <- ses(oildata, alpha=0.2, initial="simple", h=3)
fit2 <- ses(oildata, alpha=0.6, initial="simple", h=3)
fit3 <- ses(oildata, h=3)
plot(fit1, plot.conf=FALSE, ylab="Oil (millions of tonnes)",
  xlab="Year", main="", fcol="white", type="o")
lines(fitted(fit1), col="blue", type="o")
lines(fitted(fit2), col="red", type="o")
lines(fitted(fit3), col="green", type="o")
lines(fit1$mean, col="blue", type="o")
lines(fit2$mean, col="red", type="o")
lines(fit3$mean, col="green", type="o")
legend("topleft",lty=1, col=c(1,"blue","red","green"), 
  c("data", expression(alpha == 0.2), expression(alpha == 0.6),
  expression(alpha == 0.89)),pch=1)

## Warning in plot.window(xlim, ylim, log, ...): "plot.conf" is not a
## graphical parameter

## Warning in title(main = main, xlab = xlab, ylab = ylab, ...): "plot.conf"
## is not a graphical parameter

## Warning in axis(1, ...): "plot.conf" is not a graphical parameter

## Warning in axis(2, ...): "plot.conf" is not a graphical parameter

## Warning in box(...): "plot.conf" is not a graphical parameter

Holt’s linear trend method

Extended simple exponential smoothing to allow forecasting of data with a trend. This method involves a forecast equation and two smoothing equations (one for the level and one for the trend):

\[ \begin{align*} \text{Forecast equation}&& \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level equation}&& \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend equation}&& b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1} \end{align*} \]

where \(\ell_t\) denotes an estimate of the level of the series at time \(t\), \(b_t\) denotes an estimate of the trend (slope) of the series at time \(t\), \(\alpha\) is the smoothing parameter for the level, \(0\le\alpha\le1\) and \(\beta^*\) is the smoothing parameter for the trend, \(0\le\beta^*\le1\) (we denote this as \(\beta^*\) instead of \(\beta\).

As with simple exponential smoothing, the level equation here shows that \(\ell_t\) is a weighted average of observation \(y_t\) and the within-sample one-step-ahead forecast for time \(t\), here given by \(\ell_{t-1} + b_{t-1}\). The trend equation shows that \(b_t\) is a weighted average of the estimated trend at time \(t\) based on \(\ell_{t} - \ell_{t-1}\) and \(b_{t-1}\), the previous estimate of the trend.

The forecast function is no longer flat but trending. The \(h\)-step-ahead forecast is equal to the last estimated level plus \(h\) times the last estimated trend value. Hence the forecasts are a linear function of \(h\).

The error correction form of the level and the trend equations show the adjustments in terms of the within-sample one-step forecast errors: \[ \begin{align*} \ell_{t} &= \ell_{t-1} + b_{t-1}+\alpha e_t\\ b_{t} &= b_{t-1} + \alpha \beta^*e_t \end{align*} \] where \(e_t=y_{t} -(\ell_{t-1} + b_{t-1})=y_{t}-\hat{y}_{t|t-1}\).

# Run in R
library(fpp)
air <- window(ausair,start=1990,end=2004)
fit1 <- holt(air, alpha=0.8, beta=0.2, initial="simple", h=5) 
fit2 <- holt(air,alpha=0.8,beta=0.2,initial="simple",exponential=TRUE,h=5) 
# Results for first model:
fit1$model$state
fitted(fit1)
fit1$mean
fit3 <- holt(air, alpha=0.8, beta=0.2, damped=TRUE, initial="simple", h=5) 
plot(fit2, type="o", ylab="Air passengers in Australia (millions)", 
     xlab="Year", fcol="white", plot.conf=FALSE)
lines(fitted(fit1), col="blue") 
lines(fitted(fit2), col="red")
lines(fitted(fit3), col="green")
lines(fit1$mean, col="blue", type="o") 
lines(fit2$mean, col="red", type="o")
lines(fit3$mean, col="green", type="o")
legend("topleft", lty=1, col=c("black","blue","red","green"), 
   c("Data","Holt's lin trend","Expon trend","Additive damped trend"))

## Warning in holt(air, alpha = 0.8, beta = 0.2, damped = TRUE, initial =
## "simple", : Damped Holt's method requires optimal initialization

## Warning in plot.window(xlim, ylim, log, ...): "plot.conf" is not a
## graphical parameter

## Warning in title(main = main, xlab = xlab, ylab = ylab, ...): "plot.conf"
## is not a graphical parameter

## Warning in axis(1, ...): "plot.conf" is not a graphical parameter

## Warning in axis(2, ...): "plot.conf" is not a graphical parameter

## Warning in box(...): "plot.conf" is not a graphical parameter

Exponential trend method

A variation from Holt’s linear trend method is achieved by allowing the level and the slope to be multiplied rather than added:

\[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t} b_{t}^h\\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} b_{t-1})\\ b_{t} &= \beta^*\frac{\ell_{t}}{ \ell_{t-1}} + (1 -\beta^*)b_{t-1} \end{align*} \]

where \(b_t\) now represents an estimated growth rate (in relative terms rather than absolute) which is multiplied rather than added to the estimated level. The trend in the forecast function is now exponential rather than linear, so that the forecasts project a constant growth rate rather than a constant slope.

The error correction form is

\[ \begin{align*} \ell_{t} &= \ell_{t-1} b_{t-1}+\alpha e_{t}\\ b_{t} &= b_{t-1}+ \alpha \beta^*\frac{e_{t}}{\ell_{t-1}} \end{align*} \] where \(e_t=y_{t} -(\ell_{t-1}b_{t-1})=y_{t}-\hat{y}_{t|t-1}\).

Damped trend methods

The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Even more extreme are the forecasts generated by the exponential trend method which include exponential growth or decline.

Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Motivated by this, a dampening parameter is introduced so that the trend approaches a flat line some time in the future.

Additive damped trend

In conjunction with the smoothing parameters \(\alpha\) and \(\beta^*\) (with values between 0 and 1 as in Holt’s method), this method also includes a damping parameter \(0< \phi <1\):

\[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + (\phi+\phi^2 + \dots + \phi^{h}) b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)\phi b_{t-1}. \end{align*} \]

If \(\phi=1\) the method is identical to Holt’s linear method. For values between \(0\) and \(1\), \(\phi\) dampens the trend so that it approaches a constant some time in the future.

In fact the forecasts converge to \(\ell_T+\phi b_T/(1-\phi)\) as \(h\rightarrow\infty\) for any value \(0< \phi <1\). The effect of this is that short-run forecasts are trended while long-run forecasts are constant.

Multiplicative damped trend

Motivated by the improvements in forecasting performance seen in the additive damped trend case, this method introduces a damping parameter to the exponential trend method:

\[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t}b_{t}^{(\phi+\phi^2 + \dots + \phi^{h})} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1} b^\phi_{t-1}\\ b_{t} &= \beta^*\frac{\ell_{t}}{\ell_{t-1}} + (1 -\beta^*)b_{t-1}^{\phi}. \end{align*} \]

This method will produce even more conservative forecasts than the additive damped trend method when compared to Holt’s linear method.

The error correction form of the additive damped trend smoothing equations is

\[ \begin{align*} \ell_{t} &= \ell_{t-1} + \phi b_{t-1} + \alpha e_{t} \\ b_{t} &= \phi b_{t-1}+ \alpha \beta^*e_{t}. \end{align*} \]

The error correction form of the multiplicative damped trend smoothing equations is

\[ \begin{align*} \ell_{t} &= \ell_{t-1} b^\phi_{t-1}+\alpha e_{t}\\ b_{t} &= b^\phi_{t-1}+ \alpha \beta^*\frac{e_{t}}{\ell_{t-1}}. \end{align*} \]

Example

# Run in R
library(fpp)
livestock2 <- window(livestock,start=1970,end=2000)
fit1 <- ses(livestock2)
fit2 <- holt(livestock2)
fit3 <- holt(livestock2,exponential=TRUE)
fit4 <- holt(livestock2,damped=TRUE)
fit5 <- holt(livestock2,exponential=TRUE,damped=TRUE)
# Results for first model:
fit1$model
accuracy(fit1) # training set
accuracy(fit1,livestock) # test set

plot(fit2$model$state)
plot(fit4$model$state)
plot(fit3, type="o", ylab="Livestock, sheep in Asia (millions)", 
  flwd=1, plot.conf=FALSE)
lines(window(livestock,start=2001),type="o")
lines(fit1$mean,col=2)
lines(fit2$mean,col=3)
lines(fit4$mean,col=5)
lines(fit5$mean,col=6)
legend("topleft", lty=1, pch=1, col=1:6,
    c("Data","SES","Holt's","Exponential",
      "Additive Damped","Multiplicative Damped"))

## Warning in plot.window(xlim, ylim, log, ...): "plot.conf" is not a
## graphical parameter

## Warning in title(main = main, xlab = xlab, ylab = ylab, ...): "plot.conf"
## is not a graphical parameter

## Warning in axis(1, ...): "plot.conf" is not a graphical parameter

## Warning in axis(2, ...): "plot.conf" is not a graphical parameter

## Warning in box(...): "plot.conf" is not a graphical parameter

Holt-Winters seasonal method

The Holt-Winters seasonal method consists of the forecast equation and three smoothing equations - one for the level \(\ell_t\), one for trend \(b_t\), and one for the seasonal component denoted by \(s_t\), with smoothing parameters \(\alpha\), \(\beta^*\) and \(\gamma\). We use \(m\) to denote the period of the seasonality, i.e., the number of seasons in a year.

There are two variations to this method that differ in the nature of the seasonal component. The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.

Holt-Winters additive method

The component form for the additive method is:

\[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} + s_{t-m+h_m^+} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m}, \end{align*} \]

where \(h_m^+=\lfloor(h-1)\mod~m\rfloor+1\), which ensures that the estimates of the seasonal indices used for forecasting come from the final year of the sample. (The notation \(\lfloor u \rfloor\) means the largest integer not greater than \(u\).)

The level equation shows a weighted average between the seasonally adjusted observation \((y_{t} - s_{t-m})\) and the non-seasonal forecast \((\ell_{t-1}+b_{t-1})\) for time \(t\). The trend equation is identical to Holt’s linear method.

The seasonal equation shows a weighted average between the current seasonal index, \((y_{t}-\ell_{t-1}-b_{t-1})\), and the seasonal index of the same season last year (i.e., \(m\) time periods ago).

The error correction form of the smoothing equations is:

\[ \begin{align*} \ell_{t} &= \ell_{t-1} + b_{t-1}+\alpha e_{t}\\ b_{t} &= b_{t-1} + \alpha \beta^*e_{t}\\ s_{t} &= s_{t-m}+\gamma e_t. \end{align*} \]

where \(e_t=y_{t} - (\ell_{t-1} + b_{t-1}+s_{t-m})=y_{t} - \hat{y}_{t|t-1}\) are the one-step training forecast errors.

Holt-Winters multiplicative method

The component form for the multiplicative method is:

\[ \begin{align*} \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t})s_{t-m+h_{m}^{+}}. \\ \ell_{t} &= \alpha \frac{y_{t}}{s_{t-m}} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t}-\ell_{t-1}) + (1 - \beta^*)b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + b_{t-1})} + (1 - \gamma)s_{t-m} \end{align*} \]

and the error correction representation is:

\[ \begin{align*} \ell_{t} &= \ell_{t-1} + b_{t-1}+\alpha\frac{e_{t}}{s_{t-m}}\\ b_{t} &= b_{t-1}+\alpha\beta^*\frac{e_{t}}{s_{t-m}}\\ s_{t} &= s_{t}+\gamma \frac{e_{t}}{(\ell_{t-1} + b_{t-1})} \end{align*} \]

where \(e_t=y_t-(\ell_{t-1}+b_{t-1})s_{t-m}\).

Example

aust <- window(austourists,start=2005)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
plot(fit2,ylab="International visitor night in Australia (millions)",
     plot.conf=FALSE, type="o", fcol="white", xlab="Year")
lines(fitted(fit1), col="red", lty=2)
lines(fitted(fit2), col="green", lty=2)
lines(fit1$mean, type="o", col="red")
lines(fit2$mean, type="o", col="green")
legend("topleft",lty=1, pch=1, col=1:3, 
  c("data","Holt Winters' Additive","Holt Winters' Multiplicative"))
states <- cbind(fit1$model$states[,1:3],fit2$model$states[,1:3])
colnames(states) <- c("level","slope","seasonal","level","slope","seasonal")
plot(states, xlab="Year")
fit1$model$state[,1:3]
fitted(fit1)
fit1$mean

## Warning in plot.window(xlim, ylim, log, ...): "plot.conf" is not a
## graphical parameter

## Warning in title(main = main, xlab = xlab, ylab = ylab, ...): "plot.conf"
## is not a graphical parameter

## Warning in axis(1, ...): "plot.conf" is not a graphical parameter

## Warning in axis(2, ...): "plot.conf" is not a graphical parameter

## Warning in box(...): "plot.conf" is not a graphical parameter

Holt-Winters damped method

A method that is often the single most accurate forecasting method for seasonal data is the Holt-Winters method with a damped trend and multiplicative seasonality:

\[ \begin{align*} \hat{y}_{t+h|t} &= [\ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t}]s_{t-m+h_m^+}. \\ \ell_{t} &= \alpha(y_{t} / s_{t-m}) + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + \phi b_{t-1})} + (1 - \gamma)s_{t-m} \end{align*} \]

Taxonomy of exponential smoothing methods

Starting values

Method	Initial values
(N,N)	\(\ell_0=y_1\)
(A,N) (A_d,N)	\(\ell_0=y_1\), \(b_0=y_2-y_1\)
(M,N) (M_d,N)	\(\ell_0=y_1\), \(b_0=y_2/y_1\)
(A,A) (A_d,A)	\(\ell_0=\frac{1}{m}{\left(y_1+\dots+y_m\right)}\)
	\(b_0=\frac{1}{m}\left[\frac{y_{m+1}-y_{1}}{m}+\dots+\frac{y_{m+m}-y_{m}}{m} \right]\)
	\(s_0= y_m - \ell_0,~s_{-1} = y_{m-1} - \ell_0,~\dots,~s_{-m+1} = y_1 -\ell_0\)
(A,M) (A_d,M)	\(\ell_0=\frac{1}{m}{\left(y_1+\dots+y_m\right)}\)
	\(b_0=\frac{1}{m}\left[\frac{y_{m+1}-y_{1}}{m}+\dots+\frac{y_{m+m}-y_{m}}{m} \right]\)
	\(s_0 = y_m/\ell_0,~s_{-1} = y_{m-1}/\ell_0,~\dots,~s_{-m+1} = y_1 /\ell_0\)

Innovations state space models for exponential smoothing

The exponential smoothing methods presented so far are algorithms that generate point forecasts.

There are statistical models that generate the same point forecasts, but can also generate prediction (or forecast) intervals. A statistical model is a stochastic (or random) data generating process that can produce an entire forecast distribution.

Each model consists of a measurement equation that describes the observed data and some transition equations that describe how the unobserved components or states (level, trend, seasonal) change over time. Hence these are referred to as state space models.

Each state space model can be labeled as ETS (Error, Trend, Seasonal). The possibilities for each component are: Error ={A,M}, Trend ={N,A,A\(_d\),M,M\(_d\)} and Seasonal ={N,A,M}, where N =none, A = additive, M = multiplicative, and \(_d\)=damped.

Therefore, in total there exist 30 such state space models: 15 with additive errors and 15 with multiplicative errors.

The models can be identified using information criteria and estimated in R using the ets() function in the forecast package.

Example

oildata <- window(oil,start=1996,end=2007)
fit <- ets(oildata, model="ANN")
plot(forecast(fit, h=3), ylab="Oil (millions of tones)")

fit$par

##       alpha           l 
##   0.8919431 447.4881561

Another example

vndata <- window(austourists, start=2005)
fit <- ets(vndata)
summary(fit)

## ETS(M,N,M) 
## 
## Call:
##  ets(y = vndata) 
## 
##   Smoothing parameters:
##     alpha = 0.7673 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 32.9088 
##     s=1.0317 0.9438 0.7617 1.2628
## 
##   sigma:  0.0412
## 
##      AIC     AICc      BIC 
## 113.4198 120.4198 121.6661 
## 
## Training set error measures:
##                     ME    RMSE      MAE      MPE     MAPE      MASE
## Training set 0.7381341 1.64236 1.313633 1.781574 3.251206 0.4618879
##                    ACF1
## Training set -0.3173166

plot(fit)

plot(forecast(fit,h=8),
  ylab="International visitor night in Australia (millions)")

fit <- ets(vndata, model="MMM", damped=TRUE)
summary(fit)

## ETS(M,Md,M) 
## 
## Call:
##  ets(y = vndata, model = "MMM", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.4428 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9609 
## 
##   Initial states:
##     l = 32.3794 
##     b = 1.026 
##     s=1.0297 0.9449 0.7615 1.2639
## 
##   sigma:  0.0323
## 
##      AIC     AICc      BIC 
## 108.6564 125.5795 120.4370 
## 
## Training set error measures:
##                       ME     RMSE      MAE        MPE   MAPE      MASE
## Training set -0.06299026 1.320812 1.014136 -0.2239887 2.5147 0.3565816
##                    ACF1
## Training set -0.1208134

plot(fit)

plot(forecast(fit,h=8),
  ylab="International visitor night in Australia (millions)")

exponential smoothing

cx

30/04/2018