Exponential Smoothing

Textbook: Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia.

What is Exponential Smoothing

Exponential smoothing is forecasting of future observations using weighted averages of past observations, with the weights decaying exponentially as observations recede further into the past.

Where naive forecasting places 100% weight on the most recent observation and moving averages place equal weight on k values, exponential smoothing allows for weighted averages where greater weight can be placed on recent observations and lesser weight on older observations.

Exponential Smoothing Methods

1.Simple Exponential weighted average — Recent observations get higher weight, older observations less weight. Suitable for data with no clear trend or seasonal pattern.

Holt’s Linear Trend method - Extended simple exponential smoothing to deal with data with a trend.
Holt-Winters Seasonal method — Extended Holt’s method to capture seasonality
State Space—ETS models

Simple Exponential Smoothing

The simplest of the exponentially smoothing methods is called “simple exponential smoothing” (SES). SES is suitable for data with no trend or seasonal pattern.

For exponential smoothing, we weigh the recent observations more heavily than older observations. The weight of each observation is determined through the use of a smoothing parameter, which we will denote \(\alpha\). For a data set with T observations, we calculate our predicted value \(\hat y_{t+1}\), which will be based on \(y_1\) through \(y_t\) as follows:

\(\hat y_{t+1}=\alpha y_t + \alpha (1-\alpha)y_{t-1} +...+\alpha (1-\alpha)^{t-1}y_1\)

The above equation is called the weighted average form where \(\alpha\) is the smoothing parameter and alpha lies between 0 to 1. There is also the component form which uses the following set of equations:

Forecast Equation

\(\hat y_{t+1}=l_t\)

Smoothing Equation

\(l_t= \alpha y_t + (1-\alpha)l_{t-1}\)

In both equations we can see that the most weight is placed on the most recent observation. When \(\alpha\) is closer to 0, we consider this slow learning because the algorithm gives historical data more weight. When \(\alpha\) is closer to 1, we consider this fast learning because the algorithm gives more weight to the most recent observation and therefore, recent changes in the data will have a bigger impact on forecasted values.

The table below shows the weights attached to observations for four different values of \(\alpha\) when forecasting using simple exponential smoothing. Note that the sum of the weights even for a small value of \(\alpha\) will be approximately one for any reasonable sample size.

The decay is more clearly understood from the following chart:

Example

We will use SES to forecast oil production in Saudi Arabia. We will be using different values of \(\alpha\) to compare the forecasts.

oildata<-window(oil, start=1996)

#plot using different values of alpha
fit1 <- ses(oildata, alpha=0.2, initial="simple", h=5)
fit2 <- ses(oildata, alpha=0.6, initial="simple", h=5)
fit3 <- ses(oildata, initial="simple", h=5)
plot(fit1, plot.conf=FALSE, ylab="Oil (millions of tonnes)",
  xlab="Year", main="SES using different values of alpha", fcol="white", type="o")
lines(fitted(fit1), col="blue", type="o")
lines(fitted(fit2), col="red", type="o")
lines(fitted(fit3), col="darkgreen", type="o")
lines(fit1$mean, col="blue", type="o")
lines(fit2$mean, col="red", type="o")
lines(fit3$mean, col="darkgreen", type="o")
legend("topleft",lty=1, col=c(1,"blue","red","green"), 
  c("data", expression(alpha == 0.2), expression(alpha == 0.6),
  expression(alpha == 0.83)),pch=1)

summary(fit3)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = oildata, h = 5, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.8346 
## 
##   Initial states:
##     l = 445.3641 
## 
##   sigma:  28.1238
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE        ACF1
## Training set 6.478009 28.12383 22.20654 1.114821 4.598887 0.9235081 -0.03462133
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2014       542.6841 506.6420 578.7263 487.5624 597.8058
## 2015       542.6841 495.7381 589.6301 470.8864 614.4819
## 2016       542.6841 486.9273 598.4409 457.4114 627.9568
## 2017       542.6841 479.3302 606.0380 445.7927 639.5755
## 2018       542.6841 472.5513 612.8169 435.4253 649.9430

checkresiduals(fit3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 4.073, df = 3, p-value = 0.2537
## 
## Model df: 2.   Total lags used: 5

The SES method comes up with point forecasts. The prediction intervals show that there is considerable uncertainty in the future values of oil production over the five-year forecast period. So interpreting the point forecasts without accounting for the large uncertainty can be very misleading.

Holt’s Linear Trend method

Holt (1957) extended simple exponential smoothing to allow the forecasting of data with a trend. This method involves a forecast equation and two smoothing equations (one for the level and one for the trend):

\[\begin{align*} \text{Forecast equation}&& \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level equation}&& \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend equation}&& b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1} \end{align*}\]

\(l_t\) denotes an estimate of the level of the series at time t

\(b_t\) denotes an estimate of the trend (slope) of the series at time t

\(\alpha\) is the smoothing parameter for the level, 0≤α≤1

\(\beta*\)∗ is the smoothing parameter for the trend, 0≤β∗≤1 (we denote this as \(\beta*\)∗∗ instead of \(\beta\). The trend equation shows that \(b_t\) is a weighted average of the estimated trend at time t based on \(\ell_{t} - \ell_{t-1} + (1 -\beta^*)b_{t-1}\), the previous estimate of the trend. The special case when \(\beta^*=0\) is known as SES with drift.

Example

We will work on the Australian air passengers data, which has a trend component.

air<-window(ausair, start=1990)

fit1 <- holt(air, beta=0.2, h=5)
fit2 <- holt(air,  beta=0.6,  h=5)
fit3 <- holt(air,  beta=0.9,  h=5)
plot(fit1, ylab="Australia air passengers (millions)",
  xlab="Year", main="Holt's Linear Trend Method with different beta", fcol="white")
#lines(fitted(fit1), col="navyblue")
#lines(fitted(fit2), col="red")
#lines(fitted(fit3), col="darkgreen")
lines(fit1$mean, col="navyblue")
lines(fit2$mean, col="red")
lines(fit3$mean, col="darkgreen")
legend("topleft",lty=1, col=c(1,"navyblue","red","darkgreen"), 
  c("data", expression(beta == 0.2), expression(beta == 0.6),
  expression(beta == 0.9)),pch=1)

summary(fit2)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = air, h = 5, beta = 0.6) 
## 
##   Smoothing parameters:
##     alpha = 0.7129 
##     beta  = 0.6 
## 
##   Initial states:
##     l = 14.5764 
##     b = 3.6022 
## 
##   sigma:  2.8209
## 
##      AIC     AICc      BIC 
## 148.6596 150.4778 153.8429 
## 
## Error measures:
##                       ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.09892936 2.603583 1.684423 -0.4897047 3.851842 0.7331688
##                    ACF1
## Training set -0.1434201
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2017       73.92483 70.30968 77.53997 68.39594  79.45371
## 2018       75.92433 69.95806 81.89059 66.79971  85.04894
## 2019       77.92383 68.79046 87.05720 63.95554  91.89211
## 2020       79.92333 67.04134 92.80531 60.22203  99.62462
## 2021       81.92283 64.81764 99.02801 55.76270 108.08295

checkresiduals(fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 4.3362, df = 3, p-value = 0.2274
## 
## Model df: 4.   Total lags used: 7

Exponential trend method

A variation from Holt’s linear trend method is achieved by allowing the level and the slope to be multiplied rather than added:

\[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t} b_{t}^h\\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} b_{t-1})\\ b_{t} &= \beta^*\frac{\ell_{t}}{ \ell_{t-1}} + (1 -\beta^*)b_{t-1} \end{align*}\]

where \(b_t\) now represents an estimated growth rate (in relative terms rather than absolute) which is multiplied rather than added to the estimated level. The trend in the forecast function is now exponential rather than linear, so that the forecasts project a constant growth rate rather than a constant slope.

Dampened Trend

The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Motivated by this observation, Gardner & McKenzie (1985) introduced a parameter that “dampens” the trend to a flat line some time in the future.

Additive damped trend

In conjunction with the smoothing parameters \(\alpha\) and \(\beta\), (with values between 0 and 1 as in Holt’s method), this method also includes a damping parameter 0<\(\phi\)<1.

\[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + (\phi+\phi^2 + \dots + \phi^{h}) b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)\phi b_{t-1}. \end{align*}\]

If \(\phi=1\) the method is identical to Holt’s linear method. For values between 0 and 1, \(\phi\) dampens the trend so that it approaches a constant some time in the future.

Multiplicative damped trend

This method introduces a damping parameter to the exponential trend method:

\[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t}b_{t}^{(\phi+\phi^2 + \dots + \phi^{h})} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1} b^\phi_{t-1}\\ b_{t} &= \beta^*\frac{\ell_{t}}{\ell_{t-1}} + (1 -\beta^*)b_{t-1}^{\phi}. \end{align*}\]

This method will produce even more conservative forecasts than the additive damped trend method when compared to Holt’s linear method.

Example

We will work on the Australian air passengers data, which has a trend component..

#Fit different models using the different methods
fit1 <- ses(air, h=15) # N,N
fit2 <- holt(air, h=15) # A,N
fit3 <- holt(air,exponential=TRUE, h=15) # M,N
fit4 <- holt(air,damped=TRUE, h=15) #A_d, N
fit5 <- holt(air,exponential=TRUE,damped=TRUE, h=15) #M_d, N
#Plot
plot(fit3, type="o", main="Compare forecasts using different methods", ylab="Australia air passengers (millions)", flwd=1)
lines(window(air,start=1990),type="o")
lines(fit1$mean,col=2)
lines(fit2$mean,col=3)
lines(fit4$mean,col=5)
lines(fit5$mean,col=6)
legend("topleft", lty=1, pch=1, col=1:6,
    c("Data","SES","Holt's","Exponential",
      "Additive Damped","Multiplicative Damped"))

data.frame(Method = c("Ses", "Holt", "Exponential", "Additive damped", "Multiplicative damped"),
           RMSE = c(accuracy(fit1)[2],
                    accuracy(fit2)[2],
                    accuracy(fit3)[2],
                    accuracy(fit4)[2],
                    accuracy(fit5)[2])) %>% 
  knitr::kable(caption = "RMSE for each variation of the method") %>%
  kableExtra::kable_styling(bootstrap_options = "striped", full_width = FALSE)

RMSE for each variation of the method
Method	RMSE
Ses	3.039165
Holt	2.182343
Exponential	2.377641
Additive damped	2.285529
Multiplicative damped	2.235318

Holt-Winters’ Seasonal Method

An extension of Holt’s method to capture seasonality.

Comprised of forecast Equation and three smoothing equations: - One for the level - One for the trend - One for the seasonal component with corresponding smoothing parameters: \[ \alpha, \beta*, and \gamma \]

There are two variations to this method that differ in the nature of the seasonal component.

First, the additive method is preferred when the seasonal variations are roughly constant through the series,

Holt Seasonal Additive

With this method, the seasonal component is expressed in absolute terms in the scale of the observed series, and in the level equation the series is seasonally adjusted by subtracting the seasonal component. Within each year, the seasonal component will add up to approximately zero.

Second, the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.

Holt Seasonal Multiplicative

With the multiplicative method, the seasonal component is expressed in relative terms (percentages), and the series is seasonally adjusted by dividing through by the seasonal component. Within each year, the seasonal component will sum up to \(\m\) approximately .

aust <- window(austourists,start=2005)
fit1 <- hw(aust,seasonal="additive")
fit2 <- hw(aust,seasonal="multiplicative")
autoplot(aust) +
autolayer(fit1, series="HW additive forecasts", PI=FALSE) +
autolayer(fit2, series="HW multiplicative forecasts", PI=FALSE) +
xlab("Year") +
ylab("Visitor nights (millions)") +
ggtitle("International visitors nights in Australia") +
guides(colour=guide_legend(title="Forecast"))

summary(fit1)

## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = aust, seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 0.3063 
##     beta  = 1e-04 
##     gamma = 0.4263 
## 
##   Initial states:
##     l = 32.2597 
##     b = 0.7014 
##     s = 1.3106 -1.6935 -9.3132 9.6962
## 
##   sigma:  1.9494
## 
##      AIC     AICc      BIC 
## 234.4171 239.7112 250.4748 
## 
## Error measures:
##                       ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.008115785 1.763305 1.374062 -0.2860248 2.973922 0.4502579
##                     ACF1
## Training set -0.06272507
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2016 Q1       76.09837 73.60011 78.59664 72.27761 79.91914
## 2016 Q2       51.60333 48.99039 54.21626 47.60718 55.59947
## 2016 Q3       63.96867 61.24582 66.69153 59.80443 68.13292
## 2016 Q4       68.37170 65.54313 71.20027 64.04578 72.69762
## 2017 Q1       78.90404 75.53440 82.27369 73.75061 84.05747
## 2017 Q2       54.40899 50.95325 57.86473 49.12389 59.69409
## 2017 Q3       66.77434 63.23454 70.31414 61.36069 72.18799
## 2017 Q4       71.17737 67.55541 74.79933 65.63806 76.71667

summary(fit2)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = aust, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.4406 
##     beta  = 0.0134 
##     gamma = 0.0023 
## 
##   Initial states:
##     l = 32.4875 
##     b = 0.6974 
##     s = 1.0237 0.9618 0.7704 1.2442
## 
##   sigma:  0.0367
## 
##      AIC     AICc      BIC 
## 221.1313 226.4254 237.1890 
## 
## Error measures:
##                      ME     RMSE     MAE           MPE    MAPE      MASE
## Training set 0.09206228 1.575631 1.25496 -0.0006505533 2.70539 0.4112302
##                     ACF1
## Training set -0.07955726
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2016 Q1       80.08894 76.31865 83.85922 74.32278 85.85509
## 2016 Q2       50.15482 47.56655 52.74309 46.19640 54.11324
## 2016 Q3       63.34322 59.80143 66.88502 57.92652 68.75993
## 2016 Q4       68.17810 64.08399 72.27221 61.91670 74.43950
## 2017 Q1       83.80112 78.43079 89.17146 75.58790 92.01434
## 2017 Q2       52.45291 48.88795 56.01787 47.00077 57.90504
## 2017 Q3       66.21274 61.46194 70.96353 58.94702 73.47845
## 2017 Q4       71.23205 65.85721 76.60690 63.01194 79.45217

fitted(fit1)

##          Qtr1     Qtr2     Qtr3     Qtr4
## 2005 42.65723 24.21081 32.66618 36.37206
## 2006 45.53781 27.51872 36.21481 40.33713
## 2007 49.17134 32.18209 39.31365 43.50887
## 2008 49.89746 32.85434 39.71474 43.48280
## 2009 53.65576 35.82730 43.37632 45.34947
## 2010 56.83932 37.31366 45.42533 47.91400
## 2011 60.42249 37.71212 47.59100 49.91681
## 2012 63.19432 40.58637 49.32571 53.47546
## 2013 65.75739 44.06492 54.19500 55.53369
## 2014 68.24659 43.48545 54.81621 58.70628
## 2015 69.05311 47.59377 59.24376 64.22408

fitted(fit2)

##          Qtr1     Qtr2     Qtr3     Qtr4
## 2005 41.28689 26.36042 32.62011 35.43591
## 2006 44.91855 28.43989 36.70598 39.63640
## 2007 50.24528 31.40864 39.84476 42.20533
## 2008 50.30134 31.91456 40.44309 44.16408
## 2009 53.76967 34.34469 43.00433 46.12632
## 2010 56.35302 36.28628 45.13587 48.57174
## 2011 58.97855 37.27806 47.78102 51.14259
## 2012 62.76659 39.02701 49.48525 54.86171
## 2013 67.26005 42.02854 52.46503 56.18995
## 2014 69.83611 42.43956 53.92965 58.43127
## 2015 72.59030 45.62125 58.77277 64.38368

Accuracy measures for the Holt Seasonal Additive:

round(accuracy(fit1),2)

##                ME RMSE  MAE   MPE MAPE MASE  ACF1
## Training set 0.01 1.76 1.37 -0.29 2.97 0.45 -0.06

Accuracy measures for the Holt Seasonal Multiplicative:

round(accuracy(fit2),2)

##                ME RMSE  MAE MPE MAPE MASE  ACF1
## Training set 0.09 1.58 1.25   0 2.71 0.41 -0.08

Additive_States <- fit1$model$states[,1:3]
Multiplicative_States <- fit2$model$states[,1:3]

colnames(Additive_States) <- c("level", "slope", "Season")

colnames(Multiplicative_States) <- c("level", "slope", "Season")

Additive_States_plot <- autoplot(Additive_States, facets=TRUE, color="blue") + labs(title="Additive states") 
Multiplicative_States_plot <- autoplot(Multiplicative_States,facets=TRUE, color="red") + labs(title="Multiplicative States")
grid.arrange(Additive_States_plot, Multiplicative_States_plot, nrow=1)

The small value of \(\gamma\) for the multiplicative model means that the seasonal component hardly changes over time. The small value of \(\beta\) for the additive model means the slope component hardly changes over time (check the vertical scale). The increasing size of the seasonal component for the additive model suggests that the model is less appropriate than the multiplicative model.

Holt-Winter’s Damped Method

As stated previously, Gardner & McKenzie (1985) introduced a parameter that “dampens” the trend to a flat line some time in the future. A method that often provides accurate and robust forecasts for seasonal data is the Holt-Winters method with a damped trend and multiplicative seasonality.

Holt Seasonal Damped Method

fc <- hw(subset(hyndsight,end=length(hyndsight)-35),
         damped = TRUE, seasonal="multiplicative", h=35)
autoplot(hyndsight) +
  autolayer(fc, series="HW multi damped", PI=FALSE)+
  guides(colour=guide_legend(title="Daily forecasts"))

Exponential Smoothing and Innovation State Space Model (ISSM)

Exponential smoothing (ETS, which stands for Error, Trend, and Seasonality) is a family of very successful forecasting methods which are based on the key property that forecasts are weighted combinations of past observations (Hyndman et. al, 2008). For example, in simple exponential smoothing, the forecast \(\hat{z}_{T+1}\) for time step \(T+1\) is written as

\[\hat{z}_{T+1} = \hat{z}_T + \alpha (z_T - \hat{z}_T) = \alpha\cdot z_T + (1 - \alpha)\cdot \hat{z}_T,\]

General exponential smoothing methods consider the extensions of simple ETS to include time series patterns such as (linear) trend, various periodic seasonal effects. All ETS methods falls under the category of forecasting methods as the predictions are point forecasts (a single value is predicted for each future time step). On the other hand a statistical model describes the underlying data generation process and has an advantage that it can produce an entire probability distribution for each of the future time steps. Innovation state space model (ISSM) is an example of such models with considerable flexibility in representing commonly occurring time series patterns and underlie the exponential smoothing methods.

The idea behind ISSMs is to maintain a latent state vector \(l_{t}\) with recent information about level, trend, and seasonality factors. The state vector \(l_{t}\) evolves over time adding small innvoation (i.e., the Gaussian noise) at each time step. The observations are then a linear combination of the components of the current state.

Mathematically, ISSM is specified by 2 equations,

The state transition equation: \[l_{t} = F_t l_{t-1} + g_{t}\epsilon_t,\quad \epsilon_t\sim \mathcal{N}(0,1).\] Note that the innovation strength is controlled by \(g_t\)
The observation equation:

\[z_t = a_{t}^{\top}l_{t-1} + b_t + \nu_t, \quad \nu_t \sim \mathcal{N}(0, \sigma_t^2)\]

Here we allow for an additional term \(b_t\) which can model any determinstic component (exogenous variables). This describes a fairy generic model allowing the user to encode specific time series patterns using the coefficients \(F, a_t\) and thus are problem dependent.

The innovation vector \(g_t\) comes in terms of parameters to be learned (the innovation strengths). Moreover, the initial state \(l_0\) has to be specified. We do so by specifying a Gaussian prior distribution \(P(l_0)\), whose parameters (means, standard deviation) are learned from data as well.

The parameters of the ISSM are typically learned using the maximum likelihood principle. This requires the computation of the log-likelihood of the given observations i.e., computing the probability of the data under the model, \(P(z_1, ..., z_t)\)

Specificying the model type

So when specificying the model type you always specificy the error, trend, then seasonality (hence “ets”). The options you can specify for each component is as follows:

error: additive (“A”), multiplicative (“M”), unknown (“Z”)
- trend: none (“N”), additive (“A”), multiplicative (“M”), unknown (“Z”)
- seasonality: none (“N”), additive (“A”), multiplicative (“M”), unknown (“Z”)

Consequently, if we wanted to apply a Holt’s model where the error and trend were additive and no seasonality exists we would select model = "AAN".

If you want to apply a Holt-Winters model where there is additive error, an exponential (multiplicative) trend, and additive seasonality you would select model = "AMA".

If you are uncertain of the type of component then you use “Z”. So if you were uncertain of the components or if you want the model to select the best option, you could use model = "ZZZ" and the “optimal” model will be selected.

Example

We will use the qcement passengers dataset within the fpp2 package for illustration. The first step we do is by splitting the qcement dataset into train and test set to compare performance. This data has seasonality and trend; however, it is unclear if seasonality is additive or multiplicative.

# create training and validation of the AirPassengers data
qcement.train <- window(qcement, end = c(2012, 4))
qcement.test <- window(qcement, start = c(2013, 1))

qcement.hw <- ets(qcement.train, model = "AAA")  #stands for a model with additive error, additive trend, and additve seasonality.
summary(qcement.hw)

## ETS(A,A,A) 
## 
## Call:
##  ets(y = qcement.train, model = "AAA") 
## 
##   Smoothing parameters:
##     alpha = 0.6418 
##     beta  = 1e-04 
##     gamma = 0.1988 
## 
##   Initial states:
##     l = 0.4511 
##     b = 0.0075 
##     s = 0.0049 0.0307 9e-04 -0.0365
## 
##   sigma:  0.0854
## 
##      AIC     AICc      BIC 
## 126.0419 126.8676 156.9060 
## 
## Training set error measures:
##                       ME       RMSE       MAE          MPE     MAPE      MASE
## Training set 0.001463693 0.08393279 0.0597683 -0.003454533 3.922727 0.5912949
##                    ACF1
## Training set 0.02150539

If we assess our additive model we can see that \(a = 0.6418\), \(\beta = 0.0001\), and \(\gamma = 0.1988\)

The important thing to understand about the ets() model is how to select the model = parameter. In total we have 36 model options to choose from.

With the ets() function, the default estimation method is maximum likelihood rather than minimum sum of squares.

autoplot(qcement.hw)

autoplot(forecast(qcement.hw))

If we check our residuals, we see that residuals grow larger over time. This may suggest that a multiplicative error rate may be more appropriate.

checkresiduals(qcement.hw)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,A,A)
## Q* = 20.288, df = 3, p-value = 0.0001479
## 
## Model df: 8.   Total lags used: 11

# cbind('Residuals' = residuals(fit),
#       'Forecast errors' = residuals(fit,type='response')) %>%
#   autoplot(facet=TRUE) + xlab("Year") + ylab("")

Compare predictive types, accuracy prespective

To compare the predictive accuracy of our models let’s compare four different models. We see that the first model (additive error, trend and seasonality) results in the lowest RMSE and MAPE on test test data set.

# additive error, trend and seasonality
qcement.hw1 <- ets(qcement.train, model = "AAA")
# forecast the next 5 quarters
qcement.f1 <- forecast(qcement.hw1, h = 5)

# check accuracy
accuracy(qcement.f1, qcement.test)

##                       ME       RMSE        MAE          MPE     MAPE      MASE
## Training set 0.001463693 0.08393279 0.05976830 -0.003454533 3.922727 0.5912949
## Test set     0.031362775 0.07144211 0.06791904  1.115342984 2.899446 0.6719311
##                     ACF1 Theil's U
## Training set  0.02150539        NA
## Test set     -0.31290496 0.2112428

# multiplicative error, additive trend and seasonality
qcement.hw2 <- ets(qcement.train, model = "MAA")
qcement.f2 <- forecast(qcement.hw2, h = 5)
accuracy(qcement.f2, qcement.test)

##                        ME       RMSE        MAE        MPE     MAPE      MASE
## Training set -0.002700842 0.08387446 0.05912640 -0.3689856 3.811958 0.5849446
## Test set      0.018046198 0.06438073 0.05763575  0.5614035 2.493544 0.5701974
##                      ACF1 Theil's U
## Training set  0.001490172        NA
## Test set     -0.344335948 0.1722734

# additive error and trend and multiplicative seasonality
qcement.hw3 <- ets(qcement.train, model = "AAM", restrict = FALSE)
qcement.f3 <- forecast(qcement.hw3, h = 5)
accuracy(qcement.f3, qcement.test)

##                       ME       RMSE        MAE       MPE     MAPE      MASE
## Training set 0.007542179 0.07818508 0.05640602 0.3985457 3.693964 0.5580315
## Test set     0.046793468 0.09456029 0.09241462 1.6559245 3.882433 0.9142688
##                     ACF1 Theil's U
## Training set -0.01243122        NA
## Test set     -0.05070111 0.2737314

# multiplicative error, additive trend, and multiplicative seasonality
qcement.hw4 <- ets(qcement.train, model = "MAM")
qcement.f4 <- forecast(qcement.hw4, h = 5)
accuracy(qcement.f4, qcement.test)

##                        ME       RMSE        MAE         MPE     MAPE      MASE
## Training set 0.0009645886 0.07807481 0.05595930 -0.02540554 3.657772 0.5536120
## Test set     0.0164979314 0.08297241 0.07162921  0.37381187 3.064455 0.7086363
##                     ACF1 Theil's U
## Training set -0.03038457        NA
## Test set     -0.06185361 0.1930288

If we were to compare this to an unspecified model where we let ets select the optimal model, we see that ets selects a model specification of multiplicative error, additive trend, and multiplicative seasonality (“MAM”). This is equivalent to our fourth model above. This model is assumed “optimal” because it minimizes RMSE, AIC, and BIC on the training data set, but does not necessarily minimize prediction errors on the test set.

qcement.hw5 <- ets(qcement.train, model = "ZZZ")
summary(qcement.hw5)

## ETS(M,A,M) 
## 
## Call:
##  ets(y = qcement.train, model = "ZZZ") 
## 
##   Smoothing parameters:
##     alpha = 0.7664 
##     beta  = 0.0131 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 0.49 
##     b = 0.0064 
##     s = 1.0298 1.0488 1.0151 0.9063
## 
##   sigma:  0.0477
## 
##        AIC       AICc        BIC 
##  0.3161397  1.1418277 31.1802503 
## 
## Training set error measures:
##                        ME       RMSE       MAE         MPE     MAPE     MASE
## Training set 0.0009645886 0.07807481 0.0559593 -0.02540554 3.657772 0.553612
##                     ACF1
## Training set -0.03038457

We can optimize the \(\gamma\) parameter in our Holt-Winters model (if we don’t want to use the triple z option). Here, we use the additive error, trend and seasonality model that minimized our prediction errors above and identify the \(\gamma\) parameter that minimizes forecast errors. In this case we see that \(\gamma = 0.21\) minimizes the error rate.

gamma <- seq(0.01, 0.85, 0.01)
RMSE <- NA

for(i in seq_along(gamma)) {
  hw.expo <- ets(qcement.train, "AAA", gamma = gamma[i])
  future <- forecast(hw.expo, h = 5)
  RMSE[i] = accuracy(future, qcement.test)[2,2]
}

error <- data_frame(gamma, RMSE)

## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

minimum <- filter(error, RMSE == min(RMSE))
ggplot(error, aes(gamma, RMSE)) +
  geom_line() +
  geom_point(data = minimum, color = "blue", size = 2) +
  ggtitle("gamma's impact on forecast errors",
          subtitle = "gamma = 0.21 minimizes RMSE")

If we update our model with this “optimal” \(\gamma\) parameter we see that we bring our forecasting error rate down from 2.88% to 2.76%. This is a small improvement, but often small improvements can have large business implications.

# previous model with additive error, trend and seasonality
accuracy(qcement.f1, qcement.test)

##                       ME       RMSE        MAE          MPE     MAPE      MASE
## Training set 0.001463693 0.08393279 0.05976830 -0.003454533 3.922727 0.5912949
## Test set     0.031362775 0.07144211 0.06791904  1.115342984 2.899446 0.6719311
##                     ACF1 Theil's U
## Training set  0.02150539        NA
## Test set     -0.31290496 0.2112428

# new model with optimal gamma parameter
qcement.hw6 <- ets(qcement.train, model = "AAA", gamma = 0.21)
qcement.f6 <- forecast(qcement.hw6, h = 5, level =c(80, 95))
accuracy(qcement.f6, qcement.test)

##                        ME       RMSE        MAE        MPE     MAPE      MASE
## Training set -0.001312025 0.08377557 0.05905971 -0.2684606 3.834134 0.5842847
## Test set      0.033492771 0.07148708 0.06775269  1.2096488 2.881680 0.6702854
##                     ACF1 Theil's U
## Training set  0.04832198        NA
## Test set     -0.35877010 0.2202448

With this new optimal model we can get our predicted values:

qcement.f6

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2013 Q1       2.134650 2.025352 2.243947 1.967494 2.301806
## 2013 Q2       2.427828 2.299602 2.556055 2.231723 2.623934
## 2013 Q3       2.601989 2.457284 2.746694 2.380681 2.823296
## 2013 Q4       2.505001 2.345506 2.664496 2.261075 2.748927
## 2014 Q1       2.171068 1.987914 2.354223 1.890958 2.451179

autoplot(qcement.f6)

Exponential Smoothing

Farhana Zahir, John K Hancock, Salma Elshahawy

2/28/2021

What is Exponential Smoothing

Exponential Smoothing Methods

Simple Exponential Smoothing

Holt’s Linear Trend method

Holt-Winters’ Seasonal Method

Holt-Winter’s Damped Method

Exponential Smoothing and Innovation State Space Model (ISSM)

Specificying the model type

Example

Compare predictive types, accuracy prespective

References