if (!require("fma")) install.packages("fma")
if (!require("expsmooth")) install.packages("expsmooth")
library(fma)
library(expsmooth)

Exercise 7.1

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books (data set books).

head(books)

## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
##   Paperback Hardcover
## 1       199       139
## 2       172       128
## 3       111       172
## 4       209       139
## 5       161       191
## 6       119       168

Exercise 7.1.a

Plot the series and discuss the main features of the data.

Approach

A Time Plot is used to plot the series. The Time Plot is a line graph that plots each observed value against the time of the observation, with a single line connecting each observation across the entire period.

Results

plot(books, main="Daily Book Sales", xlab="Day")

Interpretation

There is only 30 days of data available, but some patterns are clearly visible in the Time Plot. The main features of the data are the upward trend and a seasonality pattern of approximately every three days.

Exercise 7.1.b

Use simple exponential smoothing with the ses function (setting initial="simple") and explore different values of \(\alpha\) for the paperback series. Record the within-sample SSE for the one-step forecasts. Plot SSE against \(\alpha\) and find which value of \(\alpha\) works best. What is the effect of \(\alpha\) on the forecasts?

Approach

The naive forecasting method assumes the most current observation is the only important one and forecasts values equal to the last observed value of a series. The average method assumes that all observations are of equal importance and forecasts values equal to a simple average of the observed data. The Simple Exponential Smoothing (SES) method attaches weights that decrease exponentially at rate \(\alpha\in(0,1)\) with respect to time such that \(w=(\alpha)(1-\alpha)^t\). A large \(\alpha\) gives more weight to more recent observations relative to a small \(\alpha\) which has the effect of distributing more weight to observations from the distant past since \(\alpha_1 > \alpha_2 \Rightarrow (\alpha_1) (1-\alpha_1)^t < (\alpha_2) (1-\alpha_2)^t \forall t>2\).

Results

a <- seq(0, 1, 0.1)
sse <- numeric(length(a))
for(i in 1:11){
  assign(paste0("fit", i), ses(books[, 1], alpha=a[i], initial="simple", h=1))
  sse[i] <- sum(get(paste0("fit", i))$residuals^2)
}
plot(a, sse, xlab=expression(alpha), ylab=expression(sum(e^2)),
     main="Paperback Simple Exponential Smoothing")

for(i in 1:3){
  a <- switch(i, 0.2, 0.5, 0.8)
  assign(paste0("fit", i), ses(books[, 1], alpha=a, initial="simple", h=1))
  if (i==1) { l <- character(length(a))
    plot(get(paste0("fit", i)), ylab="Paperback Sales", PI=FALSE, xlab="Day", fcol=NA) }
  lines(fitted(get(paste0("fit", i))), col=(i+1))
  points(31, get(paste0("fit", i))$mean, col=(i+1), pch=20)
  l[i] <- paste(expression(alpha), "=", round(get(paste0("fit", i))$model$par[1],1))
}
legend("topleft", lty=1, col=c(1:4), c("Unsmoothed", l))

Interpretation

The RMSE-\(\alpha\) plot shows that for the paperback series, the RMSE approaches its minimum around \(\alpha=0.2\). Plotting the unsmoothed data against predictions using three different values for \(\alpha\) appears to indicate that as \(\alpha\) increases from its near minimum value, so does the variability in the forecast. That is in line with the findings of the RMSE-\(\alpha\). Increased error values imply increased variability in the forecast.

Exercise 7.1.c

Now let ses select the optimal value of \(\alpha\). Use this value to generate forecasts for the next four days. Compare your results with 2.

Approach

The alpha parameter of the ses() function is the value of smoothing parameter for the initial level. If NULL, it will be estimated using non-linear minimization on the observed data. This is more robust and objective than choosing alpha based on subjective experience.

Results

fit <- ses(books[, 1], alpha=NULL, initial="simple", h=4)
fit$model$par

##       alpha           l 
##   0.2125115 199.0000000

Interpretation

The optimal value of \(\alpha=0.2125115\) calculated by ses() is approximately equal to the value of \(\alpha=0.2\) found in the RMSE-\(\alpha\) plot.

Exercise 7.1.d

Repeat but with initial="optimal". How much difference does an optimal initial level make?

Approach

The initial parameter of the ses() function is used for selecting initial state values. If optimal, the initial values and the smoothing parameters are optimized using ets. If simple, the initial values are obtained using simple calculations on the first few observations.

Results

fit1 <- ses(books[, 1], alpha=NULL, initial="simple", h=4)
fit1$model$par

##       alpha           l 
##   0.2125115 199.0000000

fit2 <- ses(books[, 1], alpha=NULL, initial="optimal", h=4)
fit2$model$par

##       alpha           l 
##   0.1685384 170.8257359

Interpretation

Selecting suitable initial values is important when \(\alpha\) is small and/or the time series is relatively short because generally small weights attached to initial values can be large enough to have a noticeable effect on forecasts. Given that the \(\alpha\) that minimizes the RMSE in this series is small, the optimal parameter adjusts the initial level from 199 to 170.8257359 by decreasing the estimated value for \(\alpha\).

Exercise 7.1.e

Repeat steps (b)-(d) with the hardcover series.

Approach

Steps (b)-(d) call for plotting the series and discussing the main features of the data, letting ses select the optimal value of \(\alpha\), and repeating with initial="optimal".

A Time Plot is used to plot the series. The Time Plot is a line graph that plots each observed value against the time of the observation, with a single line connecting each observation across the entire period.
The alpha parameter of the ses() function is the value of smoothing parameter for the initial level. If NULL, it will be estimated using non-linear minimization on the observed data. This is more robust and objective than choosing alpha based on subjective experience.
The initial parameter of the ses() function is used for selecting initial state values. If optimal, the initial values and the smoothing parameters are optimized using ets. If simple, the initial values are obtained using simple calculations on the first few observations.

Results

a <- seq(0, 1, 0.1)
sse <- numeric(length(a))
for(i in 1:11){
  assign(paste0("fit", i), ses(books[, 2], alpha=a[i], initial="simple", h=1))
  sse[i] <- sum(get(paste0("fit", i))$residuals^2)
}
plot(a, sse, xlab=expression(alpha), ylab=expression(sum(e^2)),
     main="Hardcover Simple Exponential Smoothing")

fit1 <- ses(books[, 2], alpha=NULL, initial="simple", h=4)
fit1$model$par

##       alpha           l 
##   0.3473305 139.0000000

fit2 <- ses(books[, 2], alpha=NULL, initial="optimal", h=4)
fit2$model$par

##       alpha           l 
##   0.3282696 149.2836097

Interpretation

The RMSE-\(\alpha\) plot shows that for the hardcover series, the RMSE approaches its minimum around \(\alpha=0.35\). The optimal value of \(\alpha=0.3473305\) calculated by ses() is approximately equal to the value of \(\alpha=0.35\) found in the RMSE-\(\alpha\) plot. Selecting suitable initial values is important when \(\alpha\) is small and/or the time series is relatively short because the generally small weights attached to initial values can be large enough to have a noticeable effect on forecasts. Given that the \(\alpha\) that minimizes the RMSE in this series is small, the optimal parameter adjusts the initial level from 139 to 149.2836097 for decreasing the estimated value for \(\alpha\).

Exercise 7.3

For this exercise, use the quarterly UK passenger vehicle production data from 1977:1–2005:1 (data set ukcars).

Exercise 7.3.a

Plot the data and describe the main features of the series.

Approach

The Decomposition Plot decomposes and plots the observed values, the underlying trend, seasonality, and randomness of the time series data.

Results

plot(decompose(ukcars))

Interpretation

Plotting the trend-cycle and seasonal indices (\(m\)) computed by additive decomposition shows that the data have an upward trend, seasonal fluctuations, and fairly random residuals.

Exercise 7.3.b

Decompose the series using STL and obtain the seasonally adjusted data.

Approach

Seasonally Adjusted data are derived by removing the seasonal component from the original data. This is achieved by first doing a Seasonal and Trend decomposition using Loess (STL) then applying the seasadj function. STL is a very versatile and robust method for decomposing a time series. The function stl decomposes a time series into seasonal, trend, and irregular components using loess. The trend window parameter (t.window) controls wiggliness of trend component and the seasonal window parameter (s.window) controls variation on seasonal component.

Results

plot(ukcars, col="grey", ylab="Production (Thousands of Cars)", xlab="Quarter", 
     main="Seasonally Adjusted Plastics")
ukcarsSeasAdj <- seasadj(stl(ukcars, t.window=NULL, s.window='periodic', robust=TRUE))
lines(ukcarsSeasAdj, col="red", ylab="Seasonally Adjusted")

Interpretation

Seasonal adjusting the data is relatively easy once the time series is decomposed using STL. The stl function t.window parameter is set to NULL and s.window parameter is set to periodic so that the span (in lags) of the loess window for seasonal extraction is calculated using defaults. The resulting plot shows the seasonally adjusted data represented by a red line. This seasonally adjusted data still contains the trend-cycle and remainder components which make it go upward in a jagged pattern.

Exercise 7.3.c

Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. Then reseasonalize the forecasts. Record the parameters of the method and report the RMSE of the one-step forecasts from your method.

Approach

Holt (1957) extended simple exponential smoothing to allow forecasting of data with a trend by using a forecast equation and two smoothing equations (one for the level \(\alpha\) and one for the trend \(\beta\)). The forecasts generated by Holt’s method display a constant trend indefinitely into the future. McKenzie (1985) introduced a \(\phi \in (0,1)\) parameter that additively “dampens” the trend to a flat line some time in the future. If \(\phi=1\) the method is identical to Holt’s linear method. Forecasts are “reseasonalized” by adding a seasonal component to the forecast. This is achieved by first doing a Seasonal and Trend decomposition using Loess (STL), then forecasting without seasonal adjustment. For STL, the t.window parameter is set to NULL and s.window parameter is set to periodic so that the span (in lags) of the loess window for trend/seasonal extraction is calculated using defaults. The holt() function with a damped=TRUE parameter forecasts using an additive damped trend method.

Results

fit <- stl(ukcars, t.window=NULL, s.window='periodic', robust=TRUE)
fcast <- holt(seasadj(fit), alpha=NULL, beta=NULL, damped=TRUE, initial='optimal', h=8)
round(fcast$model$par, 6)

##      alpha       beta        phi          l          b 
##   0.566601   0.000286   0.911712 346.086474  -9.758286

sqrt(fcast$model$mse)

## [1] 25.20318

fcast <- holt(ukcars, alpha=NULL, beta=NULL, damped=TRUE, initial='optimal', h=8)
round(fcast$model$par, 6)

##      alpha       beta        phi          l          b 
##   0.167283   0.031862   0.858071 363.184201 -20.710113

sqrt(fcast$model$mse)

## [1] 41.08101

Interpretation

Forecasting two years of the series using an additive damped trend method yields vastly different results when the series is seasonally adjusted versus reseasonalized. The seasonally adjusted \(\alpha\) is larger meaning that more weight is given to more recent observations. The seasonally adjusted \(\beta\) is smaller meaning that trend is changing less. The seasonally adjusted \(\phi\) is larger meaning that the trend is being dampened less. The reseasonalized initial level \(l\) is larger to mitigate the impact of weights attached to initial values being larger than necessary with a small \(\alpha\). The seasonally adjusted initial trend \(b\) is larger to mitigate the impact of weights attached to initial values being larger than necessary with a small \(\beta\). The RMSE of the seasonally adjusted series forecast is about 60% smaller than the RMSE of the reseasonalized series.

Exercise 7.3.d

Forecast the next two years of the series using Holt’s linear method applied to the seasonally adjusted data. Then reseasonalize the forecasts. Record the parameters of the method and report the RMSE of of the one-step forecasts from your method.

Approach

Holt (1957) extended simple exponential smoothing to allow forecasting of data with a trend by using a forecast equation and two smoothing equations (one for the level \(\alpha\) and one for the trend \(\beta\)). The forecasts generated by Holt’s method display a constant trend indefinitely into the future. McKenzie (1985) introduced a \(\phi \in (0,1)\) parameter that additively “dampens” the trend to a flat line some time in the future. If \(\phi=1\) the method is identical to Holt’s linear method. Forecasts are “reseasonalized” by adding a seasonal component to the forecast. This is achieved by first doing a Seasonal and Trend decomposition using Loess (STL), then forecasting without seasonal adjustment. For STL, the t.window parameter is set to NULL and s.window parameter is set to periodic so that the span (in lags) of the loess window for trend/seasonal extraction is calculated using defaults. The holt() function with a damped=FALSE parameter forecasts using Holt’s linear trend method.

Results

fit <- stl(ukcars, t.window=NULL, s.window='periodic', robust=TRUE)
fcast <- holt(seasadj(fit), alpha=NULL, beta=NULL, damped=FALSE, initial='optimal', h=8)
round(fcast$model$par, 6)

##      alpha       beta          l          b 
##   0.601205   0.000100 343.385402   0.661666

sqrt(fcast$model$mse)

## [1] 25.39072

fcast <- holt(ukcars, alpha=NULL, beta=NULL, damped=FALSE, initial='optimal', h=8)
round(fcast$model$par, 6)

##      alpha       beta          l          b 
##   0.250529   0.021652 341.400896  -5.668736

sqrt(fcast$model$mse)

## [1] 42.17556

Interpretation

Forecasting two years of the series using Holt’s linear method yields mostly different results when the series is seasonally adjusted versus reseasonalized. The seasonally adjusted \(\alpha\) is larger meaning that more weight is given to more recent observations. The seasonally adjusted \(\beta\) is smaller meaning that trend is changing less. The seasonally adjusted and reseasonalized initial levels \(l\) are about the same although \(\alpha\)’s are very different. The seasonally adjusted initial trend \(b\) is larger to mitigate the impact of weights attached to initial values being larger than necessary with a small \(\beta\). The RMSE of the seasonally adjusted series forecast is about 60% smaller than the RMSE of the reseasonalized series.

Exercise 7.3.e

Now use ets() to choose a seasonal model for the data.

Approach

Exponential Smoothing (ETS) state space models have three parameters: Error, Trend, and Seasonal. Models with multiplicative errors are useful when the data are strictly positive, but are not numerically stable when the data contain zeroes or negative values. The possibilities for each component are:

POSSIBILITY	ERROR	TREND	SEASONAL
\(N\) (None)	NO	YES	YES
\(A\) (Additive)	YES	YES	YES
\(A_d\) (Additive damped)	NO	YES	NO
\(M\) (Multiplicative)	YES	YES	YES
\(M_d\) (Multiplicative Damped)	NO	YES	NO

There are therefore \((2)(5)(3)=30\) possible state space models. Four common state space models are:

ETS(A,N,N): Simple Exponential Smoothing with additive errors
ETS(M,N,N): Simple Exponential Smoothing with multiplicative errors
ETS(A,A,N): Holt’s linear method with additive errors
ETS(M,A,N): Holt’s linear method with multiplicative errors

If only the time series is specified in the ets() function, then all other arguments take on their default values and an appropriate model is selected automatically. The model parameter in the function takes a three-letter code where “N” is for none, “A” is for additive, and “M” is for multiplicative. The letter “Z” can be used to let the function automatically select the parameter according to the information criterion chosen (AICC, AIC, BIC). The default value of “ZZZ” results in all components being selected using the information criterion.

Results

fit <- ets(ukcars, model="ZAZ", damped=NULL)
fcast <- forecast(fit, h=8)
fcast$model

## ETS(A,Ad,A) 
## 
## Call:
##  ets(y = ukcars, model = "ZAZ", damped = NULL) 
## 
##   Smoothing parameters:
##     alpha = 0.5656 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9221 
## 
##   Initial states:
##     l = 344.5017 
##     b = -7.4634 
##     s=-0.64 -45.5093 20.8332 25.3161
## 
##   sigma:  25.1687
## 
##      AIC     AICc      BIC 
## 1283.181 1285.338 1310.455

Interpretation

Beginning with additive error and additive trend, the seasonal model for the data is found by setting the model parameter equal to AAZ. The letter “Z” is used to let the function automatically select the parameter according to the AICc default which is the Akaike information criterion for small samples (addresses overfitting in small samples). The model chosen by the function is the ETS(A,Ad,A) model: Additive Error, Additive Damped Trend, and Additive Seasonality. This works well because in these data, the absolute (additive) differences in the values are of more interest and importance than the (multiplicative) percentage changes.

Exercise 7.3.f

Compare the RMSE of the fitted model with the RMSE of the model you obtained using an STL decomposition with Holt’s method. Which gives the better in-sample fits?

Approach

The model parameter in the ets() function takes the a three-letter code where “AAA” where “A” is for additive error, trend, and seasonality. Holt (1957) extended simple exponential smoothing to allow forecasting of data with a trend by using a forecast equation and two smoothing equations (one for the level \(\alpha\) and one for the trend \(\beta\)). Forecasts are “reseasonalized” by adding a seasonal component to the forecast. This is achieved by first doing a Seasonal and Trend decomposition using Loess (STL), then forecasting without seasonal adjustment. For STL, the t.window parameter is set to NULL and s.window parameter is set to periodic so that the span (in lags) of the loess window for trend/seasonal extraction is calculated using defaults. The holt() function with a damped=FALSE parameter forecasts using Holt’s linear trend method.

Results

fit <- ets(ukcars, model="AAA", damped=TRUE)
fcast1 <- forecast(fit, h=8)
sqrt(fcast1$model$mse)

## [1] 25.16871

fit <- stl(ukcars, t.window=NULL, s.window='periodic', robust=TRUE)
fcast2 <- holt(seasadj(fit), alpha=NULL, beta=NULL, damped=FALSE, initial='optimal', h=8)
sqrt(fcast2$model$mse)

## [1] 25.39072

Interpretation

Both models give approximately the same RSME. The RMSE of the ets() method is slightly smaller but, only by 0.2220113 which is just 0.0088209% smaller. This does not appear to be a significant difference.

Exercise 7.3.g

Compare the forecasts from the two approaches? Which seems most reasonable?

Approach

Results

fit <- stl(ukcars, t.window=NULL, s.window='periodic', robust=TRUE)
fcast <- holt(seasadj(fit), alpha=NULL, beta=NULL, damped=FALSE, initial='optimal', h=8)
plot(fcast)

fit <- ets(ukcars, model="AAA", damped=TRUE)
fcast <- forecast(fit, h=8)
plot(fcast)

Interpretation

Holt’s method appears to give the same results as a random walk with drift forecast. The ETS(A,Ad,A) appears to be way more accurate since it reflects the seasonality found in the training set.