Exponential Smoothing

library(ggplot2)
library(tidyverse)
library(fpp3)

#Question 8.1 ##Example a

glimpse(aus_livestock)

## Rows: 29,364
## Columns: 4
## Key: Animal, State [54]
## $ Month  <mth> 1976 Jul, 1976 Aug, 1976 Sep, 1976 Oct, 1976 Nov, 1976 Dec, 197…
## $ Animal <fct> "Bulls, bullocks and steers", "Bulls, bullocks and steers", "Bu…
## $ State  <fct> Australian Capital Territory, Australian Capital Territory, Aus…
## $ Count  <dbl> 2300, 2100, 2100, 1900, 2100, 1800, 1800, 1900, 2700, 2300, 250…

pigs_slaughtered <- aus_livestock |> 
  filter(Animal == "Pigs", State == "Victoria")

pigs_slaughtered

fit <- pigs_slaughtered |>
  model(ANN = ETS(Count ~ error("A") + trend("N") + season("N")))

report(fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

components(fit) |>
  left_join(fitted(fit), by = c("Animal", "Month", "State"))

fit |>
  forecast(h = 4) |>
  autoplot(pigs_slaughtered) +
  labs(y = "Number of Pigs Slaughtered", title = "SES Forecast: Pigs Slaughtered in Victoria")

##Example b

fc <- fit |> forecast(h = 4)

# Standard deviation of residuals
s <- fit |> residuals() |> pull(.resid) |> sd()

# predicted value one step ahead the last observed point
y_hat <- fc$.mean[1]

lower = y_hat - 1.96 * s
upper = y_hat + 1.96 * s

# high/low function
fc |> hilo(level = 95)

The manual 95% prediction interval, calculated as the first forecast value plus or minus 1.96 times the standard deviation of the residuals, produces an interval that is very similar to the one produced by R using the hilo() function. The slight difference arises because R uses the model’s analytical variance formula which accounts for accumulated uncertainty in the smoothed level, whereas the manual approach treats the residual standard deviation as a fixed estimate of forecast error. For a one-step-ahead forecast the difference is small, but R’s interval would grow relatively wider for longer forecast horizons.

#Question 8.5 ##Example a

# filtering Country data for Grenada
grenada_economy <- global_economy |>
  filter(Country == "Grenada", !is.na(Exports))
  
  
grenada_economy |>
  autoplot(Exports) +
  labs(y = "% of GDP", x = "Year", title = "Grenada: Exports (% of GDP)")

##Example b

fit_ann <- grenada_economy |>
  model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N")))

report(fit_ann)

## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9585012 
## 
##   Initial states:
##      l[0]
##  37.79623
## 
##   sigma^2:  45.3141
## 
##      AIC     AICc      BIC 
## 304.0482 304.7149 309.1148

fit_ann |>
  forecast(h = 5) |>
  autoplot(grenada_economy, level = c(80, 95)) +
  labs(y = "% of GDP", x = "Year", title = "Grenada Exports: ETS(A,N,N) Forecast") +
  coord_cartesian(xlim = c(1960, 2025))

##Example c

# ANN
fit_ann |> accuracy() |> select(.model, RMSE)

##Example d

# AAN
fit_aan <- grenada_economy |>
  model(AAN = ETS(Exports ~ error("A") + trend("A") + season("N")))

report(fit_aan)

## Series: Exports 
## Model: ETS(A,A,N) 
##   Smoothing parameters:
##     alpha = 0.9515183 
##     beta  = 0.0001000297 
## 
##   Initial states:
##      l[0]      b[0]
##  36.73794 0.4642376
## 
##   sigma^2:  47.5886
## 
##      AIC     AICc      BIC 
## 307.8445 309.6092 316.2889

fit_aan |> accuracy() |> select(.model, RMSE)

Comparing the two models, the ETS(A,N,N) model produces an RMSE of 6.561 while the ETS(A,A,N) model produces a marginally lower RMSE of 6.544. However, this negligible difference does not justify the additional parameter introduced by the trended model. Notably, the estimated beta parameter in the ETS(A,A,N) model is essentially zero (beta is approximately 0.0001), indicating that the trend component contributes almost nothing to the model. The AAN model effectively collapses into a simple exponential smoothing model. This is consistent with the visual inspection of Grenada’s export data, which shows high volatility with no clear sustained trend. The ETS(A,N,N) model is therefore preferred, as it achieves near-identical forecast accuracy with one fewer parameter.

##Example e

# Compare forecasts visually
grenada_economy |>
  model(
    ANN = ETS(Exports ~ error("A") + trend("N") + season("N")),
    AAN = ETS(Exports ~ error("A") + trend("A") + season("N"))
  ) |>
  forecast(h = 5) |>
  autoplot(grenada_economy, level = NULL) +
  labs(y = "% of GDP", x = "Year", title = "Grenada Exports: ANN vs AAN Forecasts") +
  coord_cartesian(xlim = c(1960, 2025))

The forecasts from both models are nearly identical, which is consistent with the estimated beta parameter being essentially zero in the ETS(A,A,N) model. The ANN model produces a flat forecast at approximately 55% of GDP, while the AAN model shows a very slight upward trend — though the difference is negligible. Given the high volatility and lack of a sustained trend in Grenada’s export data, the flat forecast from the ETS(A,N,N) model is the more credible of the two. Adding a trend component that is barely distinguishable from zero does not improve the forecast in any meaningful way. Therefore, the ETS(A,N,N) model is preferred as the better forecasting method for this data.

##Example f

#Manual 95% PI using RMSE vs R's intervals
fc_ann <- fit_ann |> forecast(h = 5)
fc_aan <- fit_aan |> forecast(h = 5)

rmse_ann <- fit_ann |> accuracy() |> pull(RMSE)
rmse_aan <- fit_aan |> accuracy() |> pull(RMSE)

# First forecast point estimates
y_ann <- fc_ann$.mean[1]
y_aan <- fc_aan$.mean[1]

# R's 95% PI
fc_ann |> hilo(level = 95) |> slice(1)

fc_aan |> hilo(level = 95) |> slice(1)

The manual 95% prediction intervals, calculated using the RMSE and assuming normal errors, are [43.05, 68.77] for the ETS(A,N,N) model and [43.58, 69.24] for the ETS(A,A,N) model. R’s intervals, from the fitted distribution, are [42.86, 69.14] and [42.82, 69.58], respectively. The intervals are very similar but not identical — R’s intervals are slightly wider because they use the model’s estimated error variance rather than the training RMSE, and account for parameter uncertainty. Both approaches confirm that the two models produce comparable forecast uncertainty, consistent with the near-zero trend parameter in the AAN model.

#Question 8.6

china_economy <- global_economy |>
  filter(Country == "China")

# Plot the data first
china_economy |>
  autoplot(GDP) +
  labs(y = "GDP (USD)", x = "Year", title = "China: GDP")

# Fit multiple models to compare
fit_china <- china_economy |>
  model(
    AAN       = ETS(GDP ~ error("A") + trend("A") + season("N")),
    AAN_damped = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
    AAN_bc    = ETS(box_cox(GDP, 0.3) ~ error("A") + trend("A") + season("N")),
    AAN_bc_damped = ETS(box_cox(GDP, 0.3) ~ error("A") + trend("Ad") + season("N"))
  )

# Forecast with large h to see differences clearly
fit_china |>
  forecast(h = 20) |>
  autoplot(china_economy, level = NULL) +
  labs(y = "GDP (USD)", x = "Year", title = "China GDP: ETS Model Comparison")

China’s GDP shows a classic exponential growth pattern, remaining relatively flat until the mid-1990s before accelerating sharply through the 2000s. Given this strongly curved, non-linear growth, the choice of ETS model and transformation has a significant impact on the forecasts. The standard ETS(A,A,N) model (red) projects strong but linear growth, extrapolating the recent trend forward. The damped trend model (purple) softens this by gradually flattening the trend over the forecast horizon, producing a more conservative outlook — this is often more realistic as economies rarely sustain peak growth rates indefinitely. The Box-Cox transformation (green) surprises here, actually producing the most aggressive forecast, as it fits the exponential curvature of the historical data and projects that curve forward. The damped Box-Cox model (cyan) moderates this somewhat, combining the curvature adjustment with trend dampening. For a series like China’s GDP with clear exponential growth, the Box-Cox transformation is arguably most appropriate as it accounts for the multiplicative nature of economic growth, while adding a damped trend provides a sensible guard against over-extrapolation into the long-run forecast horizon.

#Question 8.7

aus_production |>
  autoplot(Gas) +
  labs(y = "Petajoules", x = "Year", title = "Australian Gas Production")

fit_gas <- aus_production |>
  model(
    additive = ETS(Gas ~ error("A") + trend("A") + season("A")),
    multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
    damped = ETS(Gas ~ error("M") + trend("Ad") + season("M"))
  )

fit_gas |>
  forecast(h = 20) |>
  autoplot(aus_production |> filter(year(Quarter) >= 2000), level = NULL) +
  labs(y = "Petajoules", x = "Year", title = "Australian Gas Production: ETS Forecasts")

The Australian Gas production series displays a clear pattern of increasing seasonal amplitude with the size of the quarterly fluctuations growing proportionally and the overall level of production increasing. This is precisely why multiplicative seasonality is necessary. An additive model assumes constant seasonal swings regardless of the series level, which would be inappropriate here. Looking at the zoomed forecast plot, all three models produce very similar forecasts, with the additive (red) and multiplicative (blue) models nearly indistinguishable. The damped model (green) shows a slightly more conservative forecast, with the seasonal peaks and troughs gradually flattening over the horizon. However, given the strong and consistent upward trend in gas production throughout the historical data, dampening the trend may actually underforecast. The undamped multiplicative model appears most appropriate here as it best reflects the sustained growth pattern observed in the data.

#Question 8.8 ##Example a

myseries <- aus_retail |>
  filter(`Series ID` == "A3349872X")

myseries |>
  autoplot(Turnover) +
  labs(y = "Turnover ($ Million)", x = "Year", 
       title = "NSW: Other Retailing Turnover")

Multiplicative seasonality is necessary for this series because the seasonal fluctuations grow proportionally with the level of turnover over time. In the early 1980s, the December spike (driven by Christmas retail spending) is relatively small in absolute terms, whereas by 2020 the same seasonal peak reaches over $1,300 million. The amplitude of the seasonal swings is not constant — it scales with the overall level of the series. An additive seasonal model would assume fixed seasonal effects regardless of the level, which would underestimate peaks and overestimate troughs as turnover grows. Multiplicative seasonality correctly captures this proportional relationship between the seasonal pattern and the trend level.

##Example b

fit_retail <- myseries |>
  model(
    hw_mult  = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    hw_damped = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
  )

fit_retail |>
  forecast(h = 36) |>
  autoplot(myseries |> filter(year(Month) >= 2010), level = NULL) +
  labs(y = "Turnover ($ Million)", x = "Year",
       title = "NSW Retail: Holt-Winters Multiplicative Forecasts")

Both the standard Holt-Winters multiplicative model and the damped trend variant capture the strong seasonal pattern well, correctly scaling the December peaks proportionally with the trend level. The undamped model (cyan) projects a steeper upward trend, forecasting turnover reaching around $1,600 million by the end of the horizon. The damped model (red) is more conservative, gradually tempering the trend growth over time. Given the consistent and strong upward trend observed throughout the historical data, the undamped model may be more appropriate here — the series shows little sign of the trend slowing, so dampening could lead to underforecasting.

##Example c

fit_retail |> accuracy() |> select(.model, RMSE)

The standard Holt-Winters multiplicative model hw_mult produces a slightly lower RMSE of 22.86 compared to the damped model’s RMSE of 23.22. While the difference is small, combined with the visual evidence from the forecast plot showing a strong and sustained upward trend in the data, the undamped Holt-Winters multiplicative model is preferred. The damped trend does not improve one-step forecast accuracy here, and given the consistent historical growth pattern, dampening the trend is unnecessary and likely to underforecast over longer horizons.

##Example d

augment(fit_retail |> select(hw_mult)) |>
  ggplot(aes(x = Month, y = .innov)) +
  geom_line() +
  labs(title = "Innovation Residuals", x = "Month", y = "Residuals")

augment(fit_retail |> select(hw_mult)) |>
  ACF(.innov) |>
  autoplot() +
  labs(title = "ACF of Residuals")

augment(fit_retail |> select(hw_mult)) |>
  ggplot(aes(x = .innov)) +
  geom_histogram(bins = 30) +
  labs(title = "Histogram of Residuals", x = "Residuals")

# Formal white noise test
fit_retail |>
  select(hw_mult) |>
  augment() |>
  features(.innov, ljung_box, lag = 24, dof = 3)

The residual plots show mostly encouraging signs — the innovation residuals are centered around zero with no obvious pattern or trend over time, and the histogram is approximately bell-shaped. However, the ACF plot shows several significant spikes exceeding the blue bounds, particularly around lags 6 and 12, suggesting some remaining autocorrelation in the residuals. This is confirmed by the Ljung-Box test, which returns a p-value of essentially zero (p < 0.001), strongly rejecting the null hypothesis of white noise. While the model captures the broad trend and seasonal pattern well, there is still some structure in the residuals that the model has not fully accounted for. The forecasts are still useful, but the prediction intervals may not be reliable due to this remaining autocorrelation.

##Example e

# Training data to end of 2010
myseries_train <- myseries |>
  filter(year(Month) <= 2010)

# Fit HW multiplicative and seasonal naive on training set
fit_test <- myseries_train |>
  model(
    hw_mult = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    snaive  = SNAIVE(Turnover)
  )

# Forecast over test period
fc_test <- fit_test |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

# Compare test set RMSE
fc_test |> accuracy(myseries) |> select(.model, RMSE)

Training the models to the end of 2010 and forecasting over the remaining test period, the Holt-Winters multiplicative model achieves a test set RMSE of 165.41, substantially outperforming the seasonal naïve benchmark which produces an RMSE of 283.27 — a improvement of roughly 42%. This is expected given the strong upward trend in the NSW retail turnover series. The seasonal naïve method only repeats last year’s values, so it fails to account for the continued growth in turnover over the test period, leading to systematic underforecasting. The Holt-Winters multiplicative model, by contrast, captures both the trend and the proportionally growing seasonal pattern, producing much more accurate forecasts over the long test horizon. Therefore, the Holt-Winters multiplicative method clearly beats the seasonal naïve approach for this series.

#Question 8.9

# Find optimal Box-Cox lambda
lambda <- myseries_train |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

# STL + ETS on Box-Cox transformed series
fit_stl <- myseries_train |>
  model(
    stl_ets = decomposition_model(
      STL(box_cox(Turnover, lambda)),
      ETS(season_adjust)
    )
  )

# Forecast over test period
fc_stl <- fit_stl |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

# Compare test set RMSE
bind_rows(
  fc_test |> accuracy(myseries) |> select(.model, RMSE),
  fc_stl  |> accuracy(myseries) |> select(.model, RMSE)
)

The STL decomposition with ETS applied to the Box-Cox transformed series substantially outperforms both previous methods, achieving a test set RMSE of 89.32 compared to 165.41 for the Holt-Winters multiplicative model and 283.27 for the seasonal naïve approach. This represents a 46% improvement over the best previous method. The combination of three techniques works together effectively — the Box-Cox transformation stabilises the growing variance in the series, the STL decomposition robustly handles the complex seasonal pattern, and ETS then models the trend in the seasonally adjusted data. This approach is more flexible than Holt-Winters because STL allows the seasonal component to change over time and is less sensitive to outliers, while the Box-Cox transformation ensures the error variance is more consistent across the full range of the series. Overall, the STL-ETS pipeline with Box-Cox transformation is clearly the best forecasting method for this retail series.

Exponential Smoothing

Kerwin Charles