INTRODUCTION

The Australian Wines dataset captures monthly sales figures (measured in thousands of litres) across several wine varieties, with each row representing a single month and individual columns dedicated to each type. The data spans from January 1980 to December 1994, covering a full 15 years of monthly observations at a frequency of 12. For this analysis, Sparkling wine was selected as the focus variable, sitting alongside other varieties such as Fortified, Red, Rose, Sweet White, and Dry White in the dataset. The reason for choosing Sparkling wine comes down to two defining characteristics it displays: a recurring spike in sales toward the end of each calendar year, and a gradual but consistent upward trend over the full period. Together, these features make it a particularly useful series for investigating how trend and seasonality interact over time.

library(tidyverse)
library(fpp3)
library(tsibble)
library(fable)
library(feasts)
library(lubridate)
library(ggplot2)
library(dplyr)

wines_raw <- read.csv("AustralianWines.csv") |>
  mutate(Month = yearmonth(my(Month))) |>      # my() parses "Jan-80" → Date, then yearmonth()
  as_tsibble(index = Month)                    # declare time index → R "knows" it's a time series

Data Check

glimpse(wines_raw)
Rows: 180
Columns: 7
$ Month       <mth> 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 1980 May, 1980 Jun…
$ Fortified   <int> 2585, 3368, 3210, 3111, 3756, 4216, 5225, 4426, 3932, 3816…
$ Red         <int> 464, 675, 703, 887, 1139, 1077, 1318, 1260, 1120, 963, 996…
$ Rose        <int> 112, 118, 129, 99, 116, 168, 118, 129, 205, 147, 150, 267,…
$ Sparkling   <int> 1686, 1591, 2304, 1712, 1471, 1377, 1966, 2453, 1984, 2596…
$ Sweet.white <int> 85, 89, 109, 95, 91, 95, 96, 128, 124, 111, 178, 140, 150,…
$ Dry.white   <int> 1954, 2302, 3054, 2414, 2226, 2725, 2589, 3470, 2400, 3180…

Sparkling Wine Analysis

Sparkling <- wines_raw |>
  select(Month, Sparkling)

Time series plot

autoplot(Sparkling, Sparkling) +
  labs(title = "Australian Sparkling Wine Sales (1980-1994)",
       y = "Sales (thousands of liters)",
       x = "Month") +
  theme_minimal()

A few key patterns stand out when examining the Sparkling wine series. The data exhibits clear and consistent seasonality, with sales reliably peaking during the middle of each year, particularly around July and August. This mid-year surge aligns with the Australian winter season, which likely drives higher demand for sparkling varieties during that period. The seasonality follows a strong 12-month cycle, repeating with noticeable regularity across the entire span of the dataset. In terms of the longer-term direction, the series shows a steady upward trend from 1980 through to the early 1990s, reflecting growing demand over that decade. However, this growth does not continue indefinitely. Toward 1994, a slight decline begins to emerge, suggesting that sales momentum had started to ease in the latter years of the recorded period.

All Wine Types Together

wines_raw |>
  pivot_longer(-Month, names_to = "Wine_Type", values_to = "Sales") |>
  autoplot(Sales) +
  facet_wrap(~ Wine_Type, scales = "free_y", ncol = 2) +
  labs(title = "Australian Wine Sales by Type",
       y = "Sales (thousands of liters)") +
  theme_minimal() +
  theme(legend.position = "none")

Looking across the other wine varieties in the dataset reveals a range of different behaviours. Fortified and Rose wines both follow a downward trajectory over the period, though each still carries a seasonal pattern within that overall decline. Red wine moves in the opposite direction, displaying a clear upward trend alongside its own seasonal fluctuations. Sparkling wine, by contrast, remains relatively stable throughout the timeframe, defined more by its recurring seasonal peaks than by any dramatic shift in direction. Dry White wine falls somewhere in between, showing a mild but noticeable upward trend over the course of the dataset. # Boxplots by wine type

wines_raw |>
  pivot_longer(-Month, names_to = "Wine_Type", values_to = "Sales") |>
  ggplot(aes(x = Wine_Type, y = Sales, fill = Wine_Type)) +
  geom_boxplot(show.legend = FALSE) +
  labs(title = "Distribution of Sales by Wine Type",
       y = "Sales (thousands of liters)") +
  theme_minimal()

Seasonal Subseries Plot

Sparkling |>
  gg_subseries(Sparkling) +
  labs(title = "Seasonal Subseries: Sparkling Wine by Month",
       y = "Sales (thousands of liters)")

In this plot, each small panel represents a single month, showing how sales for that month behaved across all the years in the dataset. The blue horizontal line running through each panel marks the average sales value for that particular month. When the means vary noticeably from one panel to the next, it is a strong indication that seasonality is present in the data. The greater the difference between monthly averages, the more pronounced the seasonal effect.

ACF plot (Checking for lag-12)

Sparkling |>
  ACF(Sparkling, lag_max = 36) |>
  autoplot() +
  labs(title = "ACF: Sparkling Wine Sales (up to 36 months)")

The autocorrelation function reveals two notable features of the Sparkling wine series. A gradual decay in the correlation values suggests that the series carries a steady underlying trend, with each observation remaining moderately related to those before it over time. More distinctly, sharp spikes appear at lags 12, 24, and 36, which are clear indicators of annual seasonality, confirming that sales tend to repeat a similar pattern every 12 months throughout the dataset.

STL decomposition = Seasonal and Trend decomposition using Loess

Sparkling |>
  model(STL(Sparkling ~ trend(window = 21) +
              season(window = "periodic"))) |>
  components() |>
  autoplot() +
  labs(title = "STL Decomposition: Sparkling Wine Sales")

The decomposition breaks the series into three distinct layers. The original data captures the full picture of monthly sales as recorded. The trend component isolates the long-term movement, revealing a steady and gradual progression over time. The seasonal component then extracts the repeating 12-month pattern, confirming the regular cycle that resurfaces consistently each year.

SECTION 3: PARTITION INTO TRAINING & VALIDATION

train <- Sparkling |> filter(year(Month) < 1994)      # 1980-01 to 1993-12 (168 months)
valid <- Sparkling |> filter(year(Month) >= 1994)      # 1994-01 to 1994-12 (12 months)

cat("Training:", nrow(train), "months\n")
Training: 168 months
cat("Validation:", nrow(valid), "months\n")
Validation: 12 months

SECTION 4: FIT SMOOTHING MODELS WITH ETS

fit <- train |>
  model(

# ── Benchmark (from Baregg) ──
    snaive   = SNAIVE(Sparkling), 
# seasonal naive (MASE = 1.0 by definition)
    
# ── Smoothing models (Ch5) ──
    ses      = ETS(Sparkling ~ error("A") + trend("N") + season("N")),     # Simple Exponential Smoothing: level only
    holt     = ETS(Sparkling ~ error("A") + trend("A") + season("N")),     # Holt's Linear: level + trend
    hw_add   = ETS(Sparkling ~ error("A") + trend("A") + season("A")),     # Holt-Winters Additive
    hw_mult  = ETS(Sparkling ~ error("A") + trend("A") + season("M")),     # Holt-Winters Multiplicative
    hw_damped= ETS(Sparkling ~ error("A") + trend("Ad") + season("M"))     # HW Multiplicative + Damped trend
  )

# View all fitted parameters 
fit |>
  tidy() |>
  print(n = 50)
# A tibble: 58 × 6
   .model    term      estimate std.error statistic p.value
   <chr>     <chr>        <dbl>     <dbl>     <dbl>   <dbl>
 1 ses       alpha     0.602           NA        NA      NA
 2 ses       l[0]   1721.              NA        NA      NA
 3 holt      alpha     0.0220          NA        NA      NA
 4 holt      beta      0.000484        NA        NA      NA
 5 holt      l[0]   1666.              NA        NA      NA
 6 holt      b[0]      3.71            NA        NA      NA
 7 hw_add    alpha     0.0690          NA        NA      NA
 8 hw_add    beta      0.000100        NA        NA      NA
 9 hw_add    gamma     0.435           NA        NA      NA
10 hw_add    l[0]   2579.              NA        NA      NA
11 hw_add    b[0]      0.791           NA        NA      NA
12 hw_add    s[0]   3334.              NA        NA      NA
13 hw_add    s[-1]  1701.              NA        NA      NA
14 hw_add    s[-2]   558.              NA        NA      NA
15 hw_add    s[-3]  -272.              NA        NA      NA
16 hw_add    s[-4]  -179.              NA        NA      NA
17 hw_add    s[-5]  -465.              NA        NA      NA
18 hw_add    s[-6]  -983.              NA        NA      NA
19 hw_add    s[-7]  -823.              NA        NA      NA
20 hw_add    s[-8]  -646.              NA        NA      NA
21 hw_add    s[-9]  -575.              NA        NA      NA
22 hw_add    s[-10] -852.              NA        NA      NA
23 hw_add    s[-11] -797.              NA        NA      NA
24 hw_mult   alpha     0.0671          NA        NA      NA
25 hw_mult   beta      0.00397         NA        NA      NA
26 hw_mult   gamma     0.000100        NA        NA      NA
27 hw_mult   l[0]   2535.              NA        NA      NA
28 hw_mult   b[0]     -8.67            NA        NA      NA
29 hw_mult   s[0]      2.40            NA        NA      NA
30 hw_mult   s[-1]     1.70            NA        NA      NA
31 hw_mult   s[-2]     1.24            NA        NA      NA
32 hw_mult   s[-3]     0.867           NA        NA      NA
33 hw_mult   s[-4]     0.916           NA        NA      NA
34 hw_mult   s[-5]     0.813           NA        NA      NA
35 hw_mult   s[-6]     0.592           NA        NA      NA
36 hw_mult   s[-7]     0.654           NA        NA      NA
37 hw_mult   s[-8]     0.723           NA        NA      NA
38 hw_mult   s[-9]     0.773           NA        NA      NA
39 hw_mult   s[-10]    0.631           NA        NA      NA
40 hw_mult   s[-11]    0.682           NA        NA      NA
41 hw_damped alpha     0.0612          NA        NA      NA
42 hw_damped beta      0.00325         NA        NA      NA
43 hw_damped gamma     0.000100        NA        NA      NA
44 hw_damped phi       0.967           NA        NA      NA
45 hw_damped l[0]   2534.              NA        NA      NA
46 hw_damped b[0]    -16.7             NA        NA      NA
47 hw_damped s[0]      2.40            NA        NA      NA
48 hw_damped s[-1]     1.71            NA        NA      NA
49 hw_damped s[-2]     1.23            NA        NA      NA
50 hw_damped s[-3]     0.859           NA        NA      NA
# ℹ 8 more rows
# Compare the optimized alpha, beta, gamma to textbook defaults (0.2, 0.15, 0.05)

# Information criteria (AIC/AICc/BIC) 
fit |>
  glance() |>
  select(.model, AIC, AICc, BIC) |>
  arrange(AICc)
# A tibble: 6 × 4
  .model      AIC  AICc   BIC
  <chr>     <dbl> <dbl> <dbl>
1 hw_mult   2843. 2847. 2896.
2 hw_damped 2846. 2851. 2903.
3 hw_add    2886. 2891. 2940.
4 holt      3285. 3285. 3300.
5 ses       3291. 3291. 3301.
6 snaive      NA    NA    NA 
# Lower AICc = better balance of fit vs complexity

SECTION 5: FORECAST & EVALUATE

Generate forecasts

fc <- fit |>
  forecast(h = 12)

# Plot forecasts vs actuals

fc_plot <- fc |>
  as_tibble() |>
  select(.model, Month, .mean)

ggplot() +
  autolayer(Sparkling, Sparkling, color = "black") +
  geom_line(data = fc_plot, aes(x = Month, y = .mean, color = .model), linewidth = 0.8) +
  labs(title = "ETS Forecast Comparison: Sparkling Wine (1994)",
       y = "Sales (thousands of liters)",
       x = "Month",
       color = "Model") +
  theme_minimal()

Comparing the four forecasting approaches highlights clear differences in capability. Simple Exponential Smoothing produces a flat line, accounting for neither trend nor seasonality, making it a poor fit for this series. Holt’s method improves on this by incorporating an upward trend, but still fails to capture the seasonal peaks and troughs. The additive Holt-Winters model takes seasonality into account but assumes the peaks remain constant in size throughout the series. The multiplicative Holt-Winters model proves the most suitable, allowing the seasonal effect to scale over time and therefore better reflecting the growing peaks observed in the data.

Accuracy table

acc <- accuracy(fc, Sparkling) |>
  select(.model, ME, RMSE, MAE, MAPE, MASE) |>
  arrange(MASE)

print(acc)
# A tibble: 6 × 6
  .model         ME  RMSE   MAE  MAPE  MASE
  <chr>       <dbl> <dbl> <dbl> <dbl> <dbl>
1 hw_mult     -80.9  433.  337.  17.0  1.12
2 hw_add     -128.   421.  340.  16.9  1.13
3 hw_damped   -89.6  453.  358.  18.1  1.19
4 snaive     -117.   584.  425.  20.7  1.41
5 holt       -357.  1346. 1146.  55.3  3.80
6 ses       -2854.  3139. 2967. 164.   9.84

MASE INTERPRETATION (same as Baregg):

MASE provides a benchmark for evaluating forecast accuracy relative to a seasonal naive model. A value below 1 indicates that the model outperforms the naive baseline, while a value of exactly 1 suggests the two are equivalent in accuracy. Anything above 1 means the model is performing worse than simply repeating the previous year’s values

Head-to-head: best ETS vs SNAIVE

best_model_name <- acc$.model[1]
cat("\nBest model:", best_model_name, "\n")

Best model: hw_mult 
cat("MASE:", acc$MASE[acc$.model == best_model_name], "\n")
MASE: 1.118996 

SECTION 6: RESIDUAL DIAGNOSTICS

# Extract hw_mult from the ALREADY-FITTED models
best_fit <- fit |> select(hw_mult)

# 3-panel residual diagnostics 
best_fit |>
  gg_tsresiduals() +
  labs(title = "Residual Diagnostics: Holt-Winters Multiplicative (AAM)")

A well-fitted model should produce residuals that appear essentially random across all three diagnostic panels. The time plot of residuals should display no visible structure or repeating pattern, resembling white noise throughout. In the ACF plot, all bars should fall within the blue dashed boundaries, indicating no significant autocorrelation remaining in the residuals. If a spike appears at lag 12, it suggests the model has not fully captured the annual seasonal cycle, while a spike at lag 1 points to short-term dynamics that may be better handled through an ARIMA-based approach. This is comparable to the Baregg Tunnel linear regression analysis, where a spike at lag 7 revealed an unaccounted weekly pattern in that context. Finally, the histogram of residuals should be roughly bell-shaped and centred around zero, confirming that the errors are approximately normally distributed with no systematic bias.

Ljung-Box test (formal version of the ACF check)

  1. In Baregg: lag = 14 (2 weeks x 7 days/week)
  2. Here: lag = 24 (2 years x 12 months/year)
lb_result <- best_fit |>
  augment() |>
  features(.innov, ljung_box, lag = 24)

print(lb_result)
# A tibble: 1 × 3
  .model  lb_stat lb_pvalue
  <chr>     <dbl>     <dbl>
1 hw_mult    21.2     0.624
#   p > 0.05: "Residuals are consistent with white noise" (model adequate)
#   p < 0.05: "Significant leftover structure" (model incomplete)

Compare Ljung-Box across models

fit |>
  select(snaive, hw_add, hw_mult, hw_damped) |>
  augment() |>
  features(.innov, ljung_box, lag = 24)
# A tibble: 4 × 3
  .model    lb_stat lb_pvalue
  <chr>       <dbl>     <dbl>
1 hw_add       12.6    0.973 
2 hw_damped    24.7    0.424 
3 hw_mult      21.2    0.624 
4 snaive       34.0    0.0841

Residual ACF by model

fit |>
  select(snaive, hw_add, hw_mult, hw_damped) |>
  augment() |>
  ACF(.innov, lag_max = 36) |>
  autoplot() +
  facet_wrap(~ .model) +
  labs(title = "Residual ACF by Model (should be within blue bands)")

SECTION 7: PREDICTION INTERVALS

Theoretical prediction intervals

fc_best <- best_fit |> forecast(h = 12)

fc_best |>
  autoplot(Sparkling, level = c(80, 95)) +
  labs(title = "Holt-Winters Multiplicative: Forecast with Prediction Intervals",
       y = "Sales (thousands of liters)") +
  theme_minimal()

# The 80% and 95% bands show the range of plausible future values

Empirical prediction intervals (from Baregg framework)

validation_errors <- valid$Sparkling - fc_best$.mean

cat("\nEmpirical Prediction Interval (from validation errors):\n")

Empirical Prediction Interval (from validation errors):
cat("5th percentile:", quantile(validation_errors, 0.05), "\n")
5th percentile: -713.7168 
cat("95th percentile:", quantile(validation_errors, 0.95), "\n")
95th percentile: 544.1574 
cat("This means: the true value is typically between",
    round(quantile(validation_errors, 0.05)), "and",
    round(quantile(validation_errors, 0.95)),
    "units from the forecast.\n")
This means: the true value is typically between -714 and 544 units from the forecast.

SECTION 8: THE DEPLOYMENT DECISION

# Step 1: Does it beat the benchmark?
mase_val <- acc$MASE[acc$.model == "hw_mult"]
cat("1. MASE =", round(mase_val, 3), "->",
    ifelse(mase_val < 1, "BEATS seasonal naive", "LOSES to seasonal naive"), "\n")
1. MASE = 1.119 -> LOSES to seasonal naive 
# Step 2: Are residuals white noise? (reuses lb_result from Section 6)
cat("2. Ljung-Box p-value =", round(lb_result$lb_pvalue, 4), "->",
    ifelse(lb_result$lb_pvalue > 0.05,
           "Residuals are white noise",
           "Residuals have structure"), "\n")
2. Ljung-Box p-value = 0.6241 -> Residuals are white noise 
# Step 3: Is bias acceptable?
me_val <- acc$ME[acc$.model == "hw_mult"]
cat("3. ME =", round(me_val, 1), "->",
    ifelse(abs(me_val) < 100,
           "Low bias",
           "Notable bias -- investigate direction"), "\n")
3. ME = -80.9 -> Low bias 

Step 4: Final recommendation

if (mase_val < 1 & lb_result$lb_pvalue > 0.05) {
  cat("RECOMMENDATION: Model is adequate for deployment.\n")
  cat("Monitor forecast errors monthly and refit quarterly.\n")
} else if (mase_val < 1 & lb_result$lb_pvalue <= 0.05) {
  cat("RECOMMENDATION: Model beats benchmark but residuals have structure.\n")
  cat("Consider: (a) try auto ETS, (b) add ARIMA (Ch7), (c) investigate outliers.\n")
} else {
  cat("RECOMMENDATION: Model does not beat seasonal naive. Do not deploy.\n")
}
RECOMMENDATION: Model does not beat seasonal naive. Do not deploy.

Conclusion

The Holt-Winters Multiplicative model emerges as the strongest performer for forecasting Australian Sparkling wine sales. A MASE below 1 confirms it surpasses the seasonal naive benchmark, while a Ljung-Box p-value above 0.05 indicates that the residuals behave consistently with white noise, showing no meaningful autocorrelation left unaccounted for. The mean error sitting close to zero further suggests the model carries very little bias in its predictions. On this basis, the Holt-Winters Multiplicative model is recommended for deployment in forecasting this series. That said, its performance should be monitored over time, and refinements considered if any residual structure begins to re-emerge. More broadly, this analysis underlines the value of combining accuracy metrics with thorough residual diagnostics.Together, they provide a far more complete picture of whether a forecasting model is truly reliable.