Time Series Analysis of Monthly Beer Production in Australia

Author

Marek Holík

1 Introduction

A dataset with monthly beer production in Australia was selected for time series analysis. The dataset contains data on monthly beer production in million of liters from January 1956 to August 1995. The aim of this analysis is to identify trends, seasonality, and cyclical components, as well as to predict future beer production values. Dataset source: Kaggle - Time series datasets.

Used Libraries
library(ggplot2)
library(lubridate)
library(dplyr)
library(forecast)
library(splines)
library(tseries)
library(lmtest)
library(knitr)

2 Data Preparation and Exploration

Firstly we need to perform data preparation, which includes loading the data, transforming it into a suitable format, and performing basic exploratory data analysis. Dataset contains 476 rows and 2 columns.

Extract, Load and Transform Data
data <- read.csv("monthly-beer-prod.csv")
data$Month <- as.Date(paste0(data$Month, "-01"), format = "%Y-%m-%d")
data$Monthly.beer.production <- as.numeric(data$Monthly.beer.production)
data$Time <- seq(1, nrow(data))
data <- data[, c("Time", setdiff(names(data), "Time"))]
Exploratory Data Analysis
stats <- data.frame(
  Minimum = min(data$Monthly.beer.production),
  Maximum = max(data$Monthly.beer.production),
  Median = median(data$Monthly.beer.production),
  Mean = mean(data$Monthly.beer.production),
  SD = sd(data$Monthly.beer.production)
)

stats_text <- paste(
  sprintf("Min:%6.1f", stats$Minimum),
  sprintf("Max:%6.1f", stats$Maximum),
  sprintf("Median:%6.1f", stats$Median),
  sprintf("Mean:%6.1f", stats$Mean),
  sprintf("SD:%6.1f", stats$SD),
  sep = "\n")

ggplot(data, aes(x = Month, y = Monthly.beer.production)) +
  geom_line(color = "steelblue", linewidth = 1) +
  labs(
    title = "Monthly Beer Production in Australia",
    x = "Month",
    y = "Monthly Beer Production"
  ) +
  theme_minimal() + 
    annotate(
    "rect",
    xmin = min(data$Month), xmax = min(data$Month) + 150,  # ~5 months to the right from start
    ymin = max(data$Monthly.beer.production)*0.6, ymax = max(data$Monthly.beer.production)*1.05,
    alpha = 0.15, fill = "white"
  ) +
  annotate(
    "text",
    x = min(data$Month) + 10,  # slightly right from the left edge
    y = max(data$Monthly.beer.production)*1.05,  # at the top edge of rectangle
    label = stats_text,
    hjust = 0, vjust = 1,  # left align, top vertical align
    size = 4, color = "black", lineheight = 1.1
  )

Figure 1: Monthly Beer Production in Australia

As we can see from the plot Figure 1, the monthly beer production in Australia shows a clear upward trend over the years, with some seasonal fluctuations. We will confirm this by performing a stationarity tests.

2.1 Stationarity Tests

2.1.1 ADF (Augmented Dickey-Fuller) Test

The ADF test checks for the presence of a unit root in the time series, which indicates non-stationarity.

Null hypothesis (H₀): The time series has a unit root -> it is non-stationary (p-value > 0.05).
Alternative hypothesis (H₁): The time series does not have unit root -> is stationary (p-value < 0.05).

ADF test
adf_result <- adf.test(data$Monthly.beer.production)
#print(adf_result["p.value"])

ADF test result shows that the p-value is 0.01 which is less than 0.05. We reject the null hypothesis and conclude that the series is stationary, but as we can see from the plot, there is a clear trend in the data. This means that we need to perform some more tests to confirm the non-stationarity of the series.

2.1.2 KPSS (Kwiatkowski-Phillips-Schmidt-Shin) Test

The KPSS test checks for stationarity around a deterministic trend.

Null hypothesis (H₀): The time series is stationary around deterministic trend (p-value > 0.05). Alternative hypothesis (H₁): The time series is non-stationary around deterministic trend(p-value < 0.05).

KPSS test
kpss_result <- kpss.test(data$Monthly.beer.production, null = "Level")
#print(kpss_result["p.value"])

KPSS test result shows that the p-value is 0.01 which is less than 0.05. We reject the null hypothesis and conclude that the series is non-stationary. This confirms our initial observation from the plot that there is a trend in the data.

2.1.3 ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) Analysis

As we can see from the ACF plot, there is a significant autocorrelation at lag 12, which indicates a seasonal component in the data. The PACF plot is very slowly approaching zero, which suggests that the series is not stationary and has a trend component. This confirms our previous observations.

ACF and PACF Analysis
par(mfrow = c(1, 2))  # 1 row, 2 columns layout

acf(data$Monthly.beer.production, lag.max = 60, main = "ACF of Monthly Beer Production")
pacf(data$Monthly.beer.production, lag.max = 60, main = "PACF of Monthly Beer Production")

2.1.4 Decompose the time series to visualize trend and seasonality

Using the decompose function, we can break down the time series into its components: trend, seasonal, and random (residual) components. This will help us to visualize the underlying patterns in the data.

Decompose Time Series
ts_train <- ts(data$Monthly.beer.production, start = c(1956, 1), frequency = 12)
decomposed <- decompose(ts_train)
plot(decomposed)

3 Data Splitting, Training and Testing Sets

To perform time series analysis, we need to split the data into training and testing sets. The training set will be used to fit the model, testing set will be used to evaluate the model’s performance. We will use 80% of the data for training and 20% for testing.

Split Data into Training and Testing Sets
train_size <- floor(0.8 * nrow(data))
train_data <- data[1:train_size, ]
test_data <- data[(train_size + 1):nrow(data), ]

train_data$Set <- "Train"
test_data$Set <- "Test"
combined_data <- bind_rows(train_data, test_data)


x_range <- range(data$Month)
y_range <- range(data$Monthly.beer.production, na.rm = TRUE)

rects <- data.frame(
  xmin = c(x_range[1], x_range[1] + (x_range[2] - x_range[1]) * 0.8),
  xmax = c(x_range[1] + (x_range[2] - x_range[1]) * 0.8, x_range[2]),
  ymin = y_range[1],
  ymax = y_range[2],
  Set = c("Train Period", "Test Period")
)


ggplot(combined_data, aes(x = Month, y = Monthly.beer.production, color = Set)) +
  geom_line(linewidth = 0.5) +
  geom_rect(data = rects, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = Set), alpha = 0.15, inherit.aes = FALSE) +
  scale_color_manual(values = c("Train" = "blue", "Test" = "red")) +
  scale_fill_manual(values = c("Train Period" = "blue", "Test Period" = "red")) +
  labs(
    x = "Month",
    y = "Monthly Beer Production",
    title = "Training and Testing Data",
    color = "Data Set",
    fill = "Period"
  ) +
  theme_minimal()

4 Decomposition Model

In this section, we will decompose monthly beer production time series into trend, season and cyclical components. General approach is to fit a trend model, then estimate the seasonal component using sine and cosine terms, and finally analyze the residuals to identify cyclical patterns.

4.1 Best Trend Model Selection

Estimating the trend component is crucial for understanding the underlying patterns in the data. We will fit several trend models to the training data and select the best one based on AIC (Akaike Information Criterion).

Best Trend Model Selection
trend_lm1 <- lm(Monthly.beer.production ~ Time, data = train_data)
trend_lm2 <- lm(Monthly.beer.production ~ Time + I(Time^2), data = train_data)
trend_lm3 <- lm(Monthly.beer.production ~ Time + I(Time^2) + I(Time^3), data = train_data)
trend_lm4 <- lm(log(Monthly.beer.production) ~ Time, data = train_data)

For our analysis, we will select the quadratic trend model (trend_lm2) as it fits all data overall.

Plot Trend Models
aic_results <- AIC(trend_lm1, trend_lm2, trend_lm3, trend_lm4)
legend_labels <- c(
  paste0("Linear (AIC= ", round(aic_results["trend_lm1", "AIC"], 1), ")"),
  paste0("Quadratic (AIC= ", round(aic_results["trend_lm2", "AIC"], 1), ")"),
  paste0("Cubic (AIC= ", round(aic_results["trend_lm3", "AIC"], 1), ")"),
  paste0("Logarithmic (AIC= ", round(aic_results["trend_lm4", "AIC"], 1), ")")
)

plot(train_data$Month, train_data$Monthly.beer.production, type = "l", xlim = range(train_data$Month, test_data$Month),
     xlab = "Month", ylab = "Beer Production", main = "Trend Model Comparison")

lines(train_data$Month, predict(trend_lm1), col = "blue", lwd = 2)
lines(train_data$Month, predict(trend_lm2), col = "red", lwd = 2)
lines(train_data$Month, predict(trend_lm3), col = "green", lwd = 2)
lines(train_data$Month, exp(predict(trend_lm4)), col = "purple", lwd = 2)

lines(test_data$Month, test_data$Monthly.beer.production, col = "darkgrey", lwd = 1)
lines(test_data$Month, predict(trend_lm1, newdata = test_data), col = "blue", lwd = 2, lty=2)
lines(test_data$Month, predict(trend_lm2, newdata = test_data), col = "red", lwd = 2, lty=2)
lines(test_data$Month, predict(trend_lm3, newdata = test_data), col = "green", lwd = 2, lty=2)
lines(test_data$Month, exp(predict(trend_lm4, newdata = test_data)), col = "purple", lwd = 2, lty=2)

legend("topleft", legend = legend_labels,
       col = c("blue", "red", "green", "purple"), lty = 1, lwd = 2)

4.2 Residuals Analysis After Trend Model

After fitting the trend model, we will analyze the residuals to check for autocorrelation and seasonality. As we cam see from the ACF and PACF plots, the residuals after removing the trend show some autocorrelation, indicating that there is still some structure left in the data.

Trend Model Summary and Residuals Analysis
resid_trend <- residuals(trend_lm2) 
par(mfrow = c(1, 2))  
acf(resid_trend, lag.max = 60, main = "ACF of Residuals After Trend")
pacf(resid_trend, lag.max = 60, main = "PACF of Residuals After Trend")

4.3 Seasonal Component Estimation

To estimate the seasonal component there are 2 approaches. First uses indication functions, second will use sine and cosine terms. We will fit a linear model for both these cases to the residuals after removing the trend.

Seasonal Component Estimation
train_data$sin12 <- sin(2 * pi * train_data$Time / 12)
train_data$cos12 <- cos(2 * pi * train_data$Time / 12)
test_data$sin12 <- sin(2 * pi * test_data$Time / 12)
test_data$cos12 <- cos(2 * pi * test_data$Time / 12)
train_data$MonthNum <- as.factor(format(train_data$Month, "%m"))
test_data$MonthNum <- as.factor(format(test_data$Month, "%m"))
train_data$MonthNum <- as.factor((train_data$Time - 1) %% 12 + 1)
test_data$MonthNum <- as.factor((test_data$Time - 1) %% 12 + 1)

season_model <- lm(resid_trend ~ sin12 + cos12, data = train_data)

t_seq <- 1:12
season_vals <- coef(season_model)["sin12"] * sin(2 * pi * t_seq / 12) +
               coef(season_model)["cos12"] * cos(2 * pi * t_seq / 12)

model_TS <- lm(Monthly.beer.production ~ Time + I(Time^2) + sin12 + cos12, data = train_data)
model_TS_dummies <- lm(Monthly.beer.production ~ Time + I(Time^2) + MonthNum, data = train_data)

plot(train_data$Time, resid_trend, type = "l", col = "black", lwd = 1,
     xlim = range(train_data$Time, test_data$Time),
     ylim = range(resid_trend, fitted(season_model), na.rm = TRUE),
     xlab = "Time", ylab = "Residuals After Trend",
     main = "Residuals After Trend Model")
lines(train_data$Time, fitted(season_model), col = "blue", lwd = 2)
lines(test_data$Time, predict(season_model, newdata = test_data), col = "blue", lwd = 2, lty=2)

Seasonal Component Estimation
plot(model_TS$model$Time, model_TS$fitted.values, type = "l", col = "red", lwd = 3,
     xlim = range(train_data$Time, test_data$Time),
     ylim = range(train_data$Monthly.beer.production, test_data$Monthly.beer.production, na.rm = TRUE),
     xlab = "Time", ylab = "Fitted Values",
     main = "Fitted Values from Trend + Seasonal Model")

lines(train_data$Time, train_data$Monthly.beer.production, col = "black", lwd = 1)
lines(test_data$Time, test_data$Monthly.beer.production, col = "darkgrey", lwd = 1)
lines(test_data$Time, predict(model_TS, newdata = test_data), col = "red", lwd = 1, lty=2)

Seasonal Component Estimation
plot(train_data$Time, train_data$Monthly.beer.production, type = "l", col = "black", lwd = 1,
     xlim = range(train_data$Time, test_data$Time),
     ylim = range(train_data$Monthly.beer.production, test_data$Monthly.beer.production, na.rm = TRUE),
     xlab = "Time", ylab = "Monthly Beer Production", main = "Trend + Seasonal (Dummies) Fit")
lines(train_data$Time, fitted(model_TS_dummies), col = "blue", lwd = 2)
lines(test_data$Time, test_data$Monthly.beer.production, col = "darkgrey", lwd = 1)
lines(test_data$Time,  predict(model_TS_dummies, newdata = test_data), col = "blue", lwd = 2, lty = 2)

As we can see from the plots, both models capture the seasonal component well. The model with dummy variables fits the data slightly better than the model with sine and cosine. For final estimation, we will use dumy variables model.

4.4 Cyclical Component Estimation

To estimate the cyclical component, we will analyze the residuals after removing both the trend and seasonal components. We will calculate the spectrum of the residuals to identify the dominant frequencies and then add sine and cosine terms for this cycle to the model. As dummy variables model does not capture the cyclical component, we will use the model with sine and cosine terms for the cyclical component estimation.

Based on the spectral analysis, we determined the dominant cycle period = 4 months.

Cyclical Component Estimation
model_season <- lm(Monthly.beer.production ~ I(Time) + I(Time^2) + sin12 + cos12, data = train_data)
resid_season_removed <- residuals(model_season)

spec <- spectrum(resid_season_removed, log = "no", plot = TRUE, main = "Periodogram after Seasonality Removal")

spec_df <- data.frame(
  frequency = spec$freq,
  spectrum = spec$spec
)

top_peaks <- spec_df[order(-spec_df$spectrum), ][1:3, ]
top_peaks$amplitude <- sqrt(2 * top_peaks$spectrum)

peak_freq <- top_peaks$frequency[1]
T_cycle <- 1 / peak_freq


train_data$sin_cycle <- sin(2 * pi * train_data$Time / T_cycle)
train_data$cos_cycle <- cos(2 * pi * train_data$Time / T_cycle)
test_data$sin_cycle <- sin(2 * pi * test_data$Time / T_cycle)
test_data$cos_cycle <- cos(2 * pi * test_data$Time / T_cycle)

legend_text <- paste0(
  "Top frequencies & amplitudes:\n",
  sprintf("f = %.3f, A = %.2f\n", top_peaks$frequency, top_peaks$amplitude)
)

legend("topright", legend = legend_text, bty = "n", cex = 0.8, text.col = "black")

4.5 Full Model with Trend, Seasonal, and Cyclical Components

As we can see from Residuals of the full model, the ACF and PACF plots show small autocorrelation values, indicating that the residuals are approximately white noise. The Ljung-Box and Box-Pierce tests confirm that there is autocorrelation in the residuals, so our model does not fully capture the cyclical component. However, the model is still useful for capturing the trend and seasonal components of the data.

Full Model Summary and Residuals Analysis
full_model <- lm(Monthly.beer.production ~ Time + I(Time^2) + MonthNum + sin_cycle + cos_cycle, data = train_data)

residuals_full <- residuals(full_model)
lb <- Box.test(residuals_full, lag = 20, type = "Ljung-Box")
bp <- Box.test(residuals_full, lag = 20, type = "Box-Pierce")

par(mfrow = c(2, 2))

acf(residuals_full, main = "ACF reziduals")
pacf(residuals_full, main = "PACF residuals")

plot(train_data$Month, residuals_full, type = "l", col = "red",
     main = "Reziduals from model", xlab = "Month", ylab = "Reziduals")


plot(train_data$Month, train_data$Monthly.beer.production, type = "l", col = "black", lwd = 1,
     xlim = range(train_data$Month, test_data$Month),
     xlab = "Month", ylab = "Monthly Beer Production", main = "Fitted Values from Trend + Seasonal(Dummies) + Cyclical Model")
lines(train_data$Month, fitted(full_model), col = "blue", lwd = 2)
lines(test_data$Month, test_data$Monthly.beer.production, col = "darkgrey", lwd = 1)
lines(test_data$Month, predict(full_model, newdata = test_data), col = "blue", lwd = 2, lty=2)

4.6 Summary of Decompostion model

The final decomposition model for the monthly beer production in Australia can be interpreted as:

\[ \begin{aligned} \hat{Y}_t =\ & 101.82 + 0.231 \cdot t - 0.000181 \cdot t^2 \\ & - 8.85 \cdot \mathbf{1}_{\text{Month}=2} + 1.40 \cdot \mathbf{1}_{\text{Month}=3} - 12.15 \cdot \mathbf{1}_{\text{Month}=4} \\ & - 16.97 \cdot \mathbf{1}_{\text{Month}=5} - 30.28 \cdot \mathbf{1}_{\text{Month}=6} - 20.98 \cdot \mathbf{1}_{\text{Month}=7} \\ & - 14.79 \cdot \mathbf{1}_{\text{Month}=8} - 9.68 \cdot \mathbf{1}_{\text{Month}=9} + 7.18 \cdot \mathbf{1}_{\text{Month}=10} \\ & + 13.98 \cdot \mathbf{1}_{\text{Month}=11} + 28.09 \cdot \mathbf{1}_{\text{Month}=12} \\ & - 16.93 \cdot \sin\left(\frac{2\pi t}{4}\right) - 8.68 \cdot \cos\left(\frac{2\pi t}{4}\right) \end{aligned} \] This model captures the trend, seasonal, and cyclical components of the monthly beer production time series. The coefficients for the month dummies indicate the seasonal effects for each month, while the sine and cosine terms capture the cyclical component.

4.7 Variance of Components

Variance of Components
coefs <- coef(full_model)

trend_comp <- coefs["(Intercept)"] + 
              coefs["Time"] * train_data$Time + 
              coefs["I(Time^2)"] * (train_data$Time^2)


month_dummies <- model.matrix(~ factor(train_data$MonthNum) - 1)
colnames(month_dummies) <- paste0("MonthNum", 1:12)
season_comp <- rep(0, nrow(train_data))


month_vars <- paste0("MonthNum", 2:12)
for (m in month_vars) {
  if (m %in% names(coefs)) {
    season_comp <- season_comp + coefs[m] * month_dummies[, m]
  }
}

cycle_comp <- 0
if ("sin_cycle" %in% names(coefs) && "cos_cycle" %in% names(coefs)) {
  cycle_comp <- coefs["sin_cycle"] * train_data$sin_cycle + coefs["cos_cycle"] * train_data$cos_cycle
}

var_trend <- var(trend_comp)
var_season <- var(season_comp)
var_cycle <- var(cycle_comp)


intensity_df <- data.frame(
  Component = c("Trend", "Seasonality", "Cycle"),
  Variance = c(var_trend, var_season, var_cycle)
)
knitr::kable(intensity_df, caption = "Variance of Components", digits = 4, align = "c")
Variance of Components
Component Variance
Trend 318.7292
Seasonality 239.6483
Cycle 182.6448

The table shows that the trend component has the highest variability, which is expected given the long-term growth in beer production. The seasonal component has lower variability, but is still significant, indicating the presence of seasonal fluctuations in beer production. The cyclical component has the lowest variability, indicating that cyclical patterns are less pronounced compared to the trend and seasonality. The data makes sense, as the population is growing and seasonality is determined by the seasons, with more beer being consumed/produced in summer than in winter.

5 SARIMA Model

In this section, we will fit a SARIMA model to the training data. We will first perform a Box-Cox transformation to stabilize the variance, then apply differencing to make the series stationary.

Box-Cox Transformation and Differencing
#box cox 
log_data <- BoxCox(ts(train_data$Monthly.beer.production, start = c(1956, 1), frequency = 12), lambda = "auto")
diff1_log <- diff(log_data, differences = 1)
par(mfrow = c(2, 1))
layout(matrix(c(1, 1, 2, 3), nrow = 2, byrow = TRUE))
plot(diff1_log, type = "l", col = "black",
     xlab = "Time", ylab = "First Difference",
     main = "First Differenced Time Series")

acf(diff1_log, lag.max = 60, main = "ACF of First Differenced Log Data")
pacf(diff1_log, lag.max = 60, main = "PACF of First Differenced Log Data")

Box-Cox Transformation and Differencing
layout(matrix(c(1,1,2,3), nrow = 2, byrow = TRUE))

diff_seasonal_log <- diff(diff1_log, lag = 12)
plot(diff_seasonal_log, type = "l", col = "black",
     xlab = "Time", ylab = "Seasonally Differenced",
     main = "Seasonally Differenced Time Series")

acf(diff_seasonal_log, lag.max = 60, main = "ACF of Seasonally Differenced Log Data")
pacf(diff_seasonal_log, lag.max = 60, main = "PACF of Seasonally Differenced Log Data")

We used auto.arima function to automatically select the best SARIMA model based on AIC. We also fit an ARFIMA model to compare the results. The auto.arima function will help us to identify the best orders for the AR, I, and MA components, as well as the seasonal components.

Results of auto.arima function on differenced box Cox dataset indicate that the best model is SARIMA(0,0,3)(0,0,2)[12]. We performed same auto.arima function on train dataset and results are very similar, only difference is presence of differencing terms in both parts of model. Our results show that the best SARIMA model is SARIMA(0,1,3)(0,1,2)[12], which means that we have 0 AR terms, 1 differencing term, 3 MA terms, and 1 seasonal differencing term with a period of 12 months and 2 seasonal MA terms. The ARFIMA model also shows similar results, with a fractional differencing parameter of 0.5.

SARIMA Model Fitting
ts_train1 <- ts(train_data$Monthly.beer.production, start = c(1956, 1), frequency = 12)
auto_model <- auto.arima(ts_train1, seasonal = TRUE, stepwise = FALSE, approximation = FALSE)
arfima_model <- arfima(ts_train1)

#summary(auto_model)
#summary(arfima_model)

5.1 Residuals Analysis

We will check the residuals of the SARIMA and ARFIMA models to ensure that they are white noise. We will use the checkresiduals function from the forecast package, which provides ACF plots and Ljung-Box tests for residuals.

SARIMA Model Summary and Residuals Analysis
checkresiduals(auto_model, lag = 20, test = FALSE)

ARFIMA Model Summary and Residuals Analysis
checkresiduals(arfima_model, test = FALSE)

As we can see from the ACF, SARIMA model residuals show almost zero autocorrelation, indicating that the model captures the time series well. The ARFIMA model residuals show some autocorrelation on lag 12, indicating that the model does not fully capture the seasonality. However, both models have residuals that are approximately white noise.

5.2 Forecasting with SARIMA Model

We will use the fitted SARIMA model to forecast rest of the test data. Values can be calculated using equation below.

SARIMA Model Forecasting
predicted_values <- fitted(auto_model)
sarima_forecast <- forecast(auto_model, h = nrow(test_data))

# Plot the fitted values
plot(train_data$Month, train_data$Monthly.beer.production, type = "l", col = "black", lwd = 1, 
     xlim= range(train_data$Month, test_data$Month),
     xlab = "Month", ylab = "Monthly Beer Production", main = "Fitted Values from SARIMA Model")

lines(train_data$Month, predicted_values, col = "red", lwd = 2)
lines(test_data$Month, sarima_forecast$mean , col = "purple", lwd = 2)
lines(test_data$Month, test_data$Monthly.beer.production, col = "darkgrey", lwd = 1)
legend("topleft", legend = c("Training Data", "Fitted Values", "SARIMA Forecast", "Test Data"),
       col = c("black", "red", "purple", "darkgrey"), lty = 1, lwd = 2)

Base SARIMA model can be expressed as:

\[ (1 - B)(1 - B^{12}) Y_t = (1 - \theta_1 B - \theta_2 B^2 - \theta_3 B^3)(1 - \Theta_1 B^{12} - \Theta_2 B^{24}) \varepsilon_t \]

Where: \[ \begin{aligned} \theta_1 = -1.084,\quad \theta_2 = -0.071,\quad \theta_3 = 0.271,\quad \Theta_1 = -0.725,\quad \Theta_2 = -0.121 \end{aligned} \] Expanding the right-hand side, we get final expression:

\[ (1 - B)(1 - B^{12}) Y_t = (1 + 1.084 B + 0.071 B^2 - 0.271 B^3)(1 + 0.725 B^{12} + 0.121 B^{24}) \varepsilon_t \]

6 ETS Model

In this section, we will fit an ETS (Error, Trend, Seasonality) model to the training data. The ETS model is a state space model that can capture the trend and seasonality in the time series data.

Level represent the current value of the series, trend represents the change in the level over time, and seasonality represents the periodic fluctuations in the series.

SARIMA Model Forecasting Summary
par(mfrow = c(2, 2))

ts_train1 <- ts(train_data$Monthly.beer.production, start = c(1956, 1), frequency = 12)
ets_model <- ets(ts_train1)

components <- ets_model$states
ts.plot(components[,1], col = "blue", main = "Level")
ts.plot(components[,2], col = "red", main = "Trend")
ts.plot(components[,3], col = "darkgreen", main = "Seasonality")
plot(ets_model$residuals, type = "l", col = "black", main = "Residuals of ETS Model",
     xlab = "Time", ylab = "Residuals")

6.1 ETS Model Summary

The ETS model we fitted is ETS(M,Ad,M), which means it has a multiplicative error, additive trend, and multiplicative seasonality. The model can be expressed as:

\[ \begin{aligned} y_t &= \ell_{t-1} \cdot (1 + \phi b_{t-1}) \cdot s_{t-m} \cdot \varepsilon_t \\ \ell_t &= \ell_{t-1} \cdot (1 + \phi b_{t-1}) \cdot (1 + \alpha \cdot (\varepsilon_t - 1)) \\ b_t &= \phi b_{t-1} + \beta \cdot \ell_{t-1} \cdot (\varepsilon_t - 1) \\ s_t &= s_{t-m} \cdot (1 + \gamma \cdot (\varepsilon_t - 1)) \end{aligned} \]

\[ \begin{align*} \alpha &= 0.0554 & \text{(level smoothing)} \\ \beta &= 0.0073 & \text{(trend smoothing)} \\ \gamma &= 0.0001 & \text{(seasonal smoothing)} \\ \phi &= 0.976 & \text{(damping factor)} \\ \ell_0 &= 85.373 & \text{(initial level)} \\ b_0 &= 0.085 & \text{(initial trend)} \\ s_0 &= 1.248,\, s_1 = 1.150,\, s_2 = 1.099,\, \dots,\, s_{11} = 0.978 & \text{(seasonal indices)} \end{align*} \] \[ \begin{aligned} y_t &= \ell_{t-1} \cdot (1 + 0.976 \cdot b_{t-1}) \cdot s_{t-12} \cdot \varepsilon_t \\ \ell_t &= \ell_{t-1} \cdot (1 + 0.976 \cdot b_{t-1}) \cdot \left[1 + 0.0554 \cdot (\varepsilon_t - 1)\right] \\ b_t &= 0.976 \cdot b_{t-1} + 0.0073 \cdot \ell_{t-1} \cdot (\varepsilon_t - 1) \\ s_t &= s_{t-12} \cdot \left[1 + 0.0001 \cdot (\varepsilon_t - 1)\right] \end{aligned} \]

7 Model Comparison

Based on checkresiduals function for all models, we can conclude, that residuals of estimated models are near white noise.

Residuals SARIMA
checkresiduals(auto_model, test=FALSE)

Residuals ETS
checkresiduals(ets_model, test=FALSE)

Residuals Decompostion
checkresiduals(full_model, test=FALSE)

We reached same conclusion as before. Residuals of all tested models are near white noise.

8 Forecasting with Naive, SARIMA, ETS and Decomposition Models

We will use the fitted models to forecast the rest of the test data. We will plot the forecasts along with the actual values from the test set for comparison.

Forecasting with Naive, SARIMA, ETS and Decomposition Models
naive_model <- naive(ts_train1, h = length(test_data$Monthly.beer.production))
auto_forecast <- forecast(auto_model, h = length(test_data$Monthly.beer.production))
ets_forecast <- forecast(ets_model, h = length(test_data$Monthly.beer.production))

decomp_model <- lm(Monthly.beer.production ~ Time + I(Time^2) + MonthNum + sin_cycle + cos_cycle, data = train_data)
decomp_forecast <- predict(decomp_model, newdata = test_data)
Model Comparison Plots
par(mfrow = c(2, 2))
plot(ets_forecast, main = "ETS Forecast vs. Test Data", xlab = "Month", ylab = "Monthly Beer Production")
plot(naive_model, main = "Naive Forecast vs. Test Data", xlab = "Month", ylab = "Monthly Beer Production")
plot(auto_forecast, main = "SARIMA Forecast vs. Test Data", xlab = "Month", ylab = "Monthly Beer Production")
plot(train_data$Month, train_data$Monthly.beer.production, type = "l", col = "black", lwd = 1,
     xlim = range(train_data$Month, test_data$Month),
     xlab = "Month", ylab = "Monthly Beer Production", main = "Fitted Values from Trend + Seasonal(Dummies) + Cyclical Model")
lines(train_data$Month, fitted(decomp_model), col = "red", lwd = 2)
lines(test_data$Month, test_data$Monthly.beer.production, col = "darkgrey", lwd = 1)
lines(test_data$Month, decomp_forecast, col = "blue", lwd = 2)

As we can see on the plots, SARIMA and ETS model both capture the trend and seasonality almost the same, only difference is increasing amplitute in ETS model. Naive model return last value as forecast, which is not very useful for this data. Decomposition model captures all components very well.

9 Model Evaluation

As we can see from the results, the SARIMA and ETS models perform better than the Naive model in terms of MAE, RMSE, and MAPE. This mean that models are consistent in their predictions and capture the underlying patterns in the data effectively.

Model Evaluation
actual <- test_data$Monthly.beer.production

naive_pred <- naive_model$mean
sarima_pred <- auto_forecast$mean
ets_pred <- ets_forecast$mean
decomp_pred <- decomp_forecast

MAE <- function(true, pred) mean(abs(true - pred))
RMSE <- function(true, pred) sqrt(mean((true - pred)^2))
MAPE <- function(true, pred) mean(abs((true - pred) / true)) * 100

mae_naive <- MAE(actual, naive_pred)
rmse_naive <- RMSE(actual, naive_pred)
mape_naive <- MAPE(actual, naive_pred)

mae_sarima <- MAE(actual, sarima_pred)
rmse_sarima <- RMSE(actual, sarima_pred)
mape_sarima <- MAPE(actual, sarima_pred)

mae_ets <- MAE(actual, ets_pred)
rmse_ets <- RMSE(actual, ets_pred)
mape_ets <- MAPE(actual, ets_pred)

mae_decomp <- MAE(actual, decomp_pred)
rmse_decomp <- RMSE(actual, decomp_pred)
mape_decomp <- MAPE(actual, decomp_pred)


results <- data.frame(
  Model = c("Naive", "SARIMA(0,1,3)(0,1,1)[12]", "ETS(M,Ad,M)", "Decomposition"),
  MAE = c(mae_naive, mae_sarima, mae_ets, mae_decomp),
  RMSE = c(rmse_naive, rmse_sarima, rmse_ets, rmse_decomp),
  MAPE = c(mape_naive, mape_sarima, mape_ets, mape_decomp)
)

knitr::kable(results, caption = "Forecast Accuracy Metrics for Selected Models", digits = 2)
Forecast Accuracy Metrics for Selected Models
Model MAE RMSE MAPE
Naive 18.99 26.26 11.16
SARIMA(0,1,3)(0,1,1)[12] 10.13 12.87 6.61
ETS(M,Ad,M) 11.84 14.23 7.85
Decomposition 11.84 15.22 7.28