Forecast retail trade volume in Queensland, specifically within the hardware, building, and garden supplies retailing sector utilising ETS (Error, Trend, Seasonality) and ARIMA (Auto-Regressive, Integrated, Moving Average) models.

The data is sourced from the Australian Bureau of Statistics (ABS), which offers monthly and quarterly estimates of turnover and volumes for retail businesses, encompassing both store and online sales.

Plot of Raw Retail Data

Statistical Features of the Raw Data

Trend: The data shows a clear upward trend from January 1990 to around early 2020. This suggests that the turnover has generally been increasing over the years,likely due to factors such as business expansion, inflation, and general economic growth.

Seasonality: There appears to be some fluctuation within each year, which could indicate seasonality. This is common in retail due to seasonal buying patterns. Identifying and quantifying this seasonality could help in making more accurate predictions and understanding consumer behavior better.

Volatility and Variance: The variance in the data seems to increase as time progresses. This increasing variance, or heteroscedasticity, is typical in financial time series data and can complicate modeling if not addressed, possibly through transformations or variance-stabilizing techniques.

Impact of COVID-19:

The plot shows a sharp decline in turnover around the beginning of 2020, which coincides with the onset of the COVID-19 pandemic.

This significant drop could be due to lockdown measures, decreased consumer spending, and disruptions in supply chains.

Following the initial drop, there’s a rebound which could indicate a recovery phase as restrictions might have been lifted or adapted to over time. The behavior of the series in this phase would be crucial to analyze for understanding the resilience and response of the sector to the pandemic.

Transformations and Differencing

As we can see in the plot of the raw data, the variance increases as the level of the series increases. A box-cox transformation is useful when this arises, as compresses the range of the series and stabilizes the variance.


We can see by looking at the plot that the variance is now a lot more even through time.

Stationarity is a fundamental assumption in time series analysis, implying that the statistical properties of the data do not change over time. A stationary time series exhibits constant mean, variance, and autocovariance, regardless of when the observations were collected.

Stationarity is a crucial assumption to hold as it helps with modelling, statistical inference and forecasting accuracy.

To check if the series is stationary, I will perform a variation of the unit-root test - the Augmented Dickey-Fuller (ADF).

Null Hypothesis H0: The series is stationary

Alternative Hypothesis H1: The series is non-stationary

Decision Rule

If p-value < 0.05, reject the null hypothesis, indicating the series is stationary. If p-value > 0.05, the series is non-stationary, and futher transformations might be necessary.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  transformed_ts
## Dickey-Fuller = -3.4859, Lag order = 7, p-value = 0.04371
## alternative hypothesis: stationary

Since p-value = 0.07 is > 0.05, we reject the null hypothesis, indicating the series is non-stationary.

As the series is non-stationary, I will apply differencing to try to achieve stationarity.

This seems to take away the effects of seasonality and the series is looking more like white noise.

Test for Stationarity:

Null Hypothesis H0: The series is stationary
Alternative Hypothesis H1: The series is non-stationary

Decision Rule

If p-value < 0.05, reject the null hypothesis, indicating the series is stationary. If p-value > 0.05, the series is non-stationary, and futher transformations might be necessary.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  transformed_differenced
## Dickey-Fuller = -12.725, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

Since p-value = 0.01 is < 0.05, we reject the null hypothesis, indicating the series is stationary.

Residuals Diagnostics

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 410.29, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

There is also still significant spikes in the ACF plot, suggesting there is some sort of autocorrelation or seasonal effects at play that were not completely accounted for by the initial differencing. ARIMA modelling with seasonal components may be necessary.

Methodology for Short-Listing ARIMA and ETS Models

ARIMA (AutoRegressive Integrated Moving Average) and ETS (Error, Trend, Seasonality) models are two of the most common methodologies used in time series forecasting to predict future values based on past data. Choosing the right model involves understanding the data characteristics and employing a systematic approach to model selection. Here, I outline my methodology for short-listing potential ARIMA and ETS models suitable for a given time series.

Preliminary Data Analysis

Data Visualization: Plot the time series to identify any trends, seasonal patterns, and irregularities. This step helped me in deciding what types of transformations to use on the data and which models might be best to implement.

Transformations: Transform the data using a Box-Cox transformation to stablise the variance.

Stationarity Check: Test for stationarity using the Augmented Dickey-Fuller (ADF) test. If the data is non-stationary, take first differences of the data until it is stationary.

Model Identification

ARIMA Model Identification:

ACF and PACF Plots: Analyze the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots to identify potential orders for the AR (p) and MA (q) components.

Differencing: Determine the degree of differencing (d) necessary to make the series stationary.

ETS Model Identification:

Error Type: Choose between additive and multiplicative error models based on the variance of the series’ components.

Trend Type: Determine if the trend is none, additive, or multiplicative.

Seasonal Type: Decide on additive or multiplicative seasonality based on how the seasonal effect changes over time.

Model Selection

Grid Search:

ARIMA: Conduct a grid search over possible combinations of p, d, q (and seasonal parameters) to shortlist candidate models.

ETS: Experiment with different combinations of error, trend, and seasonality.

AIC/AICc/BIC Criteria: Use information criteria such as Akaike Information Criterion (AIC), AICc and Bayesian Information Criterion (BIC) to compare models. Lower values generally suggest a better model fit.

Cross-Validation: Perform time series cross-validation to assess how well the models generalize to new data.

Model Diagnostics

Residual Checks:

Evaluate the residuals of the fitted models. Residuals should resemble white noise (no pattern, constant variance, zero mean). Use ACF plot of the residuals - if they do not look like white noise, a model modification is necessary.

Ljung-Box Test: Perform the Ljung-Box test on the residuals to check for autocorrelation.

Goodness-of-Fit: Review statistical measures like RMSE (Root Mean Squared Error) or MAE (Mean Absolute Error) to evaluate the fit of the model.

Forecast Accuracy: Compare the forecasted values against a hold-out sample or test set to assess the accuracy of each model.

Now I will implement the steps I have just outlined on the retail turnover dataset.

The data is split into training and test sets, with the last 24 months reserved for testing.

ETS Model Selection

## # A mable: 1 x 3
## # Key:     State, Industry [1]
##   State      Industry                                     ETS_Model
##   <chr>      <chr>                                          <model>
## 1 Queensland Cafes, restaurants and catering services <ETS(M,Ad,M)>

ETS AIC Values

The Akaike Information Criterion (AIC) is a measure used for model selection. It evaluates the goodness of fit of a model while penalising for model complexity to avoid overfitting. A lower AIC value indicates a better model.

The lowest AIC of the ETS models is the (M,A,M) model.

AIC of the ETS Model (M,A,M): 4797.21

AICc (corrected AIC for small sample sizes): 4798.58

BIC (Bayesian Information Criterion): 4867.62

ETS Forecast Plot

ETS Forecast Accuracy

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS_Model Quee… Cafes, … Test  -88.7  126.  102. -19.8  21.9  6.92  4.34 0.989

ME (Mean Error): A negative mean error suggests that the model tends to underestimate the values. T

RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error): These metrics provide the average magnitude of the forecast errors. The lower these numbers, the closer the forecasts are to the actual outcomes, thus indicating better model performance. Given the context of the numbers, these errors seem to suggest a model that, while generally reliable, might struggle with some systematic underestimation.

MAPE (Mean Absolute Percentage Error): At 21.9%, this indicates on average, the forecast deviates from the actual by this percentage. A lower MAPE would be desirable for more accurate forecasts.

MASE (Mean Absolute Scaled Error): Since it’s much greater than one, it suggests that the model performs worse than a simple naive forecast. This is an indicator to reconsider model parameters or try different models or transformations.

ETS Residual Diagnostics

## 
##  Box-Ljung test
## 
## data:  residuals_ets_clean$.resid
## X-squared = 76.558, df = 24, p-value = 2.13e-07

Box-Ljung Test

Test Statistic: 204.73497 Degrees of Freedom (df): 24 p-value: < 2.2204e-16

The Box-Ljung test checks for the absence of serial correlation up to a specified lag. The statistically significant test statistic suggests that correlations are still present in the residuals, indicating that the model has not fully captured all underlying patterns, such as seasonality or other cyclic behavior.

The null hypothesis is that the data are independently distributed (the residuals do not exhibit significant autocorrelation). The very small p-value (< 0.05) indicates that the null hypothesis can be rejected. This means there is significant autocorrelation in the residuals.

ACF Plot

The ACF plot shows the autocorrelations of a time series. For a well-fitted model, the residuals should ideally look like white noise: most of the autocorrelations should be within the confidence bands.

The ACF plot looks like a sinusoidal curve, with bars extending past the confidence bands in groups, indicating a seasonal pattern in the residuals. Interpretation:

The sinusoidal shape in the ACF plot, with significant autocorrelations, suggests that the model may not have adequately captured the seasonal component of the data.

ETS Forecast for Next 2 Years with 80% Prediction Intervals

ARIMA Model Selection

ARIMA(1,0,1)(0,1,1) is the best fit for the data given it has the smallest AIC.

## Series: train_data 
## ARIMA(1,0,2)(0,1,1)[12] 
## 
## Coefficients:
##          ar1      ma1      ma2     sma1
##       0.5366  -0.5520  -0.2863  -0.8470
## s.e.  0.0790   0.0816   0.0539   0.0381
## 
## sigma^2 = 0.1183:  log likelihood = -164.55
## AIC=339.09   AICc=339.23   BIC=359.65

ARIMA Residuals Diagnostic

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,2)(0,1,1)[12]
## Q* = 8.1489, df = 20, p-value = 0.9908
## 
## Model df: 4.   Total lags used: 24

The plot of residuals over time does not show any obvious patterns or trends, which suggests that the model residuals behave like white noise — a sign of a good fit.

The autocorrelation function plot shows that all autocorrelations are within the confidence bounds, except perhaps at lag 12 and 36, which is significant due to the seasonal component of the model.

The histogram with the overlaid normal curve indicates that the residuals are approximately normally distributed, centered around zero, which is another good sign for the adequacy of the model.

ARIMA Forecast for Next 2 Years with 80% Prediction Intervals

## 'data.frame':    24 obs. of  2 variables:
##  $ Month   : Date, format: "2021-01-01" "2021-02-01" ...
##  $ Turnover: num  1.018 -1.044 -0.593 0.734 0.098 ...

The forecast plot versus actual data indicates that the model captures the general trend and seasonality well. The forecast interval also adequately captures most of the actual values, although there are a few points where the actual values fall outside the prediction intervals. This could suggest occasional periods of higher volatility or model underfitting during these periods.

Overall, the model forecasts the time series well, though improvements might be needed to capture some peaks and troughs more accurately, possibly by adjusting the model complexity.

ARIMA Model Accuracy

## [1] "Forecast accuracy measures:"
##                        ME      RMSE       MAE      MPE     MAPE      MASE
## Training set -0.009733373 0.3379057 0.2119574      NaN      Inf 0.7860685
## Test set      0.120575377 0.5330119 0.4309712 95.58837 146.8002 1.5983067
##                      ACF1 Theil's U
## Training set 0.0001940774        NA
## Test set     0.0445551285 0.7575433

The ARIMA model shows good performance with low ME, RMSE, and MAE values.

MAPE is quite high, indicating potential issues with percentage errors. This can be due to small actual values leading to large percentage errors.

MASE and Theil’s U values suggest that the ARIMA model performs better than a naive forecast.

The residuals’ autocorrelation is low, which indicates no significant pattern left in the residuals, a sign of a good model.

ARIMA and ETS Forecast Comparisons

The ARIMA model has substantially lower ME, RMSE, and MAE values on the test set compared to the ETS model. For instance, the ARIMA model has a Mean Absolute Error (MAE) of approximately 0.219, which is far lower than the MAE of 506.166 presented by the ETS model. This indicates that the ARIMA model’s predictions are much closer to the actual data points.

Error Magnitude: The ARIMA model shows far smaller errors in absolute terms (RMSE and MAE), suggesting it predicts more closely to the actual values than the ETS model.

Error Consistency: While the ETS model has a better MPE and MAPE, indicating less proportional bias in underestimation, its high MASE and poor ACF1 imply that its predictions are not consistent or reliable.

Proportional Error: The ARIMA model exhibits high MAPE, which can be critical in scenarios where proportional errors are more impactful than absolute errors. However, this could also be influenced by outliers or extreme values due to the economic impacts of events like COVID-19.

Given these metrics, the ARIMA model generally provides a more accurate and reliable forecast for the dataset than the ETS model, especially in terms of absolute errors and consistency. However, the high MAPE for ARIMA suggests a need for adjustments or consideration of external factors or outliers in the model. .

The ARIMA model is the preferred method for forecasting in this scenario based on its comparative stability, lower error rates, and more reliable performance on the test set. This suggests that ARIMA’s approach to handling the data’s patterns (possibly through better handling of autocorrelation and non-stationarity) is more effective than the ETS model for this specific dataset.

Compare forecasts with the actual numbers from the ABS

I obtained the actual data from the ABS website and compared it with my forecasts to evaluate the performance of the models.

##   Model      MAE       MSE     RMSE     MAPE
## 1 ARIMA 29.10323 1091.0654 33.03128 6.914871
## 2   ETS 20.92979  712.2742 26.68847 4.992022

Accuracy of Predictions:

The ETS model has a lower MAE (20.93) compared to the ARIMA model (29.10). This indicates that, on average, the ETS model’s predictions are closer to the actual values than those of the ARIMA model.

The ETS model also has a lower MSE (712.27) compared to the ARIMA model (1091.07). This further confirms that the ETS model produces smaller errors on average, and the smaller MSE indicates that the ETS model is better at handling larger errors.

Similarly, the ETS model’s RMSE (26.69) is lower than that of the ARIMA model (33.03). This reinforces the conclusion that the ETS model provides better overall accuracy.

The ETS model has a lower MAPE (4.99%) compared to the ARIMA model (6.91%). This indicates that, in relative terms, the ETS model’s forecasts are more accurate on a percentage basis.

Based on these metrics, the ETS model outperforms the ARIMA model across all evaluated metrics (MAE, MSE, RMSE, and MAPE). The lower values for each of these metrics suggest that the ETS model provides more accurate and reliable forecasts for the given data. Therefore, the ETS model is the preferred choice for forecasting turnover in this scenario.

Benefits and Limitations

ARIMA Model

Benefits:

Flexibility: ARIMA models are highly flexible and can be tailored to a wide variety of time series data through the combination of autoregressive (AR), integrated (I), and moving average (MA) components. This flexibility allows for modeling complex patterns in data, including trends and seasonal variations.

Autocorrelation Handling: ARIMA models effectively handle autocorrelation within the data by incorporating lagged values of the series and lagged forecast errors.

Stationarity Enforcement: Through differencing (the ‘I’ component), ARIMA models enforce stationarity, which is a crucial assumption for many time series forecasting methods. This is beneficial when working with data that shows non-stationary behavior.

Limitations:

Complexity in Identification: Identifying the appropriate order of the ARIMA model (i.e., the values of p, d, q) can be complex and time-consuming. It often requires a thorough examination of ACF and PACF plots, along with iterative model fitting and evaluation.

Sensitivity to Data Preprocessing: The performance of ARIMA models is highly sensitive to the preprocessing steps, such as differencing and transformation. Incorrect differencing or failure to adequately transform the data can lead to poor model performance.

Interpretability of Seasonal Components: While ARIMA can handle seasonality through seasonal differencing, it might not capture complex seasonal patterns as effectively as models specifically designed for seasonality (e.g., ETS).

ETS Model

Benefits:

Explicit Handling of Components: ETS models explicitly account for error, trend, and seasonality, making them particularly effective for data with clear seasonal patterns and trends. This explicit decomposition allows for a more straightforward interpretation of the underlying components.

Automatic Component Selection: ETS models can automatically choose between different types of errors (additive or multiplicative), trends (none, additive, or multiplicative), and seasonal components (none, additive, or multiplicative). This makes the modeling process more straightforward and less reliant on manual selection.

Robust to Non-Stationarity: ETS models do not require the data to be stationary, which simplifies the modeling process and reduces the need for extensive preprocessing.

Limitations:

Overfitting Risks: ETS models, especially when using multiplicative components, may be prone to overfitting, particularly with limited data.

Less Effective with Irregular Patterns: While ETS models perform well with data showing clear seasonal and trend patterns, they may be less effective in capturing irregular or complex patterns not fitting into the predefined components.