I will use the aus_retail dataset from the fpp3 package to forecast
monthly department store turnover in New South Wales. After preparing
the data, four basic forecasting models Mean, Naive, Seasonal Naive, and
Drift are applied and evaluated using 2018 as the test year. The goal is
to compare their forecast accuracy and determine which simple method
best captures the seasonal patterns in this retail time series.
1.Load All Packages
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.0 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.4.1
## ✔ ggplot2 4.0.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.1 ✔ readr 2.1.5
## ✔ purrr 1.2.0 ✔ stringr 1.6.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(tsibble)
library(fable)
library(fabletools)
library(feasts)
library(ggplot2)
2.Clean and Wrangle data
retail_clean <- aus_retail %>%
filter(Industry == "Department stores",State == "New South Wales") %>%
select(Month, State, Turnover) %>%
arrange(Month)
retail_clean
## # A tsibble: 441 x 3 [1M]
## # Key: State [1]
## Month State Turnover
## <mth> <chr> <dbl>
## 1 1982 Apr New South Wales 178.
## 2 1982 May New South Wales 203.
## 3 1982 Jun New South Wales 176.
## 4 1982 Jul New South Wales 173.
## 5 1982 Aug New South Wales 170.
## 6 1982 Sep New South Wales 181.
## 7 1982 Oct New South Wales 174.
## 8 1982 Nov New South Wales 207.
## 9 1982 Dec New South Wales 347.
## 10 1983 Jan New South Wales 135.
## # ℹ 431 more rows
3.Convert to a Tsibble
retail_ts <- retail_clean %>%
as_tsibble(index = Month)
retail_ts
## # A tsibble: 441 x 3 [1M]
## # Key: State [1]
## Month State Turnover
## <mth> <chr> <dbl>
## 1 1982 Apr New South Wales 178.
## 2 1982 May New South Wales 203.
## 3 1982 Jun New South Wales 176.
## 4 1982 Jul New South Wales 173.
## 5 1982 Aug New South Wales 170.
## 6 1982 Sep New South Wales 181.
## 7 1982 Oct New South Wales 174.
## 8 1982 Nov New South Wales 207.
## 9 1982 Dec New South Wales 347.
## 10 1983 Jan New South Wales 135.
## # ℹ 431 more rows
4.Train/Test Split + Fit Four Models
retail_train <- retail_ts %>%
filter(year(Month) <= 2017)
retail_test <- retail_ts %>%
filter(year(Month) == 2018)
fit_basic <- retail_train %>%
model(
Mean = MEAN(Turnover),
Naive = NAIVE(Turnover),
SNaive = SNAIVE(Turnover),
Drift = RW(Turnover ~ drift())
)
fit_basic
## # A mable: 1 x 5
## # Key: State [1]
## State Mean Naive SNaive Drift
## <chr> <model> <model> <model> <model>
## 1 New South Wales <MEAN> <NAIVE> <SNAIVE> <RW w/ drift>
5.Forecast 12 months ahead
fc_basic <- fit_basic %>%
forecast(h = "12 months", level = c(80, 95))
# One combined plot
autoplot(fc_basic, retail_ts) +
labs(
title = "NSW Department Stores – Basic Forecasts",
x = "Month",
y = "Turnover (Million AUD)"
)

autoplot(fc_basic, retail_ts) +
facet_grid(vars(.model), scales = "free_y") +
labs(
title = "NSW Department Stores – Basic Forecasts by Model",
x = "Month",
y = "Turnover (Million AUD)"
)

6.Accuracy table
accuracy_table <- fc_basic %>%
accuracy(retail_test) %>%
select(.model, RMSE, MAE, MAPE)
accuracy_table
## # A tibble: 4 × 4
## .model RMSE MAE MAPE
## <chr> <dbl> <dbl> <dbl>
## 1 Drift 459. 440. 96.2
## 2 Mean 183. 129. 22.1
## 3 Naive 449. 429. 93.9
## 4 SNaive 18.3 13.2 2.65
7.Reflection
1) Seasonal Naive (SNaive) model emerged as the top performer across
RMSE, MAE, and MAPE for the year-2018 hold-out. This makes intuitive
sense because the monthly turnover series for department stores in NSW
exhibits clear, recurring seasonal patterns. The SNaive method uses the
turnover from the same month in the previous year as its forecast so
when seasonality is the dominant feature it offers a strong
baseline.
2) The SNaive model’s projected monthly values align closely with
the historic seasonal peaks and trough, the largest sales months and the
quieter months are mirrored. In contrast, the mean model produced a flat
line that fails to capture any seasonal variation; the naive and drift
models either overshoot or undershoot the seasonal swings because they
rely on the most recent values or a steady trend respectively. Since the
dataset’s behaviour is more about a repeating annual cycle than a simple
straight line, SNaive naturally out performed the others.
3) Even though SNaive performed the best among these simple models,
it still has limitations. It assumes that each season behaves exactly
the same from year to year, which may not hold if consumer behavior,
promotions, or economic conditions change. I also noticed that the
prediction intervals widen over time, suggesting some uncertainty that
SNaive doesn’t account for. For future analysis, I would explore more
advanced models like ETS or ARIMA, which can handle both trend and
seasonality, or even a dynamic regression model that includes external
factors such as holidays or economic indicators. These methods would
likely give me more flexible and realistic forecasts.