Project_Part

I will use the aus_retail dataset from the fpp3 package to forecast monthly department store turnover in New South Wales. After preparing the data, four basic forecasting models Mean, Naive, Seasonal Naive, and Drift are applied and evaluated using 2018 as the test year. The goal is to compare their forecast accuracy and determine which simple method best captures the seasonal patterns in this retail time series.

1.Load All Packages

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.3.0     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     4.0.0

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.1     ✔ readr   2.1.5
## ✔ purrr   1.2.0     ✔ stringr 1.6.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lubridate)
library(tsibble)
library(fable)
library(fabletools)
library(feasts)
library(ggplot2)

2.Clean and Wrangle data

retail_clean <- aus_retail %>%
  filter(Industry == "Department stores",State == "New South Wales") %>%           
  select(Month, State, Turnover) %>%       
  arrange(Month) 
  
retail_clean

## # A tsibble: 441 x 3 [1M]
## # Key:       State [1]
##       Month State           Turnover
##       <mth> <chr>              <dbl>
##  1 1982 Apr New South Wales     178.
##  2 1982 May New South Wales     203.
##  3 1982 Jun New South Wales     176.
##  4 1982 Jul New South Wales     173.
##  5 1982 Aug New South Wales     170.
##  6 1982 Sep New South Wales     181.
##  7 1982 Oct New South Wales     174.
##  8 1982 Nov New South Wales     207.
##  9 1982 Dec New South Wales     347.
## 10 1983 Jan New South Wales     135.
## # ℹ 431 more rows

3.Convert to a Tsibble

retail_ts <- retail_clean %>%
  as_tsibble(index = Month)

retail_ts

## # A tsibble: 441 x 3 [1M]
## # Key:       State [1]
##       Month State           Turnover
##       <mth> <chr>              <dbl>
##  1 1982 Apr New South Wales     178.
##  2 1982 May New South Wales     203.
##  3 1982 Jun New South Wales     176.
##  4 1982 Jul New South Wales     173.
##  5 1982 Aug New South Wales     170.
##  6 1982 Sep New South Wales     181.
##  7 1982 Oct New South Wales     174.
##  8 1982 Nov New South Wales     207.
##  9 1982 Dec New South Wales     347.
## 10 1983 Jan New South Wales     135.
## # ℹ 431 more rows

4.Train/Test Split + Fit Four Models

retail_train <- retail_ts %>%
  filter(year(Month) <= 2017)

retail_test <- retail_ts %>%
  filter(year(Month) == 2018)

fit_basic <- retail_train %>%
  model(
    Mean   = MEAN(Turnover),
    Naive  = NAIVE(Turnover),
    SNaive = SNAIVE(Turnover),
    Drift  = RW(Turnover ~ drift())
  )

fit_basic

## # A mable: 1 x 5
## # Key:     State [1]
##   State              Mean   Naive   SNaive         Drift
##   <chr>           <model> <model>  <model>       <model>
## 1 New South Wales  <MEAN> <NAIVE> <SNAIVE> <RW w/ drift>

5.Forecast 12 months ahead

fc_basic <- fit_basic %>%
  forecast(h = "12 months", level = c(80, 95))

# One combined plot
autoplot(fc_basic, retail_ts) +
  labs(
    title = "NSW Department Stores – Basic Forecasts",
    x = "Month",
    y = "Turnover (Million AUD)"
  )

autoplot(fc_basic, retail_ts) +
  facet_grid(vars(.model), scales = "free_y") +
  labs(
    title = "NSW Department Stores – Basic Forecasts by Model",
    x = "Month",
    y = "Turnover (Million AUD)"
  )

6.Accuracy table

accuracy_table <- fc_basic %>%
  accuracy(retail_test) %>%
  select(.model, RMSE, MAE, MAPE)

accuracy_table

## # A tibble: 4 × 4
##   .model  RMSE   MAE  MAPE
##   <chr>  <dbl> <dbl> <dbl>
## 1 Drift  459.  440.  96.2 
## 2 Mean   183.  129.  22.1 
## 3 Naive  449.  429.  93.9 
## 4 SNaive  18.3  13.2  2.65

7.Reflection

1) Seasonal Naive (SNaive) model emerged as the top performer across RMSE, MAE, and MAPE for the year-2018 hold-out. This makes intuitive sense because the monthly turnover series for department stores in NSW exhibits clear, recurring seasonal patterns. The SNaive method uses the turnover from the same month in the previous year as its forecast so when seasonality is the dominant feature it offers a strong baseline.

2) The SNaive model’s projected monthly values align closely with the historic seasonal peaks and trough, the largest sales months and the quieter months are mirrored. In contrast, the mean model produced a flat line that fails to capture any seasonal variation; the naive and drift models either overshoot or undershoot the seasonal swings because they rely on the most recent values or a steady trend respectively. Since the dataset’s behaviour is more about a repeating annual cycle than a simple straight line, SNaive naturally out performed the others.

3) Even though SNaive performed the best among these simple models, it still has limitations. It assumes that each season behaves exactly the same from year to year, which may not hold if consumer behavior, promotions, or economic conditions change. I also noticed that the prediction intervals widen over time, suggesting some uncertainty that SNaive doesn’t account for. For future analysis, I would explore more advanced models like ETS or ARIMA, which can handle both trend and seasonality, or even a dynamic regression model that includes external factors such as holidays or economic indicators. These methods would likely give me more flexible and realistic forecasts.

Project_Part_1

Fairouz Maaayah

2025-11-10

1.Load All Packages

2.Clean and Wrangle data

3.Convert to a Tsibble

4.Train/Test Split + Fit Four Models

5.Forecast 12 months ahead

6.Accuracy table

7.Reflection