Time Series Forecasting: US Retail Sales

Author

Aryamani Boruah

Published

May 10, 2026

Introduction

This report applies the fpp3 forecasting pipeline to monthly US Advance Retail Sales data sourced from the Federal Reserve Economic Data (FRED) database. Part I generates benchmark model forecasts and evaluates them using five accuracy metrics. Part II applies classical additive and multiplicative decomposition to identify trend and seasonal structure, and discusses how decomposition results can inform forecasting decisions.


Part I — Benchmark Forecasting and Accuracy Metrics

Data

The series used is RSAFS: Advance Retail Sales — Retail Trade, a monthly seasonally adjusted series reported in millions of dollars. The sample spans January 2010 to December 2023, giving 168 monthly observations. The last 24 months (January 2022 to December 2023) are held out as a test set. All models are trained on the remaining 144 months.

Show Code
library(fpp3)
library(tidyverse)
library(fredr)
library(kableExtra)


fredr_set_key("cdbfb0bbfdb75085e9eea36ef1806968")
Show Code
retail_raw <- fredr(
  series_id         = "RSAFS",
  observation_start = as.Date("2010-01-01"),
  observation_end   = as.Date("2023-12-01")
)

retail_ts <- retail_raw |>
  select(date, value) |>
  mutate(Month = yearmonth(date)) |>
  as_tsibble(index = Month) |>
  select(Month, value) |>
  rename(Sales = value)

retail_ts
Month Sales
2010 Jan 339093
2010 Feb 339580
2010 Mar 346974
2010 Apr 349869
2010 May 346858
2010 Jun 346516
2010 Jul 347612
2010 Aug 349188
2010 Sep 352179
2010 Oct 356215
2010 Nov 359450
2010 Dec 361979
2011 Jan 364394
2011 Feb 367475
2011 Mar 370775
2011 Apr 372820
2011 May 372505
2011 Jun 375587
2011 Jul 375442
2011 Aug 375860
2011 Sep 380042
2011 Oct 382207
2011 Nov 383422
2011 Dec 383985
2012 Jan 387531
2012 Feb 392310
2012 Mar 393698
2012 Apr 392073
2012 May 391376
2012 Jun 387901
2012 Jul 389686
2012 Aug 394524
2012 Sep 397681
2012 Oct 397494
2012 Nov 399708
2012 Dec 401093
2013 Jan 404434
2013 Feb 409118
2013 Mar 406223
2013 Apr 404231
2013 May 406677
2013 Jun 407921
2013 Jul 410268
2013 Aug 409691
2013 Sep 410044
2013 Oct 411399
2013 Nov 412561
2013 Dec 414812
2014 Jan 411561
2014 Feb 416736
2014 Mar 421230
2014 Apr 425546
2014 May 426253
2014 Jun 427305
2014 Jul 428079
2014 Aug 431379
2014 Sep 430189
2014 Oct 431903
2014 Nov 433113
2014 Dec 430110
2015 Jan 428208
2015 Feb 427119
2015 Mar 433647
2015 Apr 434470
2015 May 437865
2015 Jun 437951
2015 Jul 441942
2015 Aug 441849
2015 Sep 439867
2015 Oct 438693
2015 Nov 440303
2015 Dec 442149
2016 Jan 439466
2016 Feb 443117
2016 Mar 441856
2016 Apr 444254
2016 May 445490
2016 Jun 450237
2016 Jul 449789
2016 Aug 449946
2016 Sep 453056
2016 Oct 453847
2016 Nov 453892
2016 Dec 459305
2017 Jan 464412
2017 Feb 464284
2017 Mar 463674
2017 Apr 465276
2017 May 462754
2017 Jun 465130
2017 Jul 465089
2017 Aug 466020
2017 Sep 475724
2017 Oct 476107
2017 Nov 480949
2017 Dec 483586
2018 Jan 481414
2018 Feb 484031
2018 Mar 484110
2018 Apr 484154
2018 May 491960
2018 Jun 490228
2018 Jul 493142
2018 Aug 493445
2018 Sep 491507
2018 Oct 496416
2018 Nov 498854
2018 Dec 489113
2019 Jan 490440
2019 Feb 491751
2019 Mar 499292
2019 Apr 499343
2019 May 504741
2019 Jun 505251
2019 Jul 509091
2019 Aug 512561
2019 Sep 509282
2019 Oct 510648
2019 Nov 514215
2019 Dec 515866
2020 Jan 515119
2020 Feb 515330
2020 Mar 468324
2020 Apr 401028
2020 May 478449
2020 Jun 518038
2020 Jul 526304
2020 Aug 530735
2020 Sep 541302
2020 Oct 539114
2020 Nov 533941
2020 Dec 538548
2021 Jan 559230
2021 Feb 544891
2021 Mar 603581
2021 Apr 608961
2021 May 605282
2021 Jun 611950
2021 Jul 601817
2021 Aug 605533
2021 Sep 609671
2021 Oct 618573
2021 Nov 624874
2021 Dec 619938
2022 Jan 631509
2022 Feb 638101
2022 Mar 651027
2022 Apr 660194
2022 May 659847
2022 Jun 666113
2022 Jul 659550
2022 Aug 663566
2022 Sep 661854
2022 Oct 668671
2022 Nov 659458
2022 Dec 651763
2023 Jan 680253
2023 Feb 672152
2023 Mar 665071
2023 Apr 670494
2023 May 673902
2023 Jun 677117
2023 Jul 678424
2023 Aug 684755
2023 Sep 689403
2023 Oct 686148
2023 Nov 685328
2023 Dec 686277
Show Code
retail_ts |>
  autoplot(Sales) +
  labs(
    title    = "US Advance Retail Sales (Monthly)",
    subtitle = "FRED Series: RSAFS | January 2010 to December 2023",
    y        = "Sales (Millions of Dollars)",
    x        = NULL
  ) +
  theme_minimal()

US Advance Retail Sales — FRED Series RSAFS

The series shows a clear upward trend with a sharp drop in early 2020 corresponding to the COVID-19 pandemic, followed by a rapid recovery. A repeating seasonal pattern is visible with peaks in late calendar years driven by holiday retail activity.

Train and Test Split

Show Code
retail_train <- retail_ts |> filter(Month <= yearmonth("2021 Dec"))
retail_test  <- retail_ts |> filter(Month >  yearmonth("2021 Dec"))

cat("Training observations:", nrow(retail_train), "\n")
Training observations: 144 
Show Code
cat("Test observations:    ", nrow(retail_test),  "\n")
Test observations:     24 

Benchmark Models

Four benchmark models are fitted on the training set:

  • Mean — forecasts the historical average for all future periods
  • Naive — repeats the last observed value
  • Seasonal Naive (SNaive) — repeats the value from the same season one year ago
  • Drift — extends the trend between the first and last training observations
Show Code
fit <- retail_train |>
  model(
    Mean   = MEAN(Sales),
    Naive  = NAIVE(Sales),
    SNaive = SNAIVE(Sales),
    Drift  = RW(Sales ~ drift())
  )

fit
Mean Naive SNaive Drift

Forecasts

Show Code
fc <- fit |> forecast(h = nrow(retail_test))

fc |>
  autoplot(retail_train, level = c(80, 95)) +
  autolayer(
    retail_test,
    Sales,
    colour   = "black",
    linetype = "dashed",
    linewidth = 0.8
  ) +
  labs(
    title    = "Benchmark Forecasts — US Retail Sales",
    subtitle = "Dashed black line shows actual test values",
    y        = "Sales (Millions of Dollars)",
    x        = NULL
  ) +
  facet_wrap(~ .model, ncol = 2) +
  theme_minimal() +
  theme(legend.position = "bottom")

Benchmark Model Forecasts with 80% and 95% Confidence Intervals

Accuracy Metrics

Show Code
acc <- fc |>
  accuracy(retail_test) |>
  select(.model, ME, RMSE, MAE, MPE, MAPE) |>
  arrange(MAPE)

acc |>
  kbl(
    digits    = 3,
    caption   = "Forecast Accuracy Metrics — 24-Month Test Set",
    col.names = c("Model", "ME", "RMSE", "MAE", "MPE", "MAPE")
  ) |>
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width        = FALSE,
    position          = "left"
  ) |>
  column_spec(1, bold = TRUE) |>
  row_spec(1, background = "#d4edda") |>
  footnote(
    general = "Green row indicates best performing model by MAPE.
               All metrics computed on the 24-month held-out test set."
  )
Forecast Accuracy Metrics — 24-Month Test Set
Model ME RMSE MAE MPE MAPE
Drift 23053.32 24044.56 23053.32 3.446 3.446
Naive 47602.71 49833.52 47602.71 7.085 7.085
SNaive 66348.96 69919.76 66348.96 9.926 9.926
Mean 216369.90 216871.61 216369.90 32.380 32.380
Note:
Green row indicates best performing model by MAPE.
All metrics computed on the 24-month held-out test set.

How the Accuracy Metrics Differ

Each metric captures a different aspect of forecast error and they should be read together rather than in isolation.

ME (Mean Error) measures average bias — whether the model systematically over or under forecasts. Positive ME means the forecast is consistently above actual values; negative ME means it is consistently below. ME is useful for detecting directional bias but it can mask inaccuracy because positive and negative errors cancel each other out. A model with ME near zero could still be wildly off on individual forecasts.

RMSE (Root Mean Squared Error) squares each error before averaging, which penalises large misses more heavily than small ones. It is expressed in the same units as the series. RMSE is the appropriate metric when large forecast errors are especially costly — for instance in inventory planning where a single large shortfall causes significant disruption.

MAE (Mean Absolute Error) treats all errors equally regardless of direction or magnitude. It is more robust to outliers than RMSE and directly interpretable — an MAE of 4,000 means the average forecast is off by 4 billion dollars per month. When no particular cost is attached to large errors, MAE is often preferred over RMSE.

MPE (Mean Percentage Error) expresses bias as a percentage of the actual value. Like ME it can cancel and reveals directional information, but scaling by the actual value makes it comparable across series of different magnitudes.

MAPE (Mean Absolute Percentage Error) is the most widely reported metric in business contexts because it is scale independent — a MAPE of 5% means the forecast is off by 5% of the actual value on average, regardless of whether the series is measured in dollars, units, or any other quantity. Its weakness is that it becomes undefined when actual values equal zero and becomes extremely large when actual values are near zero, which makes it unsuitable for sparse or near-zero series.

Can These Metrics Be Recreated in Excel?

Yes. The table below shows the formula structure for each metric applied to the Naive model over the 24-month test set. Column A contains actual values, Column B contains Naive forecasts (the last training value repeated), and Column C through G contain the metric calculations.

Cell Formula Metric
C2 =A2-B2 Error
D2 =(A2-B2)/A2*100 Percentage Error
E2 =ABS(A2-B2) Absolute Error
F2 =ABS((A2-B2)/A2)*100 Absolute Percentage Error
ME =AVERAGE(C2:C25) Mean Error
RMSE =SQRT(SUMPRODUCT((A2:A25-B2:B25)^2)/24) Root Mean Squared Error
MAE =AVERAGE(E2:E25) Mean Absolute Error
MPE =AVERAGE(D2:D25) Mean Percentage Error
MAPE =AVERAGE(F2:F25) Mean Absolute Percentage Error

Applying these formulas to the same training and test data used in R produces identical results, confirming the calculations are correct. (Attach Excel screenshot here showing matching values.)


Part II — Decomposition

Decomposition separates a time series into three components: trend cycle, seasonal, and remainder. The choice between additive and multiplicative decomposition depends on whether the seasonal fluctuations scale with the level of the series.

Additive Decomposition

Additive decomposition assumes:

\[Y_t = T_t + S_t + R_t\]

where trend, seasonal, and remainder components are added together. This is appropriate when the size of seasonal swings is constant regardless of the level of the series.

Show Code
dcmp_add <- retail_ts |>
  model(classical_decomposition(Sales, type = "additive")) |>
  components()

dcmp_add |>
  autoplot() +
  labs(
    title = "Classical Additive Decomposition",
    x     = NULL
  ) +
  theme_minimal()

Classical Additive Decomposition — US Retail Sales

Multiplicative Decomposition

Multiplicative decomposition assumes:

\[Y_t = T_t \times S_t \times R_t\]

This is appropriate when seasonal swings grow proportionally with the level of the series — larger values produce larger seasonal peaks and troughs in absolute terms.

Show Code
dcmp_mult <- retail_ts |>
  model(classical_decomposition(Sales, type = "multiplicative")) |>
  components()

dcmp_mult |>
  autoplot() +
  labs(
    title = "Classical Multiplicative Decomposition",
    x     = NULL
  ) +
  theme_minimal()

Classical Multiplicative Decomposition — US Retail Sales

STL Decomposition

STL (Seasonal and Trend decomposition using Loess) is more flexible than classical decomposition because it allows the seasonal component to change over time and is robust to outliers.

Show Code
dcmp_stl <- retail_ts |>
  model(STL(Sales ~ season(window = 13))) |>
  components()

dcmp_stl |>
  autoplot() +
  labs(
    title = "STL Decomposition",
    x     = NULL
  ) +
  theme_minimal()

STL Decomposition — US Retail Sales

Which Decomposition Worked Better?

Show Code
var_add  <- var(dcmp_add$random,  na.rm = TRUE)
var_mult <- var(dcmp_mult$random, na.rm = TRUE)

tibble(
  Decomposition      = c("Additive", "Multiplicative"),
  Remainder_Variance = c(round(var_add, 2), round(var_mult, 2))
) |>
  kbl(
    caption   = "Remainder Variance by Decomposition Type",
    col.names = c("Decomposition", "Remainder Variance")
  ) |>
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width        = FALSE
  )
Remainder Variance by Decomposition Type
Decomposition Remainder Variance
Additive 124210354
Multiplicative 0

Multiplicative decomposition works better for this series. Two pieces of evidence support this conclusion.

First, the seasonal component plot from additive decomposition shows seasonal swings of roughly constant dollar magnitude across the full sample. But retail sales roughly doubled between 2010 and 2023, so the December holiday spike in 2023 was much larger in absolute dollar terms than in 2010. Multiplicative decomposition captures this correctly by expressing seasonal factors as ratios rather than levels.

Second, remainder variance is the key diagnostic. A good decomposition absorbs systematic trend and seasonal variation, leaving a remainder that looks like white noise with low variance. The model with lower remainder variance has done a better job separating signal from noise. The table above confirms which decomposition achieves this.

How Decomposition Results Can Be Used in Forecasting

Decomposition informs forecasting in two practical ways.

The first is seasonal adjustment. By removing the estimated seasonal component, the series becomes easier to forecast with simpler models. A Naive or Drift model applied to the seasonally adjusted series often outperforms the same model applied to the raw series because it is not confused by predictable seasonal swings. The seasonal pattern is then added back to produce the final forecast. This is the principle behind STL + ETS and STL + ARIMA, which consistently outperform benchmark models on seasonal series.

The second is structural understanding. The trend component reveals whether the series is growing, declining, or plateauing. The seasonal component shows which months are systematically high or low and whether that pattern is stable or evolving. A changing seasonal pattern — visible in STL but invisible in classical decomposition — signals that a time varying model is needed rather than a fixed seasonal model.

For US retail sales specifically, decomposition is worth using in forecasting. The series has strong, consistent seasonality and a clear upward trend, both of which decomposition captures reliably. The COVID-19 shock in 2020 appears in the remainder as a large outlier, which STL handles better than classical decomposition due to its robustness properties. An STL decomposition followed by an ETS or ARIMA model on the trend cycle component would likely outperform all four benchmark models evaluated in Part I.