Discussion 3 — Accuracy Metrics + Decomposition (FRED Monthly: PAYNSA)

Author

Adrian Aziza

library(fpp3)
library(quantmod)
library(knitr)

1 1. Data (FRED: PAYNSA)

fred_code <- "PAYNSA"

raw_xts <- getSymbols(fred_code, src = "FRED", auto.assign = FALSE)

y <- tibble(
  date  = as.Date(index(raw_xts)),
  value = as.numeric(raw_xts[, 1])
) |>
  mutate(Month = yearmonth(date)) |>
  select(Month, value) |>
  as_tsibble(index = Month) |>
  filter(!is.na(value)) |>
  fill_gaps()

# checks
tsibble::has_gaps(y)

# A tibble: 1 × 1
  .gaps
  <lgl>
1 FALSE

tsibble::interval(y)

<interval[1]>
[1] 1M

autoplot(y, value) +
  labs(
    title = paste("FRED series:", fred_code),
    x = NULL,
    y = "Thousands of persons"
  )

2 2. Methods

2.1 Methods (repeatable “one-collapse” pipeline)

I pulled the monthly FRED series (PAYNSA) using quantmod::getSymbols() to keep ingestion lightweight and reproducible without API-key setup.
I immediately standardized the time index by converting the FRED Date stamps to a true monthly index using yearmonth().
I validated the series cadence with interval() and checked for missing periods with has_gaps().
To guarantee compatibility with STL/fable (and to make the workflow reusable for other data sources), I enforced a regular monthly grid using fill_gaps().
I split the series into training and test sets (2015 Dec cutoff; 2016 Jan onward held out) to evaluate out-of-sample performance.
I fit benchmark models with model(), generated forecasts with forecast(h = nrow(test)), and computed accuracy on the test set using accuracy(fc, test).
For decomposition, I compared additive STL on the original scale with a multiplicative-style STL via a log transform, and evaluated the remainder to see whether structure remained.
This “ingest → standardize index → verify cadence/gaps → regularize → model/evaluate” collapse pattern is intended to generalize to higher-frequency pipelines (e.g., multi-exchange BTC data) where alignment and missing timestamps are unavoidable.

3 3. Train/test split

train <- y |> filter_index(~ "2015 Dec")
test  <- y |> filter_index("2016 Jan" ~ .)

4 4. Benchmark models + forecasts

models <- train |>
  model(
    NAIVE  = NAIVE(value),
    MEAN   = MEAN(value),
    SNAIVE = SNAIVE(value),
    AVG    = MEAN(value),
    WAVG   = TSLM(value ~ trend()),
    DRIFT  = RW(value ~ drift())
  )

models

# A mable: 1 x 6
    NAIVE    MEAN   SNAIVE     AVG    WAVG         DRIFT
  <model> <model>  <model> <model> <model>       <model>
1 <NAIVE>  <MEAN> <SNAIVE>  <MEAN>  <TSLM> <RW w/ drift>

fc <- models |> forecast(h = nrow(test))

autoplot(fc, train) +
  autolayer(test, value, alpha = 0.7) +
  facet_wrap(~ .model, ncol = 2) +
  labs(
    title = paste("Benchmark forecasts (with prediction intervals):", fred_code),
    x = NULL,
    y = "Thousands of persons"
  )

models_small <- train |>
  model(
    MEAN   = MEAN(value),
    NAIVE  = NAIVE(value),
    SNAIVE = SNAIVE(value)
  )

fc_small <- models_small |> forecast(h = nrow(test))

autoplot(fc_small, train, level = NULL) +
  autolayer(test, value, colour = "black") +
  labs(title = paste("Point forecasts (no intervals):", fred_code),
       y = "Thousands of persons", x = NULL,
       colour = "Series")

5 5. Accuracy metrics (ME, MPE, RMSE, MAE, MAPE)

acc <- accuracy(fc, test) |>
  select(.model, ME, MPE, RMSE, MAE, MAPE) |>
  arrange(.model)

acc

# A tibble: 6 × 6
  .model     ME    MPE   RMSE    MAE  MAPE
  <chr>   <dbl>  <dbl>  <dbl>  <dbl> <dbl>
1 AVG    64499. 42.8   64792. 64499. 42.8 
2 DRIFT  -1070. -0.796  4211.  2545.  1.75
3 MEAN   64499. 42.8   64792. 64499. 42.8 
4 NAIVE   6451.  4.12   8913.  7354.  4.79
5 SNAIVE  8672.  5.61  10558.  9152.  5.97
6 WAVG   -2829. -1.96   4952.  2987.  2.07

kable(
  acc,
  digits = 3,
  caption = "Forecast accuracy metrics on the test set (PAYNSA)"
)

Forecast accuracy metrics on the test set (PAYNSA)
.model	ME	MPE	RMSE	MAE	MAPE
AVG	64499.127	42.760	64791.643	64499.127	42.760
DRIFT	-1070.167	-0.796	4211.412	2544.753	1.753
MEAN	64499.127	42.760	64791.643	64499.127	42.760
NAIVE	6451.300	4.124	8912.848	7353.517	4.785
SNAIVE	8671.717	5.612	10557.727	9151.933	5.970
WAVG	-2829.454	-1.964	4951.972	2987.427	2.067

5.1 5.1 Metric differences

Let the forecast error be (e_t = y_t - _t).

ME: average signed error (bias in units).
MPE: average signed percent error (bias in percent; unstable if actuals are small).
MAE: average absolute error (typical miss size in units).
RMSE: like MAE but penalizes large misses more (squares errors).
MAPE: average absolute percent error (scale-free; problematic near 0).

6 6. Decomposition (Additive vs multiplicative-style)

6.1 6.1 Additive STL

stl_add <- y |>
  model(STL(value ~ season(window = "periodic")))

components(stl_add) |>
  autoplot() +
  labs(title = "STL decomposition (additive) — PAYNSA")

6.2 6.2 Multiplicative-style STL (log transform)

y_pos <- y |> filter(value > 0)

stl_log <- y_pos |>
  mutate(log_value = log(value)) |>
  model(STL(log_value ~ season(window = "periodic")))

components(stl_log) |>
  autoplot() +
  labs(title = "STL decomposition on log(value) (multiplicative-style) — PAYNSA")

6.3 6.3 Remainder diagnostics

rem_add <- components(stl_add) |>
  as_tibble() |>
  select(Month, remainder) |>
  as_tsibble(index = Month)

autoplot(rem_add, remainder) +
  labs(title = "Remainder (additive STL) — PAYNSA", x = NULL, y = "remainder")

rem_log <- components(stl_log) |>
  as_tibble() |>
  select(Month, remainder) |>
  as_tsibble(index = Month)

autoplot(rem_log, remainder) +
  labs(title = "Remainder (log-STL) — PAYNSA", x = NULL, y = "remainder")

6.4 6.4 Which decomposition worked better (and why)

Additive STL is appropriate when seasonal swings are roughly constant in absolute size over time.
Multiplicative-style (log) STL is appropriate when seasonal swings scale with the level (i.e., seasonal amplitude grows/shrinks as the series grows/shrinks).

How I decided: I compare the remainder plots. The “better” decomposition is the one whose remainder looks closer to random noise (no obvious leftover seasonality/trend, and more stable variance).

For PAYNSA, I would report the decomposition whose remainder is more homoscedastic (less changing spread) and shows less visible structure as the better fit. If the log-remainder looks more stable, I would prefer the multiplicative-style version; otherwise, additive is sufficient.

6.5 6.5 How decomposition could be used for forecasting (and would I?)

Decomposition is mainly a diagnostic and preprocessing step:

If decomposition reveals strong seasonality, I would favor models that explicitly handle it (e.g., SNAIVE, seasonal ETS/ARIMA, or regression with seasonal terms).
If variance/seasonal amplitude grows with the level, I would consider a log (or Box–Cox) transform before modeling.
In a fully manual workflow, you can deseasonalize → model the seasonally-adjusted series → reseasonalize. In tidyverts, many models already learn seasonality directly, so I would use decomposition primarily for understanding structure and choosing the right model/transform rather than as a required forecasting step.

7 7. Excel replication formulas

For the test period, create columns:

actual (y_t)
forecast (_t)
error (e_t = y_t - _t)
(|e_t|)
(e_t^2)
percent error (e_t / y_t)
absolute percent error (|e_t| / y_t)

If Actual is column B and Forecast is C:

Error (D2): =B2-C2
Abs error (E2): =ABS(D2)
Squared error (F2): =D2^2
Percent error (G2): =D2/B2
Abs percent error (H2): =ABS(G2)

Aggregations:

ME: =AVERAGE(D:D)
MAE: =AVERAGE(E:E)
RMSE: =SQRT(AVERAGE(F:F))
MPE: =100*AVERAGE(G:G)
MAPE: =100*AVERAGE(H:H)

8 8. Bridge note: how this maps to the BTC engine (trend-line feature generation)

In a trading pipeline, each constituent feed (e.g., exchange prices, index components like BRTI/BRTI constituents, spreads, funding, etc.) can be treated as its own time series on a shared time grid. Decomposition is useful because it turns each series into structured components—trend, seasonal, and remainder—which can be used as features. Concretely, you can extract a trend-line signal (level/slope/curvature), optionally remove predictable seasonality, and keep the remainder as a “shock/noise” feature. Stacking these per-feed features creates a time-aligned feature matrix (X_t) (one row per timestamp, columns for each feed’s trend/seasonal/remainder summaries). ACF/PACF (and cross-correlation across feeds) then help identify persistence/lag structure that informs the next modeling layer (forecasting the index, volatility bands, or a decision rule). The key enabling step is the collapse layer: enforce a single temporal grid, define a consistent missing-data policy, and then generate component-wise features on that aligned grid.