library(fpp3)
library(quantmod)
library(knitr)Discussion 3 — Accuracy Metrics + Decomposition (FRED Monthly: PAYNSA)
1 1. Data (FRED: PAYNSA)
fred_code <- "PAYNSA"
raw_xts <- getSymbols(fred_code, src = "FRED", auto.assign = FALSE)
y <- tibble(
date = as.Date(index(raw_xts)),
value = as.numeric(raw_xts[, 1])
) |>
mutate(Month = yearmonth(date)) |>
select(Month, value) |>
as_tsibble(index = Month) |>
filter(!is.na(value)) |>
fill_gaps()
# checks
tsibble::has_gaps(y)# A tibble: 1 × 1
.gaps
<lgl>
1 FALSE
tsibble::interval(y)<interval[1]>
[1] 1M
autoplot(y, value) +
labs(
title = paste("FRED series:", fred_code),
x = NULL,
y = "Thousands of persons"
)2 2. Methods
2.1 Methods (repeatable “one-collapse” pipeline)
- I pulled the monthly FRED series (PAYNSA) using
quantmod::getSymbols()to keep ingestion lightweight and reproducible without API-key setup. - I immediately standardized the time index by converting the FRED
Datestamps to a true monthly index usingyearmonth(). - I validated the series cadence with
interval()and checked for missing periods withhas_gaps(). - To guarantee compatibility with STL/fable (and to make the workflow reusable for other data sources), I enforced a regular monthly grid using
fill_gaps(). - I split the series into training and test sets (2015 Dec cutoff; 2016 Jan onward held out) to evaluate out-of-sample performance.
- I fit benchmark models with
model(), generated forecasts withforecast(h = nrow(test)), and computed accuracy on the test set usingaccuracy(fc, test). - For decomposition, I compared additive STL on the original scale with a multiplicative-style STL via a log transform, and evaluated the remainder to see whether structure remained.
- This “ingest → standardize index → verify cadence/gaps → regularize → model/evaluate” collapse pattern is intended to generalize to higher-frequency pipelines (e.g., multi-exchange BTC data) where alignment and missing timestamps are unavoidable.
3 3. Train/test split
train <- y |> filter_index(~ "2015 Dec")
test <- y |> filter_index("2016 Jan" ~ .)4 4. Benchmark models + forecasts
models <- train |>
model(
NAIVE = NAIVE(value),
MEAN = MEAN(value),
SNAIVE = SNAIVE(value),
AVG = MEAN(value),
WAVG = TSLM(value ~ trend()),
DRIFT = RW(value ~ drift())
)
models# A mable: 1 x 6
NAIVE MEAN SNAIVE AVG WAVG DRIFT
<model> <model> <model> <model> <model> <model>
1 <NAIVE> <MEAN> <SNAIVE> <MEAN> <TSLM> <RW w/ drift>
fc <- models |> forecast(h = nrow(test))
autoplot(fc, train) +
autolayer(test, value, alpha = 0.7) +
facet_wrap(~ .model, ncol = 2) +
labs(
title = paste("Benchmark forecasts (with prediction intervals):", fred_code),
x = NULL,
y = "Thousands of persons"
)models_small <- train |>
model(
MEAN = MEAN(value),
NAIVE = NAIVE(value),
SNAIVE = SNAIVE(value)
)
fc_small <- models_small |> forecast(h = nrow(test))
autoplot(fc_small, train, level = NULL) +
autolayer(test, value, colour = "black") +
labs(title = paste("Point forecasts (no intervals):", fred_code),
y = "Thousands of persons", x = NULL,
colour = "Series")5 5. Accuracy metrics (ME, MPE, RMSE, MAE, MAPE)
acc <- accuracy(fc, test) |>
select(.model, ME, MPE, RMSE, MAE, MAPE) |>
arrange(.model)
acc# A tibble: 6 × 6
.model ME MPE RMSE MAE MAPE
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AVG 64499. 42.8 64792. 64499. 42.8
2 DRIFT -1070. -0.796 4211. 2545. 1.75
3 MEAN 64499. 42.8 64792. 64499. 42.8
4 NAIVE 6451. 4.12 8913. 7354. 4.79
5 SNAIVE 8672. 5.61 10558. 9152. 5.97
6 WAVG -2829. -1.96 4952. 2987. 2.07
kable(
acc,
digits = 3,
caption = "Forecast accuracy metrics on the test set (PAYNSA)"
)| .model | ME | MPE | RMSE | MAE | MAPE |
|---|---|---|---|---|---|
| AVG | 64499.127 | 42.760 | 64791.643 | 64499.127 | 42.760 |
| DRIFT | -1070.167 | -0.796 | 4211.412 | 2544.753 | 1.753 |
| MEAN | 64499.127 | 42.760 | 64791.643 | 64499.127 | 42.760 |
| NAIVE | 6451.300 | 4.124 | 8912.848 | 7353.517 | 4.785 |
| SNAIVE | 8671.717 | 5.612 | 10557.727 | 9151.933 | 5.970 |
| WAVG | -2829.454 | -1.964 | 4951.972 | 2987.427 | 2.067 |
5.1 5.1 Metric differences
Let the forecast error be (e_t = y_t - _t).
- ME: average signed error (bias in units).
- MPE: average signed percent error (bias in percent; unstable if actuals are small).
- MAE: average absolute error (typical miss size in units).
- RMSE: like MAE but penalizes large misses more (squares errors).
- MAPE: average absolute percent error (scale-free; problematic near 0).
6 6. Decomposition (Additive vs multiplicative-style)
6.1 6.1 Additive STL
stl_add <- y |>
model(STL(value ~ season(window = "periodic")))
components(stl_add) |>
autoplot() +
labs(title = "STL decomposition (additive) — PAYNSA")6.2 6.2 Multiplicative-style STL (log transform)
y_pos <- y |> filter(value > 0)
stl_log <- y_pos |>
mutate(log_value = log(value)) |>
model(STL(log_value ~ season(window = "periodic")))
components(stl_log) |>
autoplot() +
labs(title = "STL decomposition on log(value) (multiplicative-style) — PAYNSA")6.3 6.3 Remainder diagnostics
rem_add <- components(stl_add) |>
as_tibble() |>
select(Month, remainder) |>
as_tsibble(index = Month)
autoplot(rem_add, remainder) +
labs(title = "Remainder (additive STL) — PAYNSA", x = NULL, y = "remainder")rem_log <- components(stl_log) |>
as_tibble() |>
select(Month, remainder) |>
as_tsibble(index = Month)
autoplot(rem_log, remainder) +
labs(title = "Remainder (log-STL) — PAYNSA", x = NULL, y = "remainder")6.4 6.4 Which decomposition worked better (and why)
- Additive STL is appropriate when seasonal swings are roughly constant in absolute size over time.
- Multiplicative-style (log) STL is appropriate when seasonal swings scale with the level (i.e., seasonal amplitude grows/shrinks as the series grows/shrinks).
How I decided: I compare the remainder plots. The “better” decomposition is the one whose remainder looks closer to random noise (no obvious leftover seasonality/trend, and more stable variance).
For PAYNSA, I would report the decomposition whose remainder is more homoscedastic (less changing spread) and shows less visible structure as the better fit. If the log-remainder looks more stable, I would prefer the multiplicative-style version; otherwise, additive is sufficient.
6.5 6.5 How decomposition could be used for forecasting (and would I?)
Decomposition is mainly a diagnostic and preprocessing step:
- If decomposition reveals strong seasonality, I would favor models that explicitly handle it (e.g., SNAIVE, seasonal ETS/ARIMA, or regression with seasonal terms).
- If variance/seasonal amplitude grows with the level, I would consider a log (or Box–Cox) transform before modeling.
- In a fully manual workflow, you can deseasonalize → model the seasonally-adjusted series → reseasonalize. In tidyverts, many models already learn seasonality directly, so I would use decomposition primarily for understanding structure and choosing the right model/transform rather than as a required forecasting step.
7 7. Excel replication formulas
For the test period, create columns:
- actual (y_t)
- forecast (_t)
- error (e_t = y_t - _t)
- (|e_t|)
- (e_t^2)
- percent error (e_t / y_t)
- absolute percent error (|e_t| / y_t)
If Actual is column B and Forecast is C:
- Error (D2):
=B2-C2 - Abs error (E2):
=ABS(D2) - Squared error (F2):
=D2^2 - Percent error (G2):
=D2/B2 - Abs percent error (H2):
=ABS(G2)
Aggregations:
- ME:
=AVERAGE(D:D) - MAE:
=AVERAGE(E:E) - RMSE:
=SQRT(AVERAGE(F:F)) - MPE:
=100*AVERAGE(G:G) - MAPE:
=100*AVERAGE(H:H)
8 8. Bridge note: how this maps to the BTC engine (trend-line feature generation)
In a trading pipeline, each constituent feed (e.g., exchange prices, index components like BRTI/BRTI constituents, spreads, funding, etc.) can be treated as its own time series on a shared time grid. Decomposition is useful because it turns each series into structured components—trend, seasonal, and remainder—which can be used as features. Concretely, you can extract a trend-line signal (level/slope/curvature), optionally remove predictable seasonality, and keep the remainder as a “shock/noise” feature. Stacking these per-feed features creates a time-aligned feature matrix (X_t) (one row per timestamp, columns for each feed’s trend/seasonal/remainder summaries). ACF/PACF (and cross-correlation across feeds) then help identify persistence/lag structure that informs the next modeling layer (forecasting the index, volatility bands, or a decision rule). The key enabling step is the collapse layer: enforce a single temporal grid, define a consistent missing-data policy, and then generate component-wise features on that aligned grid.