Week 2 — Accuracy Metrics + Decomposition (FRED Monthly: PAYEMS/PAYNSA)

Author

Adrian Aziza

# Packages
library(fpp3)       # tsibble + fable + feasts + ggplot helpers
library(quantmod)   # FRED download (no API key needed)
library(zoo)        # as.yearmon helper (used by quantmod users sometimes)

1 1. Overview

This report follows the tidyverts/fpp3 workflow to:

  1. Pull a monthly FRED time series,
  2. Fit baseline forecasting models (MEAN, NAIVE, SNAIVE, DRIFT),
  3. Compute accuracy metrics (ME, MPE, RMSE, MAE, MAPE) on a holdout test set,
  4. Generate forecast plots with prediction intervals,
  5. Conduct additive and multiplicative-style decomposition (STL on level vs STL on log-scale),
  6. Provide a clear mapping to recreating the same accuracy metrics in Excel.

For the forecasting + accuracy metrics, I use PAYEMS (Total Nonfarm Payrolls, seasonally adjusted). For decomposition, I use PAYNSA (Total Nonfarm Payrolls, not seasonally adjusted) so that the seasonal component is visible; PAYEMS is already seasonally adjusted and will typically show weaker seasonality.

Important implementation detail: FRED dates often arrive as end-of-month Dates. For fpp3/tidyverts seasonal tools, we convert to a true monthly index yearmonth() and enforce regular spacing with fill_gaps().

2 2. Part I — Accuracy metrics + forecasts (PAYEMS)

2.1 2.1 Pull monthly data from FRED

fred_code <- "PAYEMS"  # Total Nonfarm Payrolls (Seasonally Adjusted)

raw_xts <- getSymbols(fred_code, src = "FRED", auto.assign = FALSE)

y <- tibble(
  date  = as.Date(index(raw_xts)),
  value = as.numeric(raw_xts[, 1])
) |>
  mutate(month = yearmonth(date)) |>
  select(month, value) |>
  as_tsibble(index = month) |>
  filter(!is.na(value)) |>
  fill_gaps()

# quick sanity check (should show regular monthly interval)
y |> interval()
<interval[1]>
[1] 1M
autoplot(y, value) +
  labs(
    title = paste("FRED series:", fred_code),
    x = NULL,
    y = "Thousands of persons"
  )

2.2 2.2 Train/test split

I evaluate forecasts using a simple holdout: the last 12 months are the test set.

h <- 12

train <- y |> slice(1:(n() - h))
test  <- y |> slice((n() - h + 1):n())

train |> tail(3)
# A tsibble: 3 x 2 [1M]
     month  value
     <mth>  <dbl>
1 2024 Oct 158358
2 2024 Nov 158619
3 2024 Dec 158942
test  |> head(3)
# A tsibble: 3 x 2 [1M]
     month  value
     <mth>  <dbl>
1 2025 Jan 159053
2 2025 Feb 159155
3 2025 Mar 159275

2.3 2.3 Fit baseline models

Models included: - MEAN: mean forecast - NAIVE: random walk / last observation carried forward - SNAIVE: seasonal naive (uses last year’s same month) - DRIFT: random walk with drift

fit <- train |>
  model(
    MEAN   = MEAN(value),
    NAIVE  = NAIVE(value),
    SNAIVE = SNAIVE(value),
    DRIFT  = RW(value ~ drift())
  )

fit
# A mable: 1 x 4
     MEAN   NAIVE   SNAIVE         DRIFT
  <model> <model>  <model>       <model>
1  <MEAN> <NAIVE> <SNAIVE> <RW w/ drift>

2.4 2.4 Forecasts with prediction intervals

fc <- fit |> forecast(h = h)

# Forecast plot (screenshot: forecasts + CIs + test overlay)
autoplot(fc, train) +
  autolayer(test, value, alpha = 0.7) +
  labs(
    title = paste("Forecasts with prediction intervals:", fred_code),
    x = NULL,
    y = "Thousands of persons"
  )

2.5 2.5 Accuracy metrics table

Accuracy metrics computed on the holdout test set: - ME (Mean Error): average signed error (bias in original units) - MPE (Mean Percentage Error): average signed percent error (bias in percent) - RMSE: penalizes large errors more (squared error) - MAE: average absolute error (linear penalty) - MAPE: average absolute percent error

acc <- accuracy(fc, y) |>
  select(.model, .type, ME, MPE, RMSE, MAE, MAPE) |>
  arrange(.type, .model)

acc
# A tibble: 4 × 7
  .model .type     ME    MPE   RMSE    MAE   MAPE
  <chr>  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 DRIFT  Test   -354. -0.222   480.   354.  0.222
2 MEAN   Test  66760. 41.9   66760. 66760. 41.9  
3 NAIVE  Test    460.  0.288   484.   460.  0.288
4 SNAIVE Test   1441.  0.904  1499.  1441.  0.904
# Nicely formatted table (screenshot this)
knitr::kable(
  acc,
  digits = 3,
  caption = "Accuracy metrics for baseline models on PAYEMS (test set shown under .type = 'Test')"
)
Accuracy metrics for baseline models on PAYEMS (test set shown under .type = ‘Test’)
.model .type ME MPE RMSE MAE MAPE
DRIFT Test -353.908 -0.222 479.706 353.908 0.222
MEAN Test 66760.219 41.882 66760.394 66760.219 41.882
NAIVE Test 459.500 0.288 484.241 459.500 0.288
SNAIVE Test 1441.083 0.904 1498.693 1441.083 0.904

2.6 2.6 How the measures differ (short explanation)

All metrics are functions of forecast errors: (e_t = y_t - _t).

  • ME is the average of (e_t). It captures directional bias in the original units.
  • MPE is the average of (100 e_t / y_t). It captures bias as a percent of the actual.
  • MAE is the average of (|e_t|). It summarizes typical error magnitude in original units.
  • RMSE is (). It penalizes large errors more than MAE.
  • MAPE is the average of (100 |e_t| / y_t). It is scale-free but can be problematic if (y_t) is near zero.

For model comparison, smaller values of RMSE/MAE/MAPE generally indicate better predictive accuracy, while ME/MPE indicate systematic over- or under-forecasting.

2.7 2.7 Recreating the metrics in Excel

To recreate the exact metrics in Excel for the test period:

Create columns: - Actual (y_t) - Forecast (_t) - Error (e_t = y_t - _t) - (|e_t|) - (e_t^2) - Percent error (e_t / y_t) - Absolute percent error (|e_t| / y_t)

Excel formulas (assuming: Actual in column B, Forecast in C, Error in D):

  • Error (D2): =B2 - C2
  • Abs error (E2): =ABS(D2)
  • Squared error (F2): =D2^2
  • Percent error (G2): =D2/B2
  • Abs percent error (H2): =ABS(G2)

Then aggregate over the test rows: - ME: =AVERAGE(D:D) - MAE: =AVERAGE(E:E) - RMSE: =SQRT(AVERAGE(F:F)) - MPE: =100*AVERAGE(G:G) - MAPE: =100*AVERAGE(H:H)

To match R’s printed table, make sure you use the same test rows and apply the same rounding.

3 3. Part II — Decomposition (PAYNSA)

PAYEMS is already seasonally adjusted, which often reduces visible seasonality. To demonstrate decomposition more clearly, I use the non-seasonally adjusted companion series PAYNSA.

3.1 3.1 Pull PAYNSA

fred_code_decomp <- "PAYNSA"  # Total Nonfarm Payrolls (Not Seasonally Adjusted)

raw_xts2 <- getSymbols(fred_code_decomp, src = "FRED", auto.assign = FALSE)

y2 <- tibble(
  date  = as.Date(index(raw_xts2)),
  value = as.numeric(raw_xts2[, 1])
) |>
  mutate(month = yearmonth(date)) |>
  select(month, value) |>
  as_tsibble(index = month) |>
  filter(!is.na(value)) |>
  fill_gaps()

# sanity check
y2 |> interval()
<interval[1]>
[1] 1M
autoplot(y2, value) +
  labs(
    title = paste("FRED series:", fred_code_decomp),
    x = NULL,
    y = "Thousands of persons"
  )

3.2 3.2 Additive STL decomposition

stl_add <- y2 |>
  model(STL(value ~ season(window = "periodic")))

# Components plot (screenshot this)
components(stl_add) |>
  autoplot() +
  labs(title = "STL decomposition (additive) — PAYNSA")

3.3 3.3 Multiplicative-style decomposition (log transform)

A common way to approximate multiplicative seasonality is to apply a log transform and then decompose additively on the log scale.

# PAYNSA should be strictly positive; filter defensively
y2_pos <- y2 |> filter(value > 0)

stl_log <- y2_pos |>
  mutate(log_value = log(value)) |>
  model(STL(log_value ~ season(window = "periodic")))

# Components plot (screenshot this)
components(stl_log) |>
  autoplot() +
  labs(title = "STL decomposition on log(value) (multiplicative-style) — PAYNSA")

3.4 3.4 Which worked better? How can you tell?

Two practical criteria:

  1. Stability of seasonal amplitude
  • If seasonal swings are roughly constant in absolute size over time, additive decomposition is appropriate.
  • If seasonal swings grow/shrink proportionally with the level, log/multiplicative-style is more appropriate.
  1. Remainder (residual) behavior A “better” decomposition typically produces a remainder that looks more like noise (no obvious trend/seasonality left).
# Remainder diagnostics (additive)
rem_add <- components(stl_add) |>
  as_tibble() |>
  select(month, remainder) |>
  as_tsibble(index = month)

autoplot(rem_add, remainder) +
  labs(title = "Remainder (additive STL) — PAYNSA", x = NULL, y = "remainder")

# Remainder diagnostics (log-scale)
rem_log <- components(stl_log) |>
  as_tibble() |>
  select(month, remainder) |>
  as_tsibble(index = month)

autoplot(rem_log, remainder) +
  labs(title = "Remainder (log-STL) — PAYNSA", x = NULL, y = "remainder")

In practice, I would choose the decomposition whose remainder appears more stable (more homoscedastic) and whose seasonal component is more consistent with the series’ behavior (constant vs proportional seasonality).

3.5 3.5 How might decomposition be used in forecasting?

Decomposition can be used to: - Remove seasonality (deseasonalize), - Forecast the adjusted series (trend/remainder), - Add back seasonality (reseasonalize).

However, because many forecasting models (e.g., seasonal naive, ETS, ARIMA variants, or regression models with seasonal terms) directly model seasonality, decomposition is often most useful for interpretation, diagnostics, and variance stabilization rather than as a required forecasting step.

4 4. Submission checklist (images to include)

For the submission requirements, include images of:

  1. The accuracy table (nicely formatted)
  2. The forecast plot (point forecasts + prediction intervals + test overlay)
  3. The additive STL components plot (PAYNSA)
  4. The log/multiplicative-style STL components plot (PAYNSA)
  5. The Excel sheet reproducing ME/MPE/RMSE/MAE/MAPE for the test period, showing values match R