ARIMA and Seasonal ARIMA (fpp3 style)

Author

AS

Published

March 26, 2026

ARIMA notation

ARIMA(p,d,q) has three non-seasonal components:

  • p: autoregressive order (how many lagged values of y_t)
  • d: differencing order (how many times differenced)
  • q: moving-average order (how many lagged errors)

For seasonal data, use:

ARIMA(p,d,q)(P,D,Q)[m]

  • P: seasonal AR order
  • D: seasonal differencing order
  • Q: seasonal MA order
  • m: seasonal period (12 for monthly, 4 for quarterly)

Example: ARIMA(0,1,1)(0,1,1)[12] is a classic monthly seasonal model.

What AR, MA, and differencing do

  • AR terms capture persistence in values: the series depends on its own past levels (or past differenced values).
  • MA terms capture persistence in shocks: the series depends on past forecast errors (surprises).
  • Differencing is needed because ARIMA works best on a roughly stationary series.
  • d = 1 removes non-seasonal trend-like nonstationarity.
  • D = 1 with period m removes seasonal nonstationarity.

In short:

  • AR: momentum in values
  • MA: echo of past shocks
  • Differencing: makes the series stable enough for AR/MA modeling

Using ACF and PACF to identify terms

These are practical starting rules (then confirm with diagnostics and forecast accuracy):

  • If PACF cuts off at lag p and ACF tails off, suggest AR(p).
  • If ACF cuts off at lag q and PACF tails off, suggest MA(q).
  • If both tail off, try mixed ARMA(p,q) candidates.

For seasonal structure (period m):

  • Seasonal spikes in ACF at m, 2m, ... suggest seasonal MA (Q).
  • Seasonal spikes in PACF at m, 2m, ... suggest seasonal AR (P).

Setup

library(fpp3)
library(quantmod)
library(tidyverse)
library(fredr)

remove(list=ls())

Example 1: Non-seasonal ARIMA from FRED

fredr_set_key("8a9ec1330374c1696f05cc8e526233b5") # replace with your own key please

?fredr

unrate <- fredr(
  series_id = "UNRATE"
) %>%
  transmute(
    Month = yearmonth(date),
    value = value
  ) %>%
  as_tsibble(index = Month)


unrate %>%
  autoplot(value) +
  labs(
    title = "UNRATE from FRED",
    subtitle = "Unemployment rate (monthly, weak seasonality)",
    y = "Percent",
    x = "Month"
  )

Fit a non-seasonal ARIMA in FPP3 style:

test_n <- 24
unrate_n <- nrow(unrate)
unrate_train <- unrate %>% slice_head(n = unrate_n - test_n)
unrate_test  <- unrate %>% slice_tail(n = test_n)

fit_arima <- unrate_train %>%
  model(arima = ARIMA(value))

fit_arima

Parameter explanation for this example:

  • p: non-seasonal AR lags selected by ARIMA()
  • d: non-seasonal differences selected by ARIMA()
  • q: non-seasonal MA lags selected by ARIMA()
fit_arima %>% report()
Series: value 
Model: ARIMA(1,1,1) 

Coefficients:
          ar1     ma1
      -0.7662  0.8210
s.e.   0.1070  0.0943

sigma^2 estimated as 0.1745:  log likelihood=-497.48
AIC=1000.96   AICc=1000.98   BIC=1015.41

Interpreting ARIMA(1,1,1) coefficients

ARIMA(1,1,1) means:

  • d = 1: model is fit to first differences, w_t = y_t - y_{t-1}
  • p = 1: one AR term on differenced series
  • q = 1: one MA term on differenced series

Model form on the differenced series:

w_t = c + phi_1 * w_{t-1} + e_t + theta_1 * e_{t-1}

  • phi_1 (ar1 coefficient): persistence of the differenced process
  • theta_1 (ma1 coefficient): correction from last period shock/error
  • c (constant; drift can be implied): average change per period
fit_111 <- unrate_train %>%
  model(arima_111 = ARIMA(value ~ pdq(1,1,1)))

fit_111 %>% report()
Series: value 
Model: ARIMA(1,1,1) 

Coefficients:
          ar1     ma1
      -0.7662  0.8210
s.e.   0.1070  0.0943

sigma^2 estimated as 0.1745:  log likelihood=-497.48
AIC=1000.96   AICc=1000.98   BIC=1015.41
tidy(fit_111)

Example 2: Seasonal ARIMA from FRED

# Strongly seasonal monthly series (NSA)

houst <- fredr_series_observations(series_id = "HOUSTNSA") %>%
  transmute(
    Month = yearmonth(date),
    value = value
  ) %>%
  as_tsibble(index = Month)

houst %>%
  autoplot(value) +
  labs(
    title = "HOUSTNSA from FRED",
    subtitle = "Housing starts (monthly, strong seasonality)",
    y = "Thousands of units",
    x = "Month"
  )

Fit one explicit seasonal ARIMA:

houst_n <- nrow(houst)
houst_train <- houst %>% slice_head(n = houst_n - test_n)
houst_test  <- houst %>% slice_tail(n = test_n)

fit_sarima <- houst_train %>%
  model(sarima = ARIMA(value ~ pdq(0,1,1) + PDQ(0,1,1)))

fit_sarima
fit_sarima %>% report()
Series: value 
Model: ARIMA(0,1,1)(0,1,1)[12] 

Coefficients:
          ma1     sma1
      -0.3173  -0.8649
s.e.   0.0341   0.0190

sigma^2 estimated as 113.3:  log likelihood=-2913.39
AIC=5832.78   AICc=5832.81   BIC=5846.71

Parameter explanation for this example:

  • Non-seasonal part pdq(0,1,1):
    • p = 0: no non-seasonal AR term
    • d = 1: first difference to remove trend
    • q = 1: one non-seasonal MA term
  • Seasonal part PDQ(0,1,1) with monthly seasonality:
    • P = 0: no seasonal AR term
    • D = 1: one seasonal difference (lag 12)
    • Q = 1: one seasonal MA term
    • [m = 12]: monthly seasonal period

Seasonal vs non-seasonal on the seasonal series

compare_models <- houst_train %>%
  model(
    arima_nonseasonal = ARIMA(value ~ pdq() + PDQ(0,0,0)),
    seasonal_arima = ARIMA(value ~ pdq(0,1,1) + PDQ(0,1,1))
  )

fc_compare <- compare_models %>% forecast(h = test_n)

acc_compare <- accuracy(fc_compare, houst_test) %>%
  select(.model, RMSE, MAE, MAPE) %>%
  arrange(RMSE)

acc_compare

Why can non-seasonal ARIMA do better here?

best_model <- acc_compare$.model[1]
full_seasonal_strength <- houst %>% features(value, feat_stl) %>% pull(seasonal_strength_year)
recent_seasonal_strength <- houst_train %>%
  slice_tail(n = 120) %>%
  features(value, feat_stl) %>%
  pull(seasonal_strength_year)

cat("Best model on this holdout:", best_model, "\n")
Best model on this holdout: seasonal_arima 
cat("Seasonal strength (full sample):", round(full_seasonal_strength, 3), "\n")
Seasonal strength (full sample): 0.873 
cat("Seasonal strength (last 10 years of train):", round(recent_seasonal_strength, 3), "\n")
Seasonal strength (last 10 years of train): 0.657 
if (best_model == "arima_nonseasonal") {
  cat("\nInterpretation:\n")
  cat("1) Seasonal ARIMA here is fixed at (0,1,1)(0,1,1)[12], while non-seasonal ARIMA is auto-selected.\n")
  cat("2) In recent years, level shifts/volatility can dominate seasonality, so seasonal differencing may over-difference.\n")
  cat("3) With a short test window (24 months), the simpler model can have lower forecast error by chance.\n")
} else {
  cat("\nInterpretation: seasonal structure in this series is helping forecast accuracy on this holdout.\n")
}

Interpretation: seasonal structure in this series is helping forecast accuracy on this holdout.
autoplot(fc_compare, houst) +
  labs(
    title = "HOUSTNSA: ARIMA vs Seasonal ARIMA",
    subtitle = "Seasonal ARIMA should usually perform better on this seasonal series",
    y = "Thousands of units",
    x = "Month"
  ) +
  theme(legend.position = "bottom")

Appendix: ARIMA vs ETS

Conceptual comparison

  • ARIMA/SARIMA models autocorrelation explicitly using AR and MA terms, after differencing.
  • ETS models level/trend/seasonality components with exponential smoothing recursions.
  • ARIMA is often strong when serial correlation structure is key.
  • ETS is often strong when evolving level/trend/seasonality dominates.
  • In practice, compare both on holdout accuracy.

Empirical comparison on the same seasonal FRED series

ets_vs_arima <- houst_train %>%
  model(
    seasonal_arima = ARIMA(value ~ pdq(0,1,1) + PDQ(0,1,1)),
    ets = ETS(value)
  )

fc_ets_vs_arima <- ets_vs_arima %>% forecast(h = test_n)

accuracy(fc_ets_vs_arima, houst_test) %>%
  select(.model, RMSE, MAE, MAPE) %>%
  arrange(RMSE)
autoplot(fc_ets_vs_arima, houst) +
  labs(
    title = "HOUSTNSA: Seasonal ARIMA vs ETS",
    subtitle = "Compare both families; choose by out-of-sample performance",
    y = "Thousands of units",
    x = "Month"
  ) +
  theme(legend.position = "bottom")