library(fpp3)
library(quantmod)
library(tidyverse)
library(fredr)
remove(list=ls())ARIMA and Seasonal ARIMA (fpp3 style)
ARIMA notation
ARIMA(p,d,q) has three non-seasonal components:
p: autoregressive order (how many lagged values ofy_t)d: differencing order (how many times differenced)q: moving-average order (how many lagged errors)
For seasonal data, use:
ARIMA(p,d,q)(P,D,Q)[m]
P: seasonal AR orderD: seasonal differencing orderQ: seasonal MA orderm: seasonal period (12for monthly,4for quarterly)
Example: ARIMA(0,1,1)(0,1,1)[12] is a classic monthly seasonal model.
What AR, MA, and differencing do
ARterms capture persistence in values: the series depends on its own past levels (or past differenced values).MAterms capture persistence in shocks: the series depends on past forecast errors (surprises).- Differencing is needed because ARIMA works best on a roughly stationary series.
d = 1removes non-seasonal trend-like nonstationarity.D = 1with periodmremoves seasonal nonstationarity.
In short:
- AR: momentum in values
- MA: echo of past shocks
- Differencing: makes the series stable enough for AR/MA modeling
Using ACF and PACF to identify terms
These are practical starting rules (then confirm with diagnostics and forecast accuracy):
- If
PACFcuts off at lagpandACFtails off, suggest AR(p). - If
ACFcuts off at lagqandPACFtails off, suggest MA(q). - If both tail off, try mixed ARMA(
p,q) candidates.
For seasonal structure (period m):
- Seasonal spikes in
ACFatm, 2m, ...suggest seasonal MA (Q). - Seasonal spikes in
PACFatm, 2m, ...suggest seasonal AR (P).
Setup
Example 1: Non-seasonal ARIMA from FRED
fredr_set_key("8a9ec1330374c1696f05cc8e526233b5") # replace with your own key please
?fredr
unrate <- fredr(
series_id = "UNRATE"
) %>%
transmute(
Month = yearmonth(date),
value = value
) %>%
as_tsibble(index = Month)
unrate %>%
autoplot(value) +
labs(
title = "UNRATE from FRED",
subtitle = "Unemployment rate (monthly, weak seasonality)",
y = "Percent",
x = "Month"
)Fit a non-seasonal ARIMA in FPP3 style:
test_n <- 24
unrate_n <- nrow(unrate)
unrate_train <- unrate %>% slice_head(n = unrate_n - test_n)
unrate_test <- unrate %>% slice_tail(n = test_n)
fit_arima <- unrate_train %>%
model(arima = ARIMA(value))
fit_arimaParameter explanation for this example:
p: non-seasonal AR lags selected byARIMA()d: non-seasonal differences selected byARIMA()q: non-seasonal MA lags selected byARIMA()
fit_arima %>% report()Series: value
Model: ARIMA(1,1,1)
Coefficients:
ar1 ma1
-0.7662 0.8210
s.e. 0.1070 0.0943
sigma^2 estimated as 0.1745: log likelihood=-497.48
AIC=1000.96 AICc=1000.98 BIC=1015.41
Interpreting ARIMA(1,1,1) coefficients
ARIMA(1,1,1) means:
d = 1: model is fit to first differences,w_t = y_t - y_{t-1}p = 1: one AR term on differenced seriesq = 1: one MA term on differenced series
Model form on the differenced series:
w_t = c + phi_1 * w_{t-1} + e_t + theta_1 * e_{t-1}
phi_1(ar1coefficient): persistence of the differenced processtheta_1(ma1coefficient): correction from last period shock/errorc(constant; drift can be implied): average change per period
fit_111 <- unrate_train %>%
model(arima_111 = ARIMA(value ~ pdq(1,1,1)))
fit_111 %>% report()Series: value
Model: ARIMA(1,1,1)
Coefficients:
ar1 ma1
-0.7662 0.8210
s.e. 0.1070 0.0943
sigma^2 estimated as 0.1745: log likelihood=-497.48
AIC=1000.96 AICc=1000.98 BIC=1015.41
tidy(fit_111)Example 2: Seasonal ARIMA from FRED
# Strongly seasonal monthly series (NSA)
houst <- fredr_series_observations(series_id = "HOUSTNSA") %>%
transmute(
Month = yearmonth(date),
value = value
) %>%
as_tsibble(index = Month)
houst %>%
autoplot(value) +
labs(
title = "HOUSTNSA from FRED",
subtitle = "Housing starts (monthly, strong seasonality)",
y = "Thousands of units",
x = "Month"
)Fit one explicit seasonal ARIMA:
houst_n <- nrow(houst)
houst_train <- houst %>% slice_head(n = houst_n - test_n)
houst_test <- houst %>% slice_tail(n = test_n)
fit_sarima <- houst_train %>%
model(sarima = ARIMA(value ~ pdq(0,1,1) + PDQ(0,1,1)))
fit_sarimafit_sarima %>% report()Series: value
Model: ARIMA(0,1,1)(0,1,1)[12]
Coefficients:
ma1 sma1
-0.3173 -0.8649
s.e. 0.0341 0.0190
sigma^2 estimated as 113.3: log likelihood=-2913.39
AIC=5832.78 AICc=5832.81 BIC=5846.71
Parameter explanation for this example:
- Non-seasonal part
pdq(0,1,1):p = 0: no non-seasonal AR termd = 1: first difference to remove trendq = 1: one non-seasonal MA term
- Seasonal part
PDQ(0,1,1)with monthly seasonality:P = 0: no seasonal AR termD = 1: one seasonal difference (lag 12)Q = 1: one seasonal MA term[m = 12]: monthly seasonal period
Seasonal vs non-seasonal on the seasonal series
compare_models <- houst_train %>%
model(
arima_nonseasonal = ARIMA(value ~ pdq() + PDQ(0,0,0)),
seasonal_arima = ARIMA(value ~ pdq(0,1,1) + PDQ(0,1,1))
)
fc_compare <- compare_models %>% forecast(h = test_n)
acc_compare <- accuracy(fc_compare, houst_test) %>%
select(.model, RMSE, MAE, MAPE) %>%
arrange(RMSE)
acc_compareWhy can non-seasonal ARIMA do better here?
best_model <- acc_compare$.model[1]
full_seasonal_strength <- houst %>% features(value, feat_stl) %>% pull(seasonal_strength_year)
recent_seasonal_strength <- houst_train %>%
slice_tail(n = 120) %>%
features(value, feat_stl) %>%
pull(seasonal_strength_year)
cat("Best model on this holdout:", best_model, "\n")Best model on this holdout: seasonal_arima
cat("Seasonal strength (full sample):", round(full_seasonal_strength, 3), "\n")Seasonal strength (full sample): 0.873
cat("Seasonal strength (last 10 years of train):", round(recent_seasonal_strength, 3), "\n")Seasonal strength (last 10 years of train): 0.657
if (best_model == "arima_nonseasonal") {
cat("\nInterpretation:\n")
cat("1) Seasonal ARIMA here is fixed at (0,1,1)(0,1,1)[12], while non-seasonal ARIMA is auto-selected.\n")
cat("2) In recent years, level shifts/volatility can dominate seasonality, so seasonal differencing may over-difference.\n")
cat("3) With a short test window (24 months), the simpler model can have lower forecast error by chance.\n")
} else {
cat("\nInterpretation: seasonal structure in this series is helping forecast accuracy on this holdout.\n")
}
Interpretation: seasonal structure in this series is helping forecast accuracy on this holdout.
autoplot(fc_compare, houst) +
labs(
title = "HOUSTNSA: ARIMA vs Seasonal ARIMA",
subtitle = "Seasonal ARIMA should usually perform better on this seasonal series",
y = "Thousands of units",
x = "Month"
) +
theme(legend.position = "bottom")Appendix: ARIMA vs ETS
Conceptual comparison
- ARIMA/SARIMA models autocorrelation explicitly using AR and MA terms, after differencing.
- ETS models level/trend/seasonality components with exponential smoothing recursions.
- ARIMA is often strong when serial correlation structure is key.
- ETS is often strong when evolving level/trend/seasonality dominates.
- In practice, compare both on holdout accuracy.
Empirical comparison on the same seasonal FRED series
ets_vs_arima <- houst_train %>%
model(
seasonal_arima = ARIMA(value ~ pdq(0,1,1) + PDQ(0,1,1)),
ets = ETS(value)
)
fc_ets_vs_arima <- ets_vs_arima %>% forecast(h = test_n)
accuracy(fc_ets_vs_arima, houst_test) %>%
select(.model, RMSE, MAE, MAPE) %>%
arrange(RMSE)autoplot(fc_ets_vs_arima, houst) +
labs(
title = "HOUSTNSA: Seasonal ARIMA vs ETS",
subtitle = "Compare both families; choose by out-of-sample performance",
y = "Thousands of units",
x = "Month"
) +
theme(legend.position = "bottom")