Discussion 9

Author

Aryamani Boruah

Setup

library(tidyverse)
library(fpp3)       # tsibble, fable, feasts
library(fredr)      # FRED API
library(tseries)    
library(lmtest)     


fredr_set_key("f95eac3f44d6965f475a2026b5216b1b")

Data: FRED Monthly Series (2000–2023)

I use two monthly FRED series:

  • UNRATE — U.S. civilian unemployment rate (%), seasonally adjusted
  • CPIAUCSL — Consumer Price Index for All Urban Consumers (index, seasonally adjusted)

Unemployment is the target (y_t); CPI is the exogenous predictor (X_t) in dynamic regression.

# Pull from FRED
unrate_raw <- fredr(series_id = "UNRATE",
                    observation_start = as.Date("2000-01-01"),
                    observation_end   = as.Date("2023-12-01"),
                    frequency = "m")

cpi_raw <- fredr(series_id = "CPIAUCSL",
                 observation_start = as.Date("2000-01-01"),
                 observation_end   = as.Date("2023-12-01"),
                 frequency = "m")

# Clean and join
df <- unrate_raw %>%
  select(date, unrate = value) %>%
  left_join(cpi_raw %>% select(date, cpi = value), by = "date") %>%
  mutate(
    yearmonth = yearmonth(date),
    # Log CPI is more linear and closer to stationary after differencing
    log_cpi = log(cpi)
  ) %>%
  select(yearmonth, unrate, cpi, log_cpi)

# Convert to tsibble
ts_data <- as_tsibble(df, index = yearmonth)

head(ts_data, 6)
# A tsibble: 6 x 4 [1M]
  yearmonth unrate   cpi log_cpi
      <mth>  <dbl> <dbl>   <dbl>
1  2000 Jan    4    169.    5.13
2  2000 Feb    4.1  170     5.14
3  2000 Mar    4    171     5.14
4  2000 Apr    3.8  171.    5.14
5  2000 May    4    171.    5.14
6  2000 Jun    4    172.    5.15
# Visualize both series
ts_data %>%
  pivot_longer(cols = c(unrate, log_cpi), names_to = "series", values_to = "value") %>%
  autoplot(value) +
  facet_wrap(~series, scales = "free_y", ncol = 1,
             labeller = labeller(series = c(unrate = "Unemployment Rate (%)",
                                            log_cpi = "Log CPI"))) +
  labs(title = "Monthly U.S. Unemployment Rate and Log CPI (2000–2023)",
       x = NULL, y = NULL) +
  theme_minimal() +
  theme(legend.position = "none")


Train/Test Split

train <- ts_data %>% filter(yearmonth < yearmonth("2019 Jan"))
test  <- ts_data %>% filter(yearmonth >= yearmonth("2019 Jan"))

cat("Train:", nrow(train), "obs | Test:", nrow(test), "obs\n")
Train: 228 obs | Test: 60 obs
cat("Train period:", format(min(train$yearmonth)), "to", format(max(train$yearmonth)), "\n")
Train period: 2000 Jan to 2018 Dec 
cat("Test period: ", format(min(test$yearmonth)),  "to", format(max(test$yearmonth)),  "\n")
Test period:  2019 Jan to 2023 Dec 

Part I: ETS vs ARIMA

Fit ETS and ARIMA on Training Data

models_basic <- train %>%
  model(
    ETS   = ETS(unrate),
    ARIMA = ARIMA(unrate)
  )

report(models_basic %>% select(ETS))
Series: unrate 
Model: ETS(M,Ad,N) 
  Smoothing parameters:
    alpha = 0.7426768 
    beta  = 0.1799101 
    phi   = 0.9542215 

  Initial states:
     l[0]         b[0]
 4.019116 -0.001947149

  sigma^2:  6e-04

     AIC     AICc      BIC 
366.0564 366.4365 386.6325 
report(models_basic %>% select(ARIMA))
Series: unrate 
Model: ARIMA(1,2,1)(0,0,2)[12] 

Coefficients:
          ar1      ma1     sma1     sma2
      -0.1751  -0.7663  -0.4384  -0.2366
s.e.   0.0763   0.0461   0.0639   0.0599

sigma^2 estimated as 0.01783:  log likelihood=132.65
AIC=-255.31   AICc=-255.04   BIC=-238.21

Forecast on Test Set

fc_basic <- models_basic %>%
  forecast(h = nrow(test))

# Plot forecasts vs actuals
fc_basic %>%
  autoplot(test, level = 95) +
  autolayer(train %>% tail(48), unrate, colour = "gray50") +
  labs(title = "ETS vs ARIMA: Unemployment Rate Forecast",
       subtitle = "95% prediction intervals shown",
       x = NULL, y = "Unemployment Rate (%)") +
  theme_minimal()

Performance Metrics: ETS vs ARIMA

acc_basic <- accuracy(fc_basic, test)
acc_basic %>%
  select(.model, RMSE, MAE, MAPE, MASE) %>%
  arrange(RMSE) %>%
  knitr::kable(digits = 4, caption = "Test Set Accuracy: ETS vs ARIMA")
Test Set Accuracy: ETS vs ARIMA
.model RMSE MAE MAPE MASE
ETS 2.5159 1.3477 19.8518 NaN
ARIMA 2.7967 1.7198 27.5813 NaN

Are ETS and ARIMA Equivalent?

Per Hyndman & Athanasopoulos (FPP3, §9.10), ETS and ARIMA are not equivalent in general, but they overlap for certain special cases:

  • Some exponential smoothing models (ETS) can be written as ARIMA models, but the converse is not always true.
  • ETS operates in the state-space framework with multiplicative error structures; ARIMA models linear dependence in differences.
  • For this unemployment series (which is roughly stationary with autocorrelation), an ARIMA(p,d,q) and ETS(A,N,N) (simple exponential smoothing) may produce similar fits, but more complex seasonal or error structures diverge.
# Show selected model structures
cat("ETS model:\n")
ETS model:
models_basic %>% select(ETS) %>% report()
Series: unrate 
Model: ETS(M,Ad,N) 
  Smoothing parameters:
    alpha = 0.7426768 
    beta  = 0.1799101 
    phi   = 0.9542215 

  Initial states:
     l[0]         b[0]
 4.019116 -0.001947149

  sigma^2:  6e-04

     AIC     AICc      BIC 
366.0564 366.4365 386.6325 
cat("\nARIMA model:\n")

ARIMA model:
models_basic %>% select(ARIMA) %>% report()
Series: unrate 
Model: ARIMA(1,2,1)(0,0,2)[12] 

Coefficients:
          ar1      ma1     sma1     sma2
      -0.1751  -0.7663  -0.4384  -0.2366
s.e.   0.0763   0.0461   0.0639   0.0599

sigma^2 estimated as 0.01783:  log likelihood=132.65
AIC=-255.31   AICc=-255.04   BIC=-238.21

Conclusion: The models produce comparable point forecasts on this series, but prediction intervals and the underlying generative assumptions differ. ARIMA tends to adapt faster to structural shifts; ETS is often more robust in stable seasonal series.


Part II: Dynamic Regression (ARIMAX)

Stationarity Check

Before using CPI as an external regressor, both series should be stationary to avoid spurious relationships.

# ADF tests on levels
cat("=== ADF Tests on Levels ===\n")
=== ADF Tests on Levels ===
cat("Unemployment rate:\n")
Unemployment rate:
print(adf.test(train$unrate))

    Augmented Dickey-Fuller Test

data:  train$unrate
Dickey-Fuller = -2.218, Lag order = 6, p-value = 0.4843
alternative hypothesis: stationary
cat("\nLog CPI:\n")

Log CPI:
print(adf.test(train$log_cpi))

    Augmented Dickey-Fuller Test

data:  train$log_cpi
Dickey-Fuller = -1.4828, Lag order = 6, p-value = 0.7932
alternative hypothesis: stationary
# First differences
train_diff <- train %>%
  mutate(
    d_unrate  = difference(unrate),
    d_log_cpi = difference(log_cpi)
  ) %>%
  filter(!is.na(d_unrate))

cat("\n=== ADF Tests on First Differences ===\n")

=== ADF Tests on First Differences ===
cat("D(Unemployment rate):\n")
D(Unemployment rate):
print(adf.test(train_diff$d_unrate))

    Augmented Dickey-Fuller Test

data:  train_diff$d_unrate
Dickey-Fuller = -2.8907, Lag order = 6, p-value = 0.2018
alternative hypothesis: stationary
cat("\nD(Log CPI):\n")

D(Log CPI):
print(adf.test(train_diff$d_log_cpi))

    Augmented Dickey-Fuller Test

data:  train_diff$d_log_cpi
Dickey-Fuller = -5.993, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Fit Dynamic Regression Models

The dynamic regression (ARIMAX) framework:

\[y_t = \beta_0 + \beta_1 X_t + u_t, \quad u_t \sim \text{ARIMA}(p, d, q)\]

where u_t captures serial dependence not explained by the regressor.

# Build xreg matrix for test set (same variable used for forecasting)
# IMPORTANT: the external regressor must be available for the forecast horizon too

models_all <- train %>%
  model(
    ETS           = ETS(unrate),
    ARIMA         = ARIMA(unrate),
    DynReg_CPI    = ARIMA(unrate ~ log_cpi),          # ARIMAX with log CPI
    DynReg_dCPI   = ARIMA(unrate ~ difference(log_cpi)) # ARIMAX with diff(log CPI)
  )

# Inspect dynamic regression model
report(models_all %>% select(DynReg_CPI))
Series: unrate 
Model: LM w/ ARIMA(1,2,1)(0,0,2)[12] errors 

Coefficients:
          ar1      ma1     sma1     sma2  log_cpi
      -0.1727  -0.7715  -0.4396  -0.2441  -3.4445
s.e.   0.0763   0.0456   0.0642   0.0602   2.3275

sigma^2 estimated as 0.01773:  log likelihood=133.74
AIC=-255.48   AICc=-255.1   BIC=-234.96
report(models_all %>% select(DynReg_dCPI))
Series: unrate 
Model: LM w/ ARIMA(5,1,0)(1,0,0)[12] errors 

Coefficients:
         ar1     ar2     ar3     ar4     ar5     sar1  difference(log_cpi)
      0.0545  0.2403  0.0752  0.1194  0.2563  -0.2273               3.3359
s.e.  0.0647  0.0650  0.0665  0.0640  0.0644   0.0685               2.6832

sigma^2 estimated as 0.02023:  log likelihood=122.47
AIC=-228.93   AICc=-228.27   BIC=-201.53

Forecast with Future CPI Values

For dynamic regression, forecasting requires the external regressor over the test horizon. We use observed test CPI values (a “perfect foresight” scenario for benchmark comparison).

# Forecast ETS and ARIMA (no external regressor needed)
fc_ets_arima <- models_all %>%
  select(ETS, ARIMA) %>%
  forecast(h = nrow(test))

# Forecast dynamic regression models using observed test CPI
fc_dynreg <- models_all %>%
  select(DynReg_CPI, DynReg_dCPI) %>%
  forecast(new_data = test)

# Combine all forecasts
fc_all <- bind_rows(fc_ets_arima, fc_dynreg)

# Plot all four models
fc_all %>%
  autoplot(test, level = NULL) +
  autolayer(train %>% tail(36), unrate, colour = "gray60", linetype = "dashed") +
  labs(title = "All Models: Unemployment Rate Forecast (2019–2023)",
       subtitle = "Dynamic regression uses observed test CPI (oracle scenario)",
       x = NULL, y = "Unemployment Rate (%)",
       colour = "Model") +
  theme_minimal()

Performance Comparison: All Models

acc_all <- accuracy(fc_all, test)

acc_all %>%
  select(.model, RMSE, MAE, MAPE, MASE) %>%
  arrange(RMSE) %>%
  knitr::kable(digits = 4,
               caption = "Test Set Accuracy: All Models (lower = better)")
Test Set Accuracy: All Models (lower = better)
.model RMSE MAE MAPE MASE
DynReg_dCPI 2.5154 1.3797 20.6976 NaN
ETS 2.5159 1.3477 19.8518 NaN
ARIMA 2.7967 1.7198 27.5813 NaN
DynReg_CPI 2.8877 1.9084 32.4959 NaN

Model Reconciliation

# Check residuals of best dynamic regression model
models_all %>%
  select(DynReg_CPI) %>%
  gg_tsresiduals() +
  labs(title = "Residual Diagnostics: DynReg_CPI")

Reconciliation logic:

  • ARIMAX should outperform plain ARIMA when the external regressor (CPI) has genuine predictive power for unemployment after accounting for serial correlation. If it doesn’t improve, it suggests: (a) CPI is already implicitly captured by the ARIMA lag structure, (b) the regressor-outcome relationship is nonlinear or lagged in a way the model doesn’t capture, or (c) the COVID-19 shock in 2020 creates an outlier period that overwhelms the regressor signal.

  • Differenced CPI vs level CPI: Using difference(log_cpi) enforces stationarity of the regressor and guards against spurious estimation. If level CPI outperforms, it may indicate genuine cointegration rather than spuriousness.

  • ETS vs ARIMA: For unemployment, which has strong AR(1) dynamics and rare unit roots, ARIMA typically beats ETS on short-horizon forecasts.


Part III: Granger Causality

I am gonna test whether past values of CPI improve forecasts of unemployment and vice versa, using first-differenced series to satisfy the stationarity requirement.

Stationarity of Differenced Series (Reconfirm)

# Use the full dataset for Granger (more observations = more power)
df_diff <- ts_data %>%
  as_tibble() %>%
  arrange(yearmonth) %>%
  mutate(
    d_unrate  = c(NA, diff(unrate)),
    d_log_cpi = c(NA, diff(log_cpi))
  ) %>%
  filter(!is.na(d_unrate))


print(adf.test(df_diff$d_unrate))

    Augmented Dickey-Fuller Test

data:  df_diff$d_unrate
Dickey-Fuller = -7.5525, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
print(adf.test(df_diff$d_log_cpi))

    Augmented Dickey-Fuller Test

data:  df_diff$d_log_cpi
Dickey-Fuller = -4.9761, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Granger Causality Tests

# Test 1: Does CPI Granger-cause Unemployment?
cat("Does D(log_CPI) Granger-cause D(Unemployment)?")
Does D(log_CPI) Granger-cause D(Unemployment)?
granger_cpi_to_unrate <- grangertest(d_unrate ~ d_log_cpi,
                                     order = 4,
                                     data = df_diff)
print(granger_cpi_to_unrate)
Granger causality test

Model 1: d_unrate ~ Lags(d_unrate, 1:4) + Lags(d_log_cpi, 1:4)
Model 2: d_unrate ~ Lags(d_unrate, 1:4)
  Res.Df Df      F Pr(>F)
1    274                 
2    278 -4 1.7854 0.1319
# Test 2: Does Unemployment Granger-cause CPI? (reverse direction)
cat(" Does D(Unemployment) Granger-cause D(log_CPI)?")
 Does D(Unemployment) Granger-cause D(log_CPI)?
granger_unrate_to_cpi <- grangertest(d_log_cpi ~ d_unrate,
                                     order = 4,
                                     data = df_diff)
print(granger_unrate_to_cpi)
Granger causality test

Model 1: d_log_cpi ~ Lags(d_log_cpi, 1:4) + Lags(d_unrate, 1:4)
Model 2: d_log_cpi ~ Lags(d_log_cpi, 1:4)
  Res.Df Df      F Pr(>F)
1    274                 
2    278 -4 0.1114 0.9785

Sensitivity: Multiple Lag Orders

# Test across lag orders 1–8 for robustness
granger_sensitivity <- map_df(1:8, function(k) {
  g1 <- grangertest(d_unrate ~ d_log_cpi, order = k, data = df_diff)
  g2 <- grangertest(d_log_cpi ~ d_unrate, order = k, data = df_diff)
  tibble(
    lag      = k,
    p_cpi_to_unrate  = g1$`Pr(>F)`[2],
    p_unrate_to_cpi  = g2$`Pr(>F)`[2]
  )
})

granger_sensitivity %>%
  knitr::kable(digits = 4,
               col.names = c("Lag", "p: CPI → UNRATE", "p: UNRATE → CPI"),
               caption = "Granger Causality p-values by Lag Order")
Granger Causality p-values by Lag Order
Lag p: CPI → UNRATE p: UNRATE → CPI
1 0.0191 0.9862
2 0.0342 0.9087
3 0.0831 0.9593
4 0.1319 0.9785
5 0.2150 0.9776
6 0.2812 0.9722
7 0.3691 0.9902
8 0.4596 0.9966
# Visual summary
granger_sensitivity %>%
  pivot_longer(-lag, names_to = "direction", values_to = "p_value") %>%
  mutate(direction = recode(direction,
    p_cpi_to_unrate = "CPI → Unemployment",
    p_unrate_to_cpi = "Unemployment → CPI")) %>%
  ggplot(aes(x = lag, y = p_value, colour = direction)) +
  geom_line() + geom_point() +
  geom_hline(yintercept = 0.05, linetype = "dashed", colour = "red") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(title = "Granger Causality: Sensitivity to Lag Order",
       subtitle = "Dashed red line = 5% significance threshold",
       x = "Lag Order", y = "p-value", colour = "Direction") +
  theme_minimal()

Interpretation

g1_p <- grangertest(d_unrate ~ d_log_cpi, order = 4, data = df_diff)$`Pr(>F)`[2]
g2_p <- grangertest(d_log_cpi ~ d_unrate, order = 4, data = df_diff)$`Pr(>F)`[2]

cat(sprintf("CPI → Unemployment (lag=4): p = %.4f %s\n", g1_p, ifelse(g1_p < 0.05, "[SIGNIFICANT]", "[not significant]")))
CPI → Unemployment (lag=4): p = 0.1319 [not significant]
cat(sprintf("Unemployment → CPI (lag=4): p = %.4f %s\n", g2_p, ifelse(g2_p < 0.05, "[SIGNIFICANT]", "[not significant]")))
Unemployment → CPI (lag=4): p = 0.9785 [not significant]

So Basically :

  1. Statistical vs true causality: Even if CPI Granger-causes unemployment, this doesn’t mean rising prices cause unemployment. Both variables respond to common business cycle drivers (recessions, supply shocks, Fed policy). The rooster crows before sunrise — it doesn’t cause it.

  2. Stationarity is mandatory: Running Granger tests on non-stationary levels would yield spurious significance. Both series trend over time (especially log CPI), and the shared trend — not genuine predictive content — would drive the test statistic.

  3. Bidirectional Granger causality (if found) is consistent with a feedback loop between inflation and labor markets — a Phillips Curve relationship. Causality runs in both directions economically (tight labor markets push up wages/prices; high inflation can reduce real demand and employment).

  4. Lag selection matters: The p-value sensitivity plot shows whether findings are robust across lag orders or only appear at specific lags (possibly spurious). Significant findings stable across lags 1–8 are more credible.


Summary: Model Comparison

acc_all %>%
  select(.model, RMSE, MAE, MAPE, MASE) %>%
  arrange(RMSE) %>%
  mutate(Rank = row_number()) %>%
  relocate(Rank) %>%
  knitr::kable(digits = 4,
               caption = "Final Model Rankings by RMSE (Test Set)")
Final Model Rankings by RMSE (Test Set)
Rank .model RMSE MAE MAPE MASE
1 DynReg_dCPI 2.5154 1.3797 20.6976 NaN
2 ETS 2.5159 1.3477 19.8518 NaN
3 ARIMA 2.7967 1.7198 27.5813 NaN
4 DynReg_CPI 2.8877 1.9084 32.4959 NaN

Preferred model: The dynamic regression model with log CPI (DynReg_CPI) is expected to produce the lowest RMSE when CPI is genuinely informative about future unemployment. If the plain ARIMA still wins, it confirms the external regressor adds no predictive content beyond what the lag structure already captures — which is the correct null benchmark.

The COVID shock (March 2020) is a major challenge: unemployment spiked from ~3.5% to ~15% in two months, which no time series model trained on pre-2019 data would predict. Any model evaluated on the 2019–2023 test period will have artificially high error metrics due to this outlier. An honest evaluation would either exclude that period or use rolling-origin cross-validation.