Module 3 Discussion: Stationarity, ACF/PACF, Decomposition (+ Bonus Modulation)

library(fpp3)
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr
── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
✔ tibble      3.3.1     ✔ tsibble     1.1.6
✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
✔ tidyr       1.3.2     ✔ feasts      0.4.2
✔ lubridate   1.9.4     ✔ fable       0.4.1
✔ ggplot2     4.0.1     
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
library(tseries)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
library(ggtime)
Registered S3 methods overwritten by 'ggtime':
  method                  from      
  +.gg_tsensemble         feasts    
  autolayer.tbl_ts        fabletools
  autoplot.dcmp_ts        fabletools
  autoplot.tbl_ts         fabletools
  grid.draw.gg_tsensemble feasts    
  print.gg_tsensemble     feasts    

Attaching package: 'ggtime'
The following objects are masked from 'package:feasts':

    gg_arma, gg_irf, gg_lag, gg_season, gg_subseries, gg_tsdisplay,
    gg_tsresiduals
library(purrr)

0. Series selection (3 time series)

# 1) Monthly US employment (Retail Trade)
ts1 <- us_employment |>
  filter(Title == "Retail Trade") |>
  select(Month, value = Employed)

# 2) Quarterly beer production (Australia)
ts2 <- aus_production |>
  select(Quarter, value = Beer)

# 3) Monthly PBS scripts (ATC2 = A10)
ts3 <- PBS |>
  filter(ATC2 == "A10") |>
  summarise(value = sum(Scripts)) |>
  as_tsibble(index = Month)

series_list <- list(
  "US employment: Retail Trade (monthly)" = ts1,
  "Australia production: Beer (quarterly)" = ts2,
  "PBS: ATC2 A10 scripts (monthly)" = ts3
)

# Seasonal lag m for each series (needed for seasonal differencing)
seasonal_lags <- c(
  "US employment: Retail Trade (monthly)" = 12,
  "Australia production: Beer (quarterly)" = 4,
  "PBS: ATC2 A10 scripts (monthly)" = 12
)

1. Stationarity (Visual + ADF + KPSS)

Stationary means the statistical properties do not change over time: - Constant mean over time - Constant variance over time - Autocovariance depends only on lag (not on calendar time)

ADF null hypothesis (H0): unit root (non-stationary).
KPSS null hypothesis (H0): stationary.

run_tests <- function(x_tbl) {
  x <- as.numeric(x_tbl$value)
  x <- x[is.finite(x)]
  list(
    adf  = tseries::adf.test(x),
    kpss = tseries::kpss.test(x)
  )
}

for (nm in names(series_list)) {
  cat("\n\n##", nm, "\n")
  s <- series_list[[nm]]

  print(autoplot(s, value) + labs(title = paste("Original:", nm)))

  tst <- run_tests(s)
  cat("\nADF (H0: unit root / non-stationary): p =", tst$adf$p.value, "\n")
  cat("KPSS (H0: stationary): p =", tst$kpss$p.value, "\n")

  cat("ADF decision (5%):",
      ifelse(tst$adf$p.value < 0.05, "Reject H0 (evidence stationary)", "Fail to reject H0 (evidence non-stationary)"),
      "\n")
  cat("KPSS decision (5%):",
      ifelse(tst$kpss$p.value < 0.05, "Reject H0 (evidence non-stationary)", "Fail to reject H0 (evidence stationary)"),
      "\n")
}


## US employment: Retail Trade (monthly) 

Warning in tseries::kpss.test(x): p-value smaller than printed p-value

ADF (H0: unit root / non-stationary): p = 0.9735658 
KPSS (H0: stationary): p = 0.01 
ADF decision (5%): Fail to reject H0 (evidence non-stationary) 
KPSS decision (5%): Reject H0 (evidence non-stationary) 


## Australia production: Beer (quarterly) 

Warning in tseries::kpss.test(x): p-value smaller than printed p-value

ADF (H0: unit root / non-stationary): p = 0.8769772 
KPSS (H0: stationary): p = 0.01 
ADF decision (5%): Fail to reject H0 (evidence non-stationary) 
KPSS decision (5%): Reject H0 (evidence non-stationary) 


## PBS: ATC2 A10 scripts (monthly) 

Warning in tseries::adf.test(x): p-value smaller than printed p-value
Warning in tseries::adf.test(x): p-value smaller than printed p-value

ADF (H0: unit root / non-stationary): p = 0.01 
KPSS (H0: stationary): p = 0.01 
ADF decision (5%): Reject H0 (evidence stationary) 
KPSS decision (5%): Reject H0 (evidence non-stationary) 

2. Make stationary, then ACF/PACF + AR(p)/MA(q) interpretation

ACF measures correlation between (y_t) and (y_{t-k}).
PACF measures partial correlation between (y_t) and (y_{t-k}) controlling for intermediate lags.

ACF/PACF must be applied to a stationary series. Here, we choose differencing via unitroot_nsdiffs (seasonal) and unitroot_ndiffs (non-seasonal), then apply differencing before plotting ACF/PACF.

Interpretation cues: - PACF cutoff after lag p + ACF tails off → AR(p) - ACF cutoff after lag q + PACF tails off → MA(q) - Both tail off → ARMA(p,q) - Spikes at seasonal lag m (and multiples) → remaining seasonal dependence

for (nm in names(series_list)) {
  cat("\n\n##", nm, "\n")
  s <- series_list[[nm]]
  m <- seasonal_lags[[nm]]

  D <- s %>% features(value, unitroot_nsdiffs) %>% pull(nsdiffs)
  d <- s %>% features(value, unitroot_ndiffs) %>% pull(ndiffs)

  cat("Suggested seasonal differences D =", D, "(seasonal lag m =", m, ")\n")
  cat("Suggested non-seasonal differences d =", d, "\n")

  s_stat <- s
  if (D > 0) s_stat <- s_stat |> mutate(value = difference(value, lag = m))
  if (d > 0) s_stat <- s_stat |> mutate(value = difference(value, differences = d))
  s_stat <- s_stat |> filter(!is.na(value))

  print(ggtime::gg_tsdisplay(
    s_stat, value,
    plot_type = "partial"
  ) + labs(title = paste("Stationary version + ACF/PACF:", nm)))
}


## US employment: Retail Trade (monthly) 
Suggested seasonal differences D = 1 (seasonal lag m = 12 )
Suggested non-seasonal differences d = 1 



## Australia production: Beer (quarterly) 
Suggested seasonal differences D = 1 (seasonal lag m = 4 )
Suggested non-seasonal differences d = 1 



## PBS: ATC2 A10 scripts (monthly) 
Suggested seasonal differences D = 1 (seasonal lag m = 12 )
Suggested non-seasonal differences d = 1 

3. Decomposition (STL) on original series

for (nm in names(series_list)) {
  cat("\n\n##", nm, "\n")
  s <- series_list[[nm]]

  fit <- s |> model(stl = STL(value))
  comps <- fit |> components()

  print(autoplot(comps) + labs(title = paste("STL decomposition:", nm)))
}


## US employment: Retail Trade (monthly) 
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggtime package.
  Please report the issue to the authors.



## Australia production: Beer (quarterly) 



## PBS: ATC2 A10 scripts (monthly) 

BONUS. Modulation section (trend, seasonality, stationarity, AR(p)) — using the SAME three datasets

Robust handling note: components() for STL may name seasonal columns as seasonal, season_year, etc.
This bonus sums any columns that start with season into a single seasonal component. If none exist, seasonal is treated as 0.

set.seed(123)

build_modulated_tbl <- function(s, m) {
  idx_name <- tsibble::index_var(s)  # e.g., "Month" or "Quarter"

  fit <- s |> model(stl = STL(value))
  comps <- fit |> components()

  n <- nrow(comps)

  tr <- comps$trend
  re <- comps$remainder

  # Seasonal component may be named differently (e.g., "season_year")
  season_cols <- grep("^season", names(comps), value = TRUE)
  if (length(season_cols) == 0) {
    se <- rep(0, n)
  } else if (length(season_cols) == 1) {
    se <- comps[[season_cols[1]]]
  } else {
    se <- Reduce(`+`, lapply(season_cols, function(cc) comps[[cc]]))
  }

  # Noise scale from remainder
  noise_sd <- sd(re, na.rm = TRUE)
  if (!is.finite(noise_sd) || noise_sd == 0) noise_sd <- sd(comps$value, na.rm = TRUE)

  eps <- rnorm(n, 0, noise_sd)

  # Estimate AR(1) phi from remainder (fallback if it fails)
  phi <- 0.5
  ar_sd <- noise_sd
  re_clean <- re[is.finite(re)]
  if (length(re_clean) > 20) {
    ar_fit <- try(stats::arima(re_clean, order = c(1,0,0)), silent = TRUE)
    if (!inherits(ar_fit, "try-error")) {
      phi_hat <- as.numeric(ar_fit$coef[1])
      if (is.finite(phi_hat)) phi <- max(min(phi_hat, 0.99), -0.99)
      sigma_hat <- sqrt(ar_fit$sigma2)
      if (is.finite(sigma_hat) && sigma_hat > 0) ar_sd <- sigma_hat
    }
  }

  ar_series <- as.numeric(stats::arima.sim(n = n, list(ar = phi), sd = ar_sd))

  wide_tbl <- tibble(
    idx = comps[[idx_name]],
    `Trend only` = tr + eps,
    `Season only` = se + eps,
    `Trend + season` = tr + se + eps,
    `AR(1) only (stationary)` = ar_series
  ) |>
    rename(!!idx_name := idx)

  wide_tbl |>
    pivot_longer(-all_of(idx_name), names_to = "Series", values_to = "value") |>
    as_tsibble(index = !!idx_name, key = Series)
}

for (dataset_nm in names(series_list)) {
  cat("\n\n# BONUS dataset:", dataset_nm, "\n")
  s <- series_list[[dataset_nm]]
  m <- seasonal_lags[[dataset_nm]]

  mod_tbl <- build_modulated_tbl(s, m)

  # 1) Time plot
  print(
    autoplot(mod_tbl, value) +
      facet_wrap(~Series, scales = "free_y") +
      labs(title = paste("BONUS:", dataset_nm, "- Modulated series (original)"))
  )

  # 2) Stationarity tests on each modulated series
  for (ser_nm in unique(mod_tbl$Series)) {
    cat("\n\n###", ser_nm, "\n")
    x <- mod_tbl |> filter(Series == ser_nm) |> pull(value)

    adf  <- tseries::adf.test(x)
    kpss <- tseries::kpss.test(x)

    cat("ADF (H0: unit root / non-stationary): p =", adf$p.value, "\n")
    cat("KPSS (H0: stationary): p =", kpss$p.value, "\n")
  }

  # 3) ACF/PACF on originals
  for (ser_nm in unique(mod_tbl$Series)) {
    s_sub <- mod_tbl |> filter(Series == ser_nm)
    print(
      ggtime::gg_tsdisplay(s_sub, value, plot_type = "partial") +
        labs(title = paste("BONUS:", dataset_nm, "- ACF/PACF (original):", ser_nm))
    )
  }

  # 4) Suggest differencing per modulated series
  diff_params <- tibble(Series = unique(mod_tbl$Series)) |>
    mutate(
      D = map_dbl(Series, ~{
        tmp <- mod_tbl |> filter(Series == .x)
        tmp %>% features(value, unitroot_nsdiffs) %>% pull(nsdiffs)
      }),
      d = map_dbl(Series, ~{
        tmp <- mod_tbl |> filter(Series == .x)
        tmp %>% features(value, unitroot_ndiffs) %>% pull(ndiffs)
      })
    )

  print(diff_params)

  # Apply differencing separately per modulated series
  mod_stat_list <- map2(diff_params$Series, seq_len(nrow(diff_params)), ~{
    ser_nm <- .x
    i <- .y
    D_val <- diff_params$D[i]
    d_val <- diff_params$d[i]

    tmp <- mod_tbl |> filter(Series == ser_nm)
    if (D_val > 0) tmp <- tmp |> mutate(value = difference(value, lag = m))
    if (d_val > 0) tmp <- tmp |> mutate(value = difference(value, differences = d_val))
    tmp |> filter(!is.na(value))
  })

  mod_stat2 <- bind_rows(mod_stat_list)

  print(
    autoplot(mod_stat2, value) +
      facet_wrap(~Series, scales = "free_y") +
      labs(title = paste("BONUS:", dataset_nm, "- Modulated series after suggested differencing"))
  )

  for (ser_nm in unique(mod_stat2$Series)) {
    s_sub <- mod_stat2 |> filter(Series == ser_nm)
    print(
      ggtime::gg_tsdisplay(s_sub, value, plot_type = "partial") +
        labs(title = paste("BONUS:", dataset_nm, "- ACF/PACF (after differencing):", ser_nm))
    )
  }
}


# BONUS dataset: US employment: Retail Trade (monthly) 



### AR(1) only (stationary) 
Warning in tseries::adf.test(x): p-value smaller than printed p-value
Warning in tseries::kpss.test(x): p-value greater than printed p-value
ADF (H0: unit root / non-stationary): p = 0.01 
KPSS (H0: stationary): p = 0.1 


### Season only 
Warning in tseries::kpss.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.9717872 
KPSS (H0: stationary): p = 0.01 


### Trend + season 
Warning in tseries::kpss.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.9825512 
KPSS (H0: stationary): p = 0.01 


### Trend only 
Warning in tseries::kpss.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.6889482 
KPSS (H0: stationary): p = 0.01 

# A tibble: 4 × 3
  Series                      D     d
  <chr>                   <dbl> <dbl>
1 AR(1) only (stationary)     0     0
2 Season only                 1     1
3 Trend + season              1     1
4 Trend only                  0     2



# BONUS dataset: Australia production: Beer (quarterly) 



### AR(1) only (stationary) 
Warning in tseries::adf.test(x): p-value smaller than printed p-value
Warning in tseries::adf.test(x): p-value greater than printed p-value
ADF (H0: unit root / non-stationary): p = 0.01 
KPSS (H0: stationary): p = 0.1 


### Season only 
Warning in tseries::kpss.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.8968407 
KPSS (H0: stationary): p = 0.01 


### Trend + season 
Warning in tseries::kpss.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.9130321 
KPSS (H0: stationary): p = 0.01 


### Trend only 
Warning in tseries::kpss.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.9310831 
KPSS (H0: stationary): p = 0.01 

# A tibble: 4 × 3
  Series                      D     d
  <chr>                   <dbl> <dbl>
1 AR(1) only (stationary)     0     0
2 Season only                 1     1
3 Trend + season              1     1
4 Trend only                  0     2



# BONUS dataset: PBS: ATC2 A10 scripts (monthly) 



### AR(1) only (stationary) 
Warning in tseries::adf.test(x): p-value smaller than printed p-value
Warning in tseries::adf.test(x): p-value greater than printed p-value
ADF (H0: unit root / non-stationary): p = 0.01 
KPSS (H0: stationary): p = 0.1 


### Season only 
Warning in tseries::adf.test(x): p-value smaller than printed p-value
Warning in tseries::kpss.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.01 
KPSS (H0: stationary): p = 0.01 


### Trend + season 
Warning in tseries::adf.test(x): p-value smaller than printed p-value
Warning in tseries::adf.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.01 
KPSS (H0: stationary): p = 0.01 


### Trend only 
Warning in tseries::kpss.test(x): p-value smaller than printed p-value
ADF (H0: unit root / non-stationary): p = 0.1840766 
KPSS (H0: stationary): p = 0.01 

# A tibble: 4 × 3
  Series                      D     d
  <chr>                   <dbl> <dbl>
1 AR(1) only (stationary)     0     0
2 Season only                 1     1
3 Trend + season              1     1
4 Trend only                  0     1

Citation

This workflow follows standard practice: transform to stationarity before interpreting ACF/PACF, and use decomposition to diagnose trend and seasonality (Hyndman and Khandakar 2008).

References

Hyndman, Rob J., and Yeasmin Khandakar. 2008. “Automatic Time Series Forecasting: The Forecast Package for R.” Journal of Statistical Software 27 (3). https://doi.org/10.18637/jss.v027.i03.