Time Series Analysis: Stationarity, ACF/PACF, and Decomposition

Author

Jincheng Xie

Setup

library(fpp3)
library(fredr)
library(tidyverse)
library(patchwork)
library(knitr)
library(kableExtra)


fredr_set_key("bdfddd4beeaf18c36abe8754e4f3929b")

1. Stationarity Analysis

Choose three different time series: 1. HOUST: New Privately-Owned Housing Units Started (Housing Starts) 2. UNRATE: Unemployment Rate 3. INDPRO: Industrial Production: Total Index

# Import data
series_ids <- c("HOUST", "UNRATE", "INDPRO")

raw_data <- map_dfr(series_ids, function(id) {
  fredr(series_id = id,
        observation_start = as.Date("1990-01-01"),
        observation_end   = as.Date("2023-12-01")) |>
    mutate(Series = id)
})

# Convert to tsibble
multi_ts <- raw_data |>
  mutate(Month = yearmonth(date)) |>
  select(Month, Series, value) |>
  as_tsibble(index = Month, key = Series)

# Visualize the time series
multi_ts |>
  autoplot(value) +
  facet_wrap(~Series, scales = "free_y", ncol = 1) +
  labs(title = "Time Series Plots", y = "Value") +
  theme_minimal()

Interpretation and Formal Tests

Null Hypotheses: * KPSS Test: \(H_0\): The series is stationary. (Small p-value -> Reject -> Non-stationary). * ADF Test: \(H_0\): The series has a unit root (is non-stationary). (Small p-value -> Reject -> Stationary).

# Perform KPSS Test
kpss_results <- multi_ts |>
  features(value, unitroot_kpss)

# Display results
kpss_results |>
  kable(caption = "KPSS Unit Root Test Results") |>
  kable_styling(full_width = F)

KPSS Unit Root Test Results
Series	kpss_stat	kpss_pvalue
HOUST	0.9739958	0.0100000
INDPRO	5.2652691	0.0100000
UNRATE	0.4367011	0.0613357

Analysis:

HOUST: The KPSS test yields a p-value of 0.01, which is less than the 0.05 threshold. Therefore, we reject the null hypothesis. This confirms that the series is non-stationary, consistent with the visual evidence of volatility and cycles.
INDPRO: The p-value is 0.01 (< 0.05). We reject the null hypothesis. The series is non-stationary, which is expected given the strong upward trend observed in the plot.
UNRATE: The p-value is 0.061, which is slightly greater than 0.05. Statistically, we fail to reject the null hypothesis at the 5% level, suggesting the series might be stationary around a mean. However, the p-value is very close to the threshold, and visually the series shows strong cyclical persistence. In practice, we often treat unemployment rates as non-stationary or requiring differencing to handle these economic cycles.

2. ACF and PACF Analysis

# Check required differencing
diff_needs <- multi_ts |>
  features(value, list(unitroot_ndiffs, unitroot_nsdiffs))

diff_needs |>
  kable(caption = "Required Differencing (ndiffs = regular, nsdiffs = seasonal)") |>
  kable_styling(full_width = F)

Required Differencing (ndiffs = regular, nsdiffs = seasonal)
Series	ndiffs	nsdiffs
HOUST	1	0
INDPRO	1	0
UNRATE	0	0

We will apply differencing based on these results to make the series stationary before plotting ACF/PACF.

# Function to plot ACF/PACF for a potentially differenced series
plot_acf_pacf <- function(data, series_name) {
  ts_data <- data |> filter(Series == series_name)
  
  ts_data |>
    gg_tsdisplay(difference(value), plot_type = "partial", lag_max = 36) +
    labs(title = paste(series_name, "(First Differenced)"))
}

# Plot for each series
p1 <- plot_acf_pacf(multi_ts, "HOUST")

p2 <- plot_acf_pacf(multi_ts, "UNRATE")

p3 <- plot_acf_pacf(multi_ts, "INDPRO")

p1
p2
p3

Interpretation:

General Observation: The ACF and PACF plots allow us to identify the internal structure of the time series after differencing.
INDPRO (First Differenced): Looking at the generated plots, there are distinct, significant spikes at Lag 12, 24, and 36 in the ACF. This tells us that strong seasonality persists even after applying first differencing. The data is not yet fully stationary and likely requires seasonal differencing (or a seasonal model).
HOUST (First Differenced): Similar to INDPRO, the ACF plot for Housing Starts shows clear significant spikes at seasonal intervals (Lag 12). This confirms that housing starts follow a strict annual pattern (weather-dependent) that a simple trend difference cannot remove.
UNRATE (First Differenced): The PACF plot typically shows a sharp cutoff after Lag 1 or Lag 2, while the ACF decays more gradually. This pattern suggests that the Unemployment Rate behaves like an AR(p) process (likely AR(1) or AR(2)), meaning the current rate is strongly correlated with the rate from the immediate previous months.

3. Decomposition

We apply STL decomposition to analyze Trend, Seasonality, and Remainder.

# Apply STL Decomposition
decomp_models <- multi_ts |>
  model(STL(value ~ season(window = "periodic")))

# Visualize components for HOUST
components(decomp_models) |>
  filter(Series == "HOUST") |>
  autoplot() +
  labs(title = "STL Decomposition: Housing Starts (HOUST)")

# Visualize components for UNRATE
components(decomp_models) |>
  filter(Series == "UNRATE") |>
  autoplot() +
  labs(title = "STL Decomposition: Unemployment Rate (UNRATE)")

# Visualize components for INDPRO
components(decomp_models) |>
  filter(Series == "INDPRO") |>
  autoplot() +
  labs(title = "STL Decomposition: Industrial Production (INDPRO)")

Trend:
- INDPRO: The decomposition reveals a clear, smooth upward trend over the decades, interrupted only by brief dips during economic recessions (e.g., 2008, 2020).
- HOUST: The trend is not linear. It shows a dramatic “boom and bust” cycle, specifically the massive structural collapse around 2008 (the Subprime Mortgage Crisis) and a slow recovery afterward.
- UNRATE: The trend shows cyclical behavior (counter-cyclical to the economy) rather than a straight line. There is a massive, sharp spike in 2020 due to the COVID-19 pandemic.
Seasonality:
- All three series exhibit distinct, repetitive wave-like patterns in the season_year row.

HOUST: The seasonal variation is very strong and consistent (peaks in warmer months, drops in winter), which is typical for construction data.
INDPRO & UNRATE: Also show clear annual seasonality, confirming that monthly economic data is almost always affected by the time of year.

Remainder:
- The graphs explicitly show that the mean is changing over time (due to the Trends) and there are periodic patterns (Seasonality).
- Therefore, the decomposition visually confirms that the raw data for all three series is non-stationary. To model this data, we must remove these components (e.g., via differencing or seasonal adjustment).