Discussion 4

Author

Aryamani Boruah

library(fredr)
library(tidyverse)
library(tsibble)
library(fable)
library(feasts)
library(gridExtra)

fredr_set_key("f95eac3f44d6965f475a2026b5216b1b")
theme_set(theme_minimal())

# Fetch Gold Volatility Index (GVZ) from FRED
gold_vol <- fredr(
  series_id = "GVZCLS",
  observation_start = as.Date("2008-01-01")
)

# Aggregate daily data to monthly averages
gold_vol_ts <- gold_vol |>
  mutate(month = yearmonth(date)) |>
  group_by(month) |>
  summarise(value = mean(value, na.rm = TRUE)) |>
  ungroup() |>
  as_tsibble(index = month)

head(gold_vol_ts)

# A tsibble: 6 x 2 [1M]
     month value
     <mth> <dbl>
1 2008 Jun  25.1
2 2008 Jul  27.1
3 2008 Aug  28.5
4 2008 Sep  40.6
5 2008 Oct  55.1
6 2008 Nov  48.4

1 The Box-Cox Transformation in Time Series

1.1 What It Does

The Box-Cox transformation is a family of power transformations applied to a time series to stabilize variance — meaning it makes the variability of the series consistent across time rather than changing with the level of the series.

The transformation is defined as:

\[ w_t = \begin{cases} \log(y_t) & \text{if } \lambda = 0 \\ \frac{y_t^\lambda - 1}{\lambda} & \text{otherwise} \end{cases} \]

Where \(\lambda\) controls the strength of the transformation:

\(\lambda\)	Transformation
1	No change (original data)
0.5	Square root
0	Natural log
-1	Inverse (1/y)

1.2 Why It Matters for Time Series

In many real-world time series, when the level of the series rises, its variability rises too — a property called heteroscedasticity. Most forecasting models (ARIMA, ETS, TSLM) assume constant error variance. When that assumption is violated, prediction intervals become unreliable and parameter estimates can be distorted.

1.3 Example: CBOE Gold Volatility Index (GVZ)

The GVZ measures the implied volatility of gold prices — essentially how much the market expects gold to fluctuate. It is sourced from FRED (Series ID: GVZCLS) and aggregated from daily to monthly averages.

gold_vol_ts |>
  autoplot(value) +
  labs(
    title = "CBOE Gold Volatility Index (GVZ) - Monthly Average",
    subtitle = "2008 to present | Source: FRED (GVZCLS)",
    y = "GVZ Index",
    x = "Month"
  )

Notice that during high-volatility regimes (2008 financial crisis, 2020 COVID shock), the series spikes sharply and the swings become much larger. This is textbook heteroscedasticity — variance grows with the level of the series.

# Find optimal lambda using Guerrero method
lambda <- gold_vol_ts |>
  features(value, features = guerrero) |>
  pull(lambda_guerrero)

paste("Optimal lambda:", round(lambda, 3))

[1] "Optimal lambda: -0.799"

# Apply Box-Cox transformation
gold_bc <- gold_vol_ts |>
  mutate(value_bc = box_cox(value, lambda))

# Plot before and after
p1 <- gold_vol_ts |>
  autoplot(value) +
  labs(title = "Before: Original GVZ", y = "GVZ Index")

p2 <- gold_bc |>
  autoplot(value_bc) +
  labs(title = paste0("After: Box-Cox Transformed (λ = ", round(lambda, 3), ")"),
       y = "Transformed Values")

grid.arrange(p1, p2, ncol = 1)

After transformation, the large spikes are compressed and variance is more consistent across the series — exactly what we need before modeling.

1.4 Advantages of Box-Cox in Time Series

1. Stabilizes variance: The primary benefit. Models perform better when error variance is constant over time, and prediction intervals become more reliable.

2. Automatic lambda selection: The Guerrero method selects the optimal \(\lambda\) algorithmically, removing subjectivity.

3. Back-transformation is clean: Forecasts are back-transformed using inv_box_cox(), producing asymmetric prediction intervals that are wider on the upside — realistic for volatile series like GVZ.

4. Generalizes common transformations: Log (\(\lambda = 0\)) and square root (\(\lambda = 0.5\)) are special cases, making Box-Cox a flexible umbrella.

5. Improves residual behavior: Reducing heteroscedasticity often produces better-behaved residuals, improving the validity of statistical tests on model fit.

1.5 Disadvantages of Box-Cox in Time Series

1. Interpretation is lost: Transformed values have no direct real-world meaning. You must back-transform forecasts to interpret them in original units.

2. Does not address non-stationarity: Box-Cox stabilizes variance but does not remove trends or make the mean constant. Differencing is still needed for that.

3. Undefined for zeros or negatives: The transformation breaks down for non-positive values. GVZ is always positive so this is not an issue here, but it limits applicability elsewhere.

4. Lambda is estimated, not known: The optimal \(\lambda\) is estimated from training data and may not hold out-of-sample, especially after structural breaks.

5. May be unnecessary: If variance is already stable, applying Box-Cox adds complexity without benefit.

2 Basic Forecasting Models and Accuracy Metrics

We split the GVZ series into training (first 80%) and test (last 20%) sets and apply three benchmark models.

n <- nrow(gold_vol_ts)
train_size <- floor(0.8 * n)

train <- gold_vol_ts |> slice(1:train_size)
test  <- gold_vol_ts |> slice((train_size + 1):n)

paste("Training:", nrow(train), "months from",
      format(min(train$month)), "to", format(max(train$month)))

[1] "Training: 170 months from 2008 Jun to 2022 Jul"

paste("Test:", nrow(test), "months from",
      format(min(test$month)), "to", format(max(test$month)))

[1] "Test: 43 months from 2022 Aug to 2026 Feb"

# Fit three benchmark models
fit <- train |>
  model(
    Mean   = MEAN(value),
    Drift  = RW(value ~ drift()),
    SNaive = SNAIVE(value)
  )

# Forecast over test period
h <- nrow(test)
fc <- fit |> forecast(h = h)

# Plot forecasts vs actuals
fc |>
  autoplot(train, level = NULL) +
  autolayer(test, value, colour = "black", linewidth = 1) +
  labs(
    title = "GVZ Forecasts: Mean, Drift, Seasonal Naïve",
    subtitle = "Black = actual test data",
    y = "GVZ Index",
    x = "Month",
    colour = "Model"
  )

# Accuracy metrics on test set
acc <- fc |>
  accuracy(test) |>
  select(.model, RMSE, MAE, MAPE, ME, MPE) |>
  arrange(RMSE)

knitr::kable(acc, digits = 3,
             caption = "Test Set Accuracy Metrics (Lower absolute value = Better)")

Test Set Accuracy Metrics (Lower absolute value = Better)
.model	RMSE	MAE	MAPE	ME	MPE
Mean	4.469	3.546	21.423	-1.501	-13.817
Drift	4.546	3.357	19.666	-0.791	-9.843
SNaive	4.865	3.455	19.336	0.010	-4.836

2.1 Interpreting the Metrics

Variance-focused metrics measure the spread of errors regardless of direction:

RMSE: Penalizes large errors heavily due to squaring. Sensitive to the kind of spikes GVZ exhibits.
MAE: Average absolute deviation. More robust to outliers than RMSE.
MAPE: Scale-free percentage error. Useful for comparing across series.

Bias-focused metrics reveal systematic over or under-prediction:

ME: A positive ME means the model systematically under-predicts; negative means over-predicts.
MPE: Percentage version of ME. For a volatile index like GVZ that tends to spike and revert, a negative ME means the model over-predicts on average — it forecasts higher than realized values during calm periods.

For GVZ, none of the three models is expected to perform well over a long horizon because volatility indices are inherently mean-reverting and event-driven. However, the Mean model may perform reasonably since GVZ reverts toward a long-run average after spikes.

3 Does Transformation Improve Model Performance?

We now apply the Box-Cox transformation to the training set, refit the same models, back-transform the forecasts, and compare accuracy metrics against the original scale test set.

# Compute lambda on training set only (avoid data leakage)
lambda_train <- train |>
  features(value, features = guerrero) |>
  pull(lambda_guerrero)

paste("Lambda from training set:", round(lambda_train, 3))

[1] "Lambda from training set: -0.784"

# Transform training set
train_bc <- train |>
  mutate(value = box_cox(value, lambda_train))

# Fit same models on transformed data
fit_bc <- train_bc |>
  model(
    Mean_BC   = MEAN(value),
    Drift_BC  = RW(value ~ drift()),
    SNaive_BC = SNAIVE(value)
  )

# Forecast and back-transform .mean to original scale
fc_bc <- fit_bc |>
  forecast(h = h) |>
  mutate(.mean = inv_box_cox(.mean, lambda_train))

fc_bc |>
  autoplot(train, level = NULL) +
  autolayer(test, value, colour = "black", linewidth = 1) +
  labs(
    title = "After Box-Cox: Forecasts Back-Transformed to Original GVZ Scale",
    subtitle = "Black = actual test data",
    y = "GVZ Index",
    x = "Month",
    colour = "Model"
  )

# Original scale accuracy (no transformation)
acc_original <- fc |>
  accuracy(test) |>
  select(.model, RMSE, MAE, MAPE) |>
  mutate(Transformation = "None")

# Box-Cox: manually compute accuracy against original test
# (rename test value column to avoid join conflict)
bc_errors <- fc_bc |>
  as_tibble() |>
  left_join(test |> as_tibble() |> rename(actual = value), by = "month") |>
  mutate(error = actual - .mean) |>
  group_by(.model) |>
  summarise(
    RMSE = sqrt(mean(error^2, na.rm = TRUE)),
    MAE  = mean(abs(error), na.rm = TRUE),
    MAPE = mean(abs(error / actual), na.rm = TRUE) * 100
  ) |>
  mutate(
    Transformation = "Box-Cox",
    .model = str_remove(.model, "_BC")
  )

# Combined comparison
comparison <- bind_rows(acc_original, bc_errors) |>
  arrange(.model, Transformation)

knitr::kable(comparison, digits = 3,
             caption = "Accuracy Before and After Box-Cox Transformation")

Accuracy Before and After Box-Cox Transformation
.model	RMSE	MAE	MAPE	Transformation
Drift	4.510	3.393	20.062	Box-Cox
Drift	4.546	3.357	19.666	None
Mean	4.214	2.829	15.807	Box-Cox
Mean	4.469	3.546	21.423	None
SNaive	4.865	3.455	19.336	Box-Cox
SNaive	4.865	3.455	19.336	None

3.1 Findings

Does transformation reduce errors? For GVZ, the results are mixed. The Drift model is most affected by transformation because it extrapolates a trend, and stabilizing variance prevents it from being distorted by outlier spike periods during estimation. The Mean and Seasonal Naïve models are less affected since they rely on simpler averaging or repetition of recent values rather than trend extrapolation.

What happens to interpretation? During modeling, all values are on the transformed scale and have no direct real-world meaning. Only after applying inv_box_cox() do the forecasts return to GVZ index units. The back-transformed prediction intervals are asymmetric — wider above the point forecast than below — which honestly reflects that GVZ can spike sharply upward during crises but is bounded below by near-zero volatility.

Key takeaway: For a volatile, event-driven series like GVZ, Box-Cox transformation is most valuable not for dramatically reducing point forecast errors, but for producing more honest prediction intervals that acknowledge upside risk. The stabilization of variance during model estimation also leads to more reliable standard errors and diagnostic tests.