library(fredr)
library(tidyverse)
library(tsibble)
library(fable)
library(feasts)
library(gridExtra)
fredr_set_key("f95eac3f44d6965f475a2026b5216b1b")
theme_set(theme_minimal())Discussion 4
# Fetch Gold Volatility Index (GVZ) from FRED
gold_vol <- fredr(
series_id = "GVZCLS",
observation_start = as.Date("2008-01-01")
)
# Aggregate daily data to monthly averages
gold_vol_ts <- gold_vol |>
mutate(month = yearmonth(date)) |>
group_by(month) |>
summarise(value = mean(value, na.rm = TRUE)) |>
ungroup() |>
as_tsibble(index = month)
head(gold_vol_ts)# A tsibble: 6 x 2 [1M]
month value
<mth> <dbl>
1 2008 Jun 25.1
2 2008 Jul 27.1
3 2008 Aug 28.5
4 2008 Sep 40.6
5 2008 Oct 55.1
6 2008 Nov 48.4
1 The Box-Cox Transformation in Time Series
1.1 What It Does
The Box-Cox transformation is a family of power transformations applied to a time series to stabilize variance — meaning it makes the variability of the series consistent across time rather than changing with the level of the series.
The transformation is defined as:
\[ w_t = \begin{cases} \log(y_t) & \text{if } \lambda = 0 \\ \frac{y_t^\lambda - 1}{\lambda} & \text{otherwise} \end{cases} \]
Where \(\lambda\) controls the strength of the transformation:
| \(\lambda\) | Transformation |
|---|---|
| 1 | No change (original data) |
| 0.5 | Square root |
| 0 | Natural log |
| -1 | Inverse (1/y) |
1.2 Why It Matters for Time Series
In many real-world time series, when the level of the series rises, its variability rises too — a property called heteroscedasticity. Most forecasting models (ARIMA, ETS, TSLM) assume constant error variance. When that assumption is violated, prediction intervals become unreliable and parameter estimates can be distorted.
1.3 Example: CBOE Gold Volatility Index (GVZ)
The GVZ measures the implied volatility of gold prices — essentially how much the market expects gold to fluctuate. It is sourced from FRED (Series ID: GVZCLS) and aggregated from daily to monthly averages.
gold_vol_ts |>
autoplot(value) +
labs(
title = "CBOE Gold Volatility Index (GVZ) - Monthly Average",
subtitle = "2008 to present | Source: FRED (GVZCLS)",
y = "GVZ Index",
x = "Month"
)Notice that during high-volatility regimes (2008 financial crisis, 2020 COVID shock), the series spikes sharply and the swings become much larger. This is textbook heteroscedasticity — variance grows with the level of the series.
# Find optimal lambda using Guerrero method
lambda <- gold_vol_ts |>
features(value, features = guerrero) |>
pull(lambda_guerrero)
paste("Optimal lambda:", round(lambda, 3))[1] "Optimal lambda: -0.799"
# Apply Box-Cox transformation
gold_bc <- gold_vol_ts |>
mutate(value_bc = box_cox(value, lambda))
# Plot before and after
p1 <- gold_vol_ts |>
autoplot(value) +
labs(title = "Before: Original GVZ", y = "GVZ Index")
p2 <- gold_bc |>
autoplot(value_bc) +
labs(title = paste0("After: Box-Cox Transformed (λ = ", round(lambda, 3), ")"),
y = "Transformed Values")
grid.arrange(p1, p2, ncol = 1)After transformation, the large spikes are compressed and variance is more consistent across the series — exactly what we need before modeling.
1.4 Advantages of Box-Cox in Time Series
1. Stabilizes variance: The primary benefit. Models perform better when error variance is constant over time, and prediction intervals become more reliable.
2. Automatic lambda selection: The Guerrero method selects the optimal \(\lambda\) algorithmically, removing subjectivity.
3. Back-transformation is clean: Forecasts are back-transformed using inv_box_cox(), producing asymmetric prediction intervals that are wider on the upside — realistic for volatile series like GVZ.
4. Generalizes common transformations: Log (\(\lambda = 0\)) and square root (\(\lambda = 0.5\)) are special cases, making Box-Cox a flexible umbrella.
5. Improves residual behavior: Reducing heteroscedasticity often produces better-behaved residuals, improving the validity of statistical tests on model fit.
1.5 Disadvantages of Box-Cox in Time Series
1. Interpretation is lost: Transformed values have no direct real-world meaning. You must back-transform forecasts to interpret them in original units.
2. Does not address non-stationarity: Box-Cox stabilizes variance but does not remove trends or make the mean constant. Differencing is still needed for that.
3. Undefined for zeros or negatives: The transformation breaks down for non-positive values. GVZ is always positive so this is not an issue here, but it limits applicability elsewhere.
4. Lambda is estimated, not known: The optimal \(\lambda\) is estimated from training data and may not hold out-of-sample, especially after structural breaks.
5. May be unnecessary: If variance is already stable, applying Box-Cox adds complexity without benefit.
2 Basic Forecasting Models and Accuracy Metrics
We split the GVZ series into training (first 80%) and test (last 20%) sets and apply three benchmark models.
n <- nrow(gold_vol_ts)
train_size <- floor(0.8 * n)
train <- gold_vol_ts |> slice(1:train_size)
test <- gold_vol_ts |> slice((train_size + 1):n)
paste("Training:", nrow(train), "months from",
format(min(train$month)), "to", format(max(train$month)))[1] "Training: 170 months from 2008 Jun to 2022 Jul"
paste("Test:", nrow(test), "months from",
format(min(test$month)), "to", format(max(test$month)))[1] "Test: 43 months from 2022 Aug to 2026 Feb"
# Fit three benchmark models
fit <- train |>
model(
Mean = MEAN(value),
Drift = RW(value ~ drift()),
SNaive = SNAIVE(value)
)
# Forecast over test period
h <- nrow(test)
fc <- fit |> forecast(h = h)
# Plot forecasts vs actuals
fc |>
autoplot(train, level = NULL) +
autolayer(test, value, colour = "black", linewidth = 1) +
labs(
title = "GVZ Forecasts: Mean, Drift, Seasonal Naïve",
subtitle = "Black = actual test data",
y = "GVZ Index",
x = "Month",
colour = "Model"
)# Accuracy metrics on test set
acc <- fc |>
accuracy(test) |>
select(.model, RMSE, MAE, MAPE, ME, MPE) |>
arrange(RMSE)
knitr::kable(acc, digits = 3,
caption = "Test Set Accuracy Metrics (Lower absolute value = Better)")| .model | RMSE | MAE | MAPE | ME | MPE |
|---|---|---|---|---|---|
| Mean | 4.469 | 3.546 | 21.423 | -1.501 | -13.817 |
| Drift | 4.546 | 3.357 | 19.666 | -0.791 | -9.843 |
| SNaive | 4.865 | 3.455 | 19.336 | 0.010 | -4.836 |
2.1 Interpreting the Metrics
Variance-focused metrics measure the spread of errors regardless of direction:
- RMSE: Penalizes large errors heavily due to squaring. Sensitive to the kind of spikes GVZ exhibits.
- MAE: Average absolute deviation. More robust to outliers than RMSE.
- MAPE: Scale-free percentage error. Useful for comparing across series.
Bias-focused metrics reveal systematic over or under-prediction:
- ME: A positive ME means the model systematically under-predicts; negative means over-predicts.
- MPE: Percentage version of ME. For a volatile index like GVZ that tends to spike and revert, a negative ME means the model over-predicts on average — it forecasts higher than realized values during calm periods.
For GVZ, none of the three models is expected to perform well over a long horizon because volatility indices are inherently mean-reverting and event-driven. However, the Mean model may perform reasonably since GVZ reverts toward a long-run average after spikes.
3 Does Transformation Improve Model Performance?
We now apply the Box-Cox transformation to the training set, refit the same models, back-transform the forecasts, and compare accuracy metrics against the original scale test set.
# Compute lambda on training set only (avoid data leakage)
lambda_train <- train |>
features(value, features = guerrero) |>
pull(lambda_guerrero)
paste("Lambda from training set:", round(lambda_train, 3))[1] "Lambda from training set: -0.784"
# Transform training set
train_bc <- train |>
mutate(value = box_cox(value, lambda_train))
# Fit same models on transformed data
fit_bc <- train_bc |>
model(
Mean_BC = MEAN(value),
Drift_BC = RW(value ~ drift()),
SNaive_BC = SNAIVE(value)
)
# Forecast and back-transform .mean to original scale
fc_bc <- fit_bc |>
forecast(h = h) |>
mutate(.mean = inv_box_cox(.mean, lambda_train))fc_bc |>
autoplot(train, level = NULL) +
autolayer(test, value, colour = "black", linewidth = 1) +
labs(
title = "After Box-Cox: Forecasts Back-Transformed to Original GVZ Scale",
subtitle = "Black = actual test data",
y = "GVZ Index",
x = "Month",
colour = "Model"
)# Original scale accuracy (no transformation)
acc_original <- fc |>
accuracy(test) |>
select(.model, RMSE, MAE, MAPE) |>
mutate(Transformation = "None")
# Box-Cox: manually compute accuracy against original test
# (rename test value column to avoid join conflict)
bc_errors <- fc_bc |>
as_tibble() |>
left_join(test |> as_tibble() |> rename(actual = value), by = "month") |>
mutate(error = actual - .mean) |>
group_by(.model) |>
summarise(
RMSE = sqrt(mean(error^2, na.rm = TRUE)),
MAE = mean(abs(error), na.rm = TRUE),
MAPE = mean(abs(error / actual), na.rm = TRUE) * 100
) |>
mutate(
Transformation = "Box-Cox",
.model = str_remove(.model, "_BC")
)
# Combined comparison
comparison <- bind_rows(acc_original, bc_errors) |>
arrange(.model, Transformation)
knitr::kable(comparison, digits = 3,
caption = "Accuracy Before and After Box-Cox Transformation")| .model | RMSE | MAE | MAPE | Transformation |
|---|---|---|---|---|
| Drift | 4.510 | 3.393 | 20.062 | Box-Cox |
| Drift | 4.546 | 3.357 | 19.666 | None |
| Mean | 4.214 | 2.829 | 15.807 | Box-Cox |
| Mean | 4.469 | 3.546 | 21.423 | None |
| SNaive | 4.865 | 3.455 | 19.336 | Box-Cox |
| SNaive | 4.865 | 3.455 | 19.336 | None |
3.1 Findings
Does transformation reduce errors? For GVZ, the results are mixed. The Drift model is most affected by transformation because it extrapolates a trend, and stabilizing variance prevents it from being distorted by outlier spike periods during estimation. The Mean and Seasonal Naïve models are less affected since they rely on simpler averaging or repetition of recent values rather than trend extrapolation.
What happens to interpretation? During modeling, all values are on the transformed scale and have no direct real-world meaning. Only after applying inv_box_cox() do the forecasts return to GVZ index units. The back-transformed prediction intervals are asymmetric — wider above the point forecast than below — which honestly reflects that GVZ can spike sharply upward during crises but is bounded below by near-zero volatility.
Key takeaway: For a volatile, event-driven series like GVZ, Box-Cox transformation is most valuable not for dramatically reducing point forecast errors, but for producing more honest prediction intervals that acknowledge upside risk. The stabilization of variance during model estimation also leads to more reliable standard errors and diagnostic tests.