Box-Cox Transformations in Time Series Analysis

Author

AS

Published

September 28, 2025

Box-Cox Transformations in Time Series

Setup

library(fpp3)
library(dplyr)
library(ggplot2)

# Load example data
data("aus_retail")
food_data <- aus_retail %>%
  filter(Industry == "Food retailing") %>%
  summarise(Turnover = sum(Turnover))

What is Box-Cox?

Box-Cox transformations stabilize variance in time series data. Many series show increasing variance as the level increases (the “fan-out” pattern).

The Problem: Non-Constant Variance

# Show the fan-out pattern
food_data %>%
  autoplot(Turnover) +
  labs(title = "Australian Food Retail: Notice Increasing Variance",
       subtitle = "Early years: small fluctuations. Recent years: large fluctuations",
       y = "Turnover ($AUD)") +
  theme_minimal()

The Box-Cox Formula

\[w_t = \begin{cases} \frac{y_t^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \log(y_t) & \text{if } \lambda = 0 \end{cases}\]

Key Parameters:

  • \(y_t\) = original data
  • \(w_t\) = transformed data
  • \(\lambda\) = transformation strength

When to Use Box-Cox

Use when:

  • Variance increases with the level of the series
  • You see a “fan-out” pattern in plots
  • Residuals from models show heteroscedasticity
  • Forecasting models perform poorly due to changing variance

Don’t use when:

  • Variance is already stable
  • Data contains zeros or negative values (without adjustment)
  • You need to maintain original scale interpretation

Formula Intuition

Key Mathematical Insight

Important: When λ = 1, Box-Cox gives us: \(\frac{y^1 - 1}{1} = y - 1\)

This is not “no transformation” - it’s a linear shift by -1.

For “no transformation” (keeping original scale), we need to work outside the Box-Cox family or interpret λ = 1 as the limiting case.

Common λ Values

library(knitr)

tibble(
  Lambda = c(2, 1, 0.5, 0, -0.5, -1),
  Formula = c("(y² - 1)/2", "(y - 1)/1", "(√y - 1)/0.5", "log(y)", "(1/√y - 1)/(-0.5)", "(-1/y - 1)/(-1)"),
  `Simplified Form` = c("(y² - 1)/2", "y - 1", "2(√y - 1)", "log(y)", "-2(1/√y - 1)", "1/y - 1"),
  Effect = c("Expansion", "Linear shift", "Mild compression", "Moderate compression", 
             "Strong compression", "Very strong compression"),
  `Use When` = c("Variance decreases with level", "Linear relationship", "Slight fan-out", "Clear fan-out", 
                 "Strong fan-out", "Extreme fan-out")
) %>%
  kable(caption = "Box-Cox Transformations: The Complete Picture")
Box-Cox Transformations: The Complete Picture
Lambda Formula Simplified Form Effect Use When
2.0 (y² - 1)/2 (y² - 1)/2 Expansion Variance decreases with level
1.0 (y - 1)/1 y - 1 Linear shift Linear relationship
0.5 (√y - 1)/0.5 2(√y - 1) Mild compression Slight fan-out
0.0 log(y) log(y) Moderate compression Clear fan-out
-0.5 (1/√y - 1)/(-0.5) -2(1/√y - 1) Strong compression Strong fan-out
-1.0 (-1/y - 1)/(-1) 1/y - 1 Very strong compression Extreme fan-out

Visual Intuition

# Show how different lambdas "bend" the data
demo_data <- tibble(x = seq(1, 20, by = 1)) %>%
  mutate(
    `λ = 1 (y - 1)` = x - 1,
    `λ = 0.5 ((√y - 1)/0.5)` = 2 * (sqrt(x) - 1),
    `λ = 0 (Log)` = log(x)
  ) %>%
  pivot_longer(cols = -x, names_to = "Transformation", values_to = "y")

demo_data %>%
  ggplot(aes(x = x, y = y)) +
  geom_line(size = 1, color = "steelblue") +
  facet_wrap(~ Transformation, scales = "free_y") +
  labs(title = "Box-Cox Formula Reality Check",
       subtitle = "λ = 1 gives linear shift (y - 1), not original scale",
       x = "Original Value", y = "Box-Cox Transformed Value") +
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Key Insight: Transformations with λ < 1 compress large values more than small ones, which stabilizes variance when it increases with level.

https://onlinestatbook.com/mobile/transformations/box-cox.html

Finding Optimal λ

Guerrero Method (Automatic)

# Find optimal lambda automatically
lambda_optimal <- food_data %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

cat("Optimal λ for this data:", round(lambda_optimal, 3))
Optimal λ for this data: 0.09
# Apply the transformation
food_transformed <- food_data %>%
  mutate(
    Original = Turnover,
    Transformed = box_cox(Turnover, lambda_optimal)
  )

Compare Before and After

# Plot original vs transformed
food_transformed %>%
  pivot_longer(cols = c(Original, Transformed), 
               names_to = "Type", values_to = "Value") %>%
  ggplot(aes(x = Month, y = Value)) +
  geom_line(color = "steelblue") +
  facet_wrap(~ Type, scales = "free_y", ncol = 1) +
  labs(title = "Original vs Box-Cox Transformed Data",
       subtitle = paste("λ =", round(lambda_optimal, 3)),
       y = "Value") +
  theme_minimal()

Handling Zero Values

x <- c(0, 1, 10, 100)

log(x)     # gives -Inf at 0
[1]     -Inf 0.000000 2.302585 4.605170
log1p(x)   # safe: log(1+x)
[1] 0.0000000 0.6931472 2.3978953 4.6151205

Problem: Box-Cox requires positive values. Common solutions:

Method 1: Add a Small Constant

# If data has zeros, add small constant
data_with_zeros <- tibble(
  Month = seq(as.Date("2020-01-01"), by = "month", length.out = 12),
  Value = c(0, 5, 10, 0, 15, 20, 25, 0, 30, 35, 40, 45)
)

# Add small constant before transformation
adjusted_data <- data_with_zeros %>%
  mutate(
    Original = Value,
    Adjusted = Value + 0.1,  # Add small constant
    Transformed = box_cox(Adjusted, 0.5)
  )

print("Original data with zeros:")
[1] "Original data with zeros:"
print(data_with_zeros$Value[1:4])
[1]  0  5 10  0
print("After adding 0.1:")
[1] "After adding 0.1:"
print(adjusted_data$Adjusted[1:4])
[1]  0.1  5.1 10.1  0.1

Method 2: Use Modified Box-Cox

# Some implementations allow for shifted Box-Cox
# Formula: ((y + shift)^λ - 1) / λ

# Example with manual implementation
modified_boxcox <- function(y, lambda, shift = 1) {
  if (lambda == 0) {
    log(y + shift)
  } else {
    ((y + shift)^lambda - 1) / lambda
  }
}

# Apply with shift
adjusted_data <- adjusted_data %>%
  mutate(Modified_BoxCox = modified_boxcox(Original, 0.5, shift = 1))

print("Modified Box-Cox with shift:")
[1] "Modified Box-Cox with shift:"
print(adjusted_data$Modified_BoxCox[1:4])
[1] 0.000000 2.898979 4.633250 0.000000

Practical Implementation

In Forecasting Models

# Compare models with and without Box-Cox
models <- food_data %>%
  model(
    Original = ETS(Turnover),
    BoxCox = ETS(box_cox(Turnover, lambda_optimal)),
    Auto = ETS(Turnover)  # ETS chooses transformation automatically
  )

# Compare performance
model_performance <- models %>%
  glance() %>%
  select(.model, AICc, sigma2) %>%
  arrange(AICc)

model_performance %>%
  kable(digits = 2, caption = "Model Comparison")
Model Comparison
.model AICc sigma2
BoxCox -94.40 0
Original 6609.73 0
Auto 6609.73 0

Interpreting the Results

Key Insights from the comparison:

  1. Box-Cox wins decisively: AICc of -94.40 vs ~6610 for others
    • Lower AICc = better model fit
    • The massive difference shows Box-Cox dramatically improves model performance
  2. Original vs Auto are identical: Both have AICc = 6609.73
    • This means ETS’s automatic selection chose no transformation
    • ETS didn’t detect the need for variance stabilization
  3. Why Box-Cox performs so much better:
    • Addresses the heteroscedasticity (changing variance) problem
    • Creates more stable residuals that ETS can model effectively
    • Transforms the “fan-out” pattern into more manageable errors

The lesson: Sometimes manual Box-Cox transformation beats automatic methods because: - It specifically targets the variance stabilization problem - Automatic methods might not always detect heteroscedasticity - The Guerrero method optimizes specifically for variance stability

Back-Transformation

# Important: fpp3 automatically back-transforms forecasts
forecasts <- models %>%
  forecast(h = 12)

# All forecasts are on original scale
forecasts %>%
  autoplot(food_data %>% filter(year(Month) >= 2015)) +
  labs(title = "Forecasts (Automatically Back-Transformed)",
       y = "Turnover ($AUD)") +
  theme_minimal()

Quick Decision Guide

When to use Box-Cox:

  1. Plot your data - look for fan-out pattern
  2. Check residuals - heteroscedasticity indicates need for transformation
  3. Try λ = 0.5 first - often works well for economic data
  4. Use Guerrero method - for automatic optimal λ selection
  5. Validate results - ensure variance is more stable after transformation

R Code Pattern:

# Find optimal lambda
lambda <- your_data %>% 
  features(your_variable, features = guerrero) %>% 
  pull(lambda_guerrero)

# Apply in models
model <- your_data %>% 
  model(ETS(box_cox(your_variable, lambda)))

# Forecasts automatically back-transformed
forecast <- model %>% forecast(h = 12)

Summary

  • Purpose: Stabilize variance in time series
  • When: Data shows increasing variance with level
  • How: Apply power transformation with parameter λ
  • Zero handling: Add small constant or use shifted Box-Cox
  • In practice: Use guerrero feature for automatic λ selection
  • Models: Apply with box_cox() function in model formulas

Appendix

https://onlinestatbook.com/mobile/transformations/box-cox.html

http://www.econ.illinois.edu/~econ508/Papers/boxcox64.pdf

Kutner, M., Nachtsheim, C., Neter, J., and Li, W. (2004). Applied Linear Statistical Models, McGraw-Hill/Irwin, Homewood, IL. https://users.stat.ufl.edu/~winner/sta4211/ALSM_5Ed_Kutner.pdf