Moudle 4 Discussion Box_Cox

Author

You-Jin,Tsai

Part I Explain the Box-Cox method

Load Libraries

library(fpp3)
library(fredr)
library(tidyverse)
library(patchwork)
library(knitr)
library(kableExtra)
library(writexl)
library(dplyr)
Sys.setlocale("LC_TIME", "en_US.UTF-8")

Import Data

I selected the E-Commerce Retail Sales (series ID: ECOMSA) as the dataset for my analysis.
The time span covers 26 years, from 1999 to 2025, consisting of a total of 104 quarter observations.

remove(list=ls())

fredr_set_key("11c23965cf4414274d293d0c36ec7507") 

myts <- fredr(series_id = "ECOMSA",
              observation_start = as.Date("1999-10-01"),
              observation_end   = as.Date("2025-07-01")
              ) |>
  transmute(Quarter = yearquarter(date), value) |>
  as_tsibble(index = Quarter)

Compare before and after Box-Cox

# Find optimal lambda automatically
lambda_optimal <- myts %>%
  features(value, features = guerrero) %>%
  pull(lambda_guerrero)

cat("Optimal λ for this data:", round(lambda_optimal, 3))
Optimal λ for this data: 0.124
# Apply the transformation
transformed <- myts %>%
  mutate(
    Original = value,
    Transformed = box_cox(value, lambda_optimal)
  )
# Plot original vs transformed
transformed %>%
  pivot_longer(cols = c(Original, Transformed), 
               names_to = "Type", values_to = "Value") %>%
  ggplot(aes(x = Quarter, y = Value)) +
  geom_line(color = "red") +
  facet_wrap(~ Type, scales = "free_y", ncol = 1) +
  labs(title = "Original vs Box-Cox Transformed Data",
       subtitle = paste("λ =", round(lambda_optimal, 3)),
       y = "Value") +
  theme_minimal()

The Box‑Cox transformation is commonly used in time‑series analysis, and its main purpose is to stabilize variance so the data better satisfy the stationarity assumptions required by many forecasting models. When a series shows the pattern of “larger values coming with larger fluctuations,” Box‑Cox tends to work especially well.

  • Advantages: One of the biggest strengths of Box‑Cox is its flexibility. It can adjust more precisely to the characteristics of the data compared to a simple log transform. In my E‑Commerce Retail Sales example, the estimated lambda is 0.124, which means the optimal transformation is somewhere between the raw scale and the log scale. This helps improve the structure of the residuals and makes the model behave more consistently.

  • Disadvantages: The downside is that the transformed data become harder to interpret, since the model is no longer operating on the original scale. Also, Box‑Cox cannot be applied when the data contain zeros, so that’s something to watch out for.

Overall, looking at the E‑Commerce Retail Sales series, the data clearly show increasing variability as the level rises. After applying the Box‑Cox transformation, the variance becomes noticeably more stable, which supports the idea that this transformation is appropriate for this dataset.

Part II Forecast without transformation

Split Data

I split the dataset using an 80%–20% ratio, resulting in 84 quarters in the training data and 20 quarters in the test data.
The training period spans from 1999 Q4 to 2020 Q3, while the test period covers 2020 Q4 through 2025 Q3.

# Create training and test sets
train <- myts[1:84, ]
test  <- myts[85:nrow(myts), ]

Forecasting

# Fit forecasting models 

models <- train %>% model(
                          Drift  = RW(value ~ drift()),
                          SNaive = SNAIVE(value),
                          TSLM   = TSLM(value ~ trend() + season())
                          )

fc <- models %>%
  forecast(h = nrow(test)
           ) 
# Plot all forecasts 
fc %>% autoplot(train) + autolayer(test,value)+
  labs( title = "Forecast without transformation:Drift,SNaive,TSLM",
        y = "E-Commerce Retail Sales (Millions of Dollars)",
        x = "Quarter" 
        ) + facet_wrap(~ .model,
                       ncol = 1
                       ) 

Accuracy Metrics

Using the original data without any transformation, the forecasts from all three models show relatively large ME, MAE, and RMSE values. This mainly comes from the scale of the data itself, since the series is measured in dollar amounts. When the forecast deviates from the actual values, the errors naturally become large. In addition, all three models have positive ME values, which indicates that they tend to under‑predict the actual observations. Based on the current results, the forecasting performance ranks as Drift > SNaive > TSLM.

# 
acc <- accuracy(object = fc,  
                data = test   
                ) %>%
  select(.model, ME, MPE, RMSE, MAE, MAPE
         )

# Print with knitr::kable
kable(acc, caption = "Forecast Accuracy Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Forecast Accuracy Metrics
.model ME MPE RMSE MAE MAPE
Drift 27147.09 9.789764 30296.22 27147.09 9.789764
SNaive 83911.70 31.041400 91126.38 83911.70 31.041400
TSLM 116575.49 43.650453 117856.14 116575.49 43.650453

Part III Forecast with Box-Cox

Box-Cox transformation on train data

Since the transformation is applied only to the training data, the lambda value is different from the one calculated in Part I. So I reran the procedure to obtain the correct lambda for the training set.

#Find optimal lambda automatically (only on train data)
lambda_fc <- train %>%
  features(value, guerrero) %>% 
  pull(lambda_guerrero)
# Build model with transformed data
fit_bc <- train %>%
  model(
    Drift  = RW(box_cox(value, lambda_fc) ~ drift()),
    SNaive = SNAIVE(box_cox(value, lambda_fc)),
    TSLM   = TSLM(box_cox(value, lambda_fc) ~ trend() + season())
  )

Forecasting

# Forecast
fc_bc <- fit_bc %>% forecast(h = nrow(test))

# Plot all forecasts 
fc_bc %>%
  autoplot(train) +
  autolayer(test, value, colour = "black") +
  labs(
    title = "Forecast with Box-Cox Transformation: Drift, SNaive, TSLM",
    y = "E-Commerce Retail Sales (Millions of Dollars)",
    x = "Quarter"
  ) +
  facet_wrap(~ .model, ncol = 1)

Accuracy Metrics

From the table, it is clear that the models perform better after applying the transformation. The improvement is especially noticeable for the TSLM model, which becomes the best‑performing model after transformation. Its ME and RMSE are both lower than the Drift model, which was the best model before transformation.

In addition to TSLM, the SNaive model also benefits slightly from the transformation. However, its errors are still relatively large compared to the actual values, suggesting that SNaive may not be a suitable model for this dataset.

The Drift model is the only one whose performance becomes worse after the transformation. This is likely due to the amplification effect introduced by the inverse Box‑Cox transformation, which causes the upper bound of the forecast to rise exponentially.

acc_bc <- accuracy(object = fc_bc,
                data = test   
                ) %>%
  select(.model, ME, MPE, RMSE, MAE, MAPE
         )


# Tag the two table and merge them.
acc_combined <- bind_rows(
  acc %>% mutate(Transformation = "Original"),
  acc_bc %>% mutate(Transformation = "Box-Cox")
) %>%
  relocate(Transformation, .before = .model) 

# Draw it into a uniform table.
kable(acc_combined, caption = "Forecast Accuracy Comparison (Original vs Box-Cox)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  collapse_rows(columns = 1, valign = "top")
Forecast Accuracy Comparison (Original vs Box-Cox)
Transformation .model ME MPE RMSE MAE MAPE
Original Drift 27147.086 9.789764 30296.22 27147.09 9.789764
SNaive 83911.700 31.041400 91126.38 83911.70 31.041400
TSLM 116575.488 43.650453 117856.14 116575.49 43.650453
Box-Cox Drift -68163.001 -24.016975 84218.30 68163.00 24.016975
SNaive 76593.167 28.399053 83675.02 76593.17 28.399053
TSLM 5928.047 3.129912 23928.07 20356.56 7.897915