Box-Cox + Basic Forecasting Methods

Author

John Guarini

library(fpp3)
library(forecast)

1. Box-Cox in Time Series

The Box-Cox transformation is a class of power transformations for positive data: If λ ≠ 0: y^(λ) = (y^λ − 1) / λ If λ = 0: y^(0) = log(y) In time series analysis, Box-Cox is sometimes applied to remove variance changes that increase with the level (common in production/energy series, where the seasonal variation increases with the level). Strengths: variance stabilization, removal of level-dependent seasonality, log transformation as special case. Weaknesses: requires positive data (or shifting), potential bias in back-transformation, changed interpretation due to model estimation in transformed units. In the context of the Australian quarterly electricity production data, the Box-Cox transformation is applied to remove variance and level-dependent seasonality. However, due to the growth in electricity production over time, the seasonal variation could potentially increase in size, making a transformation potentially useful prior to modeling.

1.1 Data in Practice: Australian Quarterly Electricity Production

We use the classic Australian production dataset and focus on Electricity (quarterly).

elec <- tsibbledata::aus_production |>
  select(Quarter, Electricity) |>
  as_tsibble(index = Quarter)

elec
# A tsibble: 218 x 2 [1Q]
   Quarter Electricity
     <qtr>       <dbl>
 1 1956 Q1        3923
 2 1956 Q2        4436
 3 1956 Q3        4806
 4 1956 Q4        4418
 5 1957 Q1        4339
 6 1957 Q2        4811
 7 1957 Q3        5259
 8 1957 Q4        4735
 9 1958 Q1        4608
10 1958 Q2        5196
# ℹ 208 more rows

1.1.1 Plot

elec |>
  autoplot(Electricity) +
  labs(title = "Austrailian Quarterly Electricity Production",
       y= "Electricity (GWh)",
       x= "Quarter")

1.2 Train & Test Split

The last 8 quarters are used for test data.

n_test <- 8   # 2 years (8 quarters)

elec_train <- elec |>
  slice(1:(n() - n_test))

elec_test <- elec |>
  slice((n() - n_test + 1):n())

range(elec_train$Quarter)
<yearquarter[2]>
[1] "1956 Q1" "2008 Q2"
# Year starts on: January
range(elec_test$Quarter)
<yearquarter[2]>
[1] "2008 Q3" "2010 Q2"
# Year starts on: January

2. Basic Forecasting Models

We will begin to fit the data for:

  • AVG (mean)

  • DRIFT

  • sNAIVE

  • Linear Regression with time dummies (trend and season)

fits_orig <- elec_train |>
  model(
    AVG = MEAN(Electricity),
    DRIFT = RW(Electricity ~ drift()),
    sNAIVE= SNAIVE(Electricity),
    REG   = TSLM(Electricity ~ trend() + season())
  )

fits_orig
# A mable: 1 x 4
      AVG         DRIFT   sNAIVE     REG
  <model>       <model>  <model> <model>
1  <MEAN> <RW w/ drift> <SNAIVE>  <TSLM>

2.1 Forecast on the Test Horizon

fc_orig <- fits_orig |>
  forecast(h = n_test)

fc_orig |>
  autoplot(elec) +
  labs(title = "Forecasts (Original Scale)",
       y = "Electricity",
       x = "Quarter")

2.2 Accuracy Metrics (ME, MPE, MAE, RMSE)

acc_orig <- fc_orig |>
  accuracy(elec_test) |>
  select(.model, ME, MPE, MAE, RMSE)

acc_orig
# A tibble: 4 × 5
  .model     ME    MPE    MAE   RMSE
  <chr>   <dbl>  <dbl>  <dbl>  <dbl>
1 AVG    30592. 51.9   30592. 30659.
2 DRIFT   1083.  1.71   1531.  2670.
3 REG      164.  0.176  1739.  2155.
4 sNAIVE   880.  1.43   1623.  1991.

In terms of RMSE and MAE, the regression model (REG) was the best model on the original scale, and the AVG method was the worst. The sNAIVE model was the second-best model, indicating a strong quarterly seasonality in electricity production.

3. Transformations and Re-fit Models

We will:

  1. Transform y

  2. Fit the same models

  3. Forecast

  4. Back-transform forecasts to original units

  5. Compute accuracy metrics on original units

     run_transformed_models <- function(train_tbl, test_tbl, transform_fn, inv_fn, label){
    
      train_t <- train_tbl |>
        mutate(y_t = transform_fn(Electricity))
    
      fits <- train_t |>
        model(
          AVG    = MEAN(y_t),
          DRIFT  = RW(y_t ~ drift()),
          sNAIVE = SNAIVE(y_t),
          REG    = TSLM(y_t ~ trend() + season())
        )
    
      # forecast() returns a fable -> convert to tibble BEFORE mutate()
      fc_back <- fits |>
        forecast(h = nrow(test_tbl)) |>
        as_tibble() |>
        mutate(.mean = inv_fn(.mean)) |>
        select(Quarter, .model, .mean) |>
        rename(Electricity_fc = .mean)
    
      actuals <- test_tbl |>
        as_tibble() |>
        select(Quarter, Electricity) |>
        rename(Electricity_actual = Electricity)
    
      joined <- fc_back |>
        left_join(actuals, by = "Quarter") |>
        mutate(
          error = Electricity_fc - Electricity_actual,
          pe    = 100 * error / Electricity_actual,
          ae    = abs(error),
          se    = error^2
        )
    
      acc <- joined |>
        group_by(.model) |>
        summarise(
          ME   = mean(error, na.rm = TRUE),
          MPE  = mean(pe, na.rm = TRUE),
          MAE  = mean(ae, na.rm = TRUE),
          RMSE = sqrt(mean(se, na.rm = TRUE)),
          .groups = "drop"
        ) |>
        mutate(Transform = label) |>
        select(Transform, .model, ME, MPE, MAE, RMSE)
    
      list(acc = acc, forecasts = joined)
    }

3.1 Box-Cox Transformation

Box–Cox requires positive data. Electricity is positive here.

lambda <- BoxCox.lambda(elec_train$Electricity)
lambda
[1] 0.4632254
bc_results <- run_transformed_models(
  train_tbl = elec_train,
  test_tbl  = elec_test,
  transform_fn = function(x) BoxCox(x, lambda),
  inv_fn       = function(x) InvBoxCox(x, lambda),
  label = paste0("BoxCox(lambda=", round(lambda, 3), ")")
)

acc_bc <- bc_results$acc
acc_bc
# A tibble: 4 × 6
  Transform            .model      ME     MPE    MAE   RMSE
  <chr>                <chr>    <dbl>   <dbl>  <dbl>  <dbl>
1 BoxCox(lambda=0.463) AVG    -33793. -57.3   33793. 33854.
2 BoxCox(lambda=0.463) DRIFT    -329.  -0.414  1920.  2771.
3 BoxCox(lambda=0.463) REG      7799.  13.4    7799.  8211.
4 BoxCox(lambda=0.463) sNAIVE   -880.  -1.43   1623.  1991.

3.2 Log Transformation

log_results <- run_transformed_models(
  train_tbl = elec_train,
  test_tbl  = elec_test,
  transform_fn = function(x) log(x),
  inv_fn       = function(x) exp(x),
  label = "log"
)

acc_log <- log_results$acc
acc_log
# A tibble: 4 × 6
  Transform .model      ME    MPE    MAE   RMSE
  <chr>     <chr>    <dbl>  <dbl>  <dbl>  <dbl>
1 log       AVG    -36814. -62.5  36814. 36870.
2 log       DRIFT    1160.   2.14  3059.  3618.
3 log       REG     26833.  45.7  26833. 27222.
4 log       sNAIVE   -880.  -1.43  1623.  1991.

3.3 Square-root Transformation

sqrt_results <- run_transformed_models(
  train_tbl = elec_train,
  test_tbl  = elec_test,
  transform_fn = function(x) sqrt(x),
  inv_fn       = function(x) (x^2),
  label = "sqrt"
)

acc_sqrt <- sqrt_results$acc
acc_sqrt
# A tibble: 4 × 6
  Transform .model      ME     MPE    MAE   RMSE
  <chr>     <chr>    <dbl>   <dbl>  <dbl>  <dbl>
1 sqrt      AVG    -33561. -57.0   33561. 33622.
2 sqrt      DRIFT    -403.  -0.541  1870.  2750.
3 sqrt      REG      6979.  12.0    6979.  7417.
4 sqrt      sNAIVE   -880.  -1.43   1623.  1991.

Comparing Accuracies

acc_all <- bind_rows(
  acc_orig %>% mutate(Transform = "none") %>% select(Transform, .model, ME, MPE, MAE, RMSE),
  acc_bc,
  acc_log,
  acc_sqrt
) %>%
  arrange(.model, Transform)

acc_all
# A tibble: 16 × 6
   Transform            .model      ME     MPE    MAE   RMSE
   <chr>                <chr>    <dbl>   <dbl>  <dbl>  <dbl>
 1 BoxCox(lambda=0.463) AVG    -33793. -57.3   33793. 33854.
 2 log                  AVG    -36814. -62.5   36814. 36870.
 3 none                 AVG     30592.  51.9   30592. 30659.
 4 sqrt                 AVG    -33561. -57.0   33561. 33622.
 5 BoxCox(lambda=0.463) DRIFT    -329.  -0.414  1920.  2771.
 6 log                  DRIFT    1160.   2.14   3059.  3618.
 7 none                 DRIFT    1083.   1.71   1531.  2670.
 8 sqrt                 DRIFT    -403.  -0.541  1870.  2750.
 9 BoxCox(lambda=0.463) REG      7799.  13.4    7799.  8211.
10 log                  REG     26833.  45.7   26833. 27222.
11 none                 REG       164.   0.176  1739.  2155.
12 sqrt                 REG      6979.  12.0    6979.  7417.
13 BoxCox(lambda=0.463) sNAIVE   -880.  -1.43   1623.  1991.
14 log                  sNAIVE   -880.  -1.43   1623.  1991.
15 none                 sNAIVE    880.   1.43   1623.  1991.
16 sqrt                 sNAIVE   -880.  -1.43   1623.  1991.

Best Model

acc_all %>%
  group_by(.model) %>%
  slice_min(order_by = RMSE, n = 1, with_ties = FALSE) %>%
  ungroup()
# A tibble: 4 × 6
  Transform .model     ME    MPE    MAE   RMSE
  <chr>     <chr>   <dbl>  <dbl>  <dbl>  <dbl>
1 none      AVG    30592. 51.9   30592. 30659.
2 none      DRIFT   1083.  1.71   1531.  2670.
3 none      REG      164.  0.176  1739.  2155.
4 log       sNAIVE  -880. -1.43   1623.  1991.

The Box-Cox, log, and square-root transformations did not result in improved forecast accuracy compared to the original scale for most models. In fact, the regression model was significantly worse on the transformed scale. This indicates that the variance in the quarterly electricity series was already relatively stable, and further transformation was unnecessary.
On the transformed scale, the coefficients are no longer in the original units. Instead, they are in terms of the transformed units (e.g., log units approximate percentage changes). Additionally, forecasts need to be converted back into the original units, which can be biased.

Conclusion

In this analysis, basic benchmark forecasting techniques were used on the Australian quarterly electricity production data. Although transformations such as Box-Cox, log, and square root are common and useful in removing variance, they did not help in the forecasting results. The regression model including trend and seasonal components worked best on the original data. This shows that transformations should be used depending on the nature of the data.