Module 4 Discussion

Author

Ethan Wright

Box-Cox Discussion

What is the Box-Cox Transformation?

Box-Cox transformation is applied to time series to stabilize variation and make the data stationary to be more appropriate for modelling. The formula is as follows:

\[ y(\lambda) = {(y^k -1)/}\lambda ; \lambda \neq 0 \]

\[ y(\lambda) = ln(y) ;\lambda = 0 \]

\(\lambda\) is estimated by maximum likelihood. From the formulas, when \(\lambda\)=1, there is no transformation, and when \(\lambda\)=0, the data becomes the natural log transformation.

In the context of the Australian electricity production time series, there is an upward trend and multiplicative seasonality. The Box-Cox method compresses larger values more to make seasonal changes constant over time. We use this because models like ARIMA, ETS, and TLSM all require stationary data.

Basic Method using aus_production from fpp3

library(fpp3)
Warning: package 'fpp3' was built under R version 4.5.2
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr
── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
✔ tibble      3.3.0     ✔ tsibble     1.1.6
✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
✔ tidyr       1.3.1     ✔ feasts      0.4.2
✔ lubridate   1.9.4     ✔ fable       0.5.0
✔ ggplot2     3.5.2     
Warning: package 'tsibble' was built under R version 4.5.2
Warning: package 'tsibbledata' was built under R version 4.5.2
Warning: package 'feasts' was built under R version 4.5.2
Warning: package 'fabletools' was built under R version 4.5.2
Warning: package 'fable' was built under R version 4.5.2
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
electric <- aus_production |>
  select(Quarter, Electricity)

autoplot(electric, Electricity)+
  labs(title = "Australian Electricity Production - Quarterly",
       y = "GWh")

We see a rising trend and multiplicative seasonality.

split_index <- floor(0.8 * nrow(electric))

train <- electric[1:split_index, ]
test  <- electric[(split_index + 1):nrow(electric), ]

snaive_model<-train|>
  model(SNAIVE(Electricity))

fcst<-snaive_model|>
  forecast(h=nrow(test))

electric |> 
  autoplot(Electricity)+
  autolayer(fcst, level = NULL)+
  labs(title = "Forecasts vs Actuals - SNAIVE",
       y="GWh")

accuracy(fcst, electric)|>
  select(.model, ME, RMSE, MAE, MPE)
# A tibble: 1 × 5
  .model                 ME  RMSE   MAE   MPE
  <chr>               <dbl> <dbl> <dbl> <dbl>
1 SNAIVE(Electricity) 6763. 7554. 6763.  12.1

Box-Cox Comparison

lambda <- train|>
  features(Electricity, features = guerrero)
lambda
# A tibble: 1 × 1
  lambda_guerrero
            <dbl>
1           0.387

Recommended lambda is ~ 0.387.

snaive_boxcox <- train|>
  model(SNAIVE(box_cox(Electricity, lambda)))
fcst_boxcox <- snaive_boxcox|>forecast(h = nrow(test))

electric |> 
  autoplot(Electricity)+
  autolayer(fcst_boxcox, level = NULL)+
  labs(title = "Forecasts vs Actuals - Box-Cox",
       y="GWh")

autoplot(electric, box_cox(Electricity, lambda)) +
  labs(title = paste0("Box-Cox Transformed (λ = ", round(lambda, 2), ")"),
       y = "Transformed GWh")

bind_rows(
  accuracy(fcst, electric),
  accuracy(fcst_boxcox, electric)
) |>
  select(.model, ME, RMSE, MAE, MPE, MAPE)
# A tibble: 2 × 6
  .model                                  ME  RMSE   MAE   MPE  MAPE
  <chr>                                <dbl> <dbl> <dbl> <dbl> <dbl>
1 SNAIVE(Electricity)                  6763. 7554. 6763.  12.1  12.1
2 SNAIVE(box_cox(Electricity, lambda)) 6617. 7392. 6617.  11.8  11.8

The accuracy metrics for the seasonal naive were better across all metrics for box-cox. The only difference is that during modelling, GWh is no longer the correct label for output because the values have been transformed by lambda. This difference is then reversed during forecasting to put it back into the original scale, GWh.