Discussion 5

Author

Tin Vu

Explain the Box-Cox method (what it does, advantages and disadvantages). Keep it in the context of time series data.

The Box-Cox method is a power transformation that is used in time series when the variance changes with the level. It helps with making the series’ variability more even across time so models fit better. A \(\lambda\) value of 1 means that there is essentially no change and a value of 0 makes the transformation essentially a log transformation. Values between 1 and 0 shrinks bigger values more than the smaller values, makes outliers less significant and evens out data more.

Box-Cox has lots of applications in time series that have multiplicative seasonality. So when seasonal swings are changing throughout the year, so industries like retail or energy may benefit from applying Box-Cox transformations. Box-Cox method would transform the time series to look more additive and have constant variance which helps with improving forecasting models.

Some of the pros of using the Box-Cox method is it helps with stabilizing variances and can transform time series to make them look more additive trend. Using logs transforms the interpretability as percentage changes on the original scale which may be more comparable to other series if they don’t have the same units.

Some of the cons of using the Box-Cox method is that it requires positive data since logs fail with zeros. Another would be that a very low \(\lambda\) may create unnecessarily large predictions intervals. And lastly you will have to interpret and forecast back on the original scale.

library(fpp3)
library(tsibble)
library(feasts)
library(fable)
library(fabletools)
library(fredr)
library(ggplot2)
library(forecast)
library(tseries)
library(patchwork)
library(ggtime)
library(scales)
library(kableExtra)
library(dplyr)

#FRED API Key
fredr_set_key ("2e2c37e9aafe0f62c76d6bc4ef83765f")
#Loading Data Set From FRED
raw_totalsales <- fredr(series_id = "TOTALNSA", 
      observation_start = as.Date("2000-01-01"),
      observation_end   = as.Date("2026-01-01")) |>
  transmute(Month = yearmonth(date), value) |>
  as_tsibble(index = Month)

#80/20 Data Split (observation 251)
raw_train <- raw_totalsales |> filter_index(~"2020 Nov")
raw_test  <- raw_totalsales |> filter_index("2020 Dec"~.)


#Plot Original
original_plot <- raw_train |> 
  autoplot(value) +
  labs(
    title = "Original Total Vehicle Sales (TOTALNSA)",
    y = "Thousands Of Units",
    x = "Month"
)
#BoxCox
  lambda_totalsales <- raw_train |>
  features(value, features = guerrero) |>
  pull(lambda_guerrero)

cat("Optimal λ for this data (training):", round(lambda_totalsales, 3))
Optimal λ for this data (training): 0.866
totalsales_transformed <- raw_train |>
  mutate(
    value = box_cox(value, lambda_totalsales)
)

#Plot Transformed
transformed_plot <- totalsales_transformed |>
  autoplot(value) +
  labs(
    title = "Transformed Total Vehicle Sales (TOTALNSA)",
    y = "Box-Cox(value)",
    x = "Month"
)

original_plot / transformed_plot

Implement a basic method like linear regression with time dummies, DRIFT, AVG and/or SNAIVE on a time series of your choice in R. Calculate some accuracy metrics for the model of your choice in part 1. These could be bias-focused metrics (systematic tendency to over/under-predict) like ME, MPE or variance-focused metrics (spread of errors, regardless of direction) like MAE, RMSE. Calculate them 

#Original Accuracy Table
model_fits <- raw_train |>
  model(
    AVG    = MEAN(value),
    DRIFT  = RW(value ~ drift()),
    SNAIVE = SNAIVE(value),
    REG    = TSLM(value ~ trend() + season())
)

fc <- model_fits |> forecast(new_data = raw_test)

acc_tbl <- fc |>
  accuracy(raw_test) |>
  select(.model, ME, MPE, MAE, RMSE) |>
  arrange(RMSE)

acc_tbl 
# A tibble: 4 × 5
  .model     ME    MPE   MAE  RMSE
  <chr>   <dbl>  <dbl> <dbl> <dbl>
1 REG     -1.04 -0.846  101.  128.
2 AVG    -24.3  -3.37   134.  162.
3 DRIFT   78.8   4.60   145.  178.
4 SNAIVE  76.9   4.15   215.  290.
#Transformed Accuracy Table
model_fits_bc <- raw_train |>
  model(
    AVG    = MEAN(box_cox(value, lambda_totalsales)),
    DRIFT  = RW(box_cox(value, lambda_totalsales) ~ drift()),
    SNAIVE = SNAIVE(box_cox(value, lambda_totalsales)),
    REG    = TSLM(box_cox(value, lambda_totalsales) ~ trend() + season())
)

fc_transformed <- model_fits_bc |> forecast(new_data = raw_test)

transformed_acc_tbl <- fc_transformed |>
  accuracy(raw_test) |>
  select(.model, ME, MPE, MAE, RMSE) |>
  arrange(RMSE)

transformed_acc_tbl 
# A tibble: 4 × 5
  .model     ME    MPE   MAE  RMSE
  <chr>   <dbl>  <dbl> <dbl> <dbl>
1 REG     -1.43 -0.879  101.  128.
2 DRIFT   -1.21 -1.45   124.  155.
3 AVG    -24.3  -3.36   134.  162.
4 SNAIVE  72.4   3.81   215.  288.

The transformed Box-Cox model did not meaningfully improve the best model, REG. Very minor value differences in MPE, MAE, and RMSE. Biggest difference in ME for the REG model but still not meaningfully different. But the Box-Cox did drastically improve the DRIFT model, ME = 78.8 to -1.2, MPE = 4.6 to -1.4, MAE = 145.0 to124.5, and RMSE =178.2 to 155.0. So the Box-Cox mainly helped with scale and variance changes. The ME/MPE were already near zero before and after the transformation so no real issues with biases but the Box-Cox did help with reducing the spread seen through the changes in MAE and RMSE for DRIFT. But didn’t really affect spread in REG since MAE and RMSE were already strong. These non-meaningful changes may be reflected by the \(\lambda = 0.866\) which is close to 1 so the data itself doesn’t require strong variance stabilization, very mild transformations from Box-Cox.

Can you apply an appropriate transformation (or try a few) on your time series above and see if you can model the data better i.e. error metric reduces? Comment on your findings. What happens to the interpretation?

# LOG transform (Box-Cox lambda = 0)
model_fits_log <- raw_train |>
  model(
    AVG    = MEAN(box_cox(value, 0)),
    DRIFT  = RW(box_cox(value, 0) ~ drift()),
    SNAIVE = SNAIVE(box_cox(value, 0)),
    REG    = TSLM(box_cox(value, 0) ~ trend() + season())
)

fc_log <- model_fits_log |> forecast(new_data = raw_test)

log_acc_tbl <- fc_log |>
  accuracy(raw_test) |>
  select(.model, ME, MPE, MAE, RMSE) |>
  arrange(RMSE)

log_acc_tbl
# A tibble: 4 × 5
  .model      ME    MPE   MAE  RMSE
  <chr>    <dbl>  <dbl> <dbl> <dbl>
1 REG      -5.70  -1.21  101.  129.
2 AVG     -26.0   -3.50  134.  162.
3 SNAIVE   40.5    1.38  219.  286.
4 DRIFT  -484.   -37.9   525.  592.
# SQRT-like transform (Box-Cox lambda = 0.5)
model_fits_sqrt <- raw_train |>
  model(
    AVG    = MEAN(box_cox(value, 0.5)),
    DRIFT  = RW(box_cox(value, 0.5) ~ drift()),
    SNAIVE = SNAIVE(box_cox(value, 0.5)),
    REG    = TSLM(box_cox(value, 0.5) ~ trend() + season())
)

fc_sqrt <- model_fits_sqrt |> forecast(new_data = raw_test)

sqrt_acc_tbl <- fc_sqrt |>
  accuracy(raw_test) |>
  select(.model, ME, MPE, MAE, RMSE) |>
  arrange(RMSE)

sqrt_acc_tbl
# A tibble: 4 × 5
  .model      ME     MPE   MAE  RMSE
  <chr>    <dbl>   <dbl> <dbl> <dbl>
1 REG      -2.75  -0.987  101.  128.
2 AVG     -24.4   -3.38   134.  162.
3 SNAIVE   60.0    2.87   215.  285.
4 DRIFT  -208.   -17.1    255.  293.

Additional transformations that I did using the Box-Cox method was adjusting the \(\lambda\) value. Original it suggested a value of 0.866 which is close to 1 which means that none or very mild transformations are needed. So I tried adjusting that value to 0 and 0.5. I used the best model for this data set which is REG as the benchmark. When \(\lambda = 0\) or a log transformation, the RMSE is slightly worse at 128.7 compared to 128.1 using \(\lambda = 0.866\). MAE is slightly better at 101.1 when compared to 101.3. So it definitely isn’t clearly better. When using \(\lambda = 0.5\) or a square root like transformation, the RMSE is 128.2 and the MAE is 101.1 which isn’t that much different from the data set with no transformation. The best model for the data set is the REG model using the Guerrero \(\lambda = 0.866\) parameter. Any other values for \(\lambda\) results in very negligible changes or non-meaningful changes. But for the DRIFT model, using the different \(\lambda\) values, the RMSE actually worsens drastically, \(\lambda = 0\) results in 591.5 and \(\lambda = 0.5\) results in 292.7 which are both significantly worse than \(\lambda = 0.866\) at 155.0.

The interpretation for the data set with no transformation are quite easy to interpret since the errors and forecast are in thousands of units from the original data set units. The transformed models using different \(\lambda\) values are interpreted on a transformed scale. The units are now percent changes because of the log like transformations. But I have reversed the accuracy tables for the transformed models so that I could compare them to the accuracy table of the original data set.