Kevin Rusu Module 4

Author

Kevin Rusu

1 Box-Cox Method

The Box-Cox transformation is used to stabilize the variance of a time series using a parameter called lambda (\(\lambda\)). Many time series have seasonal swings that get sometimes bigger as the level of the time series rises. For example. for some time series the seasonal swings in earlier time periods are small, but in later timer periods the ups and downs become much larger. We know this also as a non-constant variance (heteroskedasticity).

Interpreting the lambda output is important to know because it helps to decide how to transform the time series. For example, special cases include:

  • lambda = 1 -> no transformation needed

  • lambda = 0 -> log transformation

  • lambda = 0.5 -> square root needed for transformation

Using the Box-Cox method has some advantages like:

  • the simplicity because it just uses one parameter (lambda) to control the whole transformation

  • it covers common transformations (log, square root, inverse) in one framework

  • helps forecasting models that assume constant variance perform better

  • R handles the back-transformation automatically

Disadvantages are:

  • it only works with strictly positive data (it cannot handle zeros or negatives)

  • makes the interpretation harder because everything is on a transformed scale

  • only fixes variance (other tools needed like differencing for non-stationary)

  • back-transformed forecasts only gives medians instead of means

2 Implementation of Basic Model

2.1 Load Libraries

setwd("~/Library/Mobile Documents/com~apple~CloudDocs/Boston College/Spring 2026/Forecasting/Discussions/Module 4")

library(fpp3)
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr
── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
✔ tibble      3.3.0     ✔ tsibble     1.1.6
✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
✔ tidyr       1.3.1     ✔ feasts      0.4.2
✔ lubridate   1.9.4     ✔ fable       0.4.1
✔ ggplot2     4.0.0     
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
library(fredr)

fredr_set_key(Sys.getenv("FRED_API_KEY"))

2.2 Pull Data + Plot

raw <- fredr(
  series_id = "TOTBUSSMSA",
  observation_start = as.Date("2010-01-01")
)

sales <- raw |>
  select(date, value) |>
  mutate(Month = yearmonth(date)) |>
  as_tsibble(index = Month) |>
  rename(Sales = value)

head(sales)
# A tsibble: 6 x 3 [1M]
  date         Sales    Month
  <date>       <dbl>    <mth>
1 2010-01-01 1041245 2010 Jan
2 2010-02-01 1044107 2010 Feb
3 2010-03-01 1064567 2010 Mar
4 2010-04-01 1076198 2010 Apr
5 2010-05-01 1069360 2010 May
6 2010-06-01 1069584 2010 Jun
tail(sales)
# A tsibble: 6 x 3 [1M]
  date         Sales    Month
  <date>       <dbl>    <mth>
1 2025-06-01 1928435 2025 Jun
2 2025-07-01 1947836 2025 Jul
3 2025-08-01 1948357 2025 Aug
4 2025-09-01 1946691 2025 Sep
5 2025-10-01 1943249 2025 Oct
6 2025-11-01 1955139 2025 Nov
sales
# A tsibble: 191 x 3 [1M]
   date         Sales    Month
   <date>       <dbl>    <mth>
 1 2010-01-01 1041245 2010 Jan
 2 2010-02-01 1044107 2010 Feb
 3 2010-03-01 1064567 2010 Mar
 4 2010-04-01 1076198 2010 Apr
 5 2010-05-01 1069360 2010 May
 6 2010-06-01 1069584 2010 Jun
 7 2010-07-01 1078106 2010 Jul
 8 2010-08-01 1083871 2010 Aug
 9 2010-09-01 1091817 2010 Sep
10 2010-10-01 1110618 2010 Oct
# ℹ 181 more rows
autoplot(sales, Sales) +
  labs(title = "Total Business Sales (Monthly)",
       y = "Millions of Dollars",
       x = "Month")

2.3 Decomposition

sales |>
  model(STL(Sales)) |>
  components() |>
  autoplot() +
  labs(title = "STL Decomposition of Total Business Sales")

2.4 Train-Test Split (2y forecasting)

train <- sales |> filter_index(. ~ "2023 Nov")
test <- sales |> filter_index("2023 Dec" ~ .)

fit <- train |>
  model(snaive = SNAIVE(Sales))

#24month forecast
fc <- fit |> forecast(h = 24)

#plotting forecast against actual
fc |> autoplot(sales) +
  labs(title = "SNAIVE Forecast vs Actual",
       y = "Millions of Dollars") +
  theme(legend.position = "bottom")

accuracy(fc, sales)
# A tibble: 1 × 10
  .model .type     ME   RMSE    MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>  <chr>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 snaive Test  59372. 71504. 61358.  3.10  3.21 0.763 0.627 0.750

Bias-focused:

  • ME = 559,371 -> SNAIVE tends to overcast on average

  • MPE = 3.1% -> same as above but in percentage, so SNAIVE is 3% to high on average

Variance-Focused:

  • MAE = 61,358 -> forecast is off by about $61 billion on average

3 Applying Transformation

lambda <- train |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero)

lambda
[1] -0.7531239

3.1 SNAIVE on transformed data

#log transform
fit_log <- train |>
  model(snaive_log = SNAIVE(log(Sales)))

fc_log <- fit_log |> forecast(h = 24)

#transformation sqrt
fit_sqrt <- train |>
  model(snaive_sqrt = SNAIVE(box_cox(Sales, 0.5)))

fc_sqrt <- fit_sqrt |> forecast(h = 24)

# Compare all three
accuracy(fc, sales)     
# A tibble: 1 × 10
  .model .type     ME   RMSE    MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>  <chr>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 snaive Test  59372. 71504. 61358.  3.10  3.21 0.763 0.627 0.750
accuracy(fc_log, sales)  
# A tibble: 1 × 10
  .model     .type     ME   RMSE    MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>      <chr>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 snaive_log Test  50807. 63114. 53274.  2.65  2.78 0.662 0.554 0.728
accuracy(fc_sqrt, sales) 
# A tibble: 1 × 10
  .model      .type     ME   RMSE    MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>       <chr>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 snaive_sqrt Test  56031. 68206. 58203.  2.92  3.04 0.724 0.598 0.742

Overall, when comparing all the transformations, the log transformation improve the error metrics the best. Compared to the original SNAIVE model, our MAPE dropped from 3.21% to 2.78%. Other values, such as MAE dropped from 61,358 to 53,274 and ME dropped from 59,372 to 50,807.

Why does this happen?

The STL decomposition shows that the seasonal swings were getting bigger over time. By using the log transformation, we compress the larger values more, which makes the seasonal patterns more consistent.

Note: Lambda was -0.7, so we should have taken the inverse, but this gave me NaN values, so I tried using other transformations.