Module 10 Discussion

Author

Ethan Wright

Part 1

HTS vs GTS

A hierarchical time series is one that is unambiguously aggregated from parent to the lowest level child. For example, you have a country like the US. The US has 50 mutually exclusive states, which have cities, which have areas. There is only one way to aggregate this structure. A grouped time series is one that can be aggregated in many ways. This means that you can aggregate one variable into another and vice versa. An example would be sales data between regions and products. You can aggregate sales by region or by product. The interpretation would would change slightly.

Forecast method vs Reconciliation method

Forecasting is the method of predicting how a time series will move in the future. A few examples are ETS and ARIMA. A reconciliation is the aggregation of individual forecasts through different methods like top-down and bottom-up.

4 methods

Top-Down: It is the simplest method for total forecasts, but it can lose data the further down you go.

Bottom-Up: Forecasts from the bottom and aggregates up. You maintain the structure and accuracy at the bottom level, but you lose accuracy at the top since it doesn’t borrow information from the individual time series.

Middle-Out: Sits in the middle of TD and BU and gives a decent forecast in terms of accuracy. It is really useful when you are targeting the middle group of a data set, but is unreliable at either end.

Mint: The most rigorous method because it borrows data across levels by using the covariance matrix of forecast errors on all levels. It’s downfall is in the computational intensity and in small datasets because of unstable errors.

Part 2 (Option 2)

library(fpp3)
── Attaching packages ──────────────────────────────────────────── fpp3 1.0.3 ──
✔ tibble      3.3.1     ✔ tsibble     1.2.0
✔ dplyr       1.2.1     ✔ tsibbledata 0.4.1
✔ tidyr       1.3.2     ✔ ggtime      0.2.0
✔ lubridate   1.9.5     ✔ feasts      0.5.0
✔ ggplot2     4.0.2     ✔ fable       0.5.0
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.1     ✔ readr   2.2.0
✔ purrr   1.2.1     ✔ stringr 1.6.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()     masks stats::filter()
✖ tsibble::interval() masks lubridate::interval()
✖ dplyr::lag()        masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyquant)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.12 ──
✔ PerformanceAnalytics 2.1.0      ✔ TTR                  0.24.4
✔ quantmod             0.4.28     ✔ xts                  0.14.2── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
✖ zoo::as.Date()                 masks base::as.Date()
✖ zoo::as.Date.numeric()         masks base::as.Date.numeric()
✖ dplyr::filter()                masks stats::filter()
✖ xts::first()                   masks dplyr::first()
✖ zoo::index()                   masks tsibble::index()
✖ tsibble::interval()            masks lubridate::interval()
✖ dplyr::lag()                   masks stats::lag()
✖ xts::last()                    masks dplyr::last()
✖ PerformanceAnalytics::legend() masks graphics::legend()
✖ quantmod::summary()            masks base::summary()
✖ tidyquant::VAR()               masks fable::VAR()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'tidyquant'

The following object is masked from 'package:fable':

    VAR
tickers <- c("NVDA", "AVGO", "HUN", "LYB")
sectors <- c("Tech", "Tech", "Chemicals", "Chemicals")

prices_raw <- tq_get(tickers,
                     from = "2020-04-01",
                     to = "2025-03-31",
                     get = "stock.prices")|>
  select(symbol, date, adjusted)

prices_monthly <- prices_raw |>
  mutate(month = yearmonth(date)) |>
  group_by(symbol, month) |>
  slice_max(date, n = 1) |>
  ungroup() |>
  select(symbol, month, adjusted)
initial_prices <- prices_monthly |>
  filter(month == min(month)) |>
  select(symbol, p0 = adjusted)

port_wide <- prices_monthly |>
  left_join(initial_prices, by = "symbol") |>
  mutate(shares = 10000 / p0,
         value  = adjusted * shares) |>
  select(symbol, month, value) |>
  pivot_wider(names_from = symbol, values_from = value) |>
  mutate(
    Technology = NVDA + AVGO,
    Chemicals  = LYB  + HUN,
    Total      = Technology + Chemicals
  )

port_wide
# A tibble: 60 × 8
      month   AVGO    HUN    LYB   NVDA Technology Chemicals  Total
      <mth>  <dbl>  <dbl>  <dbl>  <dbl>      <dbl>     <dbl>  <dbl>
 1 2020 Apr 10000  10000  10000  10000      20000     20000  40000 
 2 2020 May 10723. 10797. 11003. 12147.     22870.    21800. 44670.
 3 2020 Jun 11739. 10789. 11504. 13004.     24743.    22293. 47037.
 4 2020 Jul 11782. 11107. 10944. 14533.     26315.    22051. 48366.
 5 2020 Aug 12912. 12981. 11639. 18312.     31224.    24620. 55844.
 6 2020 Sep 13675. 13431. 12530. 18531.     32206.    25961. 58167.
 7 2020 Oct 13123. 14689. 12167. 17166.     30290.    26856. 57146.
 8 2020 Nov 15073. 14979. 15307. 18354.     33428.    30286. 63714.
 9 2020 Dec 16575. 15300. 16487. 17885.     34460.    31787. 66247.
10 2021 Jan 17053. 16079. 15426. 17796.     34849.    31505. 66354.
# ℹ 50 more rows
port_long <- port_wide|>
  pivot_longer(cols = c(NVDA, AVGO, LYB, HUN, Technology, Chemicals, Total),
               names_to = "series",
               values_to = "value")|>
  as_tsibble(index = month, key = series)

train <- port_long |> filter(month <= yearmonth(as.Date('2024-03-01')))
test  <- port_long |> filter(month >  yearmonth(as.Date('2024-03-01')))

train |> distinct(series) |> pull(series)
[1] "AVGO"       "Chemicals"  "HUN"        "LYB"        "NVDA"      
[6] "Technology" "Total"     
train |> count(series)
# A tibble: 7 × 2
  series         n
  <chr>      <int>
1 AVGO          48
2 Chemicals     48
3 HUN           48
4 LYB           48
5 NVDA          48
6 Technology    48
7 Total         48
fit <- train |>
  model(ets = ETS(value))

hist_props <- train |>
  as_tibble() |>
  pivot_wider(names_from = series, values_from = value) |>
  summarise(
    tech_prop = mean(Technology / Total),
    chem_prop = mean(Chemicals  / Total),
    nvda_prop = mean(NVDA / Technology),
    avgo_prop = mean(AVGO / Technology),
    lyb_prop  = mean(LYB  / Chemicals),
    hun_prop  = mean(HUN  / Chemicals)
  )

fc_bu <- fit |>
  forecast(h=12) |>
  filter(series %in% c("NVDA", "AVGO", "LYB", "HUN")) |>
  as_tibble()|>
  select(month, series, .mean) |>
  pivot_wider(names_from = series, values_from = .mean) |>
  mutate(
    Technology = NVDA + AVGO,
    Chemicals  = LYB  + HUN,
    Total      = Technology + Chemicals,
    method     = "Bottom-Up"
  )

fc_td <- fit |>
  forecast(h = 12) |>
  filter(series == "Total") |>
  as_tibble() |>
  select(month, Total = .mean) |>
  mutate(
    Technology = Total * hist_props$tech_prop,
    Chemicals  = Total * hist_props$chem_prop,
    NVDA       = Technology * hist_props$nvda_prop,
    AVGO       = Technology * hist_props$avgo_prop,
    LYB        = Chemicals  * hist_props$lyb_prop,
    HUN        = Chemicals  * hist_props$hun_prop,
    method     = "Top-Down"
  )

fc_mo <- fit |>
  forecast(h = 12) |>
  filter(series %in% c("Technology", "Chemicals")) |>
  as_tibble() |>
  select(month, series, .mean) |>
  pivot_wider(names_from = series, values_from = .mean) |>
  mutate(
    Total      = Technology + Chemicals,
    NVDA       = Technology * hist_props$nvda_prop,
    AVGO       = Technology * hist_props$avgo_prop,
    LYB        = Chemicals  * hist_props$lyb_prop,
    HUN        = Chemicals  * hist_props$hun_prop,
    method     = "Middle-Out"
  )
fit |>
  filter(series %in% c("Total", "Technology", "Chemicals")) |>
  forecast(h = 12) |>
  autoplot(port_long |> filter(series %in% c("Total", "Technology", "Chemicals")),
           level = c(80, 95)) +
  facet_wrap(~series, scales = "free_y", ncol = 1) +
  labs(title = "ETS Forecast with Confidence Intervals",
       subtitle = "Shaded = 80% and 95% prediction intervals",
       x = NULL, y = "Portfolio Value ($)") +
  theme_minimal()

actual_long <- test |>
  as_tibble() |>
  filter(series %in% c("Total", "Technology", "Chemicals")) |>
  select(month, series, value)

train_actual <- train |>
  as_tibble() |>
  filter(series %in% c("Total", "Technology", "Chemicals")) |>
  select(month, series, value)

bind_rows(
  fc_bu |> select(month, Total, Technology, Chemicals, method),
  fc_td |> select(month, Total, Technology, Chemicals, method),
  fc_mo |> select(month, Total, Technology, Chemicals, method)
) |>
  pivot_longer(cols = c(Total, Technology, Chemicals),
               names_to = "series", values_to = "forecast") |>
  left_join(actual_long, by = c("month", "series")) |>
  ggplot(aes(x = month)) +
  geom_line(data = train_actual, aes(y = value), colour = "black", linewidth = 0.8) +
  geom_line(aes(y = value), colour = "black", linetype = "dashed", linewidth = 0.8) +
  geom_line(aes(y = forecast, colour = method), linewidth = 0.8) +
  facet_wrap(~series, scales = "free_y", ncol = 1) +
  labs(title = "Reconciliation Forecasts vs Actuals",
       subtitle = "Black solid = Train | Black dashed = Actual test | Coloured = Reconciled forecast",
       x = NULL, y = "Portfolio Value ($)", colour = "Method") +
  theme_minimal()

actual_total <- test |>
  filter(series == "Total") |>
  as_tibble() |>
  select(month, actual = value)

eval <- bind_rows(
  fc_bu |> select(month, Total, method),
  fc_td |> select(month, Total, method),
  fc_mo |> select(month, Total, method)
) |>
  left_join(actual_total, by = "month") |>
  group_by(method) |>
  summarise(
    RMSE = sqrt(mean((actual - Total)^2)),
    MAE  = mean(abs(actual - Total)),
    MAPE = mean(abs((actual - Total) / actual)) * 100
  )

eval
# A tibble: 3 × 4
  method       RMSE    MAE  MAPE
  <chr>       <dbl>  <dbl> <dbl>
1 Bottom-Up  46760. 42711.  15.3
2 Middle-Out 46001. 41954.  15.0
3 Top-Down   38240. 34905.  12.5