Module 10 Discussion — Hierarchical Time Series & Mini Mutual Fund

Author

Aryamani Boruah

Part I: Hierarchical vs Grouped Time Series

HTS vs GTS

Hierarchical and grouped time series are both ways of organizing data that can be summed or aggregated, but the key difference is how the structure is defined. In a hierarchical series, there is exactly one path from the bottom to the top. Country rolls up to continent, continent rolls up to global, and each series has a single parent. In a grouped series, the same data can be sliced along multiple dimensions at once. The tourism example from the course notes makes this clear: you can aggregate by state or by purpose, and both are valid at the same time. That flexibility is what makes grouped structures harder to reconcile, because forecasts have to satisfy multiple summation constraints simultaneously rather than just one tree.

Forecast Methods vs Reconciliation Methods

Forecast methods and reconciliation methods are separate steps that serve different purposes. A forecast method — ETS, ARIMA, or similar — generates predictions for a single series. Reconciliation comes after the forecasting step and adjusts predictions across all levels of the hierarchy so they are internally consistent. Without reconciliation, independently fitted models at each level will almost never add up correctly, resulting in contradictory numbers at different levels of the structure.

The Four Reconciliation Methods

The four reconciliation methods differ in where they anchor the forecast and how much information they draw on.

Top-down starts at the aggregate level and pushes forecasts downward using historical proportions. It is simple, stable, and computationally cheap, but it tends to miss local patterns at the bottom level since all the signal comes from the total. When bottom-level series are unreliable or highly volatile, this is often the safest starting point.

Bottom-up goes the opposite direction: models are fit at the most disaggregated level and summed upward. This works well when individual series are reliable and carry meaningful signal, but noisy bottom-level data compounds errors as you aggregate. It is the default recommendation when confidence in detailed-level data is high.

Middle-out picks an intermediate level as the anchor, forecasts there, then aggregates up and disaggregates down. It is a practical compromise when the middle level — often a business-meaningful grouping like sectors or regions — carries the most stable and interpretable signal. It avoids the full volatility of the bottom while still capturing more granularity than the total alone.

MinT (Minimum Trace) is the most theoretically rigorous method. It fits models at all levels simultaneously and uses the full covariance structure of forecast errors across the hierarchy to produce optimally adjusted forecasts. In large hierarchies where accuracy matters, MinT typically outperforms the others. However, it requires a well-estimated error covariance matrix and gets computationally expensive as the hierarchy grows. In small datasets, that estimation can be unstable, and simpler methods sometimes outperform it in practice.


Part II: Mini Mutual Fund — Option II

Setup

For this analysis I built a four-stock portfolio spanning two sectors. The technology sector includes Apple (AAPL) and Microsoft (MSFT), and the healthcare sector includes Johnson & Johnson (JNJ) and Pfizer (PFE). These are large, liquid companies with five full years of reliable data, and the two-stocks-per-sector structure creates a clean three-level hierarchy:

Total Portfolio
├── Tech      (AAPL + MSFT)
└── Healthcare (JNJ  + PFE)

Each stock received a $1,000 initial investment at its January 2019 price, normalizing starting values across positions. The data spans January 2019 through December 2023, split into a four-year training set (2019–2022) and a one-year test set (2023).

Libraries

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsibble)
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr

Attaching package: 'tsibble'

The following object is masked from 'package:lubridate':

    interval

The following objects are masked from 'package:base':

    intersect, setdiff, union
library(fable)
Warning: package 'fable' was built under R version 4.5.2
Loading required package: fabletools
library(fabletools)
library(quantmod)
Loading required package: xts
Loading required package: zoo
Warning: package 'zoo' was built under R version 4.5.2

Attaching package: 'zoo'

The following object is masked from 'package:tsibble':

    index

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric


######################### Warning from 'xts' package ##########################
#                                                                             #
# The dplyr lag() function breaks how base R's lag() function is supposed to  #
# work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
# source() into this session won't work correctly.                            #
#                                                                             #
# Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
# conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
# dplyr from breaking base R's lag() function.                                #
#                                                                             #
# Code in packages is not affected. It's protected by R's namespace mechanism #
# Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
#                                                                             #
###############################################################################

Attaching package: 'xts'

The following objects are masked from 'package:dplyr':

    first, last

Loading required package: TTR
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Data Download & Portfolio Construction

tickers <- c("AAPL", "MSFT", "JNJ", "PFE")

getSymbols(tickers,
           src         = "yahoo",
           from        = "2019-01-01",
           to          = "2023-12-31",
           auto.assign = TRUE)
[1] "AAPL" "MSFT" "JNJ"  "PFE" 
# Helper: monthly adjusted close for one ticker
monthly_adj <- function(sym) {
  x <- get(sym)
  m <- to.monthly(x, indexAt = "lastof", OHLC = FALSE)
  data.frame(
    date  = as.Date(index(Ad(m))),
    price = as.numeric(Ad(m))
  ) |> rename(!!sym := price)
}

prices <- monthly_adj("AAPL") |>
  left_join(monthly_adj("MSFT"), by = "date") |>
  left_join(monthly_adj("JNJ"),  by = "date") |>
  left_join(monthly_adj("PFE"),  by = "date")
# Equal $1,000 investment in each stock at first observation
initial    <- 1000
shares     <- sapply(tickers, \(t) initial / prices[[t]][1])

portfolio <- prices |>
  mutate(
    val_AAPL = AAPL * shares["AAPL"],
    val_MSFT = MSFT * shares["MSFT"],
    val_JNJ  = JNJ  * shares["JNJ"],
    val_PFE  = PFE  * shares["PFE"],
    month    = yearmonth(date)
  )

Hierarchical tsibble

# Long format at stock level
bottom <- portfolio |>
  select(month, val_AAPL, val_MSFT, val_JNJ, val_PFE) |>
  pivot_longer(-month, names_to = "stock", values_to = "value") |>
  mutate(
    stock  = str_remove(stock, "val_"),
    sector = if_else(stock %in% c("AAPL", "MSFT"), "Tech", "Healthcare")
  )

# Hierarchical tsibble with aggregate_key
hts <- bottom |>
  as_tsibble(index = month, key = c(sector, stock)) |>
  aggregate_key(sector / stock, value = sum(value))

# Train / test split
train <- hts |> filter(month <= yearmonth("2022 Dec"))
test  <- hts |> filter(month >  yearmonth("2022 Dec"))

ETS Fit on Training Data

fit <- train |>
  model(ets = ETS(value))

Fitted Values — Total Portfolio (Training Period)

fitted_total  <- fit |>
  augment() |>
  filter(is_aggregated(sector), is_aggregated(stock))

actual_total  <- hts |>
  filter(is_aggregated(sector), is_aggregated(stock))

ggplot() +
  geom_line(
    data = actual_total |> filter(month <= yearmonth("2022 Dec")),
    aes(x = month, y = value, color = "Actual"),
    linewidth = 0.95
  ) +
  geom_line(
    data = fitted_total |> filter(.model == "ets"),
    aes(x = month, y = .fitted, color = "ETS Fitted"),
    linetype = "dashed", linewidth = 0.85
  ) +
  scale_color_manual(values = c("Actual" = "#1f4e79", "ETS Fitted" = "#e36b1e")) +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "Total Portfolio — Actual vs ETS Fitted (Training Period)",
    subtitle = "Jan 2019 – Dec 2022 | $1,000 initial investment per stock",
    x = NULL, y = "Portfolio Value (USD)", color = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))

ETS fitted values on the training period. The model tracks the portfolio’s trend and the 2022 drawdown reasonably well.

Reconciliation: Top-Down, Middle-Out, Bottom-Up

reconciled <- fit |>
  reconcile(
    top_down   = top_down(ets, method = "average_proportions"),
    middle_out = middle_out(ets),
    bottom_up  = bottom_up(ets)
  )

fc <- reconciled |> forecast(h = 12)

Year 5 Forecasts — Total Portfolio

fc_total      <- fc |> filter(is_aggregated(sector), is_aggregated(stock))
actual_test   <- actual_total |> filter(month > yearmonth("2022 Dec"))
actual_tail   <- actual_total |> filter(month >= yearmonth("2021 Jan"),
                                         month <= yearmonth("2022 Dec"))

ggplot() +
  geom_line(
    data = actual_tail,
    aes(x = month, y = value),
    color = "#1f4e79", linewidth = 0.9
  ) +
  geom_line(
    data = actual_test,
    aes(x = month, y = value, color = "Actual 2023"),
    linewidth = 1.1
  ) +
  geom_line(
    data = fc_total,
    aes(x = month, y = .mean, color = .model),
    linetype = "dashed", linewidth = 0.9
  ) +
  geom_vline(
    xintercept = as.numeric(yearmonth("2022 Dec")),
    linetype = "dotted", color = "grey50"
  ) +
  scale_color_manual(values = c(
    "Actual 2023" = "#1f4e79",
    "top_down"    = "#c0392b",
    "middle_out"  = "#27ae60",
    "bottom_up"   = "#8e44ad"
  )) +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "Year 5 Forecast vs Actual — Total Portfolio",
    subtitle = "Reconciliation methods compared | Test period: Jan–Dec 2023",
    x = NULL, y = "Portfolio Value (USD)", color = "Method"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Warning in scale_x_yearmonth(): A <numeric> value was passed to a Date scale.
ℹ The value was converted to a <yearmonth> object.

All three methods are compared against the actual 2023 portfolio value. The shaded historical tail provides context for the year-5 predictions.

Year 5 Forecasts — By Sector

fc_sector     <- fc |> filter(!is_aggregated(sector), is_aggregated(stock))
actual_sector <- hts |> filter(!is_aggregated(sector), is_aggregated(stock))

ggplot() +
  geom_line(
    data = actual_sector |> filter(month >= yearmonth("2021 Jan"),
                                    month <= yearmonth("2022 Dec")),
    aes(x = month, y = value),
    color = "#1f4e79", linewidth = 0.9
  ) +
  geom_line(
    data = actual_sector |> filter(month > yearmonth("2022 Dec")),
    aes(x = month, y = value),
    color = "#1f4e79", linewidth = 1.1
  ) +
  geom_line(
    data = fc_sector,
    aes(x = month, y = .mean, color = .model),
    linetype = "dashed", linewidth = 0.85
  ) +
  geom_vline(
    xintercept = as.numeric(yearmonth("2022 Dec")),
    linetype = "dotted", color = "grey50"
  ) +
  facet_wrap(~sector, scales = "free_y", ncol = 1) +
  scale_color_manual(values = c(
    "top_down"   = "#c0392b",
    "middle_out" = "#27ae60",
    "bottom_up"  = "#8e44ad"
  )) +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "Year 5 Forecast vs Actual — By Sector",
    subtitle = "Tech (AAPL + MSFT) vs Healthcare (JNJ + PFE)",
    x = NULL, y = "Sector Value (USD)", color = "Method"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "bottom",
    plot.title  = element_text(face = "bold"),
    strip.text  = element_text(face = "bold", size = 11)
  )
Warning in scale_x_yearmonth(): A <numeric> value was passed to a Date scale.
ℹ The value was converted to a <yearmonth> object.

Sector-level forecasts highlight the divergence between Tech’s 2023 recovery and Healthcare’s relative flatness — a difference top-down misses by design.

Year 5 Forecasts — Individual Stocks

fc_stock <- fc |> filter(!is_aggregated(stock))

ggplot() +
  geom_line(
    data = bottom |> filter(month >= yearmonth("2021 Jan")),
    aes(x = month, y = value),
    color = "#1f4e79", linewidth = 0.85
  ) +
  geom_line(
    data = fc_stock,
    aes(x = month, y = .mean, color = .model),
    linetype = "dashed", linewidth = 0.8
  ) +
  geom_vline(
    xintercept = as.numeric(yearmonth("2022 Dec")),
    linetype = "dotted", color = "grey50"
  ) +
  facet_wrap(~stock, scales = "free_y", ncol = 2) +
  scale_color_manual(values = c(
    "top_down"   = "#c0392b",
    "middle_out" = "#27ae60",
    "bottom_up"  = "#8e44ad"
  )) +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "Year 5 Forecast vs Actual — Individual Stocks",
    subtitle = "Bottom-level disaggregated forecasts",
    x = NULL, y = "Stock Value (USD)", color = "Method"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Warning in scale_x_yearmonth(): A <numeric> value was passed to a Date scale.
ℹ The value was converted to a <yearmonth> object.

Bottom-level disaggregated forecasts for each stock. Variance in PFE and JNJ is visibly higher than in AAPL and MSFT, which explains why bottom-up can struggle here.

Accuracy Evaluation

Total Portfolio Level

acc_total <- fc |>
  filter(is_aggregated(sector), is_aggregated(stock)) |>
  accuracy(test |> filter(is_aggregated(sector), is_aggregated(stock)))

acc_total |>
  select(Method = .model, RMSE, MAE, MAPE) |>
  mutate(across(where(is.numeric), \(x) round(x, 2))) |>
  arrange(RMSE) |>
  kable(caption = "Forecast Accuracy — Total Portfolio (Year 5)") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
  row_spec(1, bold = TRUE, background = "#d4edda")
Forecast Accuracy — Total Portfolio (Year 5)
Method RMSE MAE MAPE
middle_out 819.62 676.24 6.52
ets 826.72 692.83 6.69
top_down 826.72 692.83 6.69
bottom_up 1310.99 1149.24 11.08

Sector Level

acc_sector <- fc |>
  filter(!is_aggregated(sector), is_aggregated(stock)) |>
  accuracy(test |> filter(!is_aggregated(sector), is_aggregated(stock)))

acc_sector |>
  select(Method = .model, Sector = sector, RMSE, MAE, MAPE) |>
  mutate(across(where(is.numeric), \(x) round(x, 2))) |>
  arrange(Sector, RMSE) |>
  kable(caption = "Forecast Accuracy — Sector Level (Year 5)") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Forecast Accuracy — Sector Level (Year 5)
Method Sector RMSE MAE MAPE
ets Healthcare 525.18 507.97 21.57
middle_out Healthcare 525.18 507.97 21.57
bottom_up Healthcare 539.16 522.41 22.18
top_down Healthcare 811.78 773.25 32.93
ets Tech 1262.04 1144.77 14.51
middle_out Tech 1262.04 1144.77 14.51
top_down Tech 1542.90 1409.08 17.84
bottom_up Tech 1806.19 1667.41 21.16

Interpretation

Middle-out was the best-performing method at the total portfolio level with an RMSE of 819.62, MAE of 676.24, and MAPE of 6.52%. Top-down and the base ETS forecasts were identical at the total level (RMSE 826.72, MAPE 6.69%), meaning the historical proportions used by top-down exactly replicated what the aggregate ETS model produced — not surprising given relatively stable sector weights over the training period. Bottom-up was the clear worst performer with an RMSE of 1310.99 and MAPE of 11.08%, nearly double the error of the other methods.

The sector-level results explain why. Middle-out anchors at the sector, so its sector forecasts are identical to the base ETS forecasts — both show RMSE of 525.18 for Healthcare and 1262.04 for Tech. Top-down performed poorly at the sector level (Healthcare RMSE 811.78, Tech RMSE 1542.90) because it distributes the total using fixed historical proportions and cannot capture the divergence between Tech’s strong 2023 recovery and Healthcare’s relative flatness. Bottom-up was worst for Tech specifically (RMSE 1806.19), where AAPL and MSFT had volatile individual trajectories that compounded when summed.

The takeaway is that the sector level carried the most reliable signal for this portfolio. Middle-out benefited from that directly by anchoring there, while top-down paid a penalty for ignoring sector-specific dynamics and bottom-up paid a penalty for relying entirely on the noisiest level of the hierarchy.


Data sourced from Yahoo Finance via quantmod. All values in USD. Initial investment of $1,000 per stock at January 2019 adjusted close.