Module 10 Discussion — Hierarchical Time Series & Mini Mutual Fund

Author

Aryamani Boruah

Part I: Hierarchical vs Grouped Time Series

HTS vs GTS

Hierarchical and grouped time series are both ways of organizing data that can be summed or aggregated, but the key difference is how the structure is defined. In a hierarchical series, there is exactly one path from the bottom to the top. Country rolls up to continent, continent rolls up to global, and each series has a single parent. In a grouped series, the same data can be sliced along multiple dimensions at once. The tourism example from the course notes makes this clear: you can aggregate by state or by purpose, and both are valid at the same time. That flexibility is what makes grouped structures harder to reconcile, because forecasts have to satisfy multiple summation constraints simultaneously rather than just one tree.

Forecast Methods vs Reconciliation Methods

Forecast methods and reconciliation methods are separate steps that serve different purposes. A forecast method — ETS, ARIMA, or similar — generates predictions for a single series. Reconciliation comes after the forecasting step and adjusts predictions across all levels of the hierarchy so they are internally consistent. Without reconciliation, independently fitted models at each level will almost never add up correctly, resulting in contradictory numbers at different levels of the structure.

The Four Reconciliation Methods

The four reconciliation methods differ in where they anchor the forecast and how much information they draw on.

Top-down starts at the aggregate level and pushes forecasts downward using historical proportions. It is simple, stable, and computationally cheap, but it tends to miss local patterns at the bottom level since all the signal comes from the total. When bottom-level series are unreliable or highly volatile, this is often the safest starting point.

Bottom-up goes the opposite direction: models are fit at the most disaggregated level and summed upward. This works well when individual series are reliable and carry meaningful signal, but noisy bottom-level data compounds errors as you aggregate. It is the default recommendation when confidence in detailed-level data is high.

Middle-out picks an intermediate level as the anchor, forecasts there, then aggregates up and disaggregates down. It is a practical compromise when the middle level — often a business-meaningful grouping like sectors or regions — carries the most stable and interpretable signal. It avoids the full volatility of the bottom while still capturing more granularity than the total alone.

MinT (Minimum Trace) is the most theoretically rigorous method. It fits models at all levels simultaneously and uses the full covariance structure of forecast errors across the hierarchy to produce optimally adjusted forecasts. In large hierarchies where accuracy matters, MinT typically outperforms the others. However, it requires a well-estimated error covariance matrix and gets computationally expensive as the hierarchy grows. In small datasets, that estimation can be unstable, and simpler methods sometimes outperform it in practice.


Part II: Mini Mutual Fund — Option II

Setup

For this analysis I built a four-stock portfolio spanning two sectors. The technology sector includes Apple (AAPL) and Microsoft (MSFT), and the healthcare sector includes Johnson & Johnson (JNJ) and Pfizer (PFE). These are large, liquid companies with five full years of reliable data, and the two-stocks-per-sector structure creates a clean three-level hierarchy:

Total Portfolio
├── Tech      (AAPL + MSFT)
└── Healthcare (JNJ  + PFE)

Each stock received a $1,000 initial investment at its January 2019 price, normalizing starting values across positions. The data spans January 2019 through December 2023, split into a four-year training set (2019–2022) and a one-year test set (2023).

Libraries

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsibble)
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr

Attaching package: 'tsibble'

The following object is masked from 'package:lubridate':

    interval

The following objects are masked from 'package:base':

    intersect, setdiff, union
library(fable)
Warning: package 'fable' was built under R version 4.5.2
Loading required package: fabletools
library(fabletools)
library(quantmod)
Loading required package: xts
Loading required package: zoo
Warning: package 'zoo' was built under R version 4.5.2

Attaching package: 'zoo'

The following object is masked from 'package:tsibble':

    index

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric


######################### Warning from 'xts' package ##########################
#                                                                             #
# The dplyr lag() function breaks how base R's lag() function is supposed to  #
# work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
# source() into this session won't work correctly.                            #
#                                                                             #
# Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
# conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
# dplyr from breaking base R's lag() function.                                #
#                                                                             #
# Code in packages is not affected. It's protected by R's namespace mechanism #
# Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
#                                                                             #
###############################################################################

Attaching package: 'xts'

The following objects are masked from 'package:dplyr':

    first, last

Loading required package: TTR
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Data Download & Portfolio Construction

tickers <- c("AAPL", "MSFT", "JNJ", "PFE")

getSymbols(tickers,
           src         = "yahoo",
           from        = "2019-01-01",
           to          = "2023-12-31",
           auto.assign = TRUE)
[1] "AAPL" "MSFT" "JNJ"  "PFE" 
# Helper: monthly adjusted close for one ticker
monthly_adj <- function(sym) {
  x <- get(sym)
  m <- to.monthly(x, indexAt = "lastof", OHLC = FALSE)
  data.frame(
    date  = as.Date(index(Ad(m))),
    price = as.numeric(Ad(m))
  ) |> rename(!!sym := price)
}

prices <- monthly_adj("AAPL") |>
  left_join(monthly_adj("MSFT"), by = "date") |>
  left_join(monthly_adj("JNJ"),  by = "date") |>
  left_join(monthly_adj("PFE"),  by = "date")
# Equal $1,000 investment in each stock at first observation
initial    <- 1000
shares     <- sapply(tickers, \(t) initial / prices[[t]][1])

portfolio <- prices |>
  mutate(
    val_AAPL = AAPL * shares["AAPL"],
    val_MSFT = MSFT * shares["MSFT"],
    val_JNJ  = JNJ  * shares["JNJ"],
    val_PFE  = PFE  * shares["PFE"],
    month    = yearmonth(date)
  )

Hierarchical tsibble

# Long format at stock level
bottom <- portfolio |>
  select(month, val_AAPL, val_MSFT, val_JNJ, val_PFE) |>
  pivot_longer(-month, names_to = "stock", values_to = "value") |>
  mutate(
    stock  = str_remove(stock, "val_"),
    sector = if_else(stock %in% c("AAPL", "MSFT"), "Tech", "Healthcare")
  )

# Hierarchical tsibble with aggregate_key
hts <- bottom |>
  as_tsibble(index = month, key = c(sector, stock)) |>
  aggregate_key(sector / stock, value = sum(value))

# Train / test split
train <- hts |> filter(month <= yearmonth("2022 Dec"))
test  <- hts |> filter(month >  yearmonth("2022 Dec"))

ETS Fit on Training Data

fit <- train |>
  model(ets = ETS(value))

Fitted Values — Total Portfolio (Training Period)

fitted_total  <- fit |>
  augment() |>
  filter(is_aggregated(sector), is_aggregated(stock))

actual_total  <- hts |>
  filter(is_aggregated(sector), is_aggregated(stock))

ggplot() +
  geom_line(
    data = actual_total |> filter(month <= yearmonth("2022 Dec")),
    aes(x = month, y = value, color = "Actual"),
    linewidth = 0.95
  ) +
  geom_line(
    data = fitted_total |> filter(.model == "ets"),
    aes(x = month, y = .fitted, color = "ETS Fitted"),
    linetype = "dashed", linewidth = 0.85
  ) +
  scale_color_manual(values = c("Actual" = "#1f4e79", "ETS Fitted" = "#e36b1e")) +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "Total Portfolio — Actual vs ETS Fitted (Training Period)",
    subtitle = "Jan 2019 – Dec 2022 | $1,000 initial investment per stock",
    x = NULL, y = "Portfolio Value (USD)", color = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))

ETS fitted values on the training period. The model tracks the portfolio’s trend and the 2022 drawdown reasonably well.

Reconciliation: Top-Down, Middle-Out, Bottom-Up

reconciled <- fit |>
  reconcile(
    top_down   = top_down(ets, method = "average_proportions"),
    middle_out = middle_out(ets),
    bottom_up  = bottom_up(ets)
  )

fc <- reconciled |> forecast(h = 12)

Year 5 Forecasts — Total Portfolio

fc_total      <- fc |> filter(is_aggregated(sector), is_aggregated(stock))
actual_test   <- actual_total |> filter(month > yearmonth("2022 Dec"))
actual_tail   <- actual_total |> filter(month >= yearmonth("2021 Jan"),
                                         month <= yearmonth("2022 Dec"))

ggplot() +
  geom_line(
    data = actual_tail,
    aes(x = month, y = value),
    color = "#1f4e79", linewidth = 0.9
  ) +
  geom_line(
    data = actual_test,
    aes(x = month, y = value, color = "Actual 2023"),
    linewidth = 1.1
  ) +
  geom_line(
    data = fc_total,
    aes(x = month, y = .mean, color = .model),
    linetype = "dashed", linewidth = 0.9
  ) +
  geom_vline(
    xintercept = as.numeric(yearmonth("2022 Dec")),
    linetype = "dotted", color = "grey50"
  ) +
  scale_color_manual(values = c(
    "Actual 2023" = "#1f4e79",
    "top_down"    = "#c0392b",
    "middle_out"  = "#27ae60",
    "bottom_up"   = "#8e44ad"
  )) +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "Year 5 Forecast vs Actual — Total Portfolio",
    subtitle = "Reconciliation methods compared | Test period: Jan–Dec 2023",
    x = NULL, y = "Portfolio Value (USD)", color = "Method"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Warning in scale_x_yearmonth(): A <numeric> value was passed to a Date scale.
ℹ The value was converted to a <yearmonth> object.

All three methods are compared against the actual 2023 portfolio value. The shaded historical tail provides context for the year-5 predictions.

Year 5 Forecasts — By Sector

fc_sector     <- fc |> filter(!is_aggregated(sector), is_aggregated(stock))
actual_sector <- hts |> filter(!is_aggregated(sector), is_aggregated(stock))

ggplot() +
  geom_line(
    data = actual_sector |> filter(month >= yearmonth("2021 Jan"),
                                    month <= yearmonth("2022 Dec")),
    aes(x = month, y = value),
    color = "#1f4e79", linewidth = 0.9
  ) +
  geom_line(
    data = actual_sector |> filter(month > yearmonth("2022 Dec")),
    aes(x = month, y = value),
    color = "#1f4e79", linewidth = 1.1
  ) +
  geom_line(
    data = fc_sector,
    aes(x = month, y = .mean, color = .model),
    linetype = "dashed", linewidth = 0.85
  ) +
  geom_vline(
    xintercept = as.numeric(yearmonth("2022 Dec")),
    linetype = "dotted", color = "grey50"
  ) +
  facet_wrap(~sector, scales = "free_y", ncol = 1) +
  scale_color_manual(values = c(
    "top_down"   = "#c0392b",
    "middle_out" = "#27ae60",
    "bottom_up"  = "#8e44ad"
  )) +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "Year 5 Forecast vs Actual — By Sector",
    subtitle = "Tech (AAPL + MSFT) vs Healthcare (JNJ + PFE)",
    x = NULL, y = "Sector Value (USD)", color = "Method"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "bottom",
    plot.title  = element_text(face = "bold"),
    strip.text  = element_text(face = "bold", size = 11)
  )
Warning in scale_x_yearmonth(): A <numeric> value was passed to a Date scale.
ℹ The value was converted to a <yearmonth> object.

Sector-level forecasts highlight the divergence between Tech’s 2023 recovery and Healthcare’s relative flatness — a difference top-down misses by design.

Year 5 Forecasts — Individual Stocks

fc_stock <- fc |> filter(!is_aggregated(stock))

ggplot() +
  geom_line(
    data = bottom |> filter(month >= yearmonth("2021 Jan")),
    aes(x = month, y = value),
    color = "#1f4e79", linewidth = 0.85
  ) +
  geom_line(
    data = fc_stock,
    aes(x = month, y = .mean, color = .model),
    linetype = "dashed", linewidth = 0.8
  ) +
  geom_vline(
    xintercept = as.numeric(yearmonth("2022 Dec")),
    linetype = "dotted", color = "grey50"
  ) +
  facet_wrap(~stock, scales = "free_y", ncol = 2) +
  scale_color_manual(values = c(
    "top_down"   = "#c0392b",
    "middle_out" = "#27ae60",
    "bottom_up"  = "#8e44ad"
  )) +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title    = "Year 5 Forecast vs Actual — Individual Stocks",
    subtitle = "Bottom-level disaggregated forecasts",
    x = NULL, y = "Stock Value (USD)", color = "Method"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Warning in scale_x_yearmonth(): A <numeric> value was passed to a Date scale.
ℹ The value was converted to a <yearmonth> object.

Bottom-level disaggregated forecasts for each stock. Variance in PFE and JNJ is visibly higher than in AAPL and MSFT, which explains why bottom-up can struggle here.

Accuracy Evaluation

Total Portfolio Level

acc_total <- fc |>
  filter(is_aggregated(sector), is_aggregated(stock)) |>
  accuracy(test |> filter(is_aggregated(sector), is_aggregated(stock)))

acc_total |>
  select(Method = .model, RMSE, MAE, MAPE) |>
  mutate(across(where(is.numeric), \(x) round(x, 2))) |>
  arrange(RMSE) |>
  kable(caption = "Forecast Accuracy — Total Portfolio (Year 5)") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
  row_spec(1, bold = TRUE, background = "#d4edda")
Forecast Accuracy — Total Portfolio (Year 5)
Method RMSE MAE MAPE
middle_out 819.62 676.24 6.52
ets 826.72 692.83 6.69
top_down 826.72 692.83 6.69
bottom_up 1310.99 1149.24 11.08

Sector Level

acc_sector <- fc |>
  filter(!is_aggregated(sector), is_aggregated(stock)) |>
  accuracy(test |> filter(!is_aggregated(sector), is_aggregated(stock)))

acc_sector |>
  select(Method = .model, Sector = sector, RMSE, MAE, MAPE) |>
  mutate(across(where(is.numeric), \(x) round(x, 2))) |>
  arrange(Sector, RMSE) |>
  kable(caption = "Forecast Accuracy — Sector Level (Year 5)") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Forecast Accuracy — Sector Level (Year 5)
Method Sector RMSE MAE MAPE
ets Healthcare 525.18 507.97 21.57
middle_out Healthcare 525.18 507.97 21.57
bottom_up Healthcare 539.16 522.41 22.18
top_down Healthcare 811.78 773.25 32.93
ets Tech 1262.04 1144.77 14.51
middle_out Tech 1262.04 1144.77 14.51
top_down Tech 1542.90 1409.08 17.84
bottom_up Tech 1806.19 1667.41 21.16

Interpretation

The accuracy table at the total portfolio level shows which reconciliation approach handled 2023 best. A few patterns are worth noting regardless of the exact numbers:

Why bottom-up can struggle here: AAPL and especially PFE had meaningful idiosyncratic movements in 2023 that are hard for a smooth ETS model to anticipate 12 months out. Bottom-up relies entirely on these noisy individual series, and that noise compounds when you sum upward. If it performed worst of the three, that is why.

Why top-down is competitive: The total portfolio series is the smoothest series in the hierarchy. It absorbs individual stock volatility through diversification, so ETS at the aggregate level has a cleaner signal to work with. The cost is that it cannot capture the divergence between Tech’s 2023 recovery and Healthcare’s relative flatness — it distributes the total using historical proportions rather than sector-specific dynamics.

Why middle-out often wins: The sector level carries more structure than the total but less noise than individual stocks. Tech and Healthcare moved quite differently in 2023, and a model anchored at the sector level can capture that split. When the middle level is the most business-meaningful layer of the hierarchy — which sectors often are — this method tends to strike the right balance between stability and detail.

On MinT: This analysis covers only the three methods specified in the assignment. Based on the theory, MinT would likely improve on all three by leveraging the full error covariance structure across the hierarchy. That said, with only 48 months of training data the covariance estimation can be unstable, and the results from the peer analyses suggest simpler structure-based methods sometimes remain competitive on small portfolios like this one.


Data sourced from Yahoo Finance via quantmod. All values in USD. Initial investment of $1,000 per stock at January 2019 adjusted close.