Module 10 Discussion — Option 2: Hierarchical Forecasting of a Tiny Mutual Fund

Author

Adrian Aziza

Published

May 14, 2026

Part I — HTS vs GTS, Forecast vs Reconciliation, and the Four Methods

A hierarchical time series is a strict tree: each series has exactly one parent and aggregation flows up a single unambiguous path (Country → State → City; or for what we’re doing below, Mutual Fund → Sector → Stock). A grouped time series allows the same observations to be classified along multiple independent dimensions, so more than one valid aggregation path leads to the same total. The Australian tourism case is canonical: visits cross-classify by State and by Purpose, so Total = sum across states = sum across purposes — two simultaneous constraints over the same data. The structural consequence isn’t bookkeeping. Under GTS, top-down and middle-out fail because there is no unique parent-to-child path and no single meaningful “middle” level, so only bottom-up and MinT remain valid. HTS has fewer constraints and admits all four reconciliation methods; GTS has overlapping constraints and rules half of them out.

Forecast methods and reconciliation methods do different jobs and stack on top of each other. A forecast method (ARIMA, ETS, Prophet, regression) produces an unreconciled forecast for each series independently. Because each model was fit in isolation, the resulting forecasts generally don’t add up — the children won’t sum to their parent, the leaves won’t sum to the total. A reconciliation method takes those independent forecasts and adjusts them so the hierarchy is internally coherent. Reconciliation can also improve accuracy by borrowing strength across levels, but its first job is coherence. Without reconciliation we have nothing to enforce; without enforcement, our levels are arbitrary initial conditions that don’t talk to each other.

Top-Down forecasts only the total, then disaggregates downward using historical proportions. Always coherent, cheap, and the top forecast benefits from a smoother series — but the disaggregated forecasts inherit whichever proportion rule was picked and lose series-specific signal. It is an a priori commitment that signal lives at the top.

Bottom-Up forecasts every leaf and sums upward. Always coherent, preserves each leaf’s dynamics, but aggregating noisy bottom forecasts produces a noisy total — errors accumulate. The opposite a priori commitment: that signal lives at the leaves.

Middle-Out forecasts an intermediate level, then aggregates upward and disaggregates downward. A compromise that often wins when the middle level (sectors, regions) is the cleanest — but it still pre-commits to where the signal is, just at a less extreme position.

MinT (Optimal Reconciliation) is structurally different. It forecasts at every level, then weighted-combines them using the cross-covariance of in-sample forecast errors. “Minimum trace” means minimizing the sum of the diagonal of the reconciled forecast error covariance matrix — the total forecast-error variance summed across the hierarchy. Wickramasuriya, Athanasopoulos, and Hyndman (2019) proved this has a closed-form solution under unbiased reconciliation; the shrinkage variant (mint_shrink) stabilizes the covariance estimate when there are more series than observations. The mechanism: when errors across levels aren’t perfectly correlated, the weighted combination has variance lower than any single level’s forecast — the same principle as portfolio diversification, applied to forecast errors instead of returns. What distinguishes MinT from the other three is that it doesn’t commit a priori to where the signal lives. The weights are inferred from the data after the fact, not imposed before it. That is why it is preferred under GTS and why it is the default when accuracy matters more than runtime.

The standard analogy for HTS is geographic: Country → State → City. For a mutual fund the analogy inverts. The companies are the countries, because each company is itself a foundational economic unit with its own internal dynamics. Sectors are the regions, aggregating companies with shared structural features. The mutual fund is the city — the small, constructed, tractable thing we’re computing. Above all of it sits the actual economy, which isn’t in the model and isn’t supposed to be. The fund is what makes the structure tractable.

Part II — Option 2: A Tiny Mutual Fund

Setup

We build a four-stock equal-dollar portfolio across two sectors:

Technology: AAPL, MSFT
Healthcare: JNJ, PFE

We pull five years of monthly adjusted closing prices, normalize each stock to an equal starting dollar investment (so portfolio value is directly comparable across stocks), and treat the portfolio as a strict hierarchical time series:

Total Portfolio
├── Tech
│   ├── AAPL
│   └── MSFT
└── Healthcare
    ├── JNJ
    └── PFE

We train on the first 48 months (Years 1–4) and test on the last 12 months (Year 5), forecasting the total portfolio value with top-down, middle-out, and bottom-up reconciliation (MinT included as a bonus baseline).

library(tidyverse)
library(tidyquant)
library(fable)
library(fabletools)
library(tsibble)
library(lubridate)
library(feasts)
library(knitr)

theme_set(theme_minimal(base_size = 11))
set.seed(42)

1. Data acquisition

Five years of monthly adjusted closes, ending at the most recent completed month. Adjust the end_date if you want a fixed window for reproducibility.

tickers <- c("AAPL", "MSFT", "JNJ", "PFE")
sector_map <- c(AAPL = "Tech", MSFT = "Tech",
                JNJ  = "Healthcare", PFE = "Healthcare")

end_date   <- as.Date("2026-04-30")
start_date <- end_date - years(5) - months(1)  # extra month to anchor first-of-month indexing

prices_daily <- tq_get(tickers,
                       from = start_date,
                       to   = end_date,
                       get  = "stock.prices")

# Collapse to month-end adjusted closes
prices_monthly <- prices_daily %>%
  group_by(symbol) %>%
  tq_transmute(select     = adjusted,
               mutate_fun = to.monthly,
               indexAt    = "lastof") %>%
  ungroup() %>%
  mutate(yearmonth = yearmonth(date),
         sector    = sector_map[symbol])

# Normalize: each stock starts at $10,000 (equal dollar investment)
prices_norm <- prices_monthly %>%
  group_by(symbol) %>%
  arrange(yearmonth, .by_group = TRUE) %>%
  mutate(value = 10000 * adjusted / first(adjusted)) %>%
  ungroup() %>%
  select(symbol, sector, yearmonth, value)

head(prices_norm)

# A tibble: 6 × 4
  symbol sector yearmonth  value
  <chr>  <chr>      <mth>  <dbl>
1 AAPL   Tech    2021 Mar 10000 
2 AAPL   Tech    2021 Apr 10762.
3 AAPL   Tech    2021 May 10219.
4 AAPL   Tech    2021 Jun 11231.
5 AAPL   Tech    2021 Jul 11961.
6 AAPL   Tech    2021 Aug 12470.

2. Exploratory plot

Visualize the four stocks (equal-dollar normalized) over the full five years.

prices_norm %>%
  ggplot(aes(x = yearmonth, y = value, color = symbol)) +
  geom_line(linewidth = 0.7) +
  facet_wrap(~ sector) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title    = "Equal-dollar position values ($10,000 initial each)",
       subtitle = "Monthly adjusted-close, normalized to $10,000 start",
       x = NULL, y = "Position value", color = NULL) +
  theme(legend.position = "bottom")

3. Build the hierarchical tsibble

aggregate_key(sector / symbol, value = sum(value)) creates the Total / Sector / Symbol nesting and aggregates accordingly.

portfolio_ts <- prices_norm %>%
  as_tsibble(key = c(sector, symbol), index = yearmonth) %>%
  aggregate_key(sector / symbol, value = sum(value))

portfolio_ts %>%
  filter(is_aggregated(symbol)) %>%
  head(10)

# A tsibble: 10 x 4 [1M]
# Key:       sector, symbol [1]

   yearmonth sector       symbol        value
       <mth> <chr*>       <chr*>        <dbl>
 1  2021 Mar <aggregated> <aggregated> 40000 
 2  2021 Apr <aggregated> <aggregated> 42028.
 3  2021 May <aggregated> <aggregated> 41991.
 4  2021 Jun <aggregated> <aggregated> 43750.
 5  2021 Jul <aggregated> <aggregated> 46657.
 6  2021 Aug <aggregated> <aggregated> 48948.
 7  2021 Sep <aggregated> <aggregated> 45673.
 8  2021 Oct <aggregated> <aggregated> 48762.
 9  2021 Nov <aggregated> <aggregated> 52614.
10  2021 Dec <aggregated> <aggregated> 56315.

Plot the three aggregation levels:

# ggplot can't auto-scale aggregated vectors (agg_vec) — format() them to
# character first, but compute the level labels while is_aggregated() still works.
hierarchy_plot_data <- portfolio_ts %>%
  as_tibble() %>%
  mutate(
    level = case_when(
      is_aggregated(sector) & is_aggregated(symbol) ~ "Total Portfolio",
      is_aggregated(symbol)                          ~ "Sector level",
      TRUE                                           ~ "Stock level"
    ),
    level      = factor(level, levels = c("Total Portfolio", "Sector level", "Stock level")),
    sector_lbl = format(sector),
    symbol_lbl = format(symbol)
  )

hierarchy_plot_data %>%
  ggplot(aes(x = yearmonth, y = value,
             group = interaction(sector_lbl, symbol_lbl),
             color = sector_lbl)) +
  geom_line(linewidth = 0.6, alpha = 0.85) +
  facet_wrap(~ level, scales = "free_y", ncol = 1) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "Three levels of the portfolio hierarchy",
       x = NULL, y = NULL, color = "Sector") +
  theme(legend.position = "bottom")

4. Train / test split

# Year 5 cutoff: last 12 months are test
cutoff <- max(portfolio_ts$yearmonth) - 11

train <- portfolio_ts %>% filter(yearmonth <  cutoff)
test  <- portfolio_ts %>% filter(yearmonth >= cutoff)

range(train$yearmonth)

<yearmonth[2]>
[1] "2021 Mar" "2025 Apr"

range(test$yearmonth)

<yearmonth[2]>
[1] "2025 May" "2026 Apr"

5. Fit and reconcile

Base model is ETS at each level; the four reconciliation strategies wrap it. middle_out(split = 1) uses the sector level as the middle (one level below top).

fits <- train %>%
  model(base = ETS(value)) %>%
  reconcile(
    bu   = bottom_up(base),
    td   = top_down(base, method = "forecast_proportions"),
    mo   = middle_out(base, split = 1),
    mint = min_trace(base, method = "mint_shrink")
  )

fits

# A mable: 7 x 7
# Key:     sector, symbol [7]
  sector       symbol               base bu           td           mo          
  <chr*>       <chr*>            <model> <model>      <model>      <model>     
1 Healthcare   JNJ          <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
2 Healthcare   PFE          <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
3 Healthcare   <aggregated> <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
4 Tech         AAPL         <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
5 Tech         MSFT         <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
6 Tech         <aggregated> <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
7 <aggregated> <aggregated> <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
# ℹ 1 more variable: mint <model>

6. Forecast year 5

fc <- fits %>% forecast(h = 12)

7. Fitted values vs. training data (Total level)

fitted_total <- fits %>%
  augment() %>%
  filter(is_aggregated(sector), is_aggregated(symbol))

train_total <- train %>%
  filter(is_aggregated(sector), is_aggregated(symbol))

ggplot() +
  geom_line(data = train_total,
            aes(x = yearmonth, y = value),
            color = "black", linewidth = 0.8) +
  geom_line(data = fitted_total,
            aes(x = yearmonth, y = .fitted, color = .model),
            linewidth = 0.6, alpha = 0.85) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title    = "Fitted vs. actual on training data — Total Portfolio",
       subtitle = "Black = actual; colored = fitted by reconciliation method",
       x = NULL, y = "Portfolio value", color = "Method") +
  theme(legend.position = "bottom")

8. Forecasts vs. actuals on test period (Total level)

# Slice to the Total level for plotting
fc_total <- fc %>%
  filter(is_aggregated(sector), is_aggregated(symbol))

train_total <- train %>%
  filter(is_aggregated(sector), is_aggregated(symbol))

test_total <- test %>%
  filter(is_aggregated(sector), is_aggregated(symbol))
fc_total_pi <- fc_total %>%
  hilo(level = 80) %>%
  as_tibble() %>%
  mutate(
    lower = `80%`$lower,
    upper = `80%`$upper,
    plot_date = as.Date(yearmonth),
    .method = factor(
      .model,
      levels = c("base", "bu", "mint", "mo", "td"),
      labels = c(
        "Base / unreconciled",
        "Bottom-up",
        "MinT",
        "Middle-out",
        "Top-down"
      )
    )
  )

train_total_tbl <- as_tibble(train_total) %>%
  mutate(plot_date = as.Date(yearmonth))

test_total_tbl <- as_tibble(test_total) %>%
  mutate(plot_date = as.Date(yearmonth))

test_start <- min(test_total_tbl$plot_date)
test_end   <- max(test_total_tbl$plot_date)

context_start <- test_start - 275

train_context_tbl <- train_total_tbl %>%
  filter(plot_date >= context_start)

y_rng <- range(
  train_context_tbl$value,
  test_total_tbl$value,
  fc_total_pi$lower,
  fc_total_pi$upper,
  na.rm = TRUE
)

y_pad <- diff(y_rng) * 0.08

test_band <- tibble(
  xmin = test_start,
  xmax = test_end,
  ymin = -Inf,
  ymax = Inf
)

ggplot() +
  geom_rect(
    data = test_band,
    aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
    inherit.aes = FALSE,
    fill = "grey90",
    alpha = 0.5
  ) +
  geom_line(
    data = train_context_tbl,
    aes(x = plot_date, y = value),
    color = "grey55",
    linewidth = 0.75
  ) +
  geom_ribbon(
    data = fc_total_pi,
    aes(x = plot_date, ymin = lower, ymax = upper, fill = .method),
    alpha = 0.22,
    show.legend = FALSE
  ) +
  geom_line(
    data = fc_total_pi,
    aes(x = plot_date, y = .mean, color = .method),
    linewidth = 0.95,
    show.legend = FALSE
  ) +
  geom_line(
    data = test_total_tbl,
    aes(x = plot_date, y = value),
    color = "black",
    linewidth = 1.05,
    linetype = "longdash"
  ) +
  geom_point(
    data = test_total_tbl,
    aes(x = plot_date, y = value),
    color = "black",
    size = 1.5
  ) +
  facet_wrap(~ .method, ncol = 2) +
  scale_x_date(
    date_breaks = "3 months",
    date_labels = "%b\n%Y",
    expand = expansion(mult = c(0.01, 0.03))
  ) +
  scale_y_continuous(labels = scales::dollar_format()) +
  coord_cartesian(
    ylim = c(y_rng[1] - y_pad, y_rng[2] + y_pad)
  ) +
  labs(
    title = "Year-5 forecasts vs. actuals — Total Portfolio",
    subtitle = "Grey = final training lead-in; shaded region = Year-5 test window; colored band = 80% PI; black dashed = observed value.",
    x = NULL,
    y = "Portfolio value"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.minor = element_blank(),
    strip.text = element_text(face = "bold"),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(size = 10),
    legend.position = "none"
  )

The zoomed panels make the result clearer: all reconciliation methods generate nearly the same total-level forecast, and the main failure is not reconciliation but the shared ETS base forecast missing the Year-5 upward move. ## 9. Accuracy at the Total level

acc_total <- fc %>%
  accuracy(portfolio_ts,
           measures = list(RMSE = RMSE, MAE = MAE, MAPE = MAPE, MASE = MASE)) %>%
  filter(is_aggregated(sector), is_aggregated(symbol)) %>%
  select(.model, RMSE, MAE, MAPE, MASE) %>%
  arrange(MAPE)

acc_total %>% kable(digits = 2, caption = "Test-set accuracy at the Total Portfolio level")

Test-set accuracy at the Total Portfolio level
.model	RMSE	MAE	MAPE	MASE
base	10945.89	10202.11	15.58	2.17
td	10945.89	10202.11	15.58	2.17
mo	10962.28	10219.70	15.61	2.17
mint	10967.41	10225.20	15.62	2.17
bu	10998.52	10258.56	15.67	2.18

10. Accuracy at all levels (bonus)

acc_all <- fc %>%
  accuracy(portfolio_ts,
           measures = list(RMSE = RMSE, MAE = MAE, MAPE = MAPE)) %>%
  mutate(level = case_when(
    is_aggregated(sector) & is_aggregated(symbol) ~ "Total",
    is_aggregated(symbol)                          ~ "Sector",
    TRUE                                           ~ "Stock"
  )) %>%
  group_by(level, .model) %>%
  summarise(RMSE = mean(RMSE), MAE = mean(MAE), MAPE = mean(MAPE),
            .groups = "drop") %>%
  arrange(level, MAPE)

acc_all %>% kable(digits = 2, caption = "Mean accuracy by hierarchy level")

Mean accuracy by hierarchy level
level	.model	RMSE	MAE	MAPE
Sector	td	5872.29	5118.50	15.43
Sector	base	5880.01	5126.28	15.46
Sector	mo	5880.01	5126.28	15.46
Sector	mint	5882.96	5130.22	15.46
Sector	bu	5897.87	5145.71	15.50
Stock	td	3164.27	2691.30	15.13
Stock	mint	3169.05	2695.65	15.15
Stock	mo	3167.79	2694.37	15.15
Stock	base	3175.68	2700.83	15.18
Stock	bu	3175.68	2700.83	15.18
Total	base	10945.89	10202.11	15.58
Total	td	10945.89	10202.11	15.58
Total	mo	10962.28	10219.70	15.61
Total	mint	10967.41	10225.20	15.62
Total	bu	10998.52	10258.56	15.67

11. Discussion

The four methods commit differently to where signal lives in the hierarchy. Top-down asserts that the total is the cleanest series and disaggregates down. Bottom-up asserts that the leaves carry the real dynamics and sums up. Middle-out splits the difference and trusts the sector level. All three pre-commit to a structural answer before seeing the data — they impose an arbitrary initial condition on the weighting between levels. MinT is the only method that defers that commitment. It forecasts at every level, then uses the in-sample residual covariance to weight the levels against each other, so the relative trust given to total, sectors, and stocks is inferred from the data rather than imposed before it. For an equal-weight portfolio of four stocks across two sectors with 48 months of training data, this is the structurally right move: the hierarchy is small enough that estimating the residual covariance is stable (mint_shrink handles the rest), and there is no a priori reason to believe the total, the sectors, or the individual stocks are the cleanest signal. MinT should win.

The result

MinT placed fourth of five on total-portfolio MAPE; top-down and the unreconciled base model tied for first at 15.58, mo and mint followed at 15.61 and 15.62, and bu came last at 15.67. All five methods landed within 0.09 MAPE points of each other, which is itself the headline: on this twelve-month window, reconciliation choice was nearly irrelevant relative to the base forecast’s own accuracy ceiling. The td=base tie is mathematical, not coincidental — top-down by construction does not alter the total forecast, it only redistributes the total downward across the leaves, so at the total level top-down and the unreconciled base ETS forecast are identical to every decimal. The visual near-identity of the five forecast panels reinforces the same finding: reconciliation only does visible work when the base forecasts at different levels disagree, and here ETS produced base forecasts at the total, sector, and stock levels that were already nearly coherent, so each method was adjusting numbers that already nearly summed correctly. None of the methods captured the late-year upward move because all five share the same ETS base forecasts; the limiting factor on this window is the base model’s blindness to late-window inflection, not the reconciliation choice layered on top. The methods that pre-committed to a structural answer — top-down trusting the total, bottom-up trusting the leaves — happened to bracket MinT’s data-driven weighting rather than being dominated by it. That does not refute the structural argument. MinT’s case isn’t that it always produces the lowest error on every dataset; it’s that it produces the lowest expected error across datasets because it doesn’t gamble on where the signal will be. On a 12-month test fold against a four-stock portfolio, a single regime shift can hand the win to whichever method happened to bet on the level that wasn’t disrupted, and in this case the broad upward drift through year 5 favored the top-down commitment that the total is the cleanest series. The right read of a MinT loss here is that the test window favored an a priori commitment that MinT, by construction, refuses to make — and that with the differences this small, the choice between methods matters less than the choice of base forecast model.

References

Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804–819.

Hyndman, R. J., & Athanasopoulos, G. Forecasting: Principles and Practice (3rd ed.), Chapter 11: Forecasting hierarchical and grouped time series.

Session info

sessionInfo()

R version 4.5.2 (2025-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] knitr_1.51                 feasts_0.4.2              
 [3] tsibble_1.1.6              fable_0.4.1               
 [5] fabletools_0.5.1           PerformanceAnalytics_2.1.0
 [7] quantmod_0.4.28            TTR_0.24.4                
 [9] xts_0.14.1                 zoo_1.8-15                
[11] tidyquant_1.0.12           lubridate_1.9.4           
[13] forcats_1.0.1              stringr_1.6.0             
[15] dplyr_1.1.4                purrr_1.2.1               
[17] readr_2.1.6                tidyr_1.3.2               
[19] tibble_3.3.1               ggplot2_4.0.1             
[21] tidyverse_2.0.0           

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1     timeDate_4052.112    farver_2.1.2        
 [4] S7_0.2.1             fastmap_1.2.0        lazyeval_0.2.3      
 [7] digest_0.6.39        rpart_4.1.24         timechange_0.3.0    
[10] lifecycle_1.0.5      ellipsis_0.3.2       survival_3.8-3      
[13] magrittr_2.0.4       compiler_4.5.2       rlang_1.1.7         
[16] tools_4.5.2          utf8_1.2.6           yaml_2.3.12         
[19] data.table_1.18.2.1  labeling_0.4.3       htmlwidgets_1.6.4   
[22] curl_7.0.0           RColorBrewer_1.1-3   withr_3.0.2         
[25] nnet_7.3-20          grid_4.5.2           timetk_2.9.1        
[28] future_1.70.0        progressr_0.18.0     globals_0.19.1      
[31] scales_1.4.0         MASS_7.3-65          cli_3.6.5           
[34] anytime_0.3.12       crayon_1.5.3         rmarkdown_2.30      
[37] generics_0.1.4       otel_0.2.0           rstudioapi_0.18.0   
[40] future.apply_1.20.2  tzdb_0.5.0           splines_4.5.2       
[43] parallel_4.5.2       vctrs_0.7.0          hardhat_1.4.3       
[46] Matrix_1.7-4         jsonlite_2.0.0       hms_1.1.4           
[49] RobStatTM_1.0.11     listenv_0.10.1       gower_1.0.2         
[52] recipes_1.3.2        glue_1.8.0           parallelly_1.46.1   
[55] codetools_0.2-20     distributional_0.6.0 rsample_1.3.2       
[58] stringi_1.8.7        gtable_0.3.6         quadprog_1.5-8      
[61] pillar_1.11.1        furrr_0.4.0          htmltools_0.5.9     
[64] ipred_0.9-15         lava_1.9.0           R6_2.6.1            
[67] evaluate_1.0.5       lattice_0.22-7       class_7.3-23        
[70] Rcpp_1.1.1           prodlim_2026.03.11   xfun_0.56           
[73] pkgconfig_2.0.3