Hierarchical time series (HTS) and grouped time series (GTS) both involve related time series that must add up consistently, but they differ in structure. In an HTS, the data follow a clear tree structure, where each lower-level series belongs to one parent and aggregates upward in one path, such as Country → State → City. In a GTS, the same observations can be grouped across multiple dimensions at once, such as by region and by product type, so there are several valid aggregation paths instead of one. This makes grouped structures more flexible, but also more complicated to reconcile.
A forecasting method is different from a reconciliation method. Forecasting methods, such as ARIMA or ETS, are used to generate the initial or base forecasts for each series. Reconciliation methods are then used to adjust those forecasts so they are coherent across the structure, meaning all of the lower-level forecasts add up correctly to the higher-level totals. In other words, forecasting creates the predictions, while reconciliation makes sure those predictions satisfy the aggregation rules.
Each reconciliation method has its own strengths and weaknesses. Top-down forecasts the total series first and then allocates those forecasts down to lower levels. It is simple and works best when the aggregate data are more stable than the detailed data, but it can miss important lower-level patterns. Bottom-up does the opposite by forecasting the most detailed series first and summing upward, which preserves local information but can be noisy if the bottom-level data are weak. Middle-out combines the two by forecasting a middle level, then aggregating upward and disaggregating downward; this can work well when the middle level is the most stable, but it only works for true hierarchies. MinT is usually the most accurate overall because it uses forecasts from all levels and accounts for forecast error relationships, though it is also the most computationally complex.
Part 2:
remove(list=ls())# OPTION II: Tiny mutual fund with hierarchical forecastinglibrary(tidyverse)
Warning: package 'lubridate' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.1
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.5 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)library(tidyquant)
Warning: package 'tidyquant' was built under R version 4.5.2
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.12 ──
✔ PerformanceAnalytics 2.0.8 ✔ TTR 0.24.4
✔ quantmod 0.4.28 ✔ xts 0.14.1── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
✖ zoo::as.Date() masks base::as.Date()
✖ zoo::as.Date.numeric() masks base::as.Date.numeric()
✖ dplyr::filter() masks stats::filter()
✖ xts::first() masks dplyr::first()
✖ dplyr::lag() masks stats::lag()
✖ xts::last() masks dplyr::last()
✖ PerformanceAnalytics::legend() masks graphics::legend()
✖ quantmod::summary() masks base::summary()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsibble)
Warning: package 'tsibble' was built under R version 4.5.2
Attaching package: 'tsibble'
The following object is masked from 'package:zoo':
index
The following object is masked from 'package:lubridate':
interval
The following objects are masked from 'package:base':
intersect, setdiff, union
library(fable)
Warning: package 'fable' was built under R version 4.5.2
Loading required package: fabletools
Warning: package 'fabletools' was built under R version 4.5.2
Attaching package: 'fable'
The following object is masked from 'package:tidyquant':
VAR
library(feasts)
Warning: package 'feasts' was built under R version 4.5.2
# Forecast the test period (12 months)fc <- rec_fit %>%forecast(h ="12 months")
# Plot fitted values on training datatrain_total <- train %>%filter(is_aggregated(Sector), is_aggregated(Stock))fitted_total <-fitted(rec_fit) %>%filter(is_aggregated(Sector), is_aggregated(Stock))autoplot(train_total, value) +autolayer(fitted_total, .fitted, colour ="blue") +labs(title ="Training Data and Fitted Values: Total Portfolio",y ="Portfolio Value",x ="Month" )
Warning: `autoplot.tbl_ts()` was deprecated in fabletools 0.6.0.
ℹ Please use `ggtime::autoplot.tbl_ts()` instead.
ℹ Graphics functions have been moved to the {ggtime} package. Please use
`library(ggtime)` instead.
Warning: `autolayer.tbl_ts()` was deprecated in fabletools 0.6.0.
ℹ Please use `ggtime::autolayer.tbl_ts()` instead.
ℹ Graphics functions have been moved to the {ggtime} package. Please use
`library(ggtime)` instead.
test_total <- test %>%filter(is_aggregated(Sector), is_aggregated(Stock))fc_total <- fc %>%filter(is_aggregated(Sector), is_aggregated(Stock))autoplot(train_total, value) +autolayer(test_total, value, colour ="black") +autolayer(fc_total, .mean) +labs(title ="Forecasts on Test Period: Total Portfolio",y ="Portfolio Value",x ="Month" )
Warning: `autolayer.fbl_ts()` was deprecated in fabletools 0.6.0.
ℹ Please use `ggtime::autolayer.fbl_ts()` instead.
ℹ Graphics functions have been moved to the {ggtime} package. Please use
`library(ggtime)` instead.
# Accuracy on the test setaccuracy_total <- fc_total %>%accuracy(test_total) %>%select(.model, ME, RMSE, MAE, MAPE)accuracy_total
For Part II, I created a small hierarchical mutual fund in R using four stocks from two sectors: AAPL and MSFT from Tech, and JNJ and PFE from Healthcare. I downloaded five years of monthly closing prices and assumed an equal initial investment in each stock. I then converted the prices into monthly portfolio values and organized the data into a hierarchy: Total Portfolio, Sector, and Stock. The first four years were used as training data, and the fifth year was used as the test period.
After building the hierarchy, I fit models and compared several forecasting/reconciliation approaches: ETS, ARIMA, bottom-up, top-down, and middle-out. The fitted values on the training data looked fairly good, especially for the total portfolio, but the more important comparison came from the test period. Based on the accuracy measures, middle-out performed the best overall. It had the lowest RMSE (424.86), MAE (363.38), and MAPE (4.76%). ARIMA and top-down tied with RMSE of 480.21 and MAPE of 5.24%, bottom-up was slightly worse, and ETS had the weakest performance by far.
What I found most interesting is that the methods sometimes looked similar in the forecast plots at the total portfolio level, but the accuracy results still showed meaningful differences. That makes sense because the total portfolio is already a highly aggregated series, so the biggest differences between reconciliation methods are often easier to see at lower levels of the hierarchy.