Hierarchical Time Series Forecasting

Author

AS

Published

April 17, 2026

I. Nesting vs Grouping/Crossed Structures


Why This Matters

Many datasets in econometrics, panel data, and hierarchical modeling involve structured observations — individuals within groups, or observations classified by multiple dimensions.

Understanding whether these structures are nested or crossed determines:

  • How we model random effects / fixed effects
  • How we cluster standard errors
  • How we interpret within- and between-group variation
  • Which reconciliation methods are valid when we forecast

1. Nested Data Structures

Definition

A nested structure means smaller units are fully contained within larger units.
Each lower-level unit belongs to exactly one higher-level unit.

\[ \text{Level 3: Country} \supset \text{Level 2: Region} \supset \text{Level 1: City} \]

Key Properties

Feature Explanation
Belonging One-to-one hierarchy (strict tree)
Independence Observations within the same group are correlated
Model implication Hierarchical / multilevel models with random intercepts or slopes
Typical notation a/b/c (e.g., country/region/city)

Real-World Examples

Example Interpretation
Students within Schools Each student belongs to exactly one school
Cities within Regions within Countries Strict geographic hierarchy
Years within Firms Time nested within individual firms (panel data)
Employees within Departments Each employee reports to one department

Model Form

For an observation indexed by the within-group dimension \(i\) and the between-group dimension \(j\):

\[ Y_{ij} = \beta_0 + u_j + \beta_1 X_{ij} + e_{ij} \]

where \(u_j\) is the group-level (nested) random effect, \(e_{ij}\) captures within-group variation, and \(u_j\) captures between-group variation.


2. Grouped or Crossed Data Structures

Definition

A grouped (crossed) structure means units are classified along multiple independent dimensions.
Each observation can belong to several groups simultaneously.

\[ \text{Example: } (\text{City} \times \text{Brand}) \Rightarrow \text{Sales observed for each brand in each city.} \]

Key Properties

Feature Explanation
Belonging Many-to-many relationship
Independence Observations share multiple group memberships
Model implication Crossed random effects (or two-way fixed effects)
Typical notation a * b (e.g., city * brand)

Real-World Examples

Example Interpretation
Sales by City × Brand A brand appears in many cities; a city sells many brands
Student outcomes by School × Cohort-Year Cohorts span schools; schools span cohorts
Trade flows by Industry × Country Industries cross countries
Tourism by State × Purpose Every state has every purpose category

Model Form

\[ Y_{ij} = \beta_0 + u_i^{(city)} + v_j^{(brand)} + e_{ij} \]

where \(u_i\) and \(v_j\) are crossed random effects.


3. Mixed (Nested + Crossed) Structures

Sometimes, you have both:

\[ (\text{country/region/city}) \; \times \; (\text{brand/product}) \]

Interpretation

  • Cities are nested within regions and countries.
  • Products are nested within brand lines.
  • The two hierarchies cross each other → mixed structure.

Example

Country Region City Brand Product Sales
USA California Los Angeles Nike Air Max 300
USA California Los Angeles Apple iPhone 500
USA California San Francisco Nike Air Max 260
USA California San Francisco Apple iPhone 470
USA New York New York City Nike Air Max 340
USA New York New York City Apple iPhone 390
UK England London Nike Air Max 220
UK England London Apple iPhone 410

Model Form

\[ Y_{ijkm} = \beta_0 + u_{country_j} + u_{region_{k(j)}} + u_{city_{m(k,j)}} + v_{brand_b} + v_{product_{p(b)}} + e_{ijkm} \]

Outcome (Sales) for observation \(i\) belonging to country \(j\), region \(k\), city \(m\), brand \(b\), and product \(p\).


Putting It Together

Concept Relationship Example Model Framework Notation
Nesting One unit contained in another (tree) Students within schools; Years within firms Multilevel / hierarchical; fixed or random effects (nested) a/b/c
Grouping (Crossed) Units classified by multiple factors (grid) City × Brand sales; Industry × Country trade Crossed random effects / two-way fixed effects a * b
Mixed Nested hierarchies crossed with another (Country/Region/City) × (Brand/Product) Mixed-effects (a/b) * (c/d)

Rule of thumb: Nesting forms hierarchies; grouping forms grids. The econometric model — and later the forecast reconciliation — must reflect which structure your data actually follow.


II. Introduction to Hierarchical Time Series

Hierarchical time series are collections of related time series organized in a hierarchical structure. These structures naturally arise when data can be disaggregated by different categorical variables or attributes.

Key Concepts

Hierarchy: A nested structure where series at higher levels are the sum of series at lower levels. For example:

  • Total tourism → State → Region

Grouped Time Series: A generalization of hierarchies where series can be disaggregated in multiple ways that don’t nest cleanly. Series can be grouped by different attributes simultaneously.

Example: In the Australian tourism data:

  1. Hierarchy: State / Region (Regions nest strictly inside States).
  2. Grouped: State × Purpose and Region × Purpose — these cross-classifications overlap but do not nest.
    • By State + Purpose: “NSW Business”, “NSW Holiday”, …
    • By Region + Purpose: “Sydney Business”, “Melbourne Business”, …
NoteWhy Is Purpose at the Bottom in State / Region / Purpose?

Purpose does not have to be at the bottom. The order in a hierarchy is a modeling choice tied to the decision workflow.

We place Purpose at the bottom here for four practical reasons:

  1. Geography is the strict nesting backbone in this dataset (State -> Region), so it is natural to keep those levels higher in the tree.
  2. Many planning and budgeting decisions are made by geographic units first, then decomposed by trip purpose.
  3. Purpose shares within each geography are often interpreted as composition (mix) rather than a separate top-level management structure.
  4. This ordering makes interpretation clear for stakeholders who consume state and region forecasts first.

When would Purpose be placed higher?

  1. If the primary policy question is purpose-led (national business travel vs holiday travel planning).
  2. If purpose-level aggregates are more stable and are the main forecast targets.
  3. If resource allocation is organized around purpose programs instead of geography.

The key point: there is no universally correct ordering. The hierarchy should reflect the operational decision path the forecasts are meant to support.

4. Why Use Hierarchical Forecasting?

  1. Coherence: Forecasts are mathematically consistent (lower-level forecasts sum to upper-level forecasts).
  2. Information sharing: Information flows between levels.
  3. Flexibility: Forecast at any level while maintaining consistency.
  4. Accuracy: Reconciliation often improves forecast accuracy.

5. Reconciliation Methods

After defining the structure, reconciliation is what enforces coherence across levels. The four methods differ in which base forecasts they use and how they enforce consistency.

1. Bottom-Up

  • Forecast only the bottom-level series; sum upward.
  • Pros: Simple, no information loss at the bottom level.
  • Cons: Ignores potentially useful information from aggregate levels.

2. Top-Down

  • Forecast only the top-level series; disaggregate using proportions.
  • Pros: Can be more stable when bottom-level series are volatile.
  • Cons: Loses disaggregate-level information.

3. Middle-Out

  • Forecast at a chosen middle level; aggregate upward and disaggregate downward.
  • Meaningful only with three or more levels in a single tree (e.g., Total → State → Region).

4. Optimal Reconciliation (MinT)

  • Uses all base forecasts at every level.
  • Finds the optimal combination that minimizes forecast-error variance.
  • Variants: OLS, WLS, and MinT (minimum trace) with shrinkage.
ImportantWhich Methods Are Valid for HTS vs GTS?
Method HTS (single tree) GTS (crossed)
bottom_up() YES YES
top_down() YES NO — no unique parent path
middle_out() YES (needs 3+ levels) NO — no single middle level
min_trace() (MinT) YES YES (preferred for statistical efficiency)

Why Top-Down and Middle-Out fail under GTS: Top-Down assumes one parent-to-child disaggregation path; in a crossed structure, multiple parents contradict each other. Middle-Out assumes a single meaningful middle level; crossed dimensions create conflicting paths and no well-defined middle.

Practical rule: If the structure is GTS, use Bottom-Up or MinT. Avoid Top-Down and Middle-Out.


6. Example: Australian Tourism Data

1) Setup and Data Check

library(fpp3)
library(dplyr)

remove(list = ls())

df <- tsibble::tourism
Time periods: 80 quarters
States: 8 
Regions: 76 
Purpose categories: 4 

2) What Is Nested vs Crossed Here?

Geography is naturally nested:

  • Region is strictly nested within State.
  • So State / Region is a true hierarchical time series (HTS).

Purpose is a separate dimension that crosses geography:

  • State and Purpose form crossed views.
  • So State * Purpose is a grouped time series (GTS).

You can model State and Purpose as a hierarchy (State / Purpose) if the decision flow is “first by State, then split by Purpose.” But conceptually, State * Purpose is the more general crossed representation.

3) Declare Alternative Structures

Again, there is no universally correct ordering. The hierarchy should reflect the operational decision path the forecasts are meant to support.

# Hierarchical by State then Purpose (single tree)
tourism_hts_sp <- df %>%
  aggregate_key(State / Purpose, Trips = sum(Trips))

# Grouped by State and Purpose (crossed)
tourism_gts_sp <- df %>%
  aggregate_key(State * Purpose, Trips = sum(Trips))

# Hierarchical geography benchmark (strict nesting)
tourism_hts_sr <- df %>%
  aggregate_key(State / Region, Trips = sum(Trips))

4) Why Series Counts Differ Across Structures

n_state   <- n_distinct(df$State)
n_purpose <- n_distinct(df$Purpose)
n_region  <- n_distinct(df$Region)

# Unique series counts implied by each structure
series_hts_sp <- 1 + n_state + (n_state * n_purpose)                 # Total + State + State/Purpose
series_gts_sp <- 1 + n_state + n_purpose + (n_state * n_purpose)    # adds Purpose-only level
series_hts_sr <- 1 + n_state + n_region                              # Total + State + Region

counts_tbl <- tibble(
  structure = c("HTS: State / Purpose", "GTS: State * Purpose", "HTS: State / Region"),
  unique_series = c(series_hts_sp, series_gts_sp, series_hts_sr),
  unique_levels = c(
    "Total, State, State:Purpose",
    "Total, State, Purpose, State:Purpose",
    "Total, State, Region"
  )
)

counts_tbl

Interpretation:

  • State * Purpose has more unique aggregation views than State / Purpose because it adds a standalone Purpose-only level.
  • Comparing State * Purpose to State / Region is not an apples-to-apples count — Region has much higher cardinality than Purpose.
  • “More series” depends jointly on the structure and on the cardinality of each categorical variable.

5) Modeling Choice Justification

This dataset has genuine geographic nesting, but we can model it with either representation depending on the decision we need to support:

  • Use HTS if your reporting path is a single tree.
  • Use GTS if you need coherence across multiple crossed views simultaneously.

Base model classes (ETS, ARIMA, TSLM) are the same in both cases. What changes is the valid reconciliation rule.

fit_hts <- tourism_hts_sp %>%
  model(
    ets   = ETS(Trips),
    arima = ARIMA(Trips),
    tslm  = TSLM(Trips ~ trend() + season())
  )

fit_gts <- tourism_gts_sp %>%
  model(
    ets   = ETS(Trips),
    arima = ARIMA(Trips),
    tslm  = TSLM(Trips ~ trend() + season())
  )

6) Reconcile Correctly by Structure

recon_hts <- fit_hts %>%
  reconcile(
    bu    = bottom_up(ets),
    td_fp = top_down(ets, method = "forecast_proportions"),
    mint  = min_trace(ets, method = "mint_shrink")
  )
recon_gts <- fit_gts %>%
  reconcile(
    bu   = bottom_up(ets),
    mint = min_trace(ets, method = "mint_shrink")
  )

Note that top_down() is omitted for the GTS structure and middle_out() is not used here because State / Purpose has only two levels below Total — there is no true middle level to propagate from.

7) Forecast and Compare

fc_hts <- recon_hts %>% forecast(h = "2 years")
fc_gts <- recon_gts %>% forecast(h = "2 years")
fc_gts %>%
  filter(State == "Queensland", is_aggregated(Purpose)) %>%
  autoplot(tourism_gts_sp, level = NULL) +
  labs(
    title    = "GTS Reconciliation: Queensland Total",
    subtitle = "Bottom-Up vs MinT",
    y        = "Trips ('000)",
    x        = "Quarter"
  ) +
  theme_minimal()

8) Accuracy Comparison (Holdout)

library(knitr)

# Hold out last 8 quarters for evaluation
all_quarters   <- sort(unique(df$Quarter))
train_quarters <- head(all_quarters, -8)
test_quarters  <- tail(all_quarters, 8)

train <- df %>% filter(Quarter %in% train_quarters)
test  <- df %>% filter(Quarter %in% test_quarters)

# Rebuild structures on training data
train_hts_sp <- train %>% aggregate_key(State / Purpose, Trips = sum(Trips))
train_gts_sp <- train %>% aggregate_key(State * Purpose, Trips = sum(Trips))

# Aggregate the holdout to match each forecast structure so accuracy() can join on keys
test_hts_sp <- test %>% aggregate_key(State / Purpose, Trips = sum(Trips))
test_gts_sp <- test %>% aggregate_key(State * Purpose, Trips = sum(Trips))

# Fit and reconcile (ETS for like-for-like reconciliation comparison)
fit_hts_eval <- train_hts_sp %>%
  model(ets = ETS(Trips)) %>%
  reconcile(
    bu    = bottom_up(ets),
    td_fp = top_down(ets, method = "forecast_proportions"),
    mint  = min_trace(ets, method = "mint_shrink")
  )

fit_gts_eval <- train_gts_sp %>%
  model(ets = ETS(Trips)) %>%
  reconcile(
    bu   = bottom_up(ets),
    mint = min_trace(ets, method = "mint_shrink")
  )

# Forecast over the holdout horizon
fc_hts_eval <- fit_hts_eval %>% forecast(h = 8)
fc_gts_eval <- fit_gts_eval %>% forecast(h = 8)

# Accuracy metrics (each forecast is compared against the matching aggregated holdout)
acc_hts <- fc_hts_eval %>%
  accuracy(test_hts_sp, measures = list(RMSE = RMSE, MAE = MAE, MAPE = MAPE)) %>%
  mutate(Structure = "HTS: State / Purpose")

acc_gts <- fc_gts_eval %>%
  accuracy(test_gts_sp, measures = list(RMSE = RMSE, MAE = MAE, MAPE = MAPE)) %>%
  mutate(Structure = "GTS: State * Purpose")

# Summarize across all series to compare methods cleanly
acc_summary <- bind_rows(acc_hts, acc_gts) %>%
  group_by(Structure, .model) %>%
  summarise(
    RMSE = mean(RMSE, na.rm = TRUE),
    MAE  = mean(MAE,  na.rm = TRUE),
    MAPE = mean(MAPE, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(Structure, RMSE)

kable(
  acc_summary,
  digits  = 3,
  caption = "Holdout Accuracy (Last 8 Quarters): RMSE, MAE, and MAPE by Structure and Reconciliation Method"
)
Holdout Accuracy (Last 8 Quarters): RMSE, MAE, and MAPE by Structure and Reconciliation Method
Structure .model RMSE MAE MAPE
GTS: State * Purpose ets 214.257 177.205 13.605
GTS: State * Purpose mint 215.944 180.138 13.234
GTS: State * Purpose bu 245.711 211.112 13.944
HTS: State / Purpose td_fp 167.063 135.770 13.672
HTS: State / Purpose ets 183.159 151.876 14.285
HTS: State / Purpose mint 186.747 157.339 13.953
HTS: State / Purpose bu 206.718 177.931 14.495

Key Functions Reference

Function Purpose Valid For
aggregate_key() Define HTS (/) or GTS (*) structure Both
model() Fit base models (ETS, ARIMA, TSLM, …) Both
reconcile() Enforce coherence across levels Both
bottom_up() Sum bottom-level forecasts upward HTS + GTS
top_down() Disaggregate top forecast downward HTS only
middle_out() Forecast a middle level, propagate both ways HTS with 3+ levels
min_trace() Variance-minimizing reconciliation HTS + GTS

References

  • Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: principles and practice (3rd ed.). OTexts: Melbourne, Australia. OTexts.com/fpp3
  • Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804-819.