Hierarchical Time Series Forecasting

Author

Published

April 17, 2026

I. Nesting vs Grouping/Crossed Structures

Why This Matters

Many datasets in econometrics, panel data, and hierarchical modeling involve structured observations — individuals within groups, or observations classified by multiple dimensions.

Understanding whether these structures are nested or crossed determines:

How we model random effects / fixed effects
How we cluster standard errors
How we interpret within- and between-group variation
Which reconciliation methods are valid when we forecast

1. Nested Data Structures

Definition

A nested structure means smaller units are fully contained within larger units.
Each lower-level unit belongs to exactly one higher-level unit.

\[ \text{Level 3: Country} \supset \text{Level 2: Region} \supset \text{Level 1: City} \]

Key Properties

Feature	Explanation
Belonging	One-to-one hierarchy (strict tree)
Independence	Observations within the same group are correlated
Model implication	Hierarchical / multilevel models with random intercepts or slopes
Typical notation	`a/b/c` (e.g., country/region/city)

Real-World Examples

Example	Interpretation
Students within Schools	Each student belongs to exactly one school
Cities within Regions within Countries	Strict geographic hierarchy
Years within Firms	Time nested within individual firms (panel data)
Employees within Departments	Each employee reports to one department

Model Form

For an observation indexed by the within-group dimension \(i\) and the between-group dimension \(j\):

\[ Y_{ij} = \beta_0 + u_j + \beta_1 X_{ij} + e_{ij} \]

where \(u_j\) is the group-level (nested) random effect, \(e_{ij}\) captures within-group variation, and \(u_j\) captures between-group variation.

2. Grouped or Crossed Data Structures

Definition

A grouped (crossed) structure means units are classified along multiple independent dimensions.
Each observation can belong to several groups simultaneously.

\[ \text{Example: } (\text{City} \times \text{Brand}) \Rightarrow \text{Sales observed for each brand in each city.} \]

Key Properties

Feature	Explanation
Belonging	Many-to-many relationship
Independence	Observations share multiple group memberships
Model implication	Crossed random effects (or two-way fixed effects)
Typical notation	`a * b` (e.g., city * brand)

Real-World Examples

Example	Interpretation
Sales by City × Brand	A brand appears in many cities; a city sells many brands
Student outcomes by School × Cohort-Year	Cohorts span schools; schools span cohorts
Trade flows by Industry × Country	Industries cross countries
Tourism by State × Purpose	Every state has every purpose category

Model Form

\[ Y_{ij} = \beta_0 + u_i^{(city)} + v_j^{(brand)} + e_{ij} \]

where \(u_i\) and \(v_j\) are crossed random effects.

3. Mixed (Nested + Crossed) Structures

Sometimes, you have both:

\[ (\text{country/region/city}) \; \times \; (\text{brand/product}) \]

Interpretation

Cities are nested within regions and countries.
Products are nested within brand lines.
The two hierarchies cross each other → mixed structure.

Example

Country	Region	City	Brand	Product	Sales
USA	California	Los Angeles	Nike	Air Max	300
USA	California	Los Angeles	Apple	iPhone	500
USA	California	San Francisco	Nike	Air Max	260
USA	California	San Francisco	Apple	iPhone	470
USA	New York	New York City	Nike	Air Max	340
USA	New York	New York City	Apple	iPhone	390
UK	England	London	Nike	Air Max	220
UK	England	London	Apple	iPhone	410

Model Form

\[ Y_{ijkm} = \beta_0 + u_{country_j} + u_{region_{k(j)}} + u_{city_{m(k,j)}} + v_{brand_b} + v_{product_{p(b)}} + e_{ijkm} \]

Outcome (Sales) for observation \(i\) belonging to country \(j\), region \(k\), city \(m\), brand \(b\), and product \(p\).

Putting It Together

Concept	Relationship	Example	Model Framework	Notation
Nesting	One unit contained in another (tree)	Students within schools; Years within firms	Multilevel / hierarchical; fixed or random effects (nested)	`a/b/c`
Grouping (Crossed)	Units classified by multiple factors (grid)	City × Brand sales; Industry × Country trade	Crossed random effects / two-way fixed effects	`a * b`
Mixed	Nested hierarchies crossed with another	(Country/Region/City) × (Brand/Product)	Mixed-effects	`(a/b) * (c/d)`

Rule of thumb: Nesting forms hierarchies; grouping forms grids. The econometric model — and later the forecast reconciliation — must reflect which structure your data actually follow.

II. Introduction to Hierarchical Time Series

Hierarchical time series are collections of related time series organized in a hierarchical structure. These structures naturally arise when data can be disaggregated by different categorical variables or attributes.

Key Concepts

Hierarchy: A nested structure where series at higher levels are the sum of series at lower levels. For example:

Total tourism → State → Region

Grouped Time Series: A generalization of hierarchies where series can be disaggregated in multiple ways that don’t nest cleanly. Series can be grouped by different attributes simultaneously.

Example: In the Australian tourism data:

Hierarchy: State / Region (Regions nest strictly inside States).
Grouped: State × Purpose and Region × Purpose — these cross-classifications overlap but do not nest.
- By State + Purpose: “NSW Business”, “NSW Holiday”, …
- By Region + Purpose: “Sydney Business”, “Melbourne Business”, …

Why Is Purpose at the Bottom in State / Region / Purpose?

Purpose does not have to be at the bottom. The order in a hierarchy is a modeling choice tied to the decision workflow.

We place Purpose at the bottom here for four practical reasons:

Geography is the strict nesting backbone in this dataset (State -> Region), so it is natural to keep those levels higher in the tree.
Many planning and budgeting decisions are made by geographic units first, then decomposed by trip purpose.
Purpose shares within each geography are often interpreted as composition (mix) rather than a separate top-level management structure.
This ordering makes interpretation clear for stakeholders who consume state and region forecasts first.

When would Purpose be placed higher?

If the primary policy question is purpose-led (national business travel vs holiday travel planning).
If purpose-level aggregates are more stable and are the main forecast targets.
If resource allocation is organized around purpose programs instead of geography.

The key point: there is no universally correct ordering. The hierarchy should reflect the operational decision path the forecasts are meant to support.

4. Why Use Hierarchical Forecasting?

Coherence: Forecasts are mathematically consistent (lower-level forecasts sum to upper-level forecasts).
Information sharing: Information flows between levels.
Flexibility: Forecast at any level while maintaining consistency.
Accuracy: Reconciliation often improves forecast accuracy.

5. Reconciliation Methods

After defining the structure, reconciliation is what enforces coherence across levels. The four methods differ in which base forecasts they use and how they enforce consistency.

1. Bottom-Up

Forecast only the bottom-level series; sum upward.
Pros: Simple, no information loss at the bottom level.
Cons: Ignores potentially useful information from aggregate levels.

2. Top-Down

Forecast only the top-level series; disaggregate using proportions.
Pros: Can be more stable when bottom-level series are volatile.
Cons: Loses disaggregate-level information.

3. Middle-Out

Forecast at a chosen middle level; aggregate upward and disaggregate downward.
Meaningful only with three or more levels in a single tree (e.g., Total → State → Region).

4. Optimal Reconciliation (MinT)

Uses all base forecasts at every level.
Finds the optimal combination that minimizes forecast-error variance.
Variants: OLS, WLS, and MinT (minimum trace) with shrinkage.

Which Methods Are Valid for HTS vs GTS?

Method	HTS (single tree)	GTS (crossed)
`bottom_up()`	YES	YES
`top_down()`	YES	NO — no unique parent path
`middle_out()`	YES (needs 3+ levels)	NO — no single middle level
`min_trace()` (MinT)	YES	YES (preferred for statistical efficiency)

Why Top-Down and Middle-Out fail under GTS: Top-Down assumes one parent-to-child disaggregation path; in a crossed structure, multiple parents contradict each other. Middle-Out assumes a single meaningful middle level; crossed dimensions create conflicting paths and no well-defined middle.

Practical rule: If the structure is GTS, use Bottom-Up or MinT. Avoid Top-Down and Middle-Out.

6. Example: Australian Tourism Data

1) Setup and Data Check

library(fpp3)
library(dplyr)

remove(list = ls())

df <- tsibble::tourism

Time periods: 80 quarters

States: 8

Regions: 76

Purpose categories: 4

2) What Is Nested vs Crossed Here?

Geography is naturally nested:

Region is strictly nested within State.
So State / Region is a true hierarchical time series (HTS).

Purpose is a separate dimension that crosses geography:

State and Purpose form crossed views.
So State * Purpose is a grouped time series (GTS).

You can model State and Purpose as a hierarchy (State / Purpose) if the decision flow is “first by State, then split by Purpose.” But conceptually, State * Purpose is the more general crossed representation.

3) Declare Alternative Structures

Again, there is no universally correct ordering. The hierarchy should reflect the operational decision path the forecasts are meant to support.

# Hierarchical by State then Purpose (single tree)
tourism_hts_sp <- df %>%
  aggregate_key(State / Purpose, Trips = sum(Trips))

# Grouped by State and Purpose (crossed)
tourism_gts_sp <- df %>%
  aggregate_key(State * Purpose, Trips = sum(Trips))

# Hierarchical geography benchmark (strict nesting)
tourism_hts_sr <- df %>%
  aggregate_key(State / Region, Trips = sum(Trips))

4) Why Series Counts Differ Across Structures

n_state   <- n_distinct(df$State)
n_purpose <- n_distinct(df$Purpose)
n_region  <- n_distinct(df$Region)

# Unique series counts implied by each structure
series_hts_sp <- 1 + n_state + (n_state * n_purpose)                 # Total + State + State/Purpose
series_gts_sp <- 1 + n_state + n_purpose + (n_state * n_purpose)    # adds Purpose-only level
series_hts_sr <- 1 + n_state + n_region                              # Total + State + Region

counts_tbl <- tibble(
  structure = c("HTS: State / Purpose", "GTS: State * Purpose", "HTS: State / Region"),
  unique_series = c(series_hts_sp, series_gts_sp, series_hts_sr),
  unique_levels = c(
    "Total, State, State:Purpose",
    "Total, State, Purpose, State:Purpose",
    "Total, State, Region"
  )
)

counts_tbl

Interpretation:

State * Purpose has more unique aggregation views than State / Purpose because it adds a standalone Purpose-only level.
Comparing State * Purpose to State / Region is not an apples-to-apples count — Region has much higher cardinality than Purpose.
“More series” depends jointly on the structure and on the cardinality of each categorical variable.

5) Modeling Choice Justification

This dataset has genuine geographic nesting, but we can model it with either representation depending on the decision we need to support:

Use HTS if your reporting path is a single tree.
Use GTS if you need coherence across multiple crossed views simultaneously.

Base model classes (ETS, ARIMA, TSLM) are the same in both cases. What changes is the valid reconciliation rule.

fit_hts <- tourism_hts_sp %>%
  model(
    ets   = ETS(Trips),
    arima = ARIMA(Trips),
    tslm  = TSLM(Trips ~ trend() + season())
  )

fit_gts <- tourism_gts_sp %>%
  model(
    ets   = ETS(Trips),
    arima = ARIMA(Trips),
    tslm  = TSLM(Trips ~ trend() + season())
  )

6) Reconcile Correctly by Structure

recon_hts <- fit_hts %>%
  reconcile(
    bu    = bottom_up(ets),
    td_fp = top_down(ets, method = "forecast_proportions"),
    mint  = min_trace(ets, method = "mint_shrink")
  )

recon_gts <- fit_gts %>%
  reconcile(
    bu   = bottom_up(ets),
    mint = min_trace(ets, method = "mint_shrink")
  )

Note that top_down() is omitted for the GTS structure and middle_out() is not used here because State / Purpose has only two levels below Total — there is no true middle level to propagate from.

7) Forecast and Compare

fc_hts <- recon_hts %>% forecast(h = "2 years")
fc_gts <- recon_gts %>% forecast(h = "2 years")

fc_gts %>%
  filter(State == "Queensland", is_aggregated(Purpose)) %>%
  autoplot(tourism_gts_sp, level = NULL) +
  labs(
    title    = "GTS Reconciliation: Queensland Total",
    subtitle = "Bottom-Up vs MinT",
    y        = "Trips ('000)",
    x        = "Quarter"
  ) +
  theme_minimal()

8) Accuracy Comparison (Holdout)

library(knitr)

# Hold out last 8 quarters for evaluation
all_quarters   <- sort(unique(df$Quarter))
train_quarters <- head(all_quarters, -8)
test_quarters  <- tail(all_quarters, 8)

train <- df %>% filter(Quarter %in% train_quarters)
test  <- df %>% filter(Quarter %in% test_quarters)

# Rebuild structures on training data
train_hts_sp <- train %>% aggregate_key(State / Purpose, Trips = sum(Trips))
train_gts_sp <- train %>% aggregate_key(State * Purpose, Trips = sum(Trips))

# Aggregate the holdout to match each forecast structure so accuracy() can join on keys
test_hts_sp <- test %>% aggregate_key(State / Purpose, Trips = sum(Trips))
test_gts_sp <- test %>% aggregate_key(State * Purpose, Trips = sum(Trips))

# Fit and reconcile (ETS for like-for-like reconciliation comparison)
fit_hts_eval <- train_hts_sp %>%
  model(ets = ETS(Trips)) %>%
  reconcile(
    bu    = bottom_up(ets),
    td_fp = top_down(ets, method = "forecast_proportions"),
    mint  = min_trace(ets, method = "mint_shrink")
  )

fit_gts_eval <- train_gts_sp %>%
  model(ets = ETS(Trips)) %>%
  reconcile(
    bu   = bottom_up(ets),
    mint = min_trace(ets, method = "mint_shrink")
  )

# Forecast over the holdout horizon
fc_hts_eval <- fit_hts_eval %>% forecast(h = 8)
fc_gts_eval <- fit_gts_eval %>% forecast(h = 8)

# Accuracy metrics (each forecast is compared against the matching aggregated holdout)
acc_hts <- fc_hts_eval %>%
  accuracy(test_hts_sp, measures = list(RMSE = RMSE, MAE = MAE, MAPE = MAPE)) %>%
  mutate(Structure = "HTS: State / Purpose")

acc_gts <- fc_gts_eval %>%
  accuracy(test_gts_sp, measures = list(RMSE = RMSE, MAE = MAE, MAPE = MAPE)) %>%
  mutate(Structure = "GTS: State * Purpose")

# Summarize across all series to compare methods cleanly
acc_summary <- bind_rows(acc_hts, acc_gts) %>%
  group_by(Structure, .model) %>%
  summarise(
    RMSE = mean(RMSE, na.rm = TRUE),
    MAE  = mean(MAE,  na.rm = TRUE),
    MAPE = mean(MAPE, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(Structure, RMSE)

kable(
  acc_summary,
  digits  = 3,
  caption = "Holdout Accuracy (Last 8 Quarters): RMSE, MAE, and MAPE by Structure and Reconciliation Method"
)

Holdout Accuracy (Last 8 Quarters): RMSE, MAE, and MAPE by Structure and Reconciliation Method
Structure	.model	RMSE	MAE	MAPE
GTS: State * Purpose	ets	214.257	177.205	13.605
GTS: State * Purpose	mint	215.944	180.138	13.234
GTS: State * Purpose	bu	245.711	211.112	13.944
HTS: State / Purpose	td_fp	167.063	135.770	13.672
HTS: State / Purpose	ets	183.159	151.876	14.285
HTS: State / Purpose	mint	186.747	157.339	13.953
HTS: State / Purpose	bu	206.718	177.931	14.495

Recommended Practical Defaults

State clearly what structure you declared (/ vs *) before modeling.
Treat geography (State / Region) as the canonical nested benchmark.
Use State * Purpose when you need State totals and Purpose totals coherent at the same time.
Do not compare series counts across different variables (Purpose vs Region) as if they were equivalent.
For GTS, default to Bottom-Up or MinT; never Top-Down or Middle-Out.

Key Functions Reference

Function	Purpose	Valid For
`aggregate_key()`	Define HTS (`/`) or GTS (`*`) structure	Both
`model()`	Fit base models (ETS, ARIMA, TSLM, …)	Both
`reconcile()`	Enforce coherence across levels	Both
`bottom_up()`	Sum bottom-level forecasts upward	HTS + GTS
`top_down()`	Disaggregate top forecast downward	HTS only
`middle_out()`	Forecast a middle level, propagate both ways	HTS with 3+ levels
`min_trace()`	Variance-minimizing reconciliation	HTS + GTS

References

Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: principles and practice (3rd ed.). OTexts: Melbourne, Australia. OTexts.com/fpp3
Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804-819.