Hierarchical Time Series Forecasting
I. Nesting vs Grouping/Crossed Structures
Why This Matters
Many datasets in econometrics, panel data, and hierarchical modeling involve structured observations — individuals within groups, or observations classified by multiple dimensions.
Understanding whether these structures are nested or crossed determines:
- How we model random effects / fixed effects
- How we cluster standard errors
- How we interpret within- and between-group variation
- Which reconciliation methods are valid when we forecast
1. Nested Data Structures
Definition
A nested structure means smaller units are fully contained within larger units.
Each lower-level unit belongs to exactly one higher-level unit.
\[ \text{Level 3: Country} \supset \text{Level 2: Region} \supset \text{Level 1: City} \]
Key Properties
| Feature | Explanation |
|---|---|
| Belonging | One-to-one hierarchy (strict tree) |
| Independence | Observations within the same group are correlated |
| Model implication | Hierarchical / multilevel models with random intercepts or slopes |
| Typical notation |
a/b/c (e.g., country/region/city) |
Real-World Examples
| Example | Interpretation |
|---|---|
| Students within Schools | Each student belongs to exactly one school |
| Cities within Regions within Countries | Strict geographic hierarchy |
| Years within Firms | Time nested within individual firms (panel data) |
| Employees within Departments | Each employee reports to one department |
Model Form
For an observation indexed by the within-group dimension \(i\) and the between-group dimension \(j\):
\[ Y_{ij} = \beta_0 + u_j + \beta_1 X_{ij} + e_{ij} \]
where \(u_j\) is the group-level (nested) random effect, \(e_{ij}\) captures within-group variation, and \(u_j\) captures between-group variation.
2. Grouped or Crossed Data Structures
Definition
A grouped (crossed) structure means units are classified along multiple independent dimensions.
Each observation can belong to several groups simultaneously.
\[ \text{Example: } (\text{City} \times \text{Brand}) \Rightarrow \text{Sales observed for each brand in each city.} \]
Key Properties
| Feature | Explanation |
|---|---|
| Belonging | Many-to-many relationship |
| Independence | Observations share multiple group memberships |
| Model implication | Crossed random effects (or two-way fixed effects) |
| Typical notation |
a * b (e.g., city * brand) |
Real-World Examples
| Example | Interpretation |
|---|---|
| Sales by City × Brand | A brand appears in many cities; a city sells many brands |
| Student outcomes by School × Cohort-Year | Cohorts span schools; schools span cohorts |
| Trade flows by Industry × Country | Industries cross countries |
| Tourism by State × Purpose | Every state has every purpose category |
Model Form
\[ Y_{ij} = \beta_0 + u_i^{(city)} + v_j^{(brand)} + e_{ij} \]
where \(u_i\) and \(v_j\) are crossed random effects.
3. Mixed (Nested + Crossed) Structures
Sometimes, you have both:
\[ (\text{country/region/city}) \; \times \; (\text{brand/product}) \]
Interpretation
- Cities are nested within regions and countries.
- Products are nested within brand lines.
- The two hierarchies cross each other → mixed structure.
Example
| Country | Region | City | Brand | Product | Sales |
|---|---|---|---|---|---|
| USA | California | Los Angeles | Nike | Air Max | 300 |
| USA | California | Los Angeles | Apple | iPhone | 500 |
| USA | California | San Francisco | Nike | Air Max | 260 |
| USA | California | San Francisco | Apple | iPhone | 470 |
| USA | New York | New York City | Nike | Air Max | 340 |
| USA | New York | New York City | Apple | iPhone | 390 |
| UK | England | London | Nike | Air Max | 220 |
| UK | England | London | Apple | iPhone | 410 |
Model Form
\[ Y_{ijkm} = \beta_0 + u_{country_j} + u_{region_{k(j)}} + u_{city_{m(k,j)}} + v_{brand_b} + v_{product_{p(b)}} + e_{ijkm} \]
Outcome (Sales) for observation \(i\) belonging to country \(j\), region \(k\), city \(m\), brand \(b\), and product \(p\).
Putting It Together
| Concept | Relationship | Example | Model Framework | Notation |
|---|---|---|---|---|
| Nesting | One unit contained in another (tree) | Students within schools; Years within firms | Multilevel / hierarchical; fixed or random effects (nested) | a/b/c |
| Grouping (Crossed) | Units classified by multiple factors (grid) | City × Brand sales; Industry × Country trade | Crossed random effects / two-way fixed effects | a * b |
| Mixed | Nested hierarchies crossed with another | (Country/Region/City) × (Brand/Product) | Mixed-effects | (a/b) * (c/d) |
Rule of thumb: Nesting forms hierarchies; grouping forms grids. The econometric model — and later the forecast reconciliation — must reflect which structure your data actually follow.
II. Introduction to Hierarchical Time Series
Hierarchical time series are collections of related time series organized in a hierarchical structure. These structures naturally arise when data can be disaggregated by different categorical variables or attributes.
Key Concepts
Hierarchy: A nested structure where series at higher levels are the sum of series at lower levels. For example:
- Total tourism → State → Region
Grouped Time Series: A generalization of hierarchies where series can be disaggregated in multiple ways that don’t nest cleanly. Series can be grouped by different attributes simultaneously.
Example: In the Australian tourism data:
-
Hierarchy:
State / Region(Regions nest strictly inside States). -
Grouped:
State × PurposeandRegion × Purpose— these cross-classifications overlap but do not nest.- By State + Purpose: “NSW Business”, “NSW Holiday”, …
- By Region + Purpose: “Sydney Business”, “Melbourne Business”, …
State / Region / Purpose?
Purpose does not have to be at the bottom. The order in a hierarchy is a modeling choice tied to the decision workflow.
We place Purpose at the bottom here for four practical reasons:
- Geography is the strict nesting backbone in this dataset (
State -> Region), so it is natural to keep those levels higher in the tree. - Many planning and budgeting decisions are made by geographic units first, then decomposed by trip purpose.
- Purpose shares within each geography are often interpreted as composition (mix) rather than a separate top-level management structure.
- This ordering makes interpretation clear for stakeholders who consume state and region forecasts first.
When would Purpose be placed higher?
- If the primary policy question is purpose-led (national business travel vs holiday travel planning).
- If purpose-level aggregates are more stable and are the main forecast targets.
- If resource allocation is organized around purpose programs instead of geography.
The key point: there is no universally correct ordering. The hierarchy should reflect the operational decision path the forecasts are meant to support.
4. Why Use Hierarchical Forecasting?
- Coherence: Forecasts are mathematically consistent (lower-level forecasts sum to upper-level forecasts).
- Information sharing: Information flows between levels.
- Flexibility: Forecast at any level while maintaining consistency.
- Accuracy: Reconciliation often improves forecast accuracy.
5. Reconciliation Methods
After defining the structure, reconciliation is what enforces coherence across levels. The four methods differ in which base forecasts they use and how they enforce consistency.
1. Bottom-Up
- Forecast only the bottom-level series; sum upward.
- Pros: Simple, no information loss at the bottom level.
- Cons: Ignores potentially useful information from aggregate levels.
2. Top-Down
- Forecast only the top-level series; disaggregate using proportions.
- Pros: Can be more stable when bottom-level series are volatile.
- Cons: Loses disaggregate-level information.
3. Middle-Out
- Forecast at a chosen middle level; aggregate upward and disaggregate downward.
- Meaningful only with three or more levels in a single tree (e.g.,
Total → State → Region).
4. Optimal Reconciliation (MinT)
- Uses all base forecasts at every level.
- Finds the optimal combination that minimizes forecast-error variance.
- Variants: OLS, WLS, and MinT (minimum trace) with shrinkage.
| Method | HTS (single tree) | GTS (crossed) |
|---|---|---|
bottom_up() |
YES | YES |
top_down() |
YES | NO — no unique parent path |
middle_out() |
YES (needs 3+ levels) | NO — no single middle level |
min_trace() (MinT) |
YES | YES (preferred for statistical efficiency) |
Why Top-Down and Middle-Out fail under GTS: Top-Down assumes one parent-to-child disaggregation path; in a crossed structure, multiple parents contradict each other. Middle-Out assumes a single meaningful middle level; crossed dimensions create conflicting paths and no well-defined middle.
Practical rule: If the structure is GTS, use Bottom-Up or MinT. Avoid Top-Down and Middle-Out.
6. Example: Australian Tourism Data
1) Setup and Data Check
Time periods: 80 quarters
States: 8
Regions: 76
Purpose categories: 4
2) What Is Nested vs Crossed Here?
Geography is naturally nested:
-
Regionis strictly nested withinState. - So
State / Regionis a true hierarchical time series (HTS).
Purpose is a separate dimension that crosses geography:
-
StateandPurposeform crossed views. - So
State * Purposeis a grouped time series (GTS).
You can model State and Purpose as a hierarchy (State / Purpose) if the decision flow is “first by State, then split by Purpose.” But conceptually, State * Purpose is the more general crossed representation.
3) Declare Alternative Structures
Again, there is no universally correct ordering. The hierarchy should reflect the operational decision path the forecasts are meant to support.
# Hierarchical by State then Purpose (single tree)
tourism_hts_sp <- df %>%
aggregate_key(State / Purpose, Trips = sum(Trips))
# Grouped by State and Purpose (crossed)
tourism_gts_sp <- df %>%
aggregate_key(State * Purpose, Trips = sum(Trips))
# Hierarchical geography benchmark (strict nesting)
tourism_hts_sr <- df %>%
aggregate_key(State / Region, Trips = sum(Trips))4) Why Series Counts Differ Across Structures
n_state <- n_distinct(df$State)
n_purpose <- n_distinct(df$Purpose)
n_region <- n_distinct(df$Region)
# Unique series counts implied by each structure
series_hts_sp <- 1 + n_state + (n_state * n_purpose) # Total + State + State/Purpose
series_gts_sp <- 1 + n_state + n_purpose + (n_state * n_purpose) # adds Purpose-only level
series_hts_sr <- 1 + n_state + n_region # Total + State + Region
counts_tbl <- tibble(
structure = c("HTS: State / Purpose", "GTS: State * Purpose", "HTS: State / Region"),
unique_series = c(series_hts_sp, series_gts_sp, series_hts_sr),
unique_levels = c(
"Total, State, State:Purpose",
"Total, State, Purpose, State:Purpose",
"Total, State, Region"
)
)
counts_tblInterpretation:
-
State * Purposehas more unique aggregation views thanState / Purposebecause it adds a standalonePurpose-only level. - Comparing
State * PurposetoState / Regionis not an apples-to-apples count —Regionhas much higher cardinality thanPurpose. - “More series” depends jointly on the structure and on the cardinality of each categorical variable.
5) Modeling Choice Justification
This dataset has genuine geographic nesting, but we can model it with either representation depending on the decision we need to support:
- Use HTS if your reporting path is a single tree.
- Use GTS if you need coherence across multiple crossed views simultaneously.
Base model classes (ETS, ARIMA, TSLM) are the same in both cases. What changes is the valid reconciliation rule.
6) Reconcile Correctly by Structure
recon_hts <- fit_hts %>%
reconcile(
bu = bottom_up(ets),
td_fp = top_down(ets, method = "forecast_proportions"),
mint = min_trace(ets, method = "mint_shrink")
)recon_gts <- fit_gts %>%
reconcile(
bu = bottom_up(ets),
mint = min_trace(ets, method = "mint_shrink")
)Note that top_down() is omitted for the GTS structure and middle_out() is not used here because State / Purpose has only two levels below Total — there is no true middle level to propagate from.
7) Forecast and Compare
8) Accuracy Comparison (Holdout)
library(knitr)
# Hold out last 8 quarters for evaluation
all_quarters <- sort(unique(df$Quarter))
train_quarters <- head(all_quarters, -8)
test_quarters <- tail(all_quarters, 8)
train <- df %>% filter(Quarter %in% train_quarters)
test <- df %>% filter(Quarter %in% test_quarters)
# Rebuild structures on training data
train_hts_sp <- train %>% aggregate_key(State / Purpose, Trips = sum(Trips))
train_gts_sp <- train %>% aggregate_key(State * Purpose, Trips = sum(Trips))
# Aggregate the holdout to match each forecast structure so accuracy() can join on keys
test_hts_sp <- test %>% aggregate_key(State / Purpose, Trips = sum(Trips))
test_gts_sp <- test %>% aggregate_key(State * Purpose, Trips = sum(Trips))
# Fit and reconcile (ETS for like-for-like reconciliation comparison)
fit_hts_eval <- train_hts_sp %>%
model(ets = ETS(Trips)) %>%
reconcile(
bu = bottom_up(ets),
td_fp = top_down(ets, method = "forecast_proportions"),
mint = min_trace(ets, method = "mint_shrink")
)
fit_gts_eval <- train_gts_sp %>%
model(ets = ETS(Trips)) %>%
reconcile(
bu = bottom_up(ets),
mint = min_trace(ets, method = "mint_shrink")
)
# Forecast over the holdout horizon
fc_hts_eval <- fit_hts_eval %>% forecast(h = 8)
fc_gts_eval <- fit_gts_eval %>% forecast(h = 8)
# Accuracy metrics (each forecast is compared against the matching aggregated holdout)
acc_hts <- fc_hts_eval %>%
accuracy(test_hts_sp, measures = list(RMSE = RMSE, MAE = MAE, MAPE = MAPE)) %>%
mutate(Structure = "HTS: State / Purpose")
acc_gts <- fc_gts_eval %>%
accuracy(test_gts_sp, measures = list(RMSE = RMSE, MAE = MAE, MAPE = MAPE)) %>%
mutate(Structure = "GTS: State * Purpose")
# Summarize across all series to compare methods cleanly
acc_summary <- bind_rows(acc_hts, acc_gts) %>%
group_by(Structure, .model) %>%
summarise(
RMSE = mean(RMSE, na.rm = TRUE),
MAE = mean(MAE, na.rm = TRUE),
MAPE = mean(MAPE, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(Structure, RMSE)
kable(
acc_summary,
digits = 3,
caption = "Holdout Accuracy (Last 8 Quarters): RMSE, MAE, and MAPE by Structure and Reconciliation Method"
)| Structure | .model | RMSE | MAE | MAPE |
|---|---|---|---|---|
| GTS: State * Purpose | ets | 214.257 | 177.205 | 13.605 |
| GTS: State * Purpose | mint | 215.944 | 180.138 | 13.234 |
| GTS: State * Purpose | bu | 245.711 | 211.112 | 13.944 |
| HTS: State / Purpose | td_fp | 167.063 | 135.770 | 13.672 |
| HTS: State / Purpose | ets | 183.159 | 151.876 | 14.285 |
| HTS: State / Purpose | mint | 186.747 | 157.339 | 13.953 |
| HTS: State / Purpose | bu | 206.718 | 177.931 | 14.495 |
Recommended Practical Defaults
- State clearly what structure you declared (
/vs*) before modeling. - Treat geography (
State / Region) as the canonical nested benchmark. - Use
State * Purposewhen you need State totals and Purpose totals coherent at the same time. - Do not compare series counts across different variables (
PurposevsRegion) as if they were equivalent. - For GTS, default to Bottom-Up or MinT; never Top-Down or Middle-Out.
Key Functions Reference
| Function | Purpose | Valid For |
|---|---|---|
aggregate_key() |
Define HTS (/) or GTS (*) structure |
Both |
model() |
Fit base models (ETS, ARIMA, TSLM, …) | Both |
reconcile() |
Enforce coherence across levels | Both |
bottom_up() |
Sum bottom-level forecasts upward | HTS + GTS |
top_down() |
Disaggregate top forecast downward | HTS only |
middle_out() |
Forecast a middle level, propagate both ways | HTS with 3+ levels |
min_trace() |
Variance-minimizing reconciliation | HTS + GTS |
References
- Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: principles and practice (3rd ed.). OTexts: Melbourne, Australia. OTexts.com/fpp3
- Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804-819.