The HTS is hierarchical, which means that each value has an order. This data is like how at the highest level there is the world, then continents, counties, states, counties, and then towns. The world is made up of all these lower levels. While the GTS is grouping values based on different characteristics. The grouping could be done in different ways and is not strictly ordered the way that HTS is. The GTS has more complex constraints because of the many pathways through the data while the HTS has a single path from the lowest data to the highest.
The four reconciliation methods are bottom up, top down, middle out, and Optimal Reconciliation (MinT). Bottom Up uses the lowest level data to forecast, they sum the bottom to get the values. It is helpful as no data is not used but can miss information that could be gained from the higher levels like regional impact. The Top Down is the inverse where it uses data at the highest level. This is more stable but you miss out on the more granular data from the bottom levels. The Middle Out combined the two, top down and bottom up, this has moderate accuracy. The final method is MinT which uses all of the data at their own level to forecast, this is like an OLS.
#Part 2
# Load data
my_excel_data <- read_excel("Electricity_Sales2.xlsx")
my_excel_data <- my_excel_data %>%
mutate(Date = yearmonth(as.Date(Date)))
# fix format
df_long <- my_excel_data %>%
pivot_longer(
cols = -Date,
names_to = "State",
values_to = "Sales"
) %>%
filter(State != "United States") %>%
group_by(Date, State) %>%
summarise(Sales = sum(Sales), .groups = "drop") %>%
as_tsibble(key = State, index = Date)
# Create hierarchical
# Census region mapping
region_map <- tibble::tibble(
State = c(
"Connecticut","Maine","Massachusetts","New Hampshire","Rhode Island","Vermont",
"New Jersey","New York","Pennsylvania",
"Illinois","Indiana","Michigan","Ohio","Wisconsin",
"Iowa","Kansas","Minnesota","Missouri","Nebraska","North Dakota","South Dakota",
"Delaware","Florida","Georgia","Maryland","North Carolina","South Carolina","Virginia",
"District Of Columbia","West Virginia",
"Alabama","Kentucky","Mississippi","Tennessee",
"Arkansas","Louisiana","Oklahoma","Texas",
"Arizona","Colorado","Idaho","Montana","Nevada","New Mexico","Utah","Wyoming",
"Alaska","California","Hawaii","Oregon","Washington"
),
CensusRegion = c(
rep("Northeast", 9),
rep("Midwest", 12),
rep("South", 17),
rep("West", 13)
)
)
# Attach the region to each state
df_regions <- df_long %>%
left_join(region_map, by = "State")
electricity_heir <- df_regions %>%
aggregate_key(
UnitedStates = sum(Sales),
CensusRegion / State,
Sales = sum(Sales)
)
# View the structure
electricity_heir%>%
as_tibble() %>%
distinct(State, Sales) %>%
arrange(State, Sales) %>%
print(n = 50)
## # A tibble: 672 × 2
## State Sales
## <chr*> <dbl>
## 1 Alabama 152915
## 2 Alabama 157094
## 3 Alabama 162012
## 4 Alabama 165398
## 5 Alabama 168752
## 6 Alabama 169134
## 7 Alabama 174155
## 8 Alabama 186540
## 9 Alabama 190771
## 10 Alabama 194889
## 11 Alabama 214782
## 12 Alabama 216042
## 13 Alaska 11266
## 14 Alaska 11267
## 15 Alaska 11602
## 16 Alaska 11652
## 17 Alaska 11736
## 18 Alaska 11907
## 19 Alaska 12211
## 20 Alaska 12570
## 21 Alaska 13112
## 22 Alaska 13331
## 23 Alaska 13702
## 24 Alaska 14577
## 25 Arizona 122373
## 26 Arizona 124085
## 27 Arizona 128665
## 28 Arizona 131059
## 29 Arizona 131929
## 30 Arizona 139662
## 31 Arizona 145958
## 32 Arizona 155410
## 33 Arizona 178771
## 34 Arizona 187449
## 35 Arizona 214869
## 36 Arizona 216221
## 37 Arkansas 80848
## 38 Arkansas 82454
## 39 Arkansas 85954
## 40 Arkansas 87694
## 41 Arkansas 88925
## 42 Arkansas 89845
## 43 Arkansas 94454
## 44 Arkansas 100370
## 45 Arkansas 100728
## 46 Arkansas 104722
## 47 Arkansas 115834
## 48 Arkansas 119531
## 49 California 454540
## 50 California 454886
## # ℹ 622 more rows
# Fit ETS models to ALL levels of the hierarchy
electricity_fit <- electricity_heir %>%
model(base = ETS(Sales))
# This creates forecasts at every level, which may not be coherent
electricity_fit
## # A mable: 56 x 3
## # Key: CensusRegion, State [56]
## CensusRegion State base
## <chr*> <chr*> <model>
## 1 Midwest Illinois <ETS(M,N,N)>
## 2 Midwest Indiana <ETS(M,N,N)>
## 3 Midwest Iowa <ETS(M,N,N)>
## 4 Midwest Kansas <ETS(M,N,N)>
## 5 Midwest Michigan <ETS(M,N,N)>
## 6 Midwest Minnesota <ETS(A,N,N)>
## 7 Midwest Missouri <ETS(M,N,N)>
## 8 Midwest Nebraska <ETS(M,N,N)>
## 9 Midwest North Dakota <ETS(A,N,N)>
## 10 Midwest Ohio <ETS(M,N,N)>
## # ℹ 46 more rows
# Reconcile forecasts to ensure coherence
reconciled <- electricity_fit %>%
reconcile(
bu = bottom_up(base), # Bottom-up reconciliation
td = top_down(base) # Top-down reconciliation
)
reconciled
## # A mable: 56 x 5
## # Key: CensusRegion, State [56]
## CensusRegion State base bu td
## <chr*> <chr*> <model> <model> <model>
## 1 Midwest Illinois <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## 2 Midwest Indiana <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## 3 Midwest Iowa <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## 4 Midwest Kansas <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## 5 Midwest Michigan <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## 6 Midwest Minnesota <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
## 7 Midwest Missouri <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## 8 Midwest Nebraska <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## 9 Midwest North Dakota <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
## 10 Midwest Ohio <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## # ℹ 46 more rows
# Generate 2-year ahead forecasts
electricity_fc <- reconciled %>%
forecast(h = "2 years")
electricity_fc
## # A fable: 4,032 x 6 [1M]
## # Key: CensusRegion, State, .model [168]
## CensusRegion State .model Date
## <chr*> <chr*> <chr> <mth>
## 1 Midwest Illinois base 2026 Jan
## 2 Midwest Illinois base 2026 Feb
## 3 Midwest Illinois base 2026 Mar
## 4 Midwest Illinois base 2026 Apr
## 5 Midwest Illinois base 2026 May
## 6 Midwest Illinois base 2026 Jun
## 7 Midwest Illinois base 2026 Jul
## 8 Midwest Illinois base 2026 Aug
## 9 Midwest Illinois base 2026 Sep
## 10 Midwest Illinois base 2026 Oct
## # ℹ 4,022 more rows
## # ℹ 2 more variables: Sales <dist>, .mean <dbl>
# Plot
key_vars(electricity_fc)
## [1] "CensusRegion" "State" ".model"
index_var(electricity_fc)
## [1] "Date"
colnames(electricity_fc)
## [1] "CensusRegion" "State" ".model" "Date" "Sales"
## [6] ".mean"
electricity_fc %>%
filter(
is_aggregated(CensusRegion),
is_aggregated(State)
) %>%
autoplot(electricity_heir, level = 95) +
labs(
title = "Reconciled Forecast: U.S. Total Electricity Sales",
y = "Sales (Million KW)",
x = "Month"
) +
theme_minimal()
# Example: plot just the Regions
electricity_fc %>%
filter(
!is_aggregated(CensusRegion) & # keep actual regions
is_aggregated(State) # but no state-level lines
) %>%
autoplot(electricity_heir, level = 95) +
labs(
title = "Reconciled Forecasts: Electricity Sales by Census Region",
y = "Sales (Million KW)",
x = "Month"
) +
theme_minimal()
electricity_fc %>%
filter(State == "Massachusetts", !is_aggregated(Sales)) %>% # bottom-level series
autoplot(electricity_heir, level = 95) +
labs(
title = "Reconciled Forecasts: Massachusetts Electricity Sales",
y = "Sales (Million KW)",
x = "Month"
) +
theme_minimal()
//
When looking at the plots it appears that the top down method worked the best for this data. This was surprising given the difference across regions and states for the total electricity sales. The Plots show that the top down method does smooth more with a more narrow 95% band comapred to the bottom up approach and the base.