Part 1

The HTS is hierarchical, which means that each value has an order. This data is like how at the highest level there is the world, then continents, counties, states, counties, and then towns. The world is made up of all these lower levels. While the GTS is grouping values based on different characteristics. The grouping could be done in different ways and is not strictly ordered the way that HTS is. The GTS has more complex constraints because of the many pathways through the data while the HTS has a single path from the lowest data to the highest.

The four reconciliation methods are bottom up, top down, middle out, and Optimal Reconciliation (MinT). Bottom Up uses the lowest level data to forecast, they sum the bottom to get the values. It is helpful as no data is not used but can miss information that could be gained from the higher levels like regional impact. The Top Down is the inverse where it uses data at the highest level. This is more stable but you miss out on the more granular data from the bottom levels. The Middle Out combined the two, top down and bottom up, this has moderate accuracy. The final method is MinT which uses all of the data at their own level to forecast, this is like an OLS.

#Part 2

# Load  data
my_excel_data <- read_excel("Electricity_Sales2.xlsx")

my_excel_data <- my_excel_data %>%
  mutate(Date = yearmonth(as.Date(Date)))

# fix format
df_long <- my_excel_data %>%
  pivot_longer(
    cols = -Date,
    names_to = "State",
    values_to = "Sales"
  ) %>%
  filter(State != "United States") %>%
  group_by(Date, State) %>%
  summarise(Sales = sum(Sales), .groups = "drop") %>%
  as_tsibble(key = State, index = Date)

# Create hierarchical
# Census region mapping
region_map <- tibble::tibble(
  State = c(
    "Connecticut","Maine","Massachusetts","New Hampshire","Rhode Island","Vermont",
    "New Jersey","New York","Pennsylvania",
    
    "Illinois","Indiana","Michigan","Ohio","Wisconsin",
    "Iowa","Kansas","Minnesota","Missouri","Nebraska","North Dakota","South Dakota",
    
    "Delaware","Florida","Georgia","Maryland","North Carolina","South Carolina","Virginia",
    "District Of Columbia","West Virginia",
    
    "Alabama","Kentucky","Mississippi","Tennessee",
    "Arkansas","Louisiana","Oklahoma","Texas",
    
    "Arizona","Colorado","Idaho","Montana","Nevada","New Mexico","Utah","Wyoming",
    
    "Alaska","California","Hawaii","Oregon","Washington"
  ),
  
  CensusRegion = c(
    rep("Northeast", 9),
    rep("Midwest", 12),
    rep("South", 17),
    rep("West", 13)
  )
)


# Attach the region to each state
df_regions <- df_long %>%
  left_join(region_map, by = "State")


electricity_heir <- df_regions %>%
  aggregate_key(
    UnitedStates = sum(Sales),
    CensusRegion / State,
    Sales = sum(Sales)
  )


# View the structure
electricity_heir%>%
  as_tibble() %>%
  distinct(State, Sales) %>%
  arrange(State, Sales) %>%
  print(n = 50)

## # A tibble: 672 × 2
##    State       Sales
##    <chr*>      <dbl>
##  1 Alabama    152915
##  2 Alabama    157094
##  3 Alabama    162012
##  4 Alabama    165398
##  5 Alabama    168752
##  6 Alabama    169134
##  7 Alabama    174155
##  8 Alabama    186540
##  9 Alabama    190771
## 10 Alabama    194889
## 11 Alabama    214782
## 12 Alabama    216042
## 13 Alaska      11266
## 14 Alaska      11267
## 15 Alaska      11602
## 16 Alaska      11652
## 17 Alaska      11736
## 18 Alaska      11907
## 19 Alaska      12211
## 20 Alaska      12570
## 21 Alaska      13112
## 22 Alaska      13331
## 23 Alaska      13702
## 24 Alaska      14577
## 25 Arizona    122373
## 26 Arizona    124085
## 27 Arizona    128665
## 28 Arizona    131059
## 29 Arizona    131929
## 30 Arizona    139662
## 31 Arizona    145958
## 32 Arizona    155410
## 33 Arizona    178771
## 34 Arizona    187449
## 35 Arizona    214869
## 36 Arizona    216221
## 37 Arkansas    80848
## 38 Arkansas    82454
## 39 Arkansas    85954
## 40 Arkansas    87694
## 41 Arkansas    88925
## 42 Arkansas    89845
## 43 Arkansas    94454
## 44 Arkansas   100370
## 45 Arkansas   100728
## 46 Arkansas   104722
## 47 Arkansas   115834
## 48 Arkansas   119531
## 49 California 454540
## 50 California 454886
## # ℹ 622 more rows

# Fit ETS models to ALL levels of the hierarchy
electricity_fit <- electricity_heir %>%
  model(base = ETS(Sales))

# This creates forecasts at every level, which may not be coherent
electricity_fit

## # A mable: 56 x 3
## # Key:     CensusRegion, State [56]
##    CensusRegion State                base
##    <chr*>       <chr*>            <model>
##  1 Midwest      Illinois     <ETS(M,N,N)>
##  2 Midwest      Indiana      <ETS(M,N,N)>
##  3 Midwest      Iowa         <ETS(M,N,N)>
##  4 Midwest      Kansas       <ETS(M,N,N)>
##  5 Midwest      Michigan     <ETS(M,N,N)>
##  6 Midwest      Minnesota    <ETS(A,N,N)>
##  7 Midwest      Missouri     <ETS(M,N,N)>
##  8 Midwest      Nebraska     <ETS(M,N,N)>
##  9 Midwest      North Dakota <ETS(A,N,N)>
## 10 Midwest      Ohio         <ETS(M,N,N)>
## # ℹ 46 more rows

# Reconcile forecasts to ensure coherence
reconciled <- electricity_fit %>%
  reconcile(
    bu = bottom_up(base),           # Bottom-up reconciliation
    td = top_down(base)             # Top-down reconciliation  
  )

reconciled

## # A mable: 56 x 5
## # Key:     CensusRegion, State [56]
##    CensusRegion State                base bu           td          
##    <chr*>       <chr*>            <model> <model>      <model>     
##  1 Midwest      Illinois     <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
##  2 Midwest      Indiana      <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
##  3 Midwest      Iowa         <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
##  4 Midwest      Kansas       <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
##  5 Midwest      Michigan     <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
##  6 Midwest      Minnesota    <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
##  7 Midwest      Missouri     <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
##  8 Midwest      Nebraska     <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
##  9 Midwest      North Dakota <ETS(A,N,N)> <ETS(A,N,N)> <ETS(A,N,N)>
## 10 Midwest      Ohio         <ETS(M,N,N)> <ETS(M,N,N)> <ETS(M,N,N)>
## # ℹ 46 more rows

# Generate 2-year ahead forecasts
electricity_fc <- reconciled %>%
  forecast(h = "2 years")

electricity_fc

## # A fable: 4,032 x 6 [1M]
## # Key:     CensusRegion, State, .model [168]
##    CensusRegion State    .model     Date
##    <chr*>       <chr*>   <chr>     <mth>
##  1 Midwest      Illinois base   2026 Jan
##  2 Midwest      Illinois base   2026 Feb
##  3 Midwest      Illinois base   2026 Mar
##  4 Midwest      Illinois base   2026 Apr
##  5 Midwest      Illinois base   2026 May
##  6 Midwest      Illinois base   2026 Jun
##  7 Midwest      Illinois base   2026 Jul
##  8 Midwest      Illinois base   2026 Aug
##  9 Midwest      Illinois base   2026 Sep
## 10 Midwest      Illinois base   2026 Oct
## # ℹ 4,022 more rows
## # ℹ 2 more variables: Sales <dist>, .mean <dbl>

# Plot 
key_vars(electricity_fc)

## [1] "CensusRegion" "State"        ".model"

index_var(electricity_fc)

## [1] "Date"

colnames(electricity_fc)

## [1] "CensusRegion" "State"        ".model"       "Date"         "Sales"       
## [6] ".mean"

electricity_fc %>%
  filter(
    is_aggregated(CensusRegion),
    is_aggregated(State)
  ) %>%
  autoplot(electricity_heir, level = 95) +
  labs(
    title = "Reconciled Forecast: U.S. Total Electricity Sales",
    y = "Sales (Million KW)",
    x = "Month"
  ) +
  theme_minimal()

# Example: plot just the Regions
electricity_fc %>%
  filter(
    !is_aggregated(CensusRegion) &   # keep actual regions
    is_aggregated(State)             # but no state-level lines
  ) %>%
  autoplot(electricity_heir, level = 95) +
  labs(
    title = "Reconciled Forecasts: Electricity Sales by Census Region",
    y = "Sales (Million KW)",
    x = "Month"
  ) +
  theme_minimal()

electricity_fc %>%
  filter(State == "Massachusetts", !is_aggregated(Sales)) %>%  # bottom-level series
  autoplot(electricity_heir, level = 95) +
  labs(
    title = "Reconciled Forecasts: Massachusetts Electricity Sales",
    y = "Sales (Million KW)",
    x = "Month"
  ) +
  theme_minimal()

When looking at the plots it appears that the top down method worked the best for this data. This was surprising given the difference across regions and states for the total electricity sales. The Plots show that the top down method does smooth more with a more narrow 95% band comapred to the bottom up approach and the base.

Module 10

Hannah Farrell

2025-11-15

Module 10

Part 1