1. Select monthly time series

Monthly frequency, 100+ observations

The dataset chosen tracks the Henry Hub Natural Gas Spot Price and is measured in dollars per million BTU. It is not seasonally adjusted, and is released monthly by the U.S. Energy Information Administration(EIA). This series begins January 1997 through January 2026 with 349 monthly observations.

# Set fredr API key
fredr_set_key("bd76ff8fca2c1d0a902a2ccfce1e643a") #API key
api_key <- fredr_get_key()

# Get data - Henry Hub Natural Gas Spot Price

# Spikes at 2001, 2005, 2008, 2022
henryhub_data <- fredr(series_id = "MHHNGSP",
                       observation_start = as.Date("1997-01-01"),
                       observation_end = as.Date("2026-01-01"),
                       frequency = "m")

# Tsibble
gas_price <- henryhub_data |>
  select(date, value) |>
  mutate(date = yearmonth(date)) |>
  as_tsibble(index = date)

# Direct link to source (cite source)
# https://fred.stlouisfed.org/series/MHHNGSP

2. Training/Test Data Sets

The 80:20 train/test split allowed for 279 training observations and 70 test observations

# Create train/test datasets
train <- gas_price |> filter(date <= yearmonth("2020 Mar"))
test  <- gas_price |> filter(date > yearmonth("2020 Mar"))

# Check split
print(c(nrow(train),nrow(test))) # 279, approx. 80%, 70, approx. 20%
## [1] 279  70

3. Training Set

Explore & Decompose

The time series shows some extreme spikes, and these can be explained by a few events that occurred in the US and globally.Around 2000-2001, we see a big spike in price due to hydropower conditions being dire in the West which resulted in extremely high wholesale electricity prices. We see another spike about February 2003 due to extremely low inventory levels and colder-than-normal winter weather in the Northeast. The 2005 spike can be explained by hurricane season, or more specifically Hurricane Katrina. The 2008 market caused prices to be driven up because there was fear the demand would exceed supply; although, production was strong. The 2014 Polar Vortex caused “freeze-off” supply disruptions to gas coming from Canada and the US Rockies. In 2018 the British Columbia Pipeline rupture and fire completely shut down the flow for a period of time. TREND - We see several changes in trend throughout the entire dataset, which can can be explained mostly by extreme changes in demand due to weather. The Henry Hub continues to be a spot that acts as the baseline price, whether other spots have higher or lower prices. SEASONAL - Seasonality is present, but weak and dominated by external shocks. The seasonal plot does a good job of showing the specific events listed above, with Hurricane Katrina having the most drastic spike among the individual years. I zoomed in on the years surrounding this time period.This demonstrates the sharp spike in 2005 and the gradual return to baseline by 2007.

# Time plot (train set)
autoplot(train, value) +
  labs(title = "US Henry Hub Natural Gas Spot Price (1997 - 2026)",
       y = "Dollars per Million BTU",
       x = "Year")

# Seasonal plot (train set)
train |>
  gg_season(value, labels = "both") +
  labs(
    y     = "Dollars per Million BTU",
    title = "Seasonal Plot: Henry Hub Natural Gas Spot Price")
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Zoom in on the Katrina period (train set)
train |>
  filter(year(date) >= 2004, year(date) <= 2007) |>
  autoplot(value) +
  labs(y = "Dollars per Million BTU",
       title = "Henry Hub Natural Gas Spot Price: Katrina Period (2004-2007)")

# Multiplicative Decomposition plot
train |>
  model(
    classical_decomposition(value, type = "multiplicative")
  ) |>
  components() |>
  autoplot() +
  labs(
    title = "Multiplicative Decomposition of Natural Gas Spot Price")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

# STL decomposition
train |>
  model(
    STL(value ~ trend(window=21) +
          season(window = 'periodic'),
        robust = TRUE)) |>
  components() |>
  autoplot()

STL produces a more conservative seasonal estimate because it doesn’t allow extreme observations to distort the seasonal component, which is the appropriate behavior for a shock-driven series like natural gas prices.

4. Tranformations

Apply transformations to the training set

Visual inspection revealed variance instability around spike periods, the Guerrero method confirmed transformation was warranted with lambda = -0.503, the Box-Cox transformed series showed improved variance stability visually, and all subsequent modeling was conducted on the transformed data.

# Variance stability and stationarity tests (train set)
lambda <- train |>
  features(value, features = guerrero) |>
  pull(lambda_guerrero)

train |>
  autoplot(box_cox(value, lambda)) +
  labs(y = "",
       title = paste("Transformed Natural Gas Price with lambda =",
                     round(lambda,2)))

train_bc <- train |>
  mutate(value = box_cox(value, lambda))

# ACF (lag_max = 24)
train |>
  ACF(value, lag_max = 24) |>
  autoplot()

train_bc |>
  ACF(value, lag_max = 24) |>
  autoplot()

The Box-Cox transformation successfully stabilized variance as shown by the visual comparison. The ACF reveals significant autocorrelation structure in the transformed series.

5. Model & Forecast

Model and forecast using the training set

The NAIVE model serves as the baseline, forecasting each future value as equal to the most recent observed value. Fitted on the Box-Cox transformed training data.

The regression model decomposes price into trend and seasonal dummy components. Despite being the more sophisticated model, the near-zero R² (0.013) and insignificant F-statistic (p = 0.988) indicate that trend and seasonality explain virtually none of the variation in natural gas prices, which are dominated by unpredictable external shocks rather than systematic patterns.

# NAIVE model (train set)
naive_fit <- train_bc |>
  model(NAIVE(value)) 

report(naive_fit)          # sigma^2 = 0.0045, measuring forecast error variance
## Series: value 
## Model: NAIVE 
## 
## sigma^2: 0.0045
gg_tsresiduals(naive_fit)  # residuals appear as white noise with no significant ACF spikes,
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_rug()`).

                           # indicating the model leaves little systematic structure behind
augment(naive_fit)         # fitted values and residuals stored for further diagnostics
accuracy(naive_fit)        # training set accuracy metrics for reference
# TSLM model (train set)
tslm_fit <- train_bc |>
  model(
    TSLM(value ~ trend() + season()))

report(tslm_fit)  # full model summary including R² and F-statistic
## Series: value 
## Model: TSLM 
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.525549 -0.145892 -0.007757  0.185330  0.483598 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.9959853  0.0528146  18.858   <2e-16 ***
## trend()        -0.0002535  0.0001717  -1.477    0.141    
## season()year2  -0.0350478  0.0666349  -0.526    0.599    
## season()year3  -0.0510895  0.0666356  -0.767    0.444    
## season()year4  -0.0210664  0.0673571  -0.313    0.755    
## season()year5  -0.0076821  0.0673560  -0.114    0.909    
## season()year6   0.0035428  0.0673553   0.053    0.958    
## season()year7  -0.0131926  0.0673551  -0.196    0.845    
## season()year8  -0.0172804  0.0673553  -0.257    0.798    
## season()year9  -0.0206283  0.0673560  -0.306    0.760    
## season()year10 -0.0032848  0.0673571  -0.049    0.961    
## season()year11 -0.0011915  0.0673586  -0.018    0.986    
## season()year12  0.0102795  0.0673606   0.153    0.879    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2308 on 266 degrees of freedom
## Multiple R-squared: 0.01351, Adjusted R-squared: -0.031
## F-statistic: 0.3035 on 12 and 266 DF, p-value: 0.98848
tidy(tslm_fit)    # coefficient estimates for trend and seasonal dummy variables
glance(tslm_fit)  # model fit statistics in a single tidy row

6. Evaluate

# Apply the same Box-Cox transformation (lambda = -0.503) to the test set
# to ensure forecasts and actuals are on the same scale for valid comparison
test_bc <- test |>
  mutate(value = box_cox(value, lambda))

# NAIVE forecast extended into the test period
# The flat forecast with wide confidence intervals reflects the model's inability
# to anticipate the extreme 2021-2022 price surge
forecast(naive_fit, h = nrow(test)) |>
  autoplot(train_bc) +
  labs(title = "NAIVE Forecast vs Test Data")

# Overlay actual test values (red) against NAIVE forecast
# The red line dramatically exceeds the forecast bands, driven by 
# COVID-era supply disruptions and the 2021-2022 energy price spike
forecast(naive_fit, new_data = test_bc) |>
  autoplot(train_bc) +
  autolayer(test_bc, value, colour = "red") +
  labs(title = "NAIVE Forecast vs Actual Test values",
       y = "Box-Cox Transformed Price")

# TSLM forecast extended into the test period
# The seasonal wiggle pattern reflects the trend and seasonal dummy components
# but anchors too closely to the subdued price levels of the training period
forecast(tslm_fit, h = nrow(test)) |>
  autoplot(train_bc) +
  labs(title = "TSLM Forecast vs Test Data")

# Overlay actual test values (red) against TSLM forecast
# Despite poor training diagnostics, TSLM outperforms NAIVE on the test set
# (RMSE: 0.244 vs 0.425) as NAIVE is penalized heavily for anchoring to
# the last observed value during a period of extreme price movement
forecast(tslm_fit, new_data = test_bc) |>
  autoplot(train_bc) +
  autolayer(test_bc, value, colour = "red") +
  labs(title = "TSLM Forecast vs Actual Test Values",
       y = "Box-Cox Transformed Price")

# Combined accuracy comparison across both models
# TSLM outperforms NAIVE on all three test set metrics despite worse
# training diagnostics, highlighting the limitations of both approaches
# during the unpredictable 2020-2025 test period
bind_rows(
  accuracy(forecast(naive_fit, new_data = test_bc), test_bc),
  accuracy(forecast(tslm_fit, new_data = test_bc), test_bc)
) |>
  select(.model, RMSE, MAE, MAPE)