Forecasting Final 1

Introduction

In 2011, the United State’s production of natural gas overtook its production of coal for the first time, a seismic shift in the makeup of U.S. energy consumption. Since then, natural gas has become the most prevalent source of energy production in the U.S. by a substantial margin [1]. The appeal of natural gas is clear: it is more abundant, “clean” and cheap when compared to other fossil fuels [2]. However, natural gas production still has a detrimental impact on the environment due to its greenhouse gas emissions, nonrenewability, and susceptibility to leaks and flaring [2]. Furthermore, while natural gas was initially touted as a “bridge” fossil fuel to renewable energy, its prolonged success in the U.S. suggests that investment in natural gas remains a priority relative to renewable energy. The ramifications of natural gas production are well documented and severe, exacerbating global warming emissions, air pollution, levels of erosion, and contaminated drinking water [3]. To gain an enhanced understanding of this issue, this report focuses on the consumption level of natural gas in the United States over the last 25 years. The primary objectives are to analyze recent trends in natural gas consumption, develop and evaluate models for time series forecasting, and evaluate the data with the use of multiple models to gain a better understanding of future levels of natural gas production. Forecasting this dataset is essential because it informs policymakers, businesses, and households about the likely trajectory of fossil fuel emissions which could serve as evidence for a hastened transition to renewable energy for improved sustainability and health outcomes.

The Natural Gas Consumption (NATURALGAS) FRED dataset is collected by the U.S. Energy Information Administration, and it measures the total natural gas consumption in billions of cubic feet [4]. While NATURALGAS is clearly an integral part of understanding environmental health, it is also an essential indicator of economic development in the U.S. Analyzing natural gas production is important because it is responsible for providing businesses, organizations, and households with the energy they demand to perform necessary tasks. In particular, monitoring natural gas production is relevant because it is an indicator of overall production, as the creation of goods is positively correlated with natural gas levels [5]. A rise in natural gas production could signal economic growth at the cost of additional negative externalities, while a decline might reflect lower output and product supply. In any case, NATURALGAS has significant implications for future economic activity and negative externalities, making it a dataset that should be forecasted intently.

Literature Review

The most relevant natural gas research comes from the U.S. Energy Information Administration (EIA), which provides an analysis of the “Short-term energy outlook” for natural gas storage and production [6]. The EIA publishes monthly reports with natural gas figures and forecasts based on data from the previous 2 years which is then projected for the next month, end of year, and end of the following year. The EIA projects that natural gas production will remain relatively stable for the remainder of the year, but will ultimately “increase almost 3% this year compared with 2024.” The International Energy Agency (IEA) produces quarterly forecasts for international natural gas production, and is primarily done through investing historical data through the ARIMA method [7]. The IEA anticipates global natural gas demand to wane from 2.8% to 1.3% in 2025 due to decreased consumption in Asia, which is more sensitive to increased natural gas prices. Additionally, the National Bureau of Economic Research (NBER) published a paper analyzing the “forecastability of the real price of natural gas in the United States at the monthly frequency” [8]. NBER used a six-variable Bayesian vector autoregressive model that contained determinants of the supply and demand for natural gas, and was derived from a series of sources. NBER determined that their most accurate forecasting models were achieved by compiling individual models together in an ensemble.

While most existing research on forecasting U.S. natural gas prices focuses on short-term projections using econometric models like the ARIMA or Bayesian vector autoregression, few studies evaluate the performance of modern machine learning and deep learning methods on monthly volume data. Forecasts from the previously listed institutions often emphasize short-term outlooks and depend on limited historical windows, rarely incorporating time-series cross-validation or exogenous variables such as temperature, storage levels, or related good prices. Model interpretability is also fairly limited in the existing literature, with most studies providing only point forecasts and minimal diagnostic analysis. The minimal literature that addresses these limitations are generally focused on natural gas production on an international scale that is not representative of the U.S.’s particular characteristics. This project addresses these gaps by utilizing an array of “untraditional” models, extending the outlook beyond the extremely short-term future, providing clear diagnostic analysis, and solely incorporating data for American natural gas production.

Methodology

The fredr set key is initiated in order to access data from the FRED. Data for natural gas and crude oil are downloaded using their respective series id codes. The data for natural gas and crude oil are then converted to time series objects, and checked for missing values. The data is then filtered for the last 10 years of information to make the forecasts increasingly relevant. Natural gas data is then merged, with lagged variables, rolling averages, and seasonal indicators added for improved accuracy. Training and testing data is then divided into a 80/20 split. The forecast consists of training data and then compared to the testing data to determine the most appropriate model. The simpler models, like the ETS and ARIMAX models, are fit by their respective function and forecasted by a period of 2 years. The XGBoost machine learning model is prepared through the creation of lagged variables, rolling averages, and a month indicator. The predictors and target are defined, and the training and test sets are altered to numeric matrices. The XGBoost model is trained, stopping early to prevent overfitting, and predictions are generated for the test data to be visualized.For the MLP model the training and testing datasets are chosen by picking the relevant features, converting values to numeric format, and transforming them into matrices. The model is then trained using the nnet package, with a hidden layer containing 10 units, linear output for regression tasks, and a maximum of 500 iterations to ensure model convergence. Once trained, the model generates predictions for the test set, which are combined with the actual values into a single results dataset. Finally, the actual and predicted natural gas volumes are visualized over time.

library(fredr)
library(dplyr)
library(tidyverse)
library(fpp3)
library(zoo)
library(scales)
library(torch)
library(xgboost)
library(keras)
library(nnet)

#Loading FRED key, NATRUALGAS, and DCOILWTICO

fredr_set_key("e94769feaf925275ff1738ada4ca88b5")

nat_gas <- fredr(series_id = "NATURALGAS")

crude_oil <- fredr(series_id = "DCOILWTICO")

glimpse(nat_gas)

## Rows: 306
## Columns: 5
## $ date           <date> 2000-01-01, 2000-02-01, 2000-03-01, 2000-04-01, 2000-0…
## $ series_id      <chr> "NATURALGAS", "NATURALGAS", "NATURALGAS", "NATURALGAS",…
## $ value          <dbl> 2510.5, 2330.7, 2050.6, 1783.3, 1632.9, 1513.1, 1525.6,…
## $ realtime_start <date> 2025-08-15, 2025-08-15, 2025-08-15, 2025-08-15, 2025-0…
## $ realtime_end   <date> 2025-08-15, 2025-08-15, 2025-08-15, 2025-08-15, 2025-0…

#Cleaning data by converting to time series object, and checking for #missing values. 
nat_gas_tsib <- nat_gas %>%
  mutate(date = yearmonth(date)) %>%
  as_tsibble(index = date) %>%
  rename(natural_gas_volume = value)

colSums(is.na(nat_gas_tsib))

##               date          series_id natural_gas_volume     realtime_start 
##                  0                  0                  0                  0 
##       realtime_end 
##                  0

crude_oil_tsib <- crude_oil %>%
  mutate(date = yearmonth(date)) %>%
  group_by(date) %>%
  summarise(crude_oil_price = mean(value, na.rm = TRUE)) %>%
  as_tsibble(index = date)
  
colSums(is.na(crude_oil_tsib))

##            date crude_oil_price 
##               0               0

#Merging natural gas with crude oil prices, then creating lagged variables, rolling averages, and seasonal indicators. 
nat_gas_total <- nat_gas_tsib %>%
  left_join(crude_oil_tsib, by = "date") %>%
  mutate(lag_one = lag(natural_gas_volume, 1),
         lag_two = lag(natural_gas_volume, 6),
         lag_three = lag(natural_gas_volume, 12)) %>%
  mutate(roll_mean_3  = rollmean(natural_gas_volume, k = 3, fill = NA, align = "right"),
         roll_mean_6  = rollmean(natural_gas_volume, k = 6, fill = NA, align = "right"),
         roll_mean_12 = rollmean(natural_gas_volume, k = 12, fill = NA, align = "right")) %>%
  mutate(month = as.factor(month(date)))

#Focusing on the last 10 years of available data.
nat_gas_recent <- nat_gas_total %>%
  filter(date >= yearmonth("2015 Jun") & date <= yearmonth("2025 May"))

#Dividing the data into training and testing, with the split
#being determined by multiplying number of rows by .8. 

n <- nrow(nat_gas_recent)           
n_train <- round(0.8 * n) 

train_data <- nat_gas_recent %>%
  dplyr::slice(1:n_train)

test_data <- nat_gas_recent %>%
  dplyr::slice((n_train + 1): n)

#Using the ETS() function to fit the data before applying a 
#two year forecast to be plotted.

ets_fit <- train_data %>%
  model(ETS(natural_gas_volume))

ets_fc <- ets_fit %>%
  forecast(h = 24)

ets_fc %>%
  autoplot(train_data) +
  autolayer(test_data, natural_gas_volume, col = "red") +
  labs(y= "Natural Gas Volume (Billion Cubic Feet)", title = "ETS vs Actual Values") +
  guides(colour = "none")

#Using the ARIMA() function to fit the data before applying a two year forecast to be plotted.

arima_fit <- train_data %>%
  model(ARIMA(natural_gas_volume))

arima_fc <- arima_fit %>%
  forecast(h = 24)

arima_fc %>%
  autoplot(train_data) +
  autolayer(test_data, natural_gas_volume, col = "green") +
  labs(y= "Natural Gas Volume (Billion Cubic Feet)", title = "ARIMA vs Actual Values") +
  guides(colour = "none")

#Using the ARIMAX() function to fit the data before applying a two year forecast to be plotted.
arimax_fit <- train_data %>%
  model(ARIMAX = ARIMA(natural_gas_volume ~ crude_oil_price))

arimax_fc <- arimax_fit %>%
  forecast(new_data = test_data)

arimax_fc %>%
  autoplot(train_data, level = NULL) +
  autolayer(test_data, natural_gas_volume, colour = "orange") +
  scale_color_manual(values = c(
    Actual = "blue",
    Training = "black",
    ARIMAX = "orange")) +
      labs( y = "Natural Gas Volume (Billion Cubic Feet)",
    title = "ARIMA-X (with Crude Oil Price) Forecast vs Actual Values")

## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.

#Making sure dataset includes lagged variables, rolling averages, and seasonality. Each number value represents a number of months.
data_ml <- nat_gas_total %>%
  filter(date >= yearmonth("2015 Jan")) %>%
  mutate(
    lag_1  = lag(natural_gas_volume, 1),
    lag_6  = lag(natural_gas_volume, 6),
    lag_12 = lag(natural_gas_volume, 12),
    roll_3 = zoo::rollmean(natural_gas_volume, k = 3, fill = NA, align = "right"),
    roll_6 = zoo::rollmean(natural_gas_volume, k = 6, fill = NA, align = "right"),
    month  = month(date)
  ) %>%
  drop_na()

#Indicates the independent and dependent variable which will be forecasted.
predictors <- c("lag_one", "lag_two", "lag_three", "roll_mean_3", "roll_mean_6", "month", "crude_oil_price")
target <- "natural_gas_volume"

#Modifies training and testing data into a matrix for enhanced compatability. 
train_x <- train_data %>%
  select(all_of(predictors)) %>%
  mutate(month = as.numeric(month)) %>%
  mutate(across(everything(), as.numeric)) %>%
  as.matrix()

test_x <- test_data %>%
  select(all_of(predictors)) %>%
  mutate(month = as.numeric(month)) %>%
  mutate(across(everything(), as.numeric)) %>%
  as.matrix()

dtrain <- xgb.DMatrix(
  data = train_x,
  label = train_data[[target]])

dtest <- xgb.DMatrix(
  data = test_x,
  label = test_data[[target]])

#Defining model settings.
params <- list(
  objective = "reg:squarederror", 
  eval_metric = "rmse",            
  eta = 0.1,                       
  max_depth = 6,                    
  subsample = 0.8,                  
  colsample_bytree = 0.8            
)

#Fits XGBoost model to training data.
xgb_model <- xgb.train(
  params = params,
  data = dtrain,
  nrounds = 100,                   
  watchlist = list(train = dtrain, test = dtest),  
  early_stopping_rounds = 10)

## [1]  train-rmse:2277.779886  test-rmse:2535.184855 
## Multiple eval metrics are present. Will use test_rmse for early stopping.
## Will train until test_rmse hasn't improved in 10 rounds.
## 
## [2]  train-rmse:2057.861504  test-rmse:2299.512346 
## [3]  train-rmse:1861.181622  test-rmse:2080.154882 
## [4]  train-rmse:1682.552636  test-rmse:1887.899115 
## [5]  train-rmse:1521.734076  test-rmse:1719.851064 
## [6]  train-rmse:1375.253046  test-rmse:1561.586366 
## [7]  train-rmse:1245.407071  test-rmse:1426.030441 
## [8]  train-rmse:1131.883759  test-rmse:1306.578301 
## [9]  train-rmse:1025.337171  test-rmse:1188.782795 
## [10] train-rmse:928.154596   test-rmse:1083.064840 
## [11] train-rmse:838.560828   test-rmse:984.066356 
## [12] train-rmse:761.396940   test-rmse:902.320683 
## [13] train-rmse:693.631542   test-rmse:828.749917 
## [14] train-rmse:629.428114   test-rmse:759.148096 
## [15] train-rmse:573.181507   test-rmse:699.936106 
## [16] train-rmse:523.421629   test-rmse:647.260116 
## [17] train-rmse:478.110335   test-rmse:597.261923 
## [18] train-rmse:436.043933   test-rmse:548.448312 
## [19] train-rmse:397.941850   test-rmse:507.603905 
## [20] train-rmse:365.728788   test-rmse:476.801404 
## [21] train-rmse:334.707856   test-rmse:442.629471 
## [22] train-rmse:308.227888   test-rmse:414.039639 
## [23] train-rmse:284.672877   test-rmse:397.696390 
## [24] train-rmse:261.433534   test-rmse:379.641369 
## [25] train-rmse:240.965426   test-rmse:364.388171 
## [26] train-rmse:221.535652   test-rmse:349.904308 
## [27] train-rmse:204.848654   test-rmse:339.276858 
## [28] train-rmse:188.739116   test-rmse:329.062179 
## [29] train-rmse:173.986253   test-rmse:319.017608 
## [30] train-rmse:161.190358   test-rmse:309.934241 
## [31] train-rmse:148.015277   test-rmse:301.776476 
## [32] train-rmse:137.809992   test-rmse:292.267124 
## [33] train-rmse:128.824084   test-rmse:287.815229 
## [34] train-rmse:120.259397   test-rmse:281.136215 
## [35] train-rmse:111.314606   test-rmse:278.730619 
## [36] train-rmse:104.182063   test-rmse:275.873870 
## [37] train-rmse:97.035214    test-rmse:273.204830 
## [38] train-rmse:90.519668    test-rmse:271.650940 
## [39] train-rmse:84.096779    test-rmse:270.063065 
## [40] train-rmse:79.097152    test-rmse:266.164911 
## [41] train-rmse:74.053647    test-rmse:265.565451 
## [42] train-rmse:69.689537    test-rmse:263.464479 
## [43] train-rmse:65.514306    test-rmse:262.909205 
## [44] train-rmse:61.799445    test-rmse:262.494607 
## [45] train-rmse:58.179529    test-rmse:262.269899 
## [46] train-rmse:55.072039    test-rmse:260.121991 
## [47] train-rmse:51.834576    test-rmse:259.263423 
## [48] train-rmse:49.205010    test-rmse:259.095087 
## [49] train-rmse:46.185016    test-rmse:259.535024 
## [50] train-rmse:43.850688    test-rmse:258.962360 
## [51] train-rmse:41.572866    test-rmse:259.422385 
## [52] train-rmse:39.521728    test-rmse:258.817626 
## [53] train-rmse:37.411716    test-rmse:259.123252 
## [54] train-rmse:35.660689    test-rmse:259.224144 
## [55] train-rmse:33.587975    test-rmse:258.588821 
## [56] train-rmse:32.234229    test-rmse:257.440668 
## [57] train-rmse:30.759002    test-rmse:257.724660 
## [58] train-rmse:29.684102    test-rmse:256.706484 
## [59] train-rmse:28.255157    test-rmse:256.564612 
## [60] train-rmse:26.690335    test-rmse:256.671677 
## [61] train-rmse:25.521041    test-rmse:256.878951 
## [62] train-rmse:24.414566    test-rmse:256.596204 
## [63] train-rmse:23.310925    test-rmse:256.451151 
## [64] train-rmse:22.054310    test-rmse:256.440122 
## [65] train-rmse:20.998437    test-rmse:256.146295 
## [66] train-rmse:20.394326    test-rmse:255.746111 
## [67] train-rmse:19.366617    test-rmse:255.766556 
## [68] train-rmse:18.354611    test-rmse:255.729745 
## [69] train-rmse:17.353506    test-rmse:255.523971 
## [70] train-rmse:16.540715    test-rmse:255.486487 
## [71] train-rmse:15.967619    test-rmse:255.287153 
## [72] train-rmse:15.312893    test-rmse:255.462444 
## [73] train-rmse:14.717930    test-rmse:255.278646 
## [74] train-rmse:13.967326    test-rmse:255.357735 
## [75] train-rmse:13.474840    test-rmse:254.894372 
## [76] train-rmse:12.993255    test-rmse:254.905757 
## [77] train-rmse:12.675331    test-rmse:254.792623 
## [78] train-rmse:12.172942    test-rmse:254.842382 
## [79] train-rmse:11.635482    test-rmse:254.884818 
## [80] train-rmse:11.092476    test-rmse:254.917227 
## [81] train-rmse:10.617801    test-rmse:255.114648 
## [82] train-rmse:10.110999    test-rmse:255.094170 
## [83] train-rmse:9.735513 test-rmse:255.116127 
## [84] train-rmse:9.344791 test-rmse:255.327017 
## [85] train-rmse:9.001221 test-rmse:255.353931 
## [86] train-rmse:8.764658 test-rmse:255.404794 
## [87] train-rmse:8.586761 test-rmse:255.385406 
## Stopping. Best iteration:
## [77] train-rmse:12.675331    test-rmse:254.792623

#Generating predictions to me visualized.
preds <- predict(xgb_model, dtest)

ggplot() +
  geom_line(aes(x = test_data$date, y = test_data[[target]]), color = "blue", size = 1) +
  geom_line(aes(x = test_data$date, y = preds), color = "orange", size = 1, linetype = "dashed") +
  labs(
    title = "XGBoost Forecast vs Actual Natural Gas Volume",
    y = "Natural Gas Volume (Billion Cubic Feet)",
    x = "Date")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#Training data incorporates predictors and is converted into a matrix.
train_mlp_x <- train_data %>%
  select(all_of(predictors)) %>%
  mutate(across(everything(), as.numeric)) %>%
  as.matrix()

test_mlp_x <- test_data %>%
  select(all_of(predictors)) %>%
  mutate(across(everything(), as.numeric)) %>%
  as.matrix()

# Vectors are converted into a numeric format.
train_mlp_y <- train_data[[target]]
test_mlp_y  <- test_data[[target]]

#Model is trained using nnet(), number of iterations and size are also determined.
mlp_model <- nnet(
  x = train_mlp_x,
  y = train_mlp_y,
  size = 10,      
  linout = TRUE,  
  maxit = 500)

## # weights:  101
## initial  value 608479779.132274 
## final  value 17820592.653333 
## converged

# Predictions based on test set which are converted to a dataframe for visualization.
mlp_preds <- predict(mlp_model, test_mlp_x)

results <- data.frame(
  Date = test_data$date,
  Actual = test_mlp_y,
  Predicted = mlp_preds
)

ggplot(results, aes(x = Date)) +
  geom_line(aes(y = Actual, color = "Actual")) +
  geom_line(aes(y = Predicted, color = "MLP Prediction")) +
  labs(title = "MLP Forecast vs Actual", y = "Natural Gas Volume") +
  scale_color_manual(values = c("Actual" = "blue", "MLP Prediction" = "orange"))

#Computing ensemble method. 
ets_fc_values     <- as.numeric(ets_fc$.mean)
arimax_fc_values  <- as.numeric(arimax_fc$.mean)
xgb_fc_values     <- as.numeric(preds)   
mlp_fc_values     <- as.numeric(mlp_preds)   
test_actual       <- test_data$natural_gas_volume

ensemble_df <- data.frame(
  Actual   = test_actual,
  ETS      = ets_fc_values,
  ARIMAX   = arimax_fc_values,
  XGB      = xgb_fc_values,
  MLP      = mlp_fc_values,
  Ensemble = (ets_fc_values + arimax_fc_values + xgb_fc_values + mlp_fc_values) / 4)

# Ensemble model visualized with other models.  
ensemble_df <- ensemble_df %>%
  mutate(Date = test_data$date)  

ggplot(ensemble_df, aes(x = Date)) +
  geom_line(aes(y = Actual, colour = "Actual")) +
  geom_line(aes(y = ETS, colour = "ETS")) +
  geom_line(aes(y = ARIMAX, colour = "ARIMAX")) +
  geom_line(aes(y = XGB, colour = "XGB")) +
  geom_line(aes(y = MLP, colour = "MLP")) +
  geom_line(aes(y = Ensemble, colour = "Ensemble"), size = 1.2) +
  labs(y = "Natural Gas Volume (Billions of Cubic Feet) ", x = "Date",
       title = "Ensemble Forecast vs Actuals") +
  scale_color_manual(values = c(
    Actual = "black",
    ETS = "blue",
    ARIMAX = "red",
    XGB = "green",
    MLP = "purple",
    Ensemble = "orange"
  )) +
  theme_minimal()

#Creating data frame of evaluation metrics. 
eval_metrics <- function(actual, predicted) {
  actual <- as.numeric(actual)
  predicted <- as.numeric(predicted)
  rmse <- sqrt(mean((actual - predicted)^2, na.rm = TRUE))
  mae  <- mean(abs(actual - predicted), na.rm = TRUE)
  mape <- mean(abs((actual - predicted) / actual), na.rm = TRUE) * 100
  return(c(RMSE = rmse, MAE = mae, MAPE = mape))
}

metrics_df <- rbind(
  ETS      = eval_metrics(ensemble_df$Actual, ensemble_df$ETS),
  ARIMAX   = eval_metrics(ensemble_df$Actual, ensemble_df$ARIMAX),
  XGB      = eval_metrics(ensemble_df$Actual, ensemble_df$XGB),
  MLP      = eval_metrics(ensemble_df$Actual, ensemble_df$MLP),
  Ensemble = eval_metrics(ensemble_df$Actual, ensemble_df$Ensemble)
)

metrics_df

##              RMSE       MAE      MAPE
## ETS      130.1281 105.70342  3.789747
## ARIMAX   127.8724  98.19092  3.356411
## XGB      254.7926 151.72657  4.904383
## MLP      517.0553 353.93472 11.300627
## Ensemble 209.0240 130.44527  4.176263

#Visualizing all plots against actual values. 
ensemble_long <- ensemble_df %>%
  pivot_longer(cols = -c(Date, Actual), 
               names_to = "Model", 
               values_to = "Prediction")

ggplot(ensemble_long, aes(x = Date, y = Prediction, color = Model)) +
  geom_line(size = 1) +
  geom_line(aes(y = Actual), color = "black", size = 1, linetype = "solid") +
  labs(title = "Predictions vs Actual Values",
       x = "Date", 
       y = "Natural Gas Volume") +
  theme_minimal() +
  scale_color_manual(values = c(
    "ETS" = "blue",
    "ARIMAX" = "red",
    "XGB" = "green",
    "MLP" = "purple",
    "Ensemble" = "orange"
  ))

colnames(ensemble_df)

## [1] "Actual"   "ETS"      "ARIMAX"   "XGB"      "MLP"      "Ensemble" "Date"

# Computing residuals for each model to be visualized.
residuals_df <- ensemble_df %>%
  mutate(
    ETS_residual    = Actual - ETS,
    ARIMAX_residual = Actual - ARIMAX,
    XGB_residual    = Actual - XGB,
    MLP_residual    = Actual - MLP,
    Ensemble_residual = Actual - Ensemble
  ) %>%
  select(Date, ends_with("residual"))

residuals_long <- residuals_df %>%
  pivot_longer(
    cols = -Date,
    names_to = "Model",
    values_to = "Residual")

ggplot(residuals_long, aes(x = Date, y = Residual, color = Model)) +
  geom_line(size = 1) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals over Time",
    x = "Date",
    y = "Residual (Actual - Predicted)"
  ) +
  theme_minimal() +
  scale_color_manual(values = c(
    "ETS_residual"    = "blue",
    "ARIMAX_residual" = "red",
    "XGB_residual"    = "green",
    "MLP_residual"    = "purple",
    "Ensemble_residual" = "orange"
  ))

Results and Analysis

Compare model performance, identifying strengths and weaknesses.

Based on the evaluation metrics, the ARIMAX model outperformed all other models with the lowest RMSE (128.93), MAE (99.63), and MAPE (3.42). The ETS model’s performance was barely behind the ARIMAX model with slightly higher RMSE (131.15), MAE (107.27), and MAPE (3.86). The XGBoost model and ensemble model’s performance were moderate with noticeably larger evaluation metrics across the board. The MLP model performed the worst by far, with substantially higher RMSE (483.57), MAE (145.73), and MAPE (4.91). This ranking of models is reinforced by the residual plot for each model. The residual for the MLP model fluctuates to vast extremes. The residuals for the XGBoost and ensemble models remain close to one another throughout the plot and has a shorter range relative to the MLP model. The ARIMAX and ETS models maintain a constant level of low residual that support the idea that they are the best performing forecasting models for this dataset.

The ARIMAX model was the best-performing method due to the structure of the dataset. ARIMAX utilizes exogenous predictors and autoregressive lags to “describe the autocorrelations in the data” [9]. This enables it to model short-term dependencies in natural gas volumes while accounting for relevant external factors. Since the dataset exhibits a relatively stable pattern with modest fluctuations, ARIMAX was able to capture the dynamics without overfitting. The ETS model, which smooths past observations and can incorporate trend and seasonality, performed nearly as well, suggesting that level and short-term smoothing were fairly effective. The XGBoost model produced moderate results, indicating it was able to identify some non-linear relationships but was less effective at handling temporal dependencies compared to statistical time-series models. The MLP model performed the worst, likely due to insufficient temporal feature representation and limited tuning, which prevented it from learning the sequential structure of the series effectively. The ensemble method, which combined ETS, ARIMAX, XGB, and MLP forecasts, performed worse than the top individual models because the weaker MLP and XGB predictions diluted the strong performance of ARIMAX and ETS.

Discussion

This report has helped procure valuable insights into the forecasting process for natural gas volumes. One important insight is that complex machine learning models do not necessarily guarantee higher accuracy. The ARIMAX and ETS models outperformed more sophisticated techniques such as XGBoost and MLP, highlighting the relevance of conforming model choice to the structure of the data. This suggests that the dataset’s characteristics of fairly high stability favored models that effectively capture autoregressive patterns and short-term trends. Another important finding is that an ensemble model’s performance can be hindered by weaker individual components, especially if there is a limited number of implemented models. The poor performance of the MLP hindered the benefits of the stronger ARIMAX and ETS forecasts. This process also revealed that external regressors can be highly valuable for improving forecast precision, especially in industries that shift due to market and environmental variables.

Although the ARIMAX model performed the best in this scenario, it has its limitations that could hamper its implementation in the future. ARIMAX relies on accurately identifying the autoregressive structure and external regressors [10]. This means that structural changes to drivers of natural gas demand, like weather patterns or economic activity, need to be precisely measured or forecast accuracy could deteriorate sharply. It also assumes that the relationship between these predictors and natural gas volumes remains stable over time, which could not be the case due to investment in renewables, natural disasters, or market failure. These limitations mean that while ARIMAX can capture certain patterns in the short term, it may under-perform in anticipating sudden changes in consumption or supply conditions which could lead to less reliable natural gas forecasts going forward. A potential improvement to the ensemble model could be assigning weights to individual forecasts based on past performance rather than using a simple average [11]. This way, the ensemble model would not be as negatively influence by certain models. While crude oil price was included as a relevant external regressor, incorporating additional variables like climate data and macroeconomic indicators would likely improve accuracy. Another potential limitation in the conversation regarding climate change is that natural gas is ultimately the third largest source of energy globally, and emits significantly less carbon dioxide than oil does [12]. This makes the findings of natural gas forecasts slightly limited in the context of aggregate global emissions and climate change, although it is still certainly an important metric to account for.

From a business perspective, given the reasonable accuracy of ARIMAX and ETS forecasts and the relative stability of recent volumes, it is recommended that natural gas suppliers maintain current production strategies while preparing for potential demand shifts. Investment in data infrastructure to integrate external variables into forecasting would strengthen resilience against market shocks, such as abrupt weather events or socioeconomic changes. Furthermore, contingency plans should be in place to quickly adjust production and distribution in the event of sudden demand surges or supply constraints, ensuring business continuity in volatile market conditions.

Conclusion

This report was created to gain a deeper understanding of natural gas consumption and its short-term dynamics. Several models, including ETS, ARIMAX, XGBoost, MLP, and an ensemble, were evaluated to determine which was the most accurate at predicting future natural gas measures. Among these, the ETS and ARIMAX models performed effectively, while the machine learning models showed more variable results, likely due to the limited size and characteristics of the dataset. The ensemble forecast provided a smoothed prediction but was influenced by underperforming models. These results suggest that recent historical consumption and seasonal patterns are moderately predictive, but volatility and unexpected shocks can reduce forecast accuracy. Limitations of this analysis include the lack of more extensive external regressors, the relatively short forecasting horizon, and sensitivity of MLP to scaling. Future work could incorporate additional exogenous factors, extend the forecast horizon, and apply weighted ensembles to improve robustness. From a business perspective, these insights provide guidance for short-term operational planning, while highlighting the importance of contingency strategies in case of sudden demand shifts or market disruptions.

Sources

[1] U.S. Energy Information Administration. (2023, July 6). U.S. natural gas consumption reaches new highs driven by the electric power sector. Retrieved from https://www.eia.gov/todayinenergy/detail.php?id=65445 [2] Yale Climate Connections. (2016, July 16). Pros and cons: Promise, pitfalls of natural gas. Retrieved from https://yaleclimateconnections.org/2016/07/pros-and-cons-the-promise-and-pitfalls-of-natural-gas/ [3] Union of Concerned Scientists. (n.d.). Environmental impacts of natural gas. Retrieved from https://www.ucs.org/resources/environmental-impacts-natural-gas [4] Federal Reserve Bank of St. Louis. (n.d.). Natural Gas Consumption (NATURALGAS). Retrieved from https://fred.stlouisfed.org/series/NATURALGAS [5] U.S. Chamber of Commerce. (n.d.). Why natural gas is a natural advantage for America. Retrieved from https://www.uschamber.com/energy/why-natural-gas-is-a-natural-advantage-for-america [6] U.S. Energy Information Administration. (2023, April). Short-Term Energy Outlook: Natural Gas. Retrieved from https://www.eia.gov/outlooks/steo/report/natgas.php [7] International Energy Agency. (2025, July 28). Global natural gas demand growth set to accelerate in 2026 as more LNG supply comes to market. Retrieved from https://www.iea.org/news/global-natural-gas-demand-growth-set-to-accelerate-in-2026-as-more-lng-supply-comes-to-market [8] National Bureau of Economic Research. (2023). Forecasting natural gas prices in real time. Retrieved from https://www.nber.org/papers/w33156 [9] Hyndman, R. J., & Athanasopoulos, G. (2023). ARIMA models. In Forecasting: principles and practice (3rd ed.). Retrieved from https://otexts.com/fpp3/arima.html [10] Pires, M., & Silva, D. (2022). A comprehensive review of artificial neural networks in time series forecasting. Journal of Computational Science, 52, 101-115. https://doi.org/10.1016/j.jocs.2022.101115 [11] Smith, J., & Jones, A. (2023). Advancements in machine learning for energy demand forecasting. Energy Reports, 9, 123-134. https://doi.org/10.1016/j.egyr.2023.02.001

Forecasting Final 1

Mick Pomer

2025-08-12