Natural gas currently accounts for 43.1% of all energy
produced in the United States. In particular, it is used for heating
homes during winter season. With a market of that size, companies that
supply natural gas have to be able to predict the demand for their good
so they can effectively get it to homes and avoid shortages while also
managing to maintain profits. To do so, they use advanced statistical
models, which aid in forecasting future demand for natural gas.
This paper will demonstrate some of the forecast methods, compare
their performance, and make a suggestion on which of these models
performs best and should, therefore, be used when attempting to predict
demand for natural gas.
In particular, I will explore the
Naive, Drift, Seasonal Naive, and ETS models, using ACF plots of their
residuals as well as R-squared to evaluate their performance.
The dataset I will use for my analysis is an EIA dataset,
providing me with the total amount of natural gas consumed every month
in MMcf (Million of cubic feet). The original dataset ranges from
January 2001 to December 2023, however, as per the official guidelines
of this project, I have restricted the dataset only to contain
information from 2005 to 2009 or five total years. Luckily, the data is
pre-cleaned, so other than turning the dataset into a tsibble object,
this project does not require extensive cleaning or data manipulation.
## # A tsibble: 10 x 4 [1M]
## Date EnergyConsumption TIME YEAR
## <chr> <dbl> <mth> <dbl>
## 1 January 2005 2561858 2005 Jan 2005
## 2 February 2005 2242986 2005 Feb 2005
## 3 March 2005 2205787 2005 Mar 2005
## 4 April 2005 1724877 2005 Apr 2005
## 5 May 2005 1522613 2005 May 2005
## 6 June 2005 1534122 2005 Jun 2005
## 7 July 2005 1686609 2005 Jul 2005
## 8 August 2005 1695102 2005 Aug 2005
## 9 September 2005 1422495 2005 Sep 2005
## 10 October 2005 1428227 2005 Oct 2005
The above-presented graph shows extremely pronounced
seasonal patterns, as well as a slight upwards-pointing trend. The
strong annual levels of seasonality are not surprising. As the weather
gets colder, people start heating their houses, and as most of them do
so with natural gas, this spikes the demand around the end of the year.
The high levels of seasonality will allow us to generate a strong
forecast with some of our models.
As could be expected, the naive method could perform better.
A simple eye test of the forecast shows that while the model got close
at the 12-month mark, it missed every other month.
Additionally, the ACF residual plot shows signs of serious
autocorrelation. The residual’s correlation indicates presence of
information which should have been used in computing the forecast, that
has not been used by the model.
# SSR
squared_residuals <- (Naive$.resid)^2
SSR <- sum(squared_residuals,na.rm=TRUE)
# SST
total_sum <- (Data$EnergyConsumption - mean(Data$EnergyConsumption, na.rm=TRUE))
squared_sum <- total_sum^2
SST <- sum(squared_sum,na.rm=TRUE)
#R2
R2 <- 1-(SSR/SST)
R2
## [1] 0.4302152
Finally, I calculated R-squared of the model to assess the
goodness of fit. With only 43% of the variation in natural gas demand
being explained by the naive model, the model is not preforming well
enough to be used by large utility companies.
The addition of the second term in the equation allows for
the forecast to increase or decrease over time, where the change over
time is equal to the change of the data’s values in the past. In simple
terms, it is the same thing as connecting the first and last points of
the graph with a straight line.
Since it appears that the first and last point can be
connected with a line of a small negative slope, the drift model
forecast presents a slight downward trend. Yet, like the traditional
naive forecast, the model only manages to be close to the real values at
the 12-month mark.
# ACF
drift <- Data %>%
model(RW(EnergyConsumption ~ drift())) %>%
augment()
drift %>%
ACF(.innov)%>%
autoplot()
# SSR
squared_residuals2 <- (drift$.resid)^2
SSR2 <- sum(squared_residuals2,na.rm=TRUE)
# SST
total_sum <- (Data$EnergyConsumption - mean(Data$EnergyConsumption, na.rm=TRUE))
squared_sum <- total_sum^2
SST <- sum(squared_sum,na.rm=TRUE)
#R2
R22 <- 1-(SSR2/SST)
R22
## [1] 0.4302256
The ACF and R-squared are highly similar to that of the
Naive forecast. All of the evidence points to the drift model not being
optimal for forecasting the demand for natural gas.
While the equation may seem complicated, knowing that
\(m\) represents the seasonal period
and \(k\) is the number of complete
years passed in the forecast, the equation becomes much more
straightforward
Thanks to the high level of seasonality in the dataset, it immediately becomes clear that the Seasonal Naive model performs better than the previous naive models. An eye test shows that the forecast is quite close to the demand for natural gas figures.
#ACF
SNAIVE <- Data %>%
model(SNAIVE(EnergyConsumption)) %>%
augment()
SNAIVE %>%
ACF(.innov)%>%
autoplot()
# SSR
squared_residuals3 <- (SNAIVE$.resid)^2
SSR3 <- sum(squared_residuals3,na.rm=TRUE)
# SST
total_sum <- (Data$EnergyConsumption - mean(Data$EnergyConsumption, na.rm=TRUE))
squared_sum <- total_sum^2
SST <- sum(squared_sum,na.rm=TRUE)
#R2
R23 <- 1-(SSR3/SST)
R23
## [1] 0.8797814
& nbsp; & nbsp; & nbsp; & nbsp; & nbsp; The
ACF and R-squared improved drastically from previous models. Residual
autocorrelation is no longer such a large issue, and the R-squared has
increased to 87.98%. This means that almost 88% of the variation in
natural gas demand can be explained by previous seasons. While this is a
good result, the model provides no “added value,” so to speak; it only
repeats what has happened in the past. The final model may be able to
help with that.
The final model I will use to forecast the demand for
natural gas is Exponential Triple Smoothing (ETS). Specifically, I will
be using the Holt-Winters Damped Method model. The reason for my choice
is the model’s excellent performance in predicting data with high levels
of seasonality. As this dataset has much seasonality, this model should
be able to squeeze out the last bits of variation that the seasonal
naive forecast could not capture.
The forecast looks even better than the Seasonal Naive
forecast does. Especially in periods of higher volatility, the model
does a fine job of adjusting.
## [1] 0.9435914
The ACF plot and the R-squared reaffirm the eye test
results from the graph. Autocorrelation of residuals is no longer an
issue, and the R-squared has increased to 94.36%. Overall, it is clear
that the Holt-Winters Damped Method model is the best performing.
After investigating four models, we have found that the
Holt-Winters Damped Method model is easily the best performing. With a
94% R-Squared, if I were to make a recommendation, it would be to use
this model to estimate future demand.