1 Introduction


      Natural gas currently accounts for 43.1% of all energy produced in the United States. In particular, it is used for heating homes during winter season. With a market of that size, companies that supply natural gas have to be able to predict the demand for their good so they can effectively get it to homes and avoid shortages while also managing to maintain profits. To do so, they use advanced statistical models, which aid in forecasting future demand for natural gas.
      This paper will demonstrate some of the forecast methods, compare their performance, and make a suggestion on which of these models performs best and should, therefore, be used when attempting to predict demand for natural gas.
      In particular, I will explore the Naive, Drift, Seasonal Naive, and ETS models, using ACF plots of their residuals as well as R-squared to evaluate their performance.

2 Data


      The dataset I will use for my analysis is an EIA dataset, providing me with the total amount of natural gas consumed every month in MMcf (Million of cubic feet). The original dataset ranges from January 2001 to December 2023, however, as per the official guidelines of this project, I have restricted the dataset only to contain information from 2005 to 2009 or five total years. Luckily, the data is pre-cleaned, so other than turning the dataset into a tsibble object, this project does not require extensive cleaning or data manipulation.

## # A tsibble: 10 x 4 [1M]
##    Date           EnergyConsumption     TIME  YEAR
##    <chr>                      <dbl>    <mth> <dbl>
##  1 January 2005             2561858 2005 Jan  2005
##  2 February 2005            2242986 2005 Feb  2005
##  3 March 2005               2205787 2005 Mar  2005
##  4 April 2005               1724877 2005 Apr  2005
##  5 May 2005                 1522613 2005 May  2005
##  6 June 2005                1534122 2005 Jun  2005
##  7 July 2005                1686609 2005 Jul  2005
##  8 August 2005              1695102 2005 Aug  2005
##  9 September 2005           1422495 2005 Sep  2005
## 10 October 2005             1428227 2005 Oct  2005


      The above-presented graph shows extremely pronounced seasonal patterns, as well as a slight upwards-pointing trend. The strong annual levels of seasonality are not surprising. As the weather gets colder, people start heating their houses, and as most of them do so with natural gas, this spikes the demand around the end of the year. The high levels of seasonality will allow us to generate a strong forecast with some of our models.

3 Naive Model


      The naive forecast reiterates the value of the last observation over and over into the future, creating a straight line from the last observed point until the end of the forecast.
      Mathmatically, this can be expressed as:

\(y^{HAT}_{T+h|T}\) = \(y_{T}\)


      As could be expected, the naive method could perform better. A simple eye test of the forecast shows that while the model got close at the 12-month mark, it missed every other month.

      Additionally, the ACF residual plot shows signs of serious autocorrelation. The residual’s correlation indicates presence of information which should have been used in computing the forecast, that has not been used by the model.

# SSR

squared_residuals <- (Naive$.resid)^2

SSR <- sum(squared_residuals,na.rm=TRUE)

# SST

total_sum <- (Data$EnergyConsumption - mean(Data$EnergyConsumption, na.rm=TRUE))

squared_sum <- total_sum^2

SST <- sum(squared_sum,na.rm=TRUE)

#R2

R2 <- 1-(SSR/SST)

R2
## [1] 0.4302152


      Finally, I calculated R-squared of the model to assess the goodness of fit. With only 43% of the variation in natural gas demand being explained by the naive model, the model is not preforming well enough to be used by large utility companies.

4 Drift Model


     The drift model, or the drift method can be mathematically expressed such that:
\(y^{HAT}_{T+h|T}\) = \(y_{T} + h(y_{T}-y_{1}/T-1)\)


      The addition of the second term in the equation allows for the forecast to increase or decrease over time, where the change over time is equal to the change of the data’s values in the past. In simple terms, it is the same thing as connecting the first and last points of the graph with a straight line.

      Since it appears that the first and last point can be connected with a line of a small negative slope, the drift model forecast presents a slight downward trend. Yet, like the traditional naive forecast, the model only manages to be close to the real values at the 12-month mark.

# ACF

drift <- Data %>%
  model(RW(EnergyConsumption ~ drift())) %>%
  augment()

drift %>%
  ACF(.innov)%>%
  autoplot()

# SSR

squared_residuals2 <- (drift$.resid)^2

SSR2 <- sum(squared_residuals2,na.rm=TRUE)

# SST

total_sum <- (Data$EnergyConsumption - mean(Data$EnergyConsumption, na.rm=TRUE))

squared_sum <- total_sum^2

SST <- sum(squared_sum,na.rm=TRUE)

#R2

R22 <- 1-(SSR2/SST)

R22
## [1] 0.4302256


      The ACF and R-squared are highly similar to that of the Naive forecast. All of the evidence points to the drift model not being optimal for forecasting the demand for natural gas.

5 Seasonal Naive


      The seasonal naive method creates a forecast which equals the value of our variable in question during the same time last season. This means if the value in January is equal to \(\alpha\), our forecast for January of next year will also be equal to \(\alpha\).
      The mathematical expression of the seasonal naive model is as follows:
\(y^{HAT}_{T+h|T}\) = \(y_{T+h-m(k+1)}\)


      While the equation may seem complicated, knowing that \(m\) represents the seasonal period and \(k\) is the number of complete years passed in the forecast, the equation becomes much more straightforward


      Thanks to the high level of seasonality in the dataset, it immediately becomes clear that the Seasonal Naive model performs better than the previous naive models. An eye test shows that the forecast is quite close to the demand for natural gas figures.


#ACF

SNAIVE <- Data %>%
  model(SNAIVE(EnergyConsumption)) %>%
  augment()

SNAIVE %>%
  ACF(.innov)%>%
  autoplot()

# SSR

squared_residuals3 <- (SNAIVE$.resid)^2

SSR3 <- sum(squared_residuals3,na.rm=TRUE)

# SST

total_sum <- (Data$EnergyConsumption - mean(Data$EnergyConsumption, na.rm=TRUE))

squared_sum <- total_sum^2

SST <- sum(squared_sum,na.rm=TRUE)

#R2

R23 <- 1-(SSR3/SST)

R23
## [1] 0.8797814


& nbsp; & nbsp; & nbsp; & nbsp; & nbsp; The ACF and R-squared improved drastically from previous models. Residual autocorrelation is no longer such a large issue, and the R-squared has increased to 87.98%. This means that almost 88% of the variation in natural gas demand can be explained by previous seasons. While this is a good result, the model provides no “added value,” so to speak; it only repeats what has happened in the past. The final model may be able to help with that.

6 ETS Model


     The final model I will use to forecast the demand for natural gas is Exponential Triple Smoothing (ETS). Specifically, I will be using the Holt-Winters Damped Method model. The reason for my choice is the model’s excellent performance in predicting data with high levels of seasonality. As this dataset has much seasonality, this model should be able to squeeze out the last bits of variation that the seasonal naive forecast could not capture.


      The forecast looks even better than the Seasonal Naive forecast does. Especially in periods of higher volatility, the model does a fine job of adjusting.

## [1] 0.9435914


      The ACF plot and the R-squared reaffirm the eye test results from the graph. Autocorrelation of residuals is no longer an issue, and the R-squared has increased to 94.36%. Overall, it is clear that the Holt-Winters Damped Method model is the best performing.

7 Conclusion


      After investigating four models, we have found that the Holt-Winters Damped Method model is easily the best performing. With a 94% R-Squared, if I were to make a recommendation, it would be to use this model to estimate future demand.