The data set which will be used for this analysis is a time series of the monthly price of natural gas from January 1997 to August 2020, collected by the U.S. Energy Information Administration. It is publicly available for free download on the datahub.io website. For the purposes of this analysis, the full data set will be reduced to only the 150 most recent observations, from March 2008 to August 2020.
nat.gas.full <- read.csv("https://datahub.io/core/natural-gas/r/monthly.csv")
n.row = dim(nat.gas.full)[1]
nat.gas = nat.gas.full[(n.row-149):n.row, ]
The purpose of this analysis is to compare the results of the decomposition of a time series using classical decomposition and the STL method, and to evaluate the accuracy of forecasting future values using the STL method.
First, the time series object based on the data set must be created on which to perform decomposition and forecasting. It will also be plotted to get a general picture of the pattern of the data.
nat.gas.ts = ts(nat.gas[,2], frequency = 12, start = c(2008, 1))
par(mar=c(2,2,2,2))
plot(nat.gas.ts, main="Natural Gas Prices Between March 2008 & August 2020", ylab="Monthly Price (in nominal dollars)", xlab="")
With the time series object created, it can now be decomposed into its components parts using two different methods: classical decomposition and the STL method. Each decomposition can then be plotted to visually compare how well each method captures the patterns of the original time series.
cls.decomp = decompose(nat.gas.ts)
par(mar=c(2,2,2,2))
plot(cls.decomp, xlab="")
Classical Decomposition Method
stl.decomp=stl(nat.gas.ts, s.window = 12)
par(mar=c(2,2,2,2))
plot(stl.decomp)
STL Method
Based on the plots of the two decomposition methods, the STL method appears to do a better job than the classical method of capturing the pattern of the original time series. Therefore the STL method will be used to forecast future natural gas prices and evaluated for accuracy.
In order to forecast future values and evaluate the accuracy of the forecasting, the data set will be split into training and testing sets. Multiple training sets of different sample sizes (143, 107, 71, & 47 observations, respectively) will be used to construct separate time series, each of which will be used to conduct STL decomposition and forecasting of future values. In each case, quantitative accuracy measures will be calculated to determine best sample size of the four from which to make forecasts. The last 7 observations of data (i.e., the most recent 7 months) will be used as the testing set across all cases to evaluate forecasting accuracy.
library(forecast)
ini.data = nat.gas[,2]
n0 = length(ini.data)
##
train.data01 = nat.gas[1:(n0-7), 2]
train.data02 = nat.gas[37:(n0-7), 2]
train.data03 = nat.gas[73:(n0-7), 2]
train.data04 = nat.gas[97:(n0-7), 2]
## last 7 observations
test.data = nat.gas[(n0-6):n0,2]
##
train01.ts = ts(train.data01, frequency = 12, start = c(2008, 1))
train02.ts = ts(train.data02, frequency = 12, start = c(2011, 1))
train03.ts = ts(train.data03, frequency = 12, start = c(2014, 1))
train04.ts = ts(train.data04, frequency = 12, start = c(2016, 1))
stl01 = stl(train01.ts, s.window = 12)
stl02 = stl(train02.ts, s.window = 12)
stl03 = stl(train03.ts, s.window = 12)
stl04 = stl(train04.ts, s.window = 12)
## Forecast with decomposing
fcst01 = forecast(stl01,h=7, method="naive")
fcst02 = forecast(stl02,h=7, method="naive")
fcst03 = forecast(stl03,h=7, method="naive")
fcst04 = forecast(stl04,h=7, method="naive")
PE01=(test.data-fcst01$mean)/fcst01$mean
PE02=(test.data-fcst02$mean)/fcst02$mean
PE03=(test.data-fcst03$mean)/fcst03$mean
PE04=(test.data-fcst04$mean)/fcst04$mean
###
MAPE1 = mean(abs(PE01))
MAPE2 = mean(abs(PE02))
MAPE3 = mean(abs(PE03))
MAPE4 = mean(abs(PE04))
###
E1=test.data-fcst01$mean
E2=test.data-fcst02$mean
E3=test.data-fcst03$mean
E4=test.data-fcst04$mean
##
MSE1=mean(E1^2)
MSE2=mean(E2^2)
MSE3=mean(E3^2)
MSE4=mean(E4^2)
###
MSE=c(MSE1, MSE2, MSE3, MSE4)
MAPE=c(MAPE1, MAPE2, MAPE3, MAPE4)
accuracy=cbind(MSE=MSE, MAPE=MAPE)
row.names(accuracy)=c("n.143", "n.107", "n. 71", "n. 47")
kable(accuracy, caption="Accuracy Measures for Forecasts Using Different Sample Sizes")
| MSE | MAPE | |
|---|---|---|
| n.143 | 0.0499262 | 0.0836555 |
| n.107 | 0.0523423 | 0.0865743 |
| n. 71 | 0.0756898 | 0.1278248 |
| n. 47 | 0.1410745 | 0.2073388 |
Of the four sample sizes used to forecast natural gas prices for future months, the most accurate (based on both MSE & MAPE) was the largest training set of 143 months. As the sample size decreased, the error increased noticeably.
plot(1:4, MSE, type="b", col="forestgreen", ylab="Error", xlab="",
ylim=c(0,.24),xlim = c(0.5,4.5), main="Error Curves", axes=FALSE)
labs=c("n=143", "n=107", "n=71", "n=47")
axis(1, at=1:4, label=labs, pos=0)
axis(2)
lines(1:4, MAPE, type="b", col="royalblue2")
text(1:4, MAPE+0.03, as.character(round(MAPE,4)), col="royalblue2", cex=0.7)
text(1:4, MSE-0.03, as.character(round(MSE,4)), col="forestgreen", cex=0.7)
legend("topleft", c("MSE", "MAPE"), col=c("forestgreen","royalblue2"), lty=1, bty="n", cex=0.7)
Comparing Forecast Errors
Based on this plot of the forecast accuracy measures for each of the different sample sizes, not only did the forecasting error increase as the sample size decreased, but this increase appears to be potentially exponential in nature.