Data description

The data set used for this analysis is the ausbeer data set. It represents the total quarterly beer production in Australia (in megalitres). The data set contains 211 observations and 2 variables. Summary output of the data set as well as its structure follows. the two variables in the data set are called QD and nd0. QD is a date variable that starts from 1956 to 2008. nd0 is an int, which counts the beer production in Australia for each observation. Each observation represents a quarter of a year. Its min, mean, and max are, respectively, 213, 415, 599.0. There seems to be no missing values.

'data.frame':   211 obs. of  2 variables:
 $ QD : int  1956 1956 1956 1956 1957 1957 1957 1957 1958 1958 ...
 $ nd0: int  284 213 227 308 262 228 236 320 272 233 ...
       QD            nd0       
 Min.   :1956   Min.   :213.0  
 1st Qu.:1969   1st Qu.:378.5  
 Median :1982   Median :423.0  
 Mean   :1982   Mean   :415.0  
 3rd Qu.:1995   3rd Qu.:465.5  
 Max.   :2008   Max.   :599.0  

Training and Testing Data

Next the data was split into test and training data sets. There were four training data sets. Each training data set had different size. The sizes were n = 48,64,150,200 The most recent 13 periods are kept for the test data set. However the amount of test observations used for each method-STL and Classic, where different. This is owed to the differences inherent in each method. Specifically , the Classic method of decomposition is short a few observations. Tabular output of these amounts follow.

DataSplit
AMNTs
Train Set_1 200
Train Set_2 150
Train Set_3 64
Train Set_4 48
Test Set 13
ORG Set 211

Time Series Object

A time series objects was constructed using the ts() function. Each training data set received a ts object. training sets 1 through 4 are respectively aligned to n=200,150,64,48 observations. The frequency was set to 4. For brevity, output of the ts object for n=64 follows as well as a plot the ts object for n=200. Visual inspection of the graph may yield that the time series is additive. Frankly, there seems to be no amplification of the data over time.

     Qtr1 Qtr2 Qtr3 Qtr4
1994  449  381  423  531
1995  426  408  416  520
1996  409  398  398  507
1997  432  398  406  526
1998  428  397  403  517
1999  435  383  424  521
2000  421  402  414  500
2001  451  380  416  492
2002  428  408  406  506
2003  435  380  421  490
2004  435  390  412  454
2005  416  403  408  482

Classic Decomposition

First Classical decomposition was conducted. Each training set was decomposed into there trend, seasonality, and error components, and then plotted. The plots depict these compositions for n=200,150,64,48 respectively.

Classic Decomposition Forecasting

Next Forecasting was conducted. Classical Decomposition does not include the first and last two observations after it decomposes the time series. So each forecast is for 13 periods. Output of the forecast for n=200 follows. The method used in forecasting was the naive.

Forecasting with decomposing - naive
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2005 Q3 416.5306 385.5944 447.4668 369.2178 463.8434
2005 Q4 515.9133 472.1629 559.6637 449.0028 582.8237
2006 Q1 448.3342 394.7511 501.9172 366.3860 530.2824
2006 Q2 403.0000 341.1276 464.8724 308.3743 497.6257
2006 Q3 416.5306 347.3552 485.7061 310.7359 522.3253
2006 Q4 515.9133 440.1354 591.6912 400.0210 631.8056
2007 Q1 448.3342 366.4847 530.1837 323.1562 573.5122
2007 Q2 403.0000 315.4992 490.5008 269.1791 536.8209
2007 Q3 416.5306 323.7220 509.3392 274.5921 558.4691
2007 Q4 515.9133 418.0844 613.7421 366.2970 665.5296
2008 Q1 448.3342 345.7304 550.9379 291.4153 605.2531
2008 Q2 403.0000 295.8339 510.1661 239.1035 566.8965
2008 Q3 416.5306 304.9886 528.0727 245.9418 587.1194

Classic Decomposition Accuracy Metrics

Next Accuracy metrics for each of the training data sets were computed and output in a table. The MAPE and MSE depicted allong the top. The observation size for each training set is written across the left side. The MAPE is represented as percentages.

Accuracy Metrics for Classic
MAPE MSE
200 4.653308 507.0078
150 4.907893 619.3271
64 4.376375 525.5023
48 4.155283 410.6874

STL Decomposition

Next STL decomposition was conducted. Each training set was decomposed into there trend, seasonality, and error components, and then plotted. The plots depict these compositions for n=200,150,64,48 respectively.

STL Forecasting

Next Forecasting was conducted. Each forecast is for 11 periods. Output of the forecast for n=200 follows. The method used in forecasting was the naive.

Forecasting with decomposing - naive, n=200
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2006 Q1 414.8311 384.0250 445.6373 367.7172 461.9451
2006 Q2 369.1828 325.6163 412.7492 302.5536 435.8119
2006 Q3 383.1352 329.7774 436.4930 301.5314 464.7389
2006 Q4 482.0000 420.3877 543.6123 387.7721 576.2279
2007 Q1 414.8311 345.9465 483.7158 309.4812 520.1811
2007 Q2 369.1828 293.7234 444.6421 253.7776 484.5879
2007 Q3 383.1352 301.6298 464.6406 258.4834 507.7870
2007 Q4 482.0000 394.8670 569.1330 348.7416 615.2584
2008 Q1 414.8311 322.4127 507.2496 273.4893 556.1730
2008 Q2 369.1828 271.7652 466.6004 220.1954 518.1701
2008 Q3 383.1352 280.9627 485.3076 226.8759 539.3945

STL Accuracy Metrics

Next Accuracy metrics for each of the training data sets were computed and output in a table. The MAPE and MSE are depicted along the top. The observation size for each training set is written across the left side. The MAPE is represented as percentages.

Accuracy Metrics for STL
MAPE MSE
200 3.737894 281.6836
150 5.151534 515.3770
64 5.199174 535.5810
48 3.549672 266.7371

Discussion

Two decomposition methods were explored in this analysis. Those methods were STL and Classical. Each method broke their given time series into trend, seasonality, and error values. However, each method approached that task differently. Ultimately, that difference was realized not only in the values that were produced, but also in the mechanics of this analysis. The data set used was the Counts of beer production in Australia. Each observation represented a measure taken quarterly from 1956 to 2008. There were 211 observations, 2 varaibles: a date and an amount, the highest and lowest beer produced in this time period, respectively, was 599.0ML and 213.0ML. Before decomposition was constructed on the time series, training and testing data sets were derived from the initial data set. Each training data set stopped at the same time period but began at different time periods. The sizes for the training data sets were n=200,150,64,48. The size for the testing data set was n=13. This was done to offset the affects of the classical decomposition method. Once these data sets were constructed forecasting was carried out on them. Naive forecasting was utilized. Each method had different forecasting periods. Specifically, STL used the 11 periods and Classical used 13. Afterwards accuracy metrics were derived using the forecasted and test values. The MAPE and MSE were reported. The MAPE was in percentages. For the STL method the highest MAPE and MSE were for n=64 and the lowest MAPE and MSE were for n=48. So as the observations increased or decreased from n=64, it seems that the MAPE and MSE decreased. For the Classical method the highest MAPE and MSE were for n=150 and the lowest MAPE and MSE were for n=48. So as the observations increased or decreased from n=150, it seems that the MAPE and MSE decreased. The differences in the accuracy metrics between the two methods may stem from different factors. A main factor may be that Classical decomposition uses the moving average to disassemble time series where STL does not. The differences in the accuracy metrics within these two methods may stem from different factors as well. A main factor may be that each observation size may inherently alter the values that the forecast is based on. Since those values may originate from dissimilar parts of the trend line, the seasonality that they produce may not culminate into expected results. Specifically, forecasting on values with dissimilar trends may not yield much information on an optimal training size. At least when forecasting using the naive method.

Conclussion

An optimal value for forecasting using STL and Classical decomposition was not found, nor was an optimal decomposition method either. However, using the MAPE and MSE reported, one might choose n=48 for both STL and Classical decomposition methods using Naive forecasting and choose STL over Classical decomposition, since it minimized both values. Caution may be advised here, however. This may be owed to peculiarities inherent in each method, their choice of forecasting method, and their choice of training size and training data set location.