STA321: Week #12 Assignment

Data description

The data set used for this analysis is the ausbeer data set. It represents the total quarterly beer production in Australia (in megalitres). The data set contains 211 observations and 2 variables. Summary output of the data set as well as its structure follows. the two variables in the data set are called QD and nd0. QD is a date variable that starts from 1956 to 2008. nd0 is an int, which counts the beer production in Australia for each observation. Each observation represents a quarter of a year. Its min, mean, and max are, respectively, 213, 415, 599.0. There seems to be no missing values.

'data.frame':   211 obs. of  2 variables:
 $ QD : int  1956 1956 1956 1956 1957 1957 1957 1957 1958 1958 ...
 $ nd0: int  284 213 227 308 262 228 236 320 272 233 ...

       QD            nd0       
 Min.   :1956   Min.   :213.0  
 1st Qu.:1969   1st Qu.:378.5  
 Median :1982   Median :423.0  
 Mean   :1982   Mean   :415.0  
 3rd Qu.:1995   3rd Qu.:465.5  
 Max.   :2008   Max.   :599.0

Training and Testing Data

Next the data was split into test and training data sets. There were four training data sets. Each training data set had different size. The sizes were n = 48,64,150,200 The most recent 13 periods are kept for the test data set. However the amount of test observations used for each method-STL and Classic, where different. This is owed to the differences inherent in each method. Specifically , the Classic method of decomposition is short a few observations. Tabular output of these amounts follow.

DataSplit
	AMNTs
Train Set_1	200
Train Set_2	150
Train Set_3	64
Train Set_4	48
Test Set	13
ORG Set	211

Time Series Object

A time series objects was constructed using the ts() function. Each training data set received a ts object. training sets 1 through 4 are respectively aligned to n=200,150,64,48 observations. The frequency was set to 4. For brevity, output of the ts object for n=64 follows as well as a plot the ts object for n=200. Visual inspection of the graph may yield that the time series is additive. Frankly, there seems to be no amplification of the data over time.

     Qtr1 Qtr2 Qtr3 Qtr4
1994  449  381  423  531
1995  426  408  416  520
1996  409  398  398  507
1997  432  398  406  526
1998  428  397  403  517
1999  435  383  424  521
2000  421  402  414  500
2001  451  380  416  492
2002  428  408  406  506
2003  435  380  421  490
2004  435  390  412  454
2005  416  403  408  482

Classic Decomposition

First Classical decomposition was conducted. Each training set was decomposed into there trend, seasonality, and error components, and then plotted. The plots depict these compositions for n=200,150,64,48 respectively.

Classic Decomposition Forecasting

Next Forecasting was conducted. Classical Decomposition does not include the first and last two observations after it decomposes the time series. So each forecast is for 13 periods. Output of the forecast for n=200 follows. The method used in forecasting was the naive.

Forecasting with decomposing - naive
	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
2005 Q3	416.5306	385.5944	447.4668	369.2178	463.8434
2005 Q4	515.9133	472.1629	559.6637	449.0028	582.8237
2006 Q1	448.3342	394.7511	501.9172	366.3860	530.2824
2006 Q2	403.0000	341.1276	464.8724	308.3743	497.6257
2006 Q3	416.5306	347.3552	485.7061	310.7359	522.3253
2006 Q4	515.9133	440.1354	591.6912	400.0210	631.8056
2007 Q1	448.3342	366.4847	530.1837	323.1562	573.5122
2007 Q2	403.0000	315.4992	490.5008	269.1791	536.8209
2007 Q3	416.5306	323.7220	509.3392	274.5921	558.4691
2007 Q4	515.9133	418.0844	613.7421	366.2970	665.5296
2008 Q1	448.3342	345.7304	550.9379	291.4153	605.2531
2008 Q2	403.0000	295.8339	510.1661	239.1035	566.8965
2008 Q3	416.5306	304.9886	528.0727	245.9418	587.1194

Classic Decomposition Accuracy Metrics

Next Accuracy metrics for each of the training data sets were computed and output in a table. The MAPE and MSE depicted allong the top. The observation size for each training set is written across the left side. The MAPE is represented as percentages.

Accuracy Metrics for Classic
	MAPE	MSE
200	4.653308	507.0078
150	4.907893	619.3271
64	4.376375	525.5023
48	4.155283	410.6874

STL Decomposition

Next STL decomposition was conducted. Each training set was decomposed into there trend, seasonality, and error components, and then plotted. The plots depict these compositions for n=200,150,64,48 respectively.

STL Forecasting

Next Forecasting was conducted. Each forecast is for 11 periods. Output of the forecast for n=200 follows. The method used in forecasting was the naive.

Forecasting with decomposing - naive, n=200
	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
2006 Q1	414.8311	384.0250	445.6373	367.7172	461.9451
2006 Q2	369.1828	325.6163	412.7492	302.5536	435.8119
2006 Q3	383.1352	329.7774	436.4930	301.5314	464.7389
2006 Q4	482.0000	420.3877	543.6123	387.7721	576.2279
2007 Q1	414.8311	345.9465	483.7158	309.4812	520.1811
2007 Q2	369.1828	293.7234	444.6421	253.7776	484.5879
2007 Q3	383.1352	301.6298	464.6406	258.4834	507.7870
2007 Q4	482.0000	394.8670	569.1330	348.7416	615.2584
2008 Q1	414.8311	322.4127	507.2496	273.4893	556.1730
2008 Q2	369.1828	271.7652	466.6004	220.1954	518.1701
2008 Q3	383.1352	280.9627	485.3076	226.8759	539.3945

STL Accuracy Metrics

Next Accuracy metrics for each of the training data sets were computed and output in a table. The MAPE and MSE are depicted along the top. The observation size for each training set is written across the left side. The MAPE is represented as percentages.

Accuracy Metrics for STL
	MAPE	MSE
200	3.737894	281.6836
150	5.151534	515.3770
64	5.199174	535.5810
48	3.549672	266.7371

Discussion

Two decomposition methods were explored in this analysis. Those methods were STL and Classical. Each method broke their given time series into trend, seasonality, and error values. However, each method approached that task differently. Ultimately, that difference was realized not only in the values that were produced, but also in the mechanics of this analysis. The data set used was the Counts of beer production in Australia. Each observation represented a measure taken quarterly from 1956 to 2008. There were 211 observations, 2 varaibles: a date and an amount, the highest and lowest beer produced in this time period, respectively, was 599.0ML and 213.0ML. Before decomposition was constructed on the time series, training and testing data sets were derived from the initial data set. Each training data set stopped at the same time period but began at different time periods. The sizes for the training data sets were n=200,150,64,48. The size for the testing data set was n=13. This was done to offset the affects of the classical decomposition method. Once these data sets were constructed forecasting was carried out on them. Naive forecasting was utilized. Each method had different forecasting periods. Specifically, STL used the 11 periods and Classical used 13. Afterwards accuracy metrics were derived using the forecasted and test values. The MAPE and MSE were reported. The MAPE was in percentages. For the STL method the highest MAPE and MSE were for n=64 and the lowest MAPE and MSE were for n=48. So as the observations increased or decreased from n=64, it seems that the MAPE and MSE decreased. For the Classical method the highest MAPE and MSE were for n=150 and the lowest MAPE and MSE were for n=48. So as the observations increased or decreased from n=150, it seems that the MAPE and MSE decreased. The differences in the accuracy metrics between the two methods may stem from different factors. A main factor may be that Classical decomposition uses the moving average to disassemble time series where STL does not. The differences in the accuracy metrics within these two methods may stem from different factors as well. A main factor may be that each observation size may inherently alter the values that the forecast is based on. Since those values may originate from dissimilar parts of the trend line, the seasonality that they produce may not culminate into expected results. Specifically, forecasting on values with dissimilar trends may not yield much information on an optimal training size. At least when forecasting using the naive method.

Conclussion

An optimal value for forecasting using STL and Classical decomposition was not found, nor was an optimal decomposition method either. However, using the MAPE and MSE reported, one might choose n=48 for both STL and Classical decomposition methods using Naive forecasting and choose STL over Classical decomposition, since it minimized both values. Caution may be advised here, however. This may be owed to peculiarities inherent in each method, their choice of forecasting method, and their choice of training size and training data set location.