1 Data Set Description

This data comes from the United States Census Bureau and contains the monthly Sales from “hobby” products in the United States from 1992 to 2023.Some examples of products considered “hobby” include books, sporting goods, and musical instruments. The data is not seasonally adjusted and presented in a scale of millions of dollars. In order to keep the number of observations under 200 for our decomposition analysis, we will limit the range of data used in our analysis to the 189 observations from 2008 to 2023

#Objective

The goal of this analysis is to demonstrate the decomposition of a times series model. We also seek to demonstrate how a decomposition based forecast model can be improved by using the LOESS algorithm through the stl function, and can be further improved by proper choice in training data set size.

2 Defining Time Series Object

Since this is monthly data, frequency =12 will be used the define the time series object

3 Forcasting with Decomposing

Comparing decomposition of the time series using the classical and STL method, there is not any major functional differences in the graphs. The STL method is superior to the classical as the trend graph is smoother, indicating that more of the random variability has been properly removed. This visualization makes clear a notable trend in the hobby industry, during the covid-19 pandemic hobby sales trended upwards at a dramatic rate and stayed elevated even in the time period afterwards. One possible explanation for this trend is that consumers may have increased their consumption of “hobby” products during the isolation of lock down, and kept the habbit after restrictions lifted.

4 Comparing Testing Results for Different Training Sizes

One question we wish to answer is what is the optimal number of observations to use in the training data set when building a time series model. Using too few observations might not give enough information to capture the trend, but older observations might not follow modern trends and add noise and inaccuracy. The following analysis calculates the mean square error and mean absolute percentage error for 4 STL decomposed additive time series models, with the last 7 observations held as the testing data set. the amount of observations in 4 training data sets are 182, 138, 93, and 48. The results show that out of these 4 options the optimal amount of observations in the training data is 93.

Error comparison between forecast results with different sample sizes
MSE MAPE
n.182 71652.23 0.0299069
n.138 62820.15 0.0272024
n. 93 47187.66 0.0207978
n. 48 59605.00 0.0246466