CUNY624HW3JoelPark

CUNY 624 Homework 3 Chapter 6

By Joel Park

library(fpp2)

## Loading required package: ggplot2

## Loading required package: forecast

## Loading required package: fma

## Loading required package: expsmooth

library(seasonal)

6.2 The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

According to the R documentation, plastics dataset is a dataset of monthly sales of product A for a plastics manufacturer.

# Time series plot for plastics
plastics

##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013

A. Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(plastics)

ggseasonplot(plastics)

ggsubseriesplot(plastics)

gglagplot(plastics)

Acf(plastics)

# Frequency
paste0("Frequency for 'plastics' dataset: ", frequency(plastics))

## [1] "Frequency for 'plastics' dataset: 12"

From our previous homework assignments, simply by visualizing the time series, we could determine that this time series of frequency 12 certainly has a seasonal component with an upward trend-cycle. The seasonal peaks appear to be generally late summer to early fall.

However, in chapter 6, we can start to decompose the time series into its three components. There are many ways to decompose the time series, which I will go over briefly here in this section. As of note, the information was obtained from the Forecasting Textbook in Chapter 6. Please note that \(S_t\) is the seasonal component, \(T_t\) is the trend-cycle component, and \(R_t\) is the remainder component.

\[y_t = S_t + T_t + R_t\]

The additive component typically works when the seasonal variance remains constant over the time series. Otherwise, if the seasonal variance fluctuates throughout the TS, a multiplicative decomposition would likely be a better fit.

\[y_t = S_t * T_t * R_t\]

Technically, via by mathematical transformation, a multiplicative decomposition and additive decomposition can be related by this equation.

\(y_t = S_t + T_t + R_t\) is equivalent to \(\log y_t = \log S_t + \log T_t + \log R_t\)

In order to find these separate components, there are several steps that have to be taken. The details of these steps were taken from Penn State’s 510 Applied Time Series Analysis Course.

“1. The first step is to estimate the trend. Two different approaches could be used for this (with many variations of each).

One approach is to estimate the trend with a smoothing procedure such as moving averages. With this approach no equation is used to describe trend. The second approach is to model the trend with a regression equation.

The second step is to “de-trend” the series. For an additive decomposition, this is done by subtracting the trend estimates from the series. For a multiplicative decomposition, this is done by dividing the series by the trend values.
Next, seasonal factors are estimated using the de-trended series. For monthly data, this entails estimating an effect for each month of the year. For quarterly data, this entails estimating an effect for each quarter. The simplest method for estimating these effects is to average the de-trended values for a specific season. For instance, to get a seasonal effect for January, we average the de-trended values for all Januarys in the series, and so on.

The seasonal effects are usually adjusted so that they average to 0 for an additive decomposition or they average to 1 for a multiplicative decomposition.

The final step is to determine the random (irregular) component.

For the additive model, \(random = series – trend – seasonal\). For the multiplicative model, \(random = \frac{series}{trend*seasonal}\)"

According to Hyndman, while classical decomposition methods have been traditionally used, they are not very much used currently for these reasons:

“1. The estimate of the trend-cycle is unavailable for the first few and last few observations. Consequently, there is also no estimate of the remainder component for the same time periods.

The trend-cycle estimate tends to over-smooth rapid rises and falls in the data.
Classical decomposition methods assume that the seasonal component repeats from year to year. For many series, this is a reasonable assumption, but for some longer series it is not. The classical decomposition methods are unable to capture these seasonal changes over time.
Occasionally, the values of the time series in a small number of periods may be particularly unusual. For example, the monthly air passenger traffic may be affected by an industrial dispute, making the traffic during the dispute different from usual. The classical method is not robust to these kinds of unusual values."

As a result, other methods were invented over time to help overcome these problems. The first one we will talk about is the “X11 Decomposition”. According to Hyndman, X11 Decomposition:

“This method is based on classical decomposition, but includes many extra steps and features in order to overcome the drawbacks of classical decomposition that were discussed in the previous section. In particular, trend-cycle estimates are available for all observations including the end points, and the seasonal component is allowed to vary slowly over time. X11 also has some sophisticated methods for handling trading day variation, holiday effects and the effects of known predictors. It handles both additive and multiplicative decomposition. The process is entirely automatic and tends to be highly robust to outliers and level shifts in the time series.

SEATS is another decomposition that stands for “Seasonal Extraction in ARIMA Time Series” The procedure works only with quarterly and monthly data. So seasonality of other kinds, such as daily data, or hourly data, or weekly data, require an alternative approach.

Lastly, ‘STL Decomposition’ is a versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating nonlinear relationships.

STL has several advantages over the classical, SEATS and X11 decomposition methods. Unlike SEATS and X11, STL will handle any type of seasonality, not only monthly and quarterly data. The seasonal component is allowed to change over time, and the rate of change can be controlled by the user. The smoothness of the trend-cycle can also be controlled by the user.

It can be robust to outliers (i.e., the user can specify a robust decomposition), so that occasional unusual observations will not affect the estimates of the trend-cycle and seasonal components. They will, however, affect the remainder component. On the other hand, STL has some disadvantages. In particular, it does not handle trading day or calendar variation automatically, and it only provides facilities for additive decompositions."

For this particular question, I will use the STL decomposition to get a better understanding of the seasonality and trend-cycle for the above dataset plastics.

plastics %>%
  stl(t.window=13, s.window="periodic", robust=TRUE) %>%
  autoplot()

Again, the decomposition demonstrates a seasonality component with an upwards trend (though it does appear to downtrend slightly towards the end of the time series). In addition, the remainder component towards the end of the graph appears to be taller than the earlier part of the series. This could warrant a further investigation to further look and see if there are patterns that we are not taking into account for.

B. Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

As we note from the details above, let us perform the classical multiplicative decomposition.

Reference: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/decompose.html

multdecompose <- decompose(plastics, type=c("multiplicative"), filter=NULL)
multdecompose

## $x
##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
## 
## $seasonal
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 2 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 3 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 4 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 5 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
##         Aug       Sep       Oct       Nov       Dec
## 1 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 2 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 3 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 4 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 5 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 
## $trend
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1        NA        NA        NA        NA        NA        NA  976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500        NA
##         Aug       Sep       Oct       Nov       Dec
## 1  977.0417  977.0833  978.4167  982.7083  990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5        NA        NA        NA        NA        NA
## 
## $random
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1        NA        NA        NA        NA        NA        NA 1.0247887
## 2 0.9656005 0.9745267 0.9750081 0.9894824 1.0061175 1.0024895 1.0401641
## 3 1.0454117 0.9953920 1.0079773 1.0142083 0.9990100 0.9854384 0.9567618
## 4 1.0257400 0.9924762 0.9807020 0.9798704 0.9684851 0.9627557 0.9917766
## 5 0.9767392 1.0510964 1.0498039 1.0299302 1.0398787 1.0628077        NA
##         Aug       Sep       Oct       Nov       Dec
## 1 1.0157335 1.0040354 0.9724119 0.9961368 0.9489762
## 2 1.0230774 1.0040674 0.9962088 0.9735577 0.9721203
## 3 0.9969907 1.0132932 1.0314752 0.9910657 1.0258002
## 4 0.9776897 0.9920952 1.0133954 1.0527311 1.0665946
## 5        NA        NA        NA        NA        NA
## 
## $figure
##  [1] 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
##  [8] 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 
## $type
## [1] "multiplicative"
## 
## attr(,"class")
## [1] "decomposed.ts"

autoplot(multdecompose)

The above demonstrates the multiplicative decomposition. Overall, it demonstrates similar information as the STL decomposition as performed in the previous question (there is an upward trend-cycle with seasonality); however, there are some slight variations in the details of these plots between the two, particularly with the trend and remainder components.

C. Do the results support the graphical interpretation from part A?

Given from what we have discussed above, I agree that the results support the graphical interpretations from part A.

D. Compute and plot the seasonally adjusted data.

Reference: https://anomaly.io/seasonally-adjustement-in-r/

" Why Seasonally Adjust Data ? Many industries experience fluctuations in various metrics based on the time of year. This means that it is not possible to effectively assess performance by comparing data from one time of year to data from another. Furthermore, these seasonal fluctuations can sometimes be so large that they can mask important business trends hiding in the data."

In the Hyndman Forecasting Textbook, in chapter 6.1, “if the seasonal component is removed from the original data, the resulting values are the “seasonally adjusted” data. For an additive decomposition, the seasonally adjusted data are given by \(y_t - S_t\), and for multiplicative data, the seasonally adjusted values are obtained using \(\frac{y_t}{S_t}\). Seasonally adjusted series contain the remainder component as well as the trend-cycle. Therefore, they are not “smooth”, and “downturns” or “upturns” can be misleading. If the purpose is to look for turning points in a series, and interpret any changes in direction, then it is better to use the trend-cycle component rather than the seasonally adjusted data.”

Let us compute and plot the seasonally adjusted data. (Assuming additive time series. Interestingly, if you try to use the multiplicative decomposition and try to plot the seasonally adjusted data, it does a poor job adjusting for the seasons. However, when you assume additive time series, which is a reasonable assumption given the overall near constant nature of the seasonal variance, the trend-cycle becomes more obvious when you plot the time series.)

adddecompose <- decompose(plastics, type=c("additive"), filter=NULL)
adjust_plastics <- plastics - adddecompose$seasonal
autoplot(adjust_plastics) + ggtitle("Seasonally Adjusted Time Series - Plastics")

E. Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

Let R randomly choose a variable in the plastics dataset and assign it to value 500.

# Create a copy of the plastics data
plastics_copy <- plastics

set.seed(123)
data_vector <- c(1:length(plastics))
random_var <- sample(data_vector, size=1)
plastics_copy[random_var] <- 500

Now that we have reassigned a random variable to value 500 (with seed 123, observation point 18 has been randomly chosen). Let us recompute the seasonally adjusted data.

adddecompose <- decompose(plastics_copy, type=c("additive"), filter=NULL)
adjust_plastics <- plastics - adddecompose$seasonal
autoplot(adjust_plastics) + ggtitle("Seasonally Adjusted Time Series with Outlier at Point 18- Plastics")

From visualization, the overall trend appears to be preserved (increasing with a decrease towards the end of the time series), but we can see that the remainder component appears to have changed.

F. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Let us assign plastics (at point 60) to 500.

plastics_copy <- plastics
plastics_copy[60] <- 500

adddecompose <- decompose(plastics_copy, type=c("additive"), filter=NULL)
adjust_plastics <- plastics - adddecompose$seasonal
autoplot(adjust_plastics) + ggtitle("Seasonally Adjusted Time Series with Outlier at Point 60- Plastics")

If the outlier is at the end or beginning of the time series, it does not appear to make a (noticeable) impact on the seasonally adjusted time series. This may be because additive decomposition does not compute for the trend-cycle at the begining and at the end, and therefore, cannot calculate the remainder. (This is the limitation of the additive decomposition). As a result, no changes were going to be made if an outlier was placed at the very beginning or end.

6.3 Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Load in the retail time series.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)

# Category: Turnover ;  Western Australia ;  Furniture, floor coverings, houseware and textile goods retailing - this was from my previous two homework assignments.
myts <- ts(retaildata[,"A3349661X"],
           frequency=12, start=c(1982,4))

# Print out myts
myts

##        Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov
## 1982                    19.2  21.9  19.9  19.3  19.6  19.9  18.0  19.0
## 1983  16.6  16.7  18.0  16.4  19.6  18.4  18.4  19.4  19.6  17.8  18.8
## 1984  19.7  20.1  20.9  16.2  21.0  18.8  17.5  19.4  18.3  22.8  22.4
## 1985  20.6  18.5  21.4  21.1  27.2  22.2  25.0  26.5  24.1  29.3  28.2
## 1986  28.0  25.8  28.4  27.7  29.7  26.5  29.0  27.9  28.7  31.1  29.6
## 1987  25.6  24.2  24.8  25.2  30.5  30.0  34.4  33.3  29.4  30.6  30.7
## 1988  26.3  24.8  26.3  27.0  32.0  31.8  33.1  33.5  33.3  35.8  37.7
## 1989  33.0  31.5  28.1  28.2  30.4  28.1  28.2  28.2  28.8  28.7  29.8
## 1990  28.9  27.4  28.9  26.5  29.6  27.9  29.4  26.6  27.4  30.3  30.9
## 1991  26.2  25.7  25.5  27.6  29.4  30.0  33.0  31.3  31.7  38.2  39.2
## 1992  31.3  30.8  32.6  32.5  33.3  36.3  37.9  36.5  34.2  41.2  38.7
## 1993  37.4  32.6  38.3  36.6  38.0  39.1  41.6  38.0  39.2  39.8  42.3
## 1994  38.4  38.5  40.0  40.0  43.2  42.2  40.7  39.3  44.0  48.7  52.0
## 1995  43.8  40.9  45.0  40.7  45.8  46.8  46.7  50.7  51.9  58.1  56.8
## 1996  44.0  42.3  41.0  43.0  43.6  40.5  48.7  54.7  46.3  57.9  54.7
## 1997  54.1  48.5  47.6  56.3  56.1  56.0  51.8  48.6  49.4  57.9  60.2
## 1998  65.3  53.5  57.6  59.0  66.3  68.6  58.5  55.1  55.8  67.9  63.2
## 1999  63.2  58.1  66.7  57.4  64.3  62.9  70.1  62.6  64.0  75.0  80.7
## 2000  68.1  68.2  70.7  54.7  57.7  77.6  57.1  57.2  58.2  60.7  68.6
## 2001  59.6  49.7  58.3  55.9  57.3  65.4  63.8  59.3  55.6  67.1  64.2
## 2002  65.0  55.2  58.4  55.7  62.1  65.1  57.5  59.3  57.5  71.8  73.6
## 2003  64.7  55.0  57.4  60.3  71.1  67.9  78.1  75.1  73.0  82.4  81.0
## 2004  76.3  63.1  70.2  73.4  73.8  90.9  94.1  83.7  88.3 101.8  96.7
## 2005  91.0  86.8  90.6  94.0  99.0  94.7 106.4 103.9  94.9 106.8 107.8
## 2006  96.5  92.3  98.3  97.3 106.6 103.4 116.5 113.6 118.3 117.3 120.4
## 2007 104.3  88.7  99.5  93.1 103.0 109.3  94.0  96.7  99.9  93.0  95.8
## 2008  87.7  79.0  82.3  81.4  84.1  81.9  84.4  80.8  75.5  82.2  81.6
## 2009  88.2  76.7  79.6  94.9 102.3 109.9 117.7 113.5 117.0 125.6 127.3
## 2010 115.4 104.1 102.8  98.7 105.5 119.2 115.9 109.6 112.2 134.7 134.9
## 2011 118.8 113.9 120.8 126.2 134.9 154.0 152.6 149.6 154.8 162.2 179.7
## 2012 138.3 141.8 146.3 129.3 151.3 159.3 146.1 149.9 165.6 151.4 155.9
## 2013 152.1 131.0 142.1 126.9 148.6 154.1 155.3 162.5 159.3 169.4 169.5
##        Dec
## 1982  23.0
## 1983  22.4
## 1984  23.5
## 1985  29.6
## 1986  34.1
## 1987  34.4
## 1988  38.5
## 1989  33.1
## 1990  30.6
## 1991  35.6
## 1992  42.6
## 1993  45.6
## 1994  54.3
## 1995  64.8
## 1996  53.0
## 1997  60.0
## 1998  71.5
## 1999  80.3
## 2000  71.3
## 2001  64.8
## 2002  66.2
## 2003  79.9
## 2004 102.0
## 2005 117.8
## 2006 126.3
## 2007 102.1
## 2008 100.9
## 2009 141.7
## 2010 149.8
## 2011 179.7
## 2012 155.5
## 2013 175.4

This is a monthly dataset looking at the retail sales for rurniture, floor coverings, houseware and textile goods retailing in Australia. (Monthly data can be used in X11 Decomposition.)

autoplot(myts) + ggtitle("Turnover ;  Western Australia ;  Furniture, floor coverings, houseware and textile goods retailing")

This is the overall visualization of the raw data from the time series.

Let us perform an “X11 Decomposition”.

# Using the seasonal library
myts %>% seas(x11="") -> fit
autoplot(fit) +
  ggtitle("X11 decomposition of Furniture, floor coverings, houseware and textile goods retailing")

While I was not surprised about the upward trend, what I did find more obvious is the changing seasonal variance throughout the time series. It’s more pronounced here in the X11 decomposition that in prior homework assignments. Furthermore, around the year 2000 - 2001, there appears to be a significant remainder that is quite pronounced. This coincides with the smaller seasonal variance. It is unclear what the significance of the remainder is, and perhaps there was an “event” that occurred during 2000-2001 that led to this finding.