CUNY DATA624 Homework 3

Question 6.2 The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.
Question 6.3 Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(fpp2)
library(seasonal)

Question 6.2 The `plastics` data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

a. Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(plastics) + 
  ggtitle("Sales of plastic product") +
  ylab("Sales") +
  xlab("Year")

From the autoplot, we an immediately tell there’s both an upward trend and clear seasonality where sales peak around the middle to end of the summer. No cyclic behavior appears to be present.

require(gridExtra)

## Loading required package: gridExtra

grid.arrange(ggseasonplot(plastics),ggsubseriesplot(plastics),nrow=2)

We can can confirm clear seasonality from the above plots, but also noticeable is the dropoff in year 5 during the fall months. Could be the start of a cycle, but we would need more data over a longer time frame to know.

grid.arrange(gglagplot(plastics),ggAcf(plastics),ncol=2)

Lags 1 and 12 appear to confirm that there is clear seasonality and follows an obvious annual pattern with sales being low around the colder months and the opposite in the summer months.

b. Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

plastics %>% decompose(type="multiplicative") %>%
  autoplot() + xlab("Year") +
  ggtitle("Classical multiplicative decomposition
    of sales of plastic product")

c. Do the results support the graphical interpretation from part a?

The results of the classic multiplicative decomposition supports the graphical interpretation in A. Additionally, in the “trend” portion, you can even begin to see the downward dip that is starting that was apparent in the ggsubseriesplot.

d. Compute and plot the seasonally adjusted data.

pl_sa <- plastics %>%
  decompose(type = "multiplicative") %>%
  seasadj()
autoplot(plastics, series="Data") +
  autolayer(pl_sa, series="Seasonally Adjusted") +
  xlab("Year") + ylab("Sales") +
  ggtitle("Sales of plastic product") +
  scale_colour_manual(values=c("gray","blue"),
                      breaks=c("Data","Seasonally Adjusted"))

From the seasonal adjustment, we can even see the beginning of the downward trend at the end more prominently.

e. Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

pl_out <- plastics
pl_out[20] <- pl_out[20]+500
pl_out_sa <- pl_out %>%
  decompose(type = "multiplicative") %>%
  seasadj()

autoplot(plastics, series="Orig Data") +
  autolayer(pl_sa, series="Seasonally Adjusted") +
  autolayer(pl_out_sa, series="Seasonally Adjusted w/Outlier") +
  xlab("Year") + ylab("Sales") +
  ggtitle("Sales of plastic product") +
  scale_colour_manual(values=c("gray","blue","red"),
                      breaks=c("Orig Data",
                               "Seasonally Adjusted",
                               "Seasonally Adjusted w/Outlier"))

From the above graph, I added the outlier at month 20 (August of Year 2). It looks like the outlier pulled the calculations upward very slightly, but also introduced a significant dip annually. We can observe the difference of the seasonal adjustment calculations from that of the outlier below. You can see that the outlier mostly created a positive difference in the seasonal adjustment by about 9 to 12, but every August (the incidence of the outlier) had a negative difference of about 75 to 95 sales.

pl_out_sa - pl_sa

##          Jan        Feb        Mar        Apr        May        Jun        Jul
## 1   9.081280   9.272209   9.534035   9.441534   9.500020   9.110139   9.793764
## 2   9.069042   9.312118   9.509463   9.799008  10.136429  10.064770  10.844597
## 3  10.966074  10.549300  10.873223  11.092225  11.104878  10.912416  10.953883
## 4  11.639215  11.453906  11.524388  11.659979  11.750510  11.702456  12.492303
## 5  12.606090  13.728723  13.834180  13.510435  13.539834  13.471814  13.543136
##          Aug        Sep        Oct        Nov        Dec
## 1 -74.808906   9.100944   8.679757   8.650694   8.230045
## 2 294.313571  10.102952   9.946034   9.497054   9.470333
## 3 -88.343412  11.097426  11.150916  10.423596  10.752664
## 4 -95.664349  12.084367  12.279054  12.499406  12.707694
## 5 -98.924935  11.511791  10.897661   9.969234  10.647555

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

My apologies to the color blind, but can’t think of a better contrasting color!

pl_out_end <- plastics
pl_out_end[56] <- pl_out_end[56]+500
pl_out_end_sa <- pl_out_end %>%
  decompose(type = "multiplicative") %>%
  seasadj()

autoplot(plastics, series="Orig Data") +
  autolayer(pl_sa, series="Seasonally Adjusted") +
  autolayer(pl_out_sa, series="Seasonally Adjusted w/Outlier") +
  autolayer(pl_out_end_sa, series="Seasonally Adjusted w/Outlier@End") +
  xlab("Year") + ylab("Sales") +
  ggtitle("Sales of plastic product") +
  scale_colour_manual(values=c("gray","blue","red","green"),
                      breaks=c("Orig Data",
                               "Seasonally Adjusted",
                               "Seasonally Adjusted w/Outlier","Seasonally Adjusted w/Outlier@End"
                               ))

For this case, I added the outlier to the last August of the series. You can see that adding it at the end does appear to make a difference, but it’s hard to see (and getting crowded!), which perhaps implies that the differences are marginal. We can take a closer look at the differences below and in fact, we do notice that the effect on August and overall is not huge. Generally on the seasonal adjustment it looks like if the outlier occurs near the end of the series, the changes to the calculations should be minimal.

pl_out_end_sa - pl_sa

##          Jan        Feb        Mar        Apr        May        Jun        Jul
## 1  -2.660025   1.121901   4.944113   4.792428   4.968545   5.099367  -2.753049
## 2  -2.656440   1.126730   4.931371   4.973878   5.301389   5.633718  -3.048440
## 3  -3.212106   1.276424   5.638583   5.630302   5.807891   6.108185  -3.079161
## 4  -3.409277   1.385878   5.976261   5.918488   6.145559   6.550407  -3.511614
## 5  -3.692487   1.661122   7.174061   6.857761   7.081382   7.540799  -3.807006
##          Aug        Sep        Oct        Nov        Dec
## 1  -2.728955  -2.697641  -2.616240  -2.691827  -2.584502
## 2  -3.027435  -2.994650  -2.997920  -2.955188  -2.973992
## 3  -3.222681  -3.289425  -3.361094  -3.243499  -3.376686
## 4  -3.489741  -3.581967  -3.701136  -3.889427  -3.990629
## 5 403.334155  -3.412248  -3.284758  -3.102116  -3.343678

Question 6.3 Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349398A"],
  frequency=12, start=c(1982,4))

myts %>% seas(x11="") -> myts_sa
autoplot(myts_sa) +
  ggtitle("X11 decomposition of Turnover (food retail)")

The X11 decomposition doesn’t appear to reveal anything new necessarily. In fact, it shows how regular and stable the trend and seasonality are as the remainder component reveals very small numbers. The seasonal adjustment on the X11 decompositons also show virtually nothing irregular with the data. Perhaps my fault for selecting data so boring.

autoplot(myts, series="Data") +
  autolayer(seasadj(myts_sa), series="Seasonally Adjusted") +
  autolayer(trendcycle(myts_sa), series="Trend") +
  xlab("Year") + ylab("Sales") +
  ggtitle("Turnover (food retail)") +
  scale_colour_manual(values=c("gray","blue","red"),
                      breaks=c("Data","Seasonally Adjusted","Trend"))

CUNY DATA624 Homework 3

CUNY DATA624 Homework 3

Question 6.2 The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

a. Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

b. Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

c. Do the results support the graphical interpretation from part a?

d. Compute and plot the seasonally adjusted data.

e. Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Question 6.3 Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Question 6.2 The `plastics` data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.