Question HA 6.2
Question
The plastics
data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.
- Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?
- Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.
- Do the results support the graphical interpretation from part a?
- Compute and plot the seasonally adjusted data.
- Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
- Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Answers
Part a
The graphs shows clear seasonality. The winter months have low values, which climb in the spring and summer, and then fall during fall. There is also a long-term increasing trend in the sales figures. There is too little data to determine if this increasing trend is part of a larger cycle. We would need more data to determine the presence of any cycling behavior.
For the remaing parts we will calculate values using the same methodology as the text.
Part b
This data has monthly frequency, so \(m = 12\) and \(\hat{T}_n\) should be calculated as a 2x12-period moving average.
Classic multiplicative decomposition detrends by dividing out the MA.
Classical decomposition adjusts for seasonality by calculating \(\hat{S}_n\), the average value per cycle—here monthly—and then subtracting it for additive, or dividing it out for multiplicative, from the detrended series.
Sn <- double(12)
for (i in seq_len(12)) {
Sn[i] <- mean(detrend_plastic[cycle(detrend_plastic) == i],
na.rm = TRUE)
}
The raw values need to be adjusted so that their sum equals \(m\), which is 12. Again, this is from the textbook.
The random component of the times series, \(\hat{R}_n\) is calculated in classic multiplicative decomposition as the quotient of the raw values with the product of the trend-cycle and seasonal components.
We can check these results by comparing them to the output of the decompose
function. This should output a simple True/False for whether the values are equal.
## [1] TRUE
## [1] TRUE
## [1] TRUE
And we will output the components below: \(\hat{S}_n, \hat{T}_n\), and \(\hat{R}_n\).
## [1] 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## [8] 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500 NA
## Aug Sep Oct Nov Dec
## 1 977.0417 977.0833 978.4167 982.7083 990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5 NA NA NA NA NA
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 1.0247887
## 2 0.9656005 0.9745267 0.9750081 0.9894824 1.0061175 1.0024895 1.0401641
## 3 1.0454117 0.9953920 1.0079773 1.0142083 0.9990100 0.9854384 0.9567618
## 4 1.0257400 0.9924762 0.9807020 0.9798704 0.9684851 0.9627557 0.9917766
## 5 0.9767392 1.0510964 1.0498039 1.0299302 1.0398787 1.0628077 NA
## Aug Sep Oct Nov Dec
## 1 1.0157335 1.0040354 0.9724119 0.9961368 0.9489762
## 2 1.0230774 1.0040674 0.9962088 0.9735577 0.9721203
## 3 0.9969907 1.0132932 1.0314752 0.9910657 1.0258002
## 4 0.9776897 0.9920952 1.0133954 1.0527311 1.0665946
## 5 NA NA NA NA NA
Part c.
There is a very clear seasonality shown in \(\hat{S}_n\). Additionally, there is also a clear increasing trend shown in \(\hat{T}_n\). The trend component shows a peak around February and March of year five. This suggests that instead of a pure upward trend there may be a longer-term cycle. This does suggest that the classical decomposition supports the graphical interpretation from above as we have clear, non-zero values for each component in the data.
Part d.
The seasonally-adjusted data is \(\hat{T}_n\hat{R}_n\) which is plotted below.
Part e.
We will rework the above code into one chunk for this question and add an outlier.
plastics_2 <- plastics
plastics_2[[26]] <- plastics_2[[26]] + 500
Tn2 <- ma(plastics_2, order = 12, centre = TRUE)
detrend_plastics_2 <- plastics_2 / Tn2
Sn2 <- double(12)
for (i in seq_len(12)) {
Sn2[i] <- mean(detrend_plastics_2[cycle(detrend_plastics_2) == i],
na.rm = TRUE)
}
Sn2 <- 12 * Sn2 / sum(Sn2)
plot(Sn, type = 'l')
lines(Sn2, col = 'blue')
Adding of an outlier to February in the middle of the time series gives a bump to that period in the seasonality index and does seem to have an effect on the moving average. The overall shape and scale of the decompositions are similar, suggesting some level of tolerance to a single outlier.
Part f.
plastics_3 <- plastics
plastics_3[[2]] <- plastics_2[[2]] + 500
Tn3 <- ma(plastics, order = 12, centre = TRUE)
detrend_plastics_3 <- plastics_3 / Tn3
Sn3 <- double(12)
for (i in seq_len(12)) {
Sn3[i] <- mean(detrend_plastics_3[cycle(detrend_plastics_3) == i],
na.rm = TRUE)
}
Sn3 <- 12 * Sn3 / sum(Sn3)
plot(Sn, type = 'l')
lines(Sn3, col = 'purple')
Here, the outlier was added early in the dataset. In this case it had no effect at all, as it outside the acceptable window for the moving average. Here the downside of a moving average actually plays to our advantage as the outlier is beyond the calculatable window.
Question 6.3 HA
Question
Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?
Answer
First, reload the data again for this week.
library(httr)
url <- "https://otexts.com/fpp2/extrafiles/retail.xlsx"
GET(url, write_disk("retail.xlsx", overwrite=TRUE))
## Response [https://otexts.com/fpp2/extrafiles/retail.xlsx]
## Date: 2021-02-23 17:55
## Status: 200
## Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
## Size: 639 kB
## <ON DISK> /Users/jeffshamp/Documents/GitHub/cuny_msds/DATA_624/retail.xlsx
retail<- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retail[, "A3349721R"], frequency = 12, start = c(1982, 1))
And we will remind ourselves of what this looks like. Also, if we look back to HW1, we are reminded that this data is monthly, which is a use case for X11. Also this time series has a clear seasonal component that is increasing (seemingly) slowly over time, which is also a prime use case for X11. This seems like a good method-to-data fit.
Now let us use X11 Decomposition on this dataset.
library(seasonal)
fit_myts<-
myts %>%
seas(x11="")
autoplot(fit_myts) +
ggtitle("X11 Decomposition of Monthly Aussie Retail Data")
The X11 decomposition seems to suggest that the year previous to 2005 were less (for lack of a better word) “predictable” that after 2005. The remainder for the years after 2005 appear to be smaller, which suggests the trend and seasonality measures can better capture the data. There do not appear to be any major outliers in the data, but there are some “bumps” around 2003 and 2008 that are otherwise very hard to detect from the autoplot. Additionally, the seasonality appears to be less consistent early-on in the data (hence the more wild remainders). These are much easier to see on the X11 decomposition.
Below, we show a multiplicative decomp for comparison to X11. The classic decomposition definitely misses the nuance in seasonality and smoothes over some of the bumps in the early 2000s, though they are still present.