Hanyue Kuang

2.The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years. 2a.Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

plastics
##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
plot(plastics)

 there is an upward trend and a clear seasonal pattern. 
 

2b.Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

plastics %>% decompose(type="multiplicative") %>% 
  autoplot() + xlab("Year") +
  ggtitle("Classical multiplicative decomposition 
    of plastics")

2c.Do the results support the graphical interpretation from part a?

  yes

2d.Compute and plot the seasonally adjusted data.

plastic.decomp<-decompose(plastics, type="multiplicative")
plastic.seas_adj<-plastic.decomp$x/plastic.decomp$seasonal
plot(plastic.seas_adj)

2e.Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

plastics2<-plastics
plastics2[5]<-plastics[5]+500
plastics2
##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1530 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
plastic.decomp2 <- decompose(plastics2, type="multiplicative")
plastic.seas_adj2 <-plastic.decomp2$x/plastic.decomp$seasonal
plot(plastic.seas_adj2)

 the outlier makes a sharp change

2f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?

plastics3<-plastics
plastics3[25]<-plastics[25]+500
plastics3
##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3 1396  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
plastic.decomp3 <- decompose(plastics3, type="multiplicative")
plastic.seas_adj3 <-plastic.decomp3$x/plastic.decomp$seasonal
plot(plastic.seas_adj3)

plastics4<-plastics
plastics4[55]<-plastics[55]+500
plastics4
##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 2111 1608 1528 1420 1119 1013
plastic.decomp4 <- decompose(plastics4, type="multiplicative")
plastic.seas_adj4 <-plastic.decomp4$x/plastic.decomp$seasonal
plot(plastic.seas_adj4)

If the outlier is at the beginning of the time sequence, it will not largely influence the adjusted seasonality in general.
If the outlier is in the middle or the last, the whole seasonality will shrink and trend feature dim.

 4.a The trend is nice and smoothly upward shaped. The seasonal pattern does not show observable exportional variables.seasonality has minimal effect on the original plot
 4.b Yes. There is a obvious drop of 400 at that period in the plot of remainders.

5 This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 - February 2005). 5a.Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

autoplot(cangas)

ggsubseriesplot(cangas)

ggseasonplot(cangas)

 respectively in month, the gas production increases.

5b.Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

cangas %>%
  stl(t.window=NULL, s.window="periodic", robust=TRUE) %>%
  autoplot()

cangas %>%
  stl(t.window=NULL, s.window=100, robust=TRUE) %>%
  autoplot()

cangas %>%
  stl(t.window=NULL, s.window=7, robust=TRUE) %>%
  autoplot()

   s.window is the number of consecutive years to be used in estimating each value in the seasonal component. 
   If s.window is larger, the more data we will use and the shape of seasonality is more similar to "periodic". 
   If s.window is small, the less data we will use and thus more irregular shape.

5c.Compare the results with those obtained using SEATS and X11. How are they different?

cangas %>% seas() %>%
  autoplot() +
  ggtitle("SEATS decomposition of Canadian Gas Production")

cangas %>% seas(x11="") %>%
  autoplot() +
  ggtitle("X11 decomposition of Canadian Gas Production")

  the seasonality of both are not expected shaped. X11 performs better on trend and remainders

6 We will use the bricksq data (Australian quarterly clay brick production. 1956-1994) for this exercise. 6a.Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

bricksq %>%
  stl(t.window=30, s.window="periodic", robust=TRUE) %>%
  autoplot()

bricksq %>%
  stl(t.window=30, s.window=7, robust=TRUE) %>%
  autoplot()

bricksq %>%
  stl(t.window=30, s.window=50, robust=TRUE) %>%
  autoplot()

6b.Compute and plot the seasonally adjusted data.

bricksq %>%
  stl(t.window=30, s.window="periodic", robust=FALSE) %>% seasadj() %>%
  autoplot()

6c.Use a naïve method to produce forecasts of the seasonally adjusted data.

ft_stl <- stl(bricksq, t.window=30, s.window="periodic", robust=FALSE)
ft_stl %>% seasadj() %>% naive() %>% autoplot() + ylab("New orders index") +
  ggtitle("Naive forecasts of seasonally adjusted data")

6d.Use stlf() to reseasonalise the results, giving forecasts for the original data.

ft_stlf <- stlf(bricksq, method='naive')
forecast(ft_stlf,h=8)
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       465.7338 440.1878 491.2798 426.6645 504.8030
## 1995 Q1       424.6739 388.5464 460.8014 369.4217 479.9261
## 1995 Q2       483.9926 439.7456 528.2395 416.3227 551.6624
## 1995 Q3       494.0000 442.9080 545.0920 415.8616 572.1384
## 1995 Q4       465.7338 408.6112 522.8563 378.3723 553.0952
## 1996 Q1       424.6739 362.0993 487.2485 328.9742 520.3736
## 1996 Q2       483.9926 416.4042 551.5809 380.6251 587.3600
## 1996 Q3       494.0000 421.7450 566.2550 383.4956 604.5044

6e.Do the residuals look uncorrelated?

checkresiduals(ft_stlf)
## Warning in checkresiduals(ft_stlf): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  Random walk
## Q* = 40.829, df = 8, p-value = 2.244e-06
## 
## Model df: 0.   Total lags used: 8
  at the quater of 4 and 8, there does show slight correlation. Howevere, this is acceptable for a naive forecast.

6f.Repeat with a robust STL decomposition. Does it make much difference?

ft_stl2 <- stl(bricksq, t.window=30, s.window="periodic", robust=TRUE)
ft_stl2 %>% seasadj() %>% naive() %>% autoplot() + ylab("New orders index") +
  ggtitle("Naive forecasts of seasonally adjusted data,r")

forecast(ft_stl2, h=8)
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       470.9904 436.6159 505.3649 418.4191 523.5617
## 1995 Q1       437.2186 389.5246 484.9127 364.2768 510.1605
## 1995 Q2       479.5628 421.5014 537.6241 390.7656 568.3600
## 1995 Q3       493.6721 426.8080 560.5362 391.4123 595.9319
## 1995 Q4       470.9904 396.3326 545.6482 356.8111 585.1696
## 1996 Q1       437.2186 355.4869 518.9504 312.2208 562.2165
## 1996 Q2       479.5628 391.3036 567.8219 344.5820 614.5435
## 1996 Q3       493.6721 399.3184 588.0258 349.3706 637.9736
   robust or not does not significantly influence the result.

6g.Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?

bricksq
##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956  189  204  208  197
## 1957  187  214  227  223
## 1958  199  229  249  234
## 1959  208  253  267  255
## 1960  242  268  290  277
## 1961  241  253  265  236
## 1962  229  265  275  258
## 1963  231  263  308  313
## 1964  293  328  349  340
## 1965  309  349  366  340
## 1966  302  350  362  337
## 1967  326  358  359  357
## 1968  341  380  404  409
## 1969  383  417  454  428
## 1970  386  428  434  417
## 1971  385  433  453  436
## 1972  399  461  476  477
## 1973  452  461  534  516
## 1974  478  526  518  417
## 1975  340  437  459  449
## 1976  424  501  540  533
## 1977  457  513  522  478
## 1978  421  487  470  482
## 1979  458  526  573  563
## 1980  513  551  589  564
## 1981  519  581  581  578
## 1982  500  560  512  412
## 1983  303  409  420  413
## 1984  400  469  482  484
## 1985  447  507  533  503
## 1986  443  503  505  443
## 1987  415  485  495  458
## 1988  427  519  555  539
## 1989  511  572  570  526
## 1990  472  524  497  460
## 1991  373  436  424  430
## 1992  387  413  451  420
## 1993  394  462  476  443
## 1994  421  472  494
bricksq.train<-window(bricksq, end=c(1992,3))
bricksq.test<-window(bricksq, start=c(1992,4), end=c(1994,3))
bricksq.train
##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956  189  204  208  197
## 1957  187  214  227  223
## 1958  199  229  249  234
## 1959  208  253  267  255
## 1960  242  268  290  277
## 1961  241  253  265  236
## 1962  229  265  275  258
## 1963  231  263  308  313
## 1964  293  328  349  340
## 1965  309  349  366  340
## 1966  302  350  362  337
## 1967  326  358  359  357
## 1968  341  380  404  409
## 1969  383  417  454  428
## 1970  386  428  434  417
## 1971  385  433  453  436
## 1972  399  461  476  477
## 1973  452  461  534  516
## 1974  478  526  518  417
## 1975  340  437  459  449
## 1976  424  501  540  533
## 1977  457  513  522  478
## 1978  421  487  470  482
## 1979  458  526  573  563
## 1980  513  551  589  564
## 1981  519  581  581  578
## 1982  500  560  512  412
## 1983  303  409  420  413
## 1984  400  469  482  484
## 1985  447  507  533  503
## 1986  443  503  505  443
## 1987  415  485  495  458
## 1988  427  519  555  539
## 1989  511  572  570  526
## 1990  472  524  497  460
## 1991  373  436  424  430
## 1992  387  413  451
bricksq.test
##      Qtr1 Qtr2 Qtr3 Qtr4
## 1992                 420
## 1993  394  462  476  443
## 1994  421  472  494
bricksq_naive<-snaive(bricksq.train)
bricksq_stlf<-stlf(bricksq.train)
bricksq.fc_naive<-forecast(bricksq_naive,h=8)
bricksq.fc_stlf<-forecast(bricksq_stlf,h=8)

autoplot(bricksq, series = "Original data") +
  geom_line(size = 1) +
  autolayer(bricksq.fc_naive, PI = FALSE, size = 1,
            series = "stlf") +
  autolayer(bricksq.fc_stlf, PI = FALSE, size = 1,
            series = "snaive") +
  scale_color_manual(values = c("gray50", "blue", "red"),
                     breaks = c("Original data", "stlf", "snaive")) +
  scale_x_continuous(limits = c(1990, 1994.5)) +
  guides(colour = guide_legend(title = "Data")) +
  ggtitle("Forecast from stlf and snaive functions") +
  annotate(
    "rect",
    xmin=1992.75,xmax=1994.5,ymin=-Inf,ymax=Inf,
    fill="lightgreen",alpha = 0.3
  )
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
## Warning: Removed 136 rows containing missing values (geom_path).

7Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate.Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(writing)

hist(writing)

  the data is normal distributed. There is no necessary to transform. So I would not use BoxCox.
stlf_writing <- stlf(writing, 
                     s.window = "periodic", 
                     robust = TRUE,
                     method = "naive")
autoplot(stlf_writing)

8 Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(fancy)

hist(fancy)

 the histgram suggests transformation should be included to improve
stlf_fancy <- stlf(fancy,
                   s.window = "periodic",
                   robust = TRUE,
                   lambda = BoxCox.lambda(fancy),
                   method = "rwdrift")

autoplot(stlf_fancy)