Assignment 1. Chapter 6

1.Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

A 3X5 MA will lead to
1/15(y + 2y + 3y + 3y + 3y + 2y + y) =1/15(Y1) + 2/15(Y2) +3/15(Y3) +3/15(Y4) +3/15(Y5) +2/15(Y6) +1/15(Y7) =0.067Y1 + 0.133Y2 + 0.2Y3 + 0.2Y4 + 0.2Y5 + 0.133Y6 + 0.067Y7

2.the plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

2a.Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

library(fpp2)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## ── Attaching packages ──────────────────────────────────────────────────────────────────── fpp2 2.4 ──

## ✓ ggplot2   3.3.2     ✓ fma       2.4  
## ✓ forecast  8.13      ✓ expsmooth 2.3

##

plastics

##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700  774  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013

autoplot(plastics,xlab="Year",ylab="Sales")

There is seasonality that could be observed from the plots, and I could see a trend cycle as well,since it looks like the peak of sales generally occurs every summer.

2b.Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

decomp_model = decompose(plastics,type = "multiplicative")

autoplot(decomp_model)

2c.Do the results support the graphical interpretation from part a? The results do support the graphical interpretation from part a

2d.Compute and plot the seasonally adjusted data.

plot(seasadj(decomp_model))

2e.Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? 2f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?

#2e and 2F are answered together

plastics_otl1 = plastics
plastics_otl1[30] = plastics_otl1[30]+500
otl_model1 = decompose(plastics_otl1, type="multiplicative")
plot(seasadj(otl_model1), main = "outlier created in the middel")

plastics_otl2 = plastics
plastics_otl2[50] = plastics_otl2[50]+500
otl_model2 = decompose(plastics_otl2, type="multiplicative")
plot(seasadj(otl_model2, main = "outlier created in the end"))

The effect of turning an observation to an outlier is that the plots now appears to have a high peak at where the outlier was created. No matter if the outlier is created in the mideel or near the end of the plot, it only appears as a spike in the new plot, hence the plcement of it should not effect the entire time series .

3.Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
head(retaildata)

## # A tibble: 6 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # … with 184 more variables: A3349336V <dbl>, A3349337W <dbl>, A3349397X <dbl>,
## #   A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>, A3349790V <dbl>,
## #   A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>, A3349873A <dbl>,
## #   A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>, A3349789K <dbl>,
## #   A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>, A3349799R <dbl>,
## #   A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>, A3349416V <dbl>,
## #   A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>, A3349727C <dbl>,
## #   A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>, A3349349F <dbl>,
## #   A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>, A3349566A <dbl>,
## #   A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>, A3349561R <dbl>,
## #   A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>, A3349637X <dbl>,
## #   A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>, A3349719C <dbl>,
## #   A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>, A3349480L <dbl>,
## #   A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>, A3349481R <dbl>,
## #   A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>, A3349654A <dbl>,
## #   A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>, A3349656F <dbl>,
## #   A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>, A3349360V <dbl>,
## #   A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>, A3349575C <dbl>,
## #   A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>, A3349433W <dbl>,
## #   A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>, A3349815C <dbl>,
## #   A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>, A3349742A <dbl>,
## #   A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>, A3349824F <dbl>,
## #   A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>, A3349434X <dbl>,
## #   A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>, A3349908R <dbl>,
## #   A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>, A3349365F <dbl>,
## #   A3349746K <dbl>, …

retailts <- ts(retaildata[,"A3349335T"],
  frequency=12, start=c(1982,4))

X11decomp<-seas(retailts, x11 = "")

autoplot(X11decomp)

With the X11 decomposition, it does reveals outliers, as we can see fromt he plots, the outlier appears to be at around 1986 and 2010; the two very obvious outliers pop upwards.

4a.Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Based on the plots for the number of persons in the civilian labor force in Australia, it presents an overall increasing trend over time, and the seasonal plot says it contains certain amount of seasonal fluctuations.However, the growth does not appear to be a straight line, as an obvious drop shows at around 1992 ~ 1993; the remainder plot also reflected this drop as an outlier pops downward. We could acknowledge from such a situation that there was a large decrease in labor force during that time, and the causation might be national recession or the revolution of gender distribution needed by the labor market.

4b.Is the recession of 1991/1992 visible in the estimated components?

Like we mentioned in 4a, we noticed a large drop or dip around 1992, which also reflects an outlier pops downward on the remainder plot, during a recession, we expect a huge lost in the population of labor foce; hence the recessio of 1991/1992 is quite visible in the estimated components.

5.This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).

5a.Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

autoplot(cangas)

ggsubseriesplot(cangas)

ggseasonplot(cangas)

The huge change of seasonality could be caused by the change of climates and temperature over seasons in Canada, such as not much gas productions are needed during winter since more people might be using heaters, instead more gas productions ,like AC, are more than ever needed during summer time. Therefore, we expect to see lots of flutuations bewteen different season corroponding to the certain amount of gas productions needed by people.

5b.Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

stl_cangas = stl(cangas, s.window = 12)

autoplot(stl_cangas)

5c.Compare the results with those obtained using SEATS and X11. How are they different?

cangas_time = ts(cangas,frequency=12, start=c(1960,45))
x11_cangas = seas(cangas_time, x11="")

autoplot(x11_cangas)

Comparing to STL model, there are more variations could be observed before 1980 with SEATS and X11 based on the seasonal plot. Also from the reminder plots, we could observe more outliers before 1980 with SEAT and X11. Overall, we could conclude that for this particular case, with SEATS and X11 mode, we could see more fluctuations and variations of data than STL model, even though in the larger time fram, they share lots of similarities.

6.We will use the bricksq data (Australian quarterly clay brick production. 1956–1994) for this exercise.

6a.Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

stl_bricksq1 = stl(bricksq, s.window = "periodic", t.window = 12)
autoplot(stl_bricksq1)

stl_bricksq2 = stl(bricksq, s.window = 40, t.window = 12)
autoplot(stl_bricksq2)

stl_bricksq3 = stl(bricksq, s.window = 12)
autoplot(stl_bricksq3)

6b.Compute and plot the seasonally adjusted data.

bricksq_seasonal = seasadj(stl_bricksq1)

autoplot(bricksq_seasonal, xlab = "Years", ylab = "Production amount of Brick", main = "Seasonal Adjusted amount of Brick")

6c.Use a naïve method to produce forecasts of the seasonally adjusted data.

bricksq_naive = naive(bricksq_seasonal)

autoplot(bricksq_naive)

6d.Use stlf() to reseasonalise the results, giving forecasts for the original data.

bricksq_stlf = stlf(bricksq, method = "naive")

fc_bricksq_stlf = forecast(bricksq_stlf)
fc_bricksq_stlf

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       465.7338 440.1878 491.2798 426.6645 504.8030
## 1995 Q1       424.6739 388.5464 460.8014 369.4217 479.9261
## 1995 Q2       483.9926 439.7456 528.2395 416.3227 551.6624
## 1995 Q3       494.0000 442.9080 545.0920 415.8616 572.1384
## 1995 Q4       465.7338 408.6112 522.8563 378.3723 553.0952
## 1996 Q1       424.6739 362.0993 487.2485 328.9742 520.3736
## 1996 Q2       483.9926 416.4042 551.5809 380.6251 587.3600
## 1996 Q3       494.0000 421.7450 566.2550 383.4956 604.5044

6e.Do the residuals look uncorrelated?

checkresiduals(bricksq_stlf$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

The residual doesn’t look completely uncorrelated, but the correlations do appear quite weak and slight.

6f.Repeat with a robust STL decomposition. Does it make much difference?

bricksq_robust_stl<-stl(bricksq,t.window=12, s.window="periodic", robust=TRUE)
bricksq_stl1<-seasadj(bricksq_robust_stl)
bricksq_stl2<-naive(bricksq_stl1)
fc_bricksq_robust_stl<-forecast(bricksq_stl2)

fc_bricksq_robust_stl

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       471.7973 442.1625 501.4322 426.4748 517.1199
## 1995 Q1       471.7973 429.8874 513.7073 407.7015 535.8931
## 1995 Q2       471.7973 420.4683 523.1264 393.2963 550.2983
## 1995 Q3       471.7973 412.5277 531.0670 381.1522 562.4425
## 1995 Q4       471.7973 405.5318 538.0628 370.4530 573.1417
## 1996 Q1       471.7973 399.2071 544.3876 360.7802 582.8145
## 1996 Q2       471.7973 393.3909 550.2037 351.8851 591.7096
## 1996 Q3       471.7973 387.9774 555.6173 343.6058 599.9889
## 1996 Q4       471.7973 382.8928 560.7018 335.8296 607.7650
## 1997 Q1       471.7973 378.0838 565.5109 328.4748 615.1199

checkresiduals(fc_bricksq_robust_stl$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

We do not see any obvious difference with using or not using robust.

6g.Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?

bricksq_testing = window(bricksq, start=c(1992,2), end=c(1994,4))# create two yr testing data

## Warning in window.default(x, ...): 'end' value not changed

bricksq_testing

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1992       413  451  420
## 1993  394  462  476  443
## 1994  421  472  494

bricksq_snaive = snaive(bricksq_testing)
bricksq_stlf<-stlf(bricksq_testing)

fc_bricksq_snaive<-forecast(bricksq_snaive,h=8)
fc_bricksq_stlf<-forecast(bricksq_stlf,h=8)

fc_bricksq_snaive

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4            443 407.1013 478.8987 388.0977 497.9023
## 1995 Q1            421 385.1013 456.8987 366.0977 475.9023
## 1995 Q2            472 436.1013 507.8987 417.0977 526.9023
## 1995 Q3            494 458.1013 529.8987 439.0977 548.9023
## 1995 Q4            443 392.2316 493.7684 365.3564 520.6436
## 1996 Q1            421 370.2316 471.7684 343.3564 498.6436
## 1996 Q2            472 421.2316 522.7684 394.3564 549.6436
## 1996 Q3            494 443.2316 544.7684 416.3564 571.6436

fc_bricksq_stlf

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       455.5662 442.0111 469.1213 434.8355 476.2969
## 1995 Q1       424.9487 405.7799 444.1175 395.6325 454.2649
## 1995 Q2       476.0621 452.5856 499.5386 440.1578 511.9663
## 1995 Q3       493.9996 466.8915 521.1077 452.5413 535.4579
## 1995 Q4       455.5662 425.2585 485.8739 409.2146 501.9178
## 1996 Q1       424.9487 391.7484 458.1490 374.1733 475.7241
## 1996 Q2       476.0621 440.2018 511.9224 421.2185 530.9057
## 1996 Q3       493.9996 455.6634 532.3358 435.3694 552.6298

In this particular testing set, stlf() appears to support more accuracy than snavie()

7.Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

writing

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1968  562.674  599.000  668.516  597.798  579.889  668.233  499.232  215.187
## 1969  634.712  639.283  712.182  621.557  621.000  675.989  501.322  220.286
## 1970  646.783  658.442  712.906  687.714  723.916  707.183  629.000  237.530
## 1971  676.155  748.183  810.681  729.363  701.108  790.079  594.621  230.716
## 1972  747.636  773.392  813.788  766.713  728.875  749.197  680.954  241.424
## 1973  795.337  788.421  889.968  797.393  751.000  821.255  691.605  290.655
## 1974  843.038  847.000  941.952  804.309  840.307  871.528  656.330  370.508
## 1975  778.139  856.075  938.833  813.023  783.417  828.110  657.311  310.032
## 1976  895.217  856.075  893.268  875.000  835.088  934.595  832.500  300.000
## 1977  875.024  992.968  976.804  968.697  871.675 1006.852  832.037  345.587
##           Sep      Oct      Nov      Dec
## 1968  555.813  586.935  546.136  571.111
## 1969  560.727  602.530  626.379  605.508
## 1970  613.296  730.444  734.925  651.812
## 1971  617.189  691.389  701.067  705.777
## 1972  680.234  708.326  694.238  772.071
## 1973  727.147  868.355  812.390  799.556
## 1974  742.000  847.152  731.675  898.527
## 1975  780.000  860.000  780.000  807.993
## 1976  791.443  900.000  781.729  880.000
## 1977  849.528  913.871  868.746  993.733

autoplot(writing)

boxplot(writing)

hist(writing)

writing_stlf = stlf(writing, s.window = "periodic", method="naive")

autoplot(writing_stlf)

8.Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

fancy

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1987   1664.81   2397.53   2840.71   3547.29   3752.96   3714.74   4349.61
## 1988   2499.81   5198.24   7225.14   4806.03   5900.88   4951.34   6179.12
## 1989   4717.02   5702.63   9957.58   5304.78   6492.43   6630.80   7349.62
## 1990   5921.10   5814.58  12421.25   6369.77   7609.12   7224.75   8121.22
## 1991   4826.64   6470.23   9638.77   8821.17   8722.37  10209.48  11276.55
## 1992   7615.03   9849.69  14558.40  11587.33   9332.56  13082.09  16732.78
## 1993  10243.24  11266.88  21826.84  17357.33  15997.79  18601.53  26155.15
##            Aug       Sep       Oct       Nov       Dec
## 1987   3566.34   5021.82   6423.48   7600.60  19756.21
## 1988   4752.15   5496.43   5835.10  12600.08  28541.72
## 1989   8176.62   8573.17   9690.50  15151.84  34061.01
## 1990   7979.25   8093.06   8476.70  17914.66  30114.41
## 1991  12552.22  11637.39  13606.89  21822.11  45060.69
## 1992  19888.61  23933.38  25391.35  36024.80  80721.71
## 1993  28586.52  30505.41  30821.33  46634.38 104660.67

 autoplot(fancy)

boxplot(fancy)

hist(fancy)

#without boxcox transformation
fancy_stlf = stlf(fancy, s.window = "periodic", method="rwdrift")

autoplot(fancy_stlf)

#with boxcox transformation
fancy_stlf2 = stlf(fancy, s.window = "periodic",lambda = BoxCox.lambda(fancy), method="rwdrift")

autoplot(fancy_stlf2)

Assignment 1. Chapter 6

Yuanye Gao

11/8/2020