1.Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
A 3X5 MA will lead to
1/15(y + 2y + 3y + 3y + 3y + 2y + y) =1/15(Y1) + 2/15(Y2) +3/15(Y3) +3/15(Y4) +3/15(Y5) +2/15(Y6) +1/15(Y7) =0.067Y1 + 0.133Y2 + 0.2Y3 + 0.2Y4 + 0.2Y5 + 0.133Y6 + 0.067Y7
2.the plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.
2a.Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ──────────────────────────────────────────────────────────────────── fpp2 2.4 ──
## ✓ ggplot2 3.3.2 ✓ fma 2.4
## ✓ forecast 8.13 ✓ expsmooth 2.3
##
plastics
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
autoplot(plastics,xlab="Year",ylab="Sales")
There is seasonality that could be observed from the plots, and I could see a trend cycle as well,since it looks like the peak of sales generally occurs every summer.
2b.Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.
decomp_model = decompose(plastics,type = "multiplicative")
autoplot(decomp_model)
2c.Do the results support the graphical interpretation from part a? The results do support the graphical interpretation from part a
2d.Compute and plot the seasonally adjusted data.
plot(seasadj(decomp_model))
2e.Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? 2f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?
#2e and 2F are answered together
plastics_otl1 = plastics
plastics_otl1[30] = plastics_otl1[30]+500
otl_model1 = decompose(plastics_otl1, type="multiplicative")
plot(seasadj(otl_model1), main = "outlier created in the middel")
plastics_otl2 = plastics
plastics_otl2[50] = plastics_otl2[50]+500
otl_model2 = decompose(plastics_otl2, type="multiplicative")
plot(seasadj(otl_model2, main = "outlier created in the end"))
The effect of turning an observation to an outlier is that the plots now appears to have a high peak at where the outlier was created. No matter if the outlier is created in the mideel or near the end of the plot, it only appears as a spike in the new plot, hence the plcement of it should not effect the entire time series .
3.Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?
library(seasonal)
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
head(retaildata)
## # A tibble: 6 x 190
## `Series ID` A3349335T A3349627V A3349338X A3349398A A3349468W
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1982-04-01 00:00:00 303. 41.7 63.9 409. 65.8
## 2 1982-05-01 00:00:00 298. 43.1 64 405. 65.8
## 3 1982-06-01 00:00:00 298 40.3 62.7 401 62.3
## 4 1982-07-01 00:00:00 308. 40.9 65.6 414. 68.2
## 5 1982-08-01 00:00:00 299. 42.1 62.6 404. 66
## 6 1982-09-01 00:00:00 305. 42 64.4 412. 62.3
## # … with 184 more variables: A3349336V <dbl>, A3349337W <dbl>, A3349397X <dbl>,
## # A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>, A3349790V <dbl>,
## # A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>, A3349873A <dbl>,
## # A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>, A3349789K <dbl>,
## # A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>, A3349799R <dbl>,
## # A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>, A3349416V <dbl>,
## # A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>, A3349727C <dbl>,
## # A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>, A3349349F <dbl>,
## # A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>, A3349566A <dbl>,
## # A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>, A3349561R <dbl>,
## # A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>, A3349637X <dbl>,
## # A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>, A3349719C <dbl>,
## # A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>, A3349480L <dbl>,
## # A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>, A3349481R <dbl>,
## # A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>, A3349654A <dbl>,
## # A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>, A3349656F <dbl>,
## # A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>, A3349360V <dbl>,
## # A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>, A3349575C <dbl>,
## # A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>, A3349433W <dbl>,
## # A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>, A3349815C <dbl>,
## # A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>, A3349742A <dbl>,
## # A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>, A3349824F <dbl>,
## # A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>, A3349434X <dbl>,
## # A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>, A3349908R <dbl>,
## # A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>, A3349365F <dbl>,
## # A3349746K <dbl>, …
retailts <- ts(retaildata[,"A3349335T"],
frequency=12, start=c(1982,4))
X11decomp<-seas(retailts, x11 = "")
autoplot(X11decomp)
With the X11 decomposition, it does reveals outliers, as we can see fromt he plots, the outlier appears to be at around 1986 and 2010; the two very obvious outliers pop upwards.
4a.Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Based on the plots for the number of persons in the civilian labor force in Australia, it presents an overall increasing trend over time, and the seasonal plot says it contains certain amount of seasonal fluctuations.However, the growth does not appear to be a straight line, as an obvious drop shows at around 1992 ~ 1993; the remainder plot also reflected this drop as an outlier pops downward. We could acknowledge from such a situation that there was a large decrease in labor force during that time, and the causation might be national recession or the revolution of gender distribution needed by the labor market.
4b.Is the recession of 1991/1992 visible in the estimated components?
Like we mentioned in 4a, we noticed a large drop or dip around 1992, which also reflects an outlier pops downward on the remainder plot, during a recession, we expect a huge lost in the population of labor foce; hence the recessio of 1991/1992 is quite visible in the estimated components.
5.This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).
5a.Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?
autoplot(cangas)
ggsubseriesplot(cangas)
ggseasonplot(cangas)
The huge change of seasonality could be caused by the change of climates and temperature over seasons in Canada, such as not much gas productions are needed during winter since more people might be using heaters, instead more gas productions ,like AC, are more than ever needed during summer time. Therefore, we expect to see lots of flutuations bewteen different season corroponding to the certain amount of gas productions needed by people.
5b.Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.
stl_cangas = stl(cangas, s.window = 12)
autoplot(stl_cangas)
5c.Compare the results with those obtained using SEATS and X11. How are they different?
cangas_time = ts(cangas,frequency=12, start=c(1960,45))
x11_cangas = seas(cangas_time, x11="")
autoplot(x11_cangas)
Comparing to STL model, there are more variations could be observed before 1980 with SEATS and X11 based on the seasonal plot. Also from the reminder plots, we could observe more outliers before 1980 with SEAT and X11. Overall, we could conclude that for this particular case, with SEATS and X11 mode, we could see more fluctuations and variations of data than STL model, even though in the larger time fram, they share lots of similarities.
6.We will use the bricksq data (Australian quarterly clay brick production. 1956–1994) for this exercise.
6a.Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)
stl_bricksq1 = stl(bricksq, s.window = "periodic", t.window = 12)
autoplot(stl_bricksq1)
stl_bricksq2 = stl(bricksq, s.window = 40, t.window = 12)
autoplot(stl_bricksq2)
stl_bricksq3 = stl(bricksq, s.window = 12)
autoplot(stl_bricksq3)
6b.Compute and plot the seasonally adjusted data.
bricksq_seasonal = seasadj(stl_bricksq1)
autoplot(bricksq_seasonal, xlab = "Years", ylab = "Production amount of Brick", main = "Seasonal Adjusted amount of Brick")
6c.Use a naïve method to produce forecasts of the seasonally adjusted data.
bricksq_naive = naive(bricksq_seasonal)
autoplot(bricksq_naive)
6d.Use stlf() to reseasonalise the results, giving forecasts for the original data.
bricksq_stlf = stlf(bricksq, method = "naive")
fc_bricksq_stlf = forecast(bricksq_stlf)
fc_bricksq_stlf
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1994 Q4 465.7338 440.1878 491.2798 426.6645 504.8030
## 1995 Q1 424.6739 388.5464 460.8014 369.4217 479.9261
## 1995 Q2 483.9926 439.7456 528.2395 416.3227 551.6624
## 1995 Q3 494.0000 442.9080 545.0920 415.8616 572.1384
## 1995 Q4 465.7338 408.6112 522.8563 378.3723 553.0952
## 1996 Q1 424.6739 362.0993 487.2485 328.9742 520.3736
## 1996 Q2 483.9926 416.4042 551.5809 380.6251 587.3600
## 1996 Q3 494.0000 421.7450 566.2550 383.4956 604.5044
6e.Do the residuals look uncorrelated?
checkresiduals(bricksq_stlf$residuals)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
The residual doesn’t look completely uncorrelated, but the correlations do appear quite weak and slight.
6f.Repeat with a robust STL decomposition. Does it make much difference?
bricksq_robust_stl<-stl(bricksq,t.window=12, s.window="periodic", robust=TRUE)
bricksq_stl1<-seasadj(bricksq_robust_stl)
bricksq_stl2<-naive(bricksq_stl1)
fc_bricksq_robust_stl<-forecast(bricksq_stl2)
fc_bricksq_robust_stl
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1994 Q4 471.7973 442.1625 501.4322 426.4748 517.1199
## 1995 Q1 471.7973 429.8874 513.7073 407.7015 535.8931
## 1995 Q2 471.7973 420.4683 523.1264 393.2963 550.2983
## 1995 Q3 471.7973 412.5277 531.0670 381.1522 562.4425
## 1995 Q4 471.7973 405.5318 538.0628 370.4530 573.1417
## 1996 Q1 471.7973 399.2071 544.3876 360.7802 582.8145
## 1996 Q2 471.7973 393.3909 550.2037 351.8851 591.7096
## 1996 Q3 471.7973 387.9774 555.6173 343.6058 599.9889
## 1996 Q4 471.7973 382.8928 560.7018 335.8296 607.7650
## 1997 Q1 471.7973 378.0838 565.5109 328.4748 615.1199
checkresiduals(fc_bricksq_robust_stl$residuals)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
We do not see any obvious difference with using or not using robust.
6g.Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?
bricksq_testing = window(bricksq, start=c(1992,2), end=c(1994,4))# create two yr testing data
## Warning in window.default(x, ...): 'end' value not changed
bricksq_testing
## Qtr1 Qtr2 Qtr3 Qtr4
## 1992 413 451 420
## 1993 394 462 476 443
## 1994 421 472 494
bricksq_snaive = snaive(bricksq_testing)
bricksq_stlf<-stlf(bricksq_testing)
fc_bricksq_snaive<-forecast(bricksq_snaive,h=8)
fc_bricksq_stlf<-forecast(bricksq_stlf,h=8)
fc_bricksq_snaive
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1994 Q4 443 407.1013 478.8987 388.0977 497.9023
## 1995 Q1 421 385.1013 456.8987 366.0977 475.9023
## 1995 Q2 472 436.1013 507.8987 417.0977 526.9023
## 1995 Q3 494 458.1013 529.8987 439.0977 548.9023
## 1995 Q4 443 392.2316 493.7684 365.3564 520.6436
## 1996 Q1 421 370.2316 471.7684 343.3564 498.6436
## 1996 Q2 472 421.2316 522.7684 394.3564 549.6436
## 1996 Q3 494 443.2316 544.7684 416.3564 571.6436
fc_bricksq_stlf
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1994 Q4 455.5662 442.0111 469.1213 434.8355 476.2969
## 1995 Q1 424.9487 405.7799 444.1175 395.6325 454.2649
## 1995 Q2 476.0621 452.5856 499.5386 440.1578 511.9663
## 1995 Q3 493.9996 466.8915 521.1077 452.5413 535.4579
## 1995 Q4 455.5662 425.2585 485.8739 409.2146 501.9178
## 1996 Q1 424.9487 391.7484 458.1490 374.1733 475.7241
## 1996 Q2 476.0621 440.2018 511.9224 421.2185 530.9057
## 1996 Q3 493.9996 455.6634 532.3358 435.3694 552.6298
In this particular testing set, stlf() appears to support more accuracy than snavie()
7.Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.
writing
## Jan Feb Mar Apr May Jun Jul Aug
## 1968 562.674 599.000 668.516 597.798 579.889 668.233 499.232 215.187
## 1969 634.712 639.283 712.182 621.557 621.000 675.989 501.322 220.286
## 1970 646.783 658.442 712.906 687.714 723.916 707.183 629.000 237.530
## 1971 676.155 748.183 810.681 729.363 701.108 790.079 594.621 230.716
## 1972 747.636 773.392 813.788 766.713 728.875 749.197 680.954 241.424
## 1973 795.337 788.421 889.968 797.393 751.000 821.255 691.605 290.655
## 1974 843.038 847.000 941.952 804.309 840.307 871.528 656.330 370.508
## 1975 778.139 856.075 938.833 813.023 783.417 828.110 657.311 310.032
## 1976 895.217 856.075 893.268 875.000 835.088 934.595 832.500 300.000
## 1977 875.024 992.968 976.804 968.697 871.675 1006.852 832.037 345.587
## Sep Oct Nov Dec
## 1968 555.813 586.935 546.136 571.111
## 1969 560.727 602.530 626.379 605.508
## 1970 613.296 730.444 734.925 651.812
## 1971 617.189 691.389 701.067 705.777
## 1972 680.234 708.326 694.238 772.071
## 1973 727.147 868.355 812.390 799.556
## 1974 742.000 847.152 731.675 898.527
## 1975 780.000 860.000 780.000 807.993
## 1976 791.443 900.000 781.729 880.000
## 1977 849.528 913.871 868.746 993.733
autoplot(writing)
boxplot(writing)
hist(writing)
writing_stlf = stlf(writing, s.window = "periodic", method="naive")
autoplot(writing_stlf)
8.Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.
fancy
## Jan Feb Mar Apr May Jun Jul
## 1987 1664.81 2397.53 2840.71 3547.29 3752.96 3714.74 4349.61
## 1988 2499.81 5198.24 7225.14 4806.03 5900.88 4951.34 6179.12
## 1989 4717.02 5702.63 9957.58 5304.78 6492.43 6630.80 7349.62
## 1990 5921.10 5814.58 12421.25 6369.77 7609.12 7224.75 8121.22
## 1991 4826.64 6470.23 9638.77 8821.17 8722.37 10209.48 11276.55
## 1992 7615.03 9849.69 14558.40 11587.33 9332.56 13082.09 16732.78
## 1993 10243.24 11266.88 21826.84 17357.33 15997.79 18601.53 26155.15
## Aug Sep Oct Nov Dec
## 1987 3566.34 5021.82 6423.48 7600.60 19756.21
## 1988 4752.15 5496.43 5835.10 12600.08 28541.72
## 1989 8176.62 8573.17 9690.50 15151.84 34061.01
## 1990 7979.25 8093.06 8476.70 17914.66 30114.41
## 1991 12552.22 11637.39 13606.89 21822.11 45060.69
## 1992 19888.61 23933.38 25391.35 36024.80 80721.71
## 1993 28586.52 30505.41 30821.33 46634.38 104660.67
autoplot(fancy)
boxplot(fancy)
hist(fancy)
#without boxcox transformation
fancy_stlf = stlf(fancy, s.window = "periodic", method="rwdrift")
autoplot(fancy_stlf)
#with boxcox transformation
fancy_stlf2 = stlf(fancy, s.window = "periodic",lambda = BoxCox.lambda(fancy), method="rwdrift")
autoplot(fancy_stlf2)