Hanyue Kuang
2.The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years. 2a.Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?
plastics
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
plot(plastics)
there is an upward trend and a clear seasonal pattern.
2b.Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.
plastics %>% decompose(type="multiplicative") %>%
autoplot() + xlab("Year") +
ggtitle("Classical multiplicative decomposition
of plastics")
2c.Do the results support the graphical interpretation from part a?
yes
2d.Compute and plot the seasonally adjusted data.
plastic.decomp<-decompose(plastics, type="multiplicative")
plastic.seas_adj<-plastic.decomp$x/plastic.decomp$seasonal
plot(plastic.seas_adj)
2e.Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
plastics2<-plastics
plastics2[5]<-plastics[5]+500
plastics2
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1530 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
plastic.decomp2 <- decompose(plastics2, type="multiplicative")
plastic.seas_adj2 <-plastic.decomp2$x/plastic.decomp$seasonal
plot(plastic.seas_adj2)
the outlier makes a sharp change
2f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?
plastics3<-plastics
plastics3[25]<-plastics[25]+500
plastics3
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 1396 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
plastic.decomp3 <- decompose(plastics3, type="multiplicative")
plastic.seas_adj3 <-plastic.decomp3$x/plastic.decomp$seasonal
plot(plastic.seas_adj3)
plastics4<-plastics
plastics4[55]<-plastics[55]+500
plastics4
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 2111 1608 1528 1420 1119 1013
plastic.decomp4 <- decompose(plastics4, type="multiplicative")
plastic.seas_adj4 <-plastic.decomp4$x/plastic.decomp$seasonal
plot(plastic.seas_adj4)
If the outlier is at the beginning of the time sequence, it will not largely influence the adjusted seasonality in general.
If the outlier is in the middle or the last, the whole seasonality will shrink and trend feature dim.
4.a The trend is nice and smoothly upward shaped. The seasonal pattern does not show observable exportional variables.seasonality has minimal effect on the original plot
4.b Yes. There is a obvious drop of 400 at that period in the plot of remainders.
5 This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 - February 2005). 5a.Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?
autoplot(cangas)
ggsubseriesplot(cangas)
ggseasonplot(cangas)
respectively in month, the gas production increases.
5b.Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.
cangas %>%
stl(t.window=NULL, s.window="periodic", robust=TRUE) %>%
autoplot()
cangas %>%
stl(t.window=NULL, s.window=100, robust=TRUE) %>%
autoplot()
cangas %>%
stl(t.window=NULL, s.window=7, robust=TRUE) %>%
autoplot()
s.window is the number of consecutive years to be used in estimating each value in the seasonal component.
If s.window is larger, the more data we will use and the shape of seasonality is more similar to "periodic".
If s.window is small, the less data we will use and thus more irregular shape.
5c.Compare the results with those obtained using SEATS and X11. How are they different?
cangas %>% seas() %>%
autoplot() +
ggtitle("SEATS decomposition of Canadian Gas Production")
cangas %>% seas(x11="") %>%
autoplot() +
ggtitle("X11 decomposition of Canadian Gas Production")
the seasonality of both are not expected shaped. X11 performs better on trend and remainders
6 We will use the bricksq data (Australian quarterly clay brick production. 1956-1994) for this exercise. 6a.Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)
bricksq %>%
stl(t.window=30, s.window="periodic", robust=TRUE) %>%
autoplot()
bricksq %>%
stl(t.window=30, s.window=7, robust=TRUE) %>%
autoplot()
bricksq %>%
stl(t.window=30, s.window=50, robust=TRUE) %>%
autoplot()
6b.Compute and plot the seasonally adjusted data.
bricksq %>%
stl(t.window=30, s.window="periodic", robust=FALSE) %>% seasadj() %>%
autoplot()
6c.Use a naïve method to produce forecasts of the seasonally adjusted data.
ft_stl <- stl(bricksq, t.window=30, s.window="periodic", robust=FALSE)
ft_stl %>% seasadj() %>% naive() %>% autoplot() + ylab("New orders index") +
ggtitle("Naive forecasts of seasonally adjusted data")
6d.Use stlf() to reseasonalise the results, giving forecasts for the original data.
ft_stlf <- stlf(bricksq, method='naive')
forecast(ft_stlf,h=8)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1994 Q4 465.7338 440.1878 491.2798 426.6645 504.8030
## 1995 Q1 424.6739 388.5464 460.8014 369.4217 479.9261
## 1995 Q2 483.9926 439.7456 528.2395 416.3227 551.6624
## 1995 Q3 494.0000 442.9080 545.0920 415.8616 572.1384
## 1995 Q4 465.7338 408.6112 522.8563 378.3723 553.0952
## 1996 Q1 424.6739 362.0993 487.2485 328.9742 520.3736
## 1996 Q2 483.9926 416.4042 551.5809 380.6251 587.3600
## 1996 Q3 494.0000 421.7450 566.2550 383.4956 604.5044
6e.Do the residuals look uncorrelated?
checkresiduals(ft_stlf)
## Warning in checkresiduals(ft_stlf): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + Random walk
## Q* = 40.829, df = 8, p-value = 2.244e-06
##
## Model df: 0. Total lags used: 8
at the quater of 4 and 8, there does show slight correlation. Howevere, this is acceptable for a naive forecast.
6f.Repeat with a robust STL decomposition. Does it make much difference?
ft_stl2 <- stl(bricksq, t.window=30, s.window="periodic", robust=TRUE)
ft_stl2 %>% seasadj() %>% naive() %>% autoplot() + ylab("New orders index") +
ggtitle("Naive forecasts of seasonally adjusted data,r")
forecast(ft_stl2, h=8)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1994 Q4 470.9904 436.6159 505.3649 418.4191 523.5617
## 1995 Q1 437.2186 389.5246 484.9127 364.2768 510.1605
## 1995 Q2 479.5628 421.5014 537.6241 390.7656 568.3600
## 1995 Q3 493.6721 426.8080 560.5362 391.4123 595.9319
## 1995 Q4 470.9904 396.3326 545.6482 356.8111 585.1696
## 1996 Q1 437.2186 355.4869 518.9504 312.2208 562.2165
## 1996 Q2 479.5628 391.3036 567.8219 344.5820 614.5435
## 1996 Q3 493.6721 399.3184 588.0258 349.3706 637.9736
robust or not does not significantly influence the result.
6g.Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?
bricksq
## Qtr1 Qtr2 Qtr3 Qtr4
## 1956 189 204 208 197
## 1957 187 214 227 223
## 1958 199 229 249 234
## 1959 208 253 267 255
## 1960 242 268 290 277
## 1961 241 253 265 236
## 1962 229 265 275 258
## 1963 231 263 308 313
## 1964 293 328 349 340
## 1965 309 349 366 340
## 1966 302 350 362 337
## 1967 326 358 359 357
## 1968 341 380 404 409
## 1969 383 417 454 428
## 1970 386 428 434 417
## 1971 385 433 453 436
## 1972 399 461 476 477
## 1973 452 461 534 516
## 1974 478 526 518 417
## 1975 340 437 459 449
## 1976 424 501 540 533
## 1977 457 513 522 478
## 1978 421 487 470 482
## 1979 458 526 573 563
## 1980 513 551 589 564
## 1981 519 581 581 578
## 1982 500 560 512 412
## 1983 303 409 420 413
## 1984 400 469 482 484
## 1985 447 507 533 503
## 1986 443 503 505 443
## 1987 415 485 495 458
## 1988 427 519 555 539
## 1989 511 572 570 526
## 1990 472 524 497 460
## 1991 373 436 424 430
## 1992 387 413 451 420
## 1993 394 462 476 443
## 1994 421 472 494
bricksq.train<-window(bricksq, end=c(1992,3))
bricksq.test<-window(bricksq, start=c(1992,4), end=c(1994,3))
bricksq.train
## Qtr1 Qtr2 Qtr3 Qtr4
## 1956 189 204 208 197
## 1957 187 214 227 223
## 1958 199 229 249 234
## 1959 208 253 267 255
## 1960 242 268 290 277
## 1961 241 253 265 236
## 1962 229 265 275 258
## 1963 231 263 308 313
## 1964 293 328 349 340
## 1965 309 349 366 340
## 1966 302 350 362 337
## 1967 326 358 359 357
## 1968 341 380 404 409
## 1969 383 417 454 428
## 1970 386 428 434 417
## 1971 385 433 453 436
## 1972 399 461 476 477
## 1973 452 461 534 516
## 1974 478 526 518 417
## 1975 340 437 459 449
## 1976 424 501 540 533
## 1977 457 513 522 478
## 1978 421 487 470 482
## 1979 458 526 573 563
## 1980 513 551 589 564
## 1981 519 581 581 578
## 1982 500 560 512 412
## 1983 303 409 420 413
## 1984 400 469 482 484
## 1985 447 507 533 503
## 1986 443 503 505 443
## 1987 415 485 495 458
## 1988 427 519 555 539
## 1989 511 572 570 526
## 1990 472 524 497 460
## 1991 373 436 424 430
## 1992 387 413 451
bricksq.test
## Qtr1 Qtr2 Qtr3 Qtr4
## 1992 420
## 1993 394 462 476 443
## 1994 421 472 494
bricksq_naive<-snaive(bricksq.train)
bricksq_stlf<-stlf(bricksq.train)
bricksq.fc_naive<-forecast(bricksq_naive,h=8)
bricksq.fc_stlf<-forecast(bricksq_stlf,h=8)
autoplot(bricksq, series = "Original data") +
geom_line(size = 1) +
autolayer(bricksq.fc_naive, PI = FALSE, size = 1,
series = "stlf") +
autolayer(bricksq.fc_stlf, PI = FALSE, size = 1,
series = "snaive") +
scale_color_manual(values = c("gray50", "blue", "red"),
breaks = c("Original data", "stlf", "snaive")) +
scale_x_continuous(limits = c(1990, 1994.5)) +
guides(colour = guide_legend(title = "Data")) +
ggtitle("Forecast from stlf and snaive functions") +
annotate(
"rect",
xmin=1992.75,xmax=1994.5,ymin=-Inf,ymax=Inf,
fill="lightgreen",alpha = 0.3
)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
## Warning: Removed 136 rows containing missing values (geom_path).
7Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate.Use the lambda argument if you think a Box-Cox transformation is required.
autoplot(writing)
hist(writing)
the data is normal distributed. There is no necessary to transform. So I would not use BoxCox.
stlf_writing <- stlf(writing,
s.window = "periodic",
robust = TRUE,
method = "naive")
autoplot(stlf_writing)
8 Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.
autoplot(fancy)
hist(fancy)
the histgram suggests transformation should be included to improve
stlf_fancy <- stlf(fancy,
s.window = "periodic",
robust = TRUE,
lambda = BoxCox.lambda(fancy),
method = "rwdrift")
autoplot(stlf_fancy)