DSCI 609 Assignment #4 Time Series Decomposition

Question 1

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

Read the data into R and decompose the series using X11. Does it reveal any outliers, or unusual features?

retaildata <- read_excel("retail.xlsx", skip=1)

#Select one of the time series
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))

#Decompose using X11
fit <- seas(myts, x11 = "")
autoplot(fit) + ggtitle("X11 decomposition of Australian retail data")

According to the time plot, the retail data has a clear upward trend and strong seasonality. The seasonality is increasing over time.There is no outliers observed. However, after the X11 decomposition, the seasonal component is decreasing as the trend increases.This observation is different from that in the time plot. Therefore, this is an unusual feature.

Question 2

This exercise uses the cangas data (monthly Canadian gas production in billions of cubic meters, January 1960 - February 2005).

Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

autoplot(cangas) + xlab("Month") + ylab("Gas Production (billions of cubic metres)") + ggtitle("Canadian Gas Production (Jan 1960 - Feb 2005)")

ggsubseriesplot(cangas, main = NULL) + xlab("Month") + ylab("Gas Production (billions of cubic metres)")

ggseasonplot(cangas, main = NULL) + xlab("Month") + ylab("Gas Production (billions of cubic metres)")

According to the time plot, the cangas data has a clear increasing trend and strong seasonality. This observation is verified in the subseries plot and the seasonal plot as well. In general, the gas production decreases in summer and increases in winter.The seasonality increases dramatically from 1975 to 1990.This results from the larger differences in gas production between summer and winter in those years, shown as the blue and green lines in the seasonal plot.

Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

fit_stl <- stl(cangas, s.window = 13, robust = TRUE)
autoplot(fit_stl) + xlab("Month") + ggtitle("STL Decomposition of Canadian Gas Production")

autoplot(cangas, series="Data") +
  autolayer(trendcycle(fit_stl), series="Trend") +
  autolayer(seasadj(fit_stl), series="Seasonally Adjusted") +
  xlab("Year") + ylab("New orders index") +
  ggtitle("STL Decomposition of Canadian Gas Production") +
  scale_colour_manual(values=c("gray","blue","red"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

The results of the STL decomposition are shown above. The trend component adequately represent the trend in the original data. The seasonal component increases from 1975 to 1985, and then decreases. This observation is consistent with the time plot. Besides, the remainder component is around zero.

Compare the results with those obtained using SEATS and X11. How are they different?

fit_seats <- seas(cangas)

p1 <- autoplot(fit_seats) + xlab("Month") + ggtitle("SEATS Decomposition of Canadian Gas Production")

p2 <- autoplot(cangas, series="Data") +
  autolayer(trendcycle(fit_seats), series="Trend") +
  autolayer(seasadj(fit_seats), series="Seasonally Adjusted") +
  xlab("Year") + ylab("New orders index") +
  ggtitle("SEATS Decomposition of Canadian Gas Production") +
  scale_colour_manual(values=c("gray","blue","red"),
             breaks=c("Data","Seasonally Adjusted","Trend"))


fit_x11 <-seas(cangas, x11 = "")

p3 <- autoplot(fit_x11) + xlab("Month") + ggtitle("X11 Decomposition of Canadian Gas Production")

p4 <- autoplot(cangas, series="Data") +
  autolayer(trendcycle(fit_x11), series="Trend") +
  autolayer(seasadj(fit_x11), series="Seasonally Adjusted") +
  xlab("Year") + ylab("New orders index") +
  ggtitle("X11 Decomposition of Canadian Gas Production") +
  scale_colour_manual(values=c("gray","blue","red"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

grid.arrange(p1, p2, p3, p4, ncol = 2)

The results of SEATS and X11 decomposition are shown above. The decomposed trend and seasonal components are similar. The changes of seasonality are different from the original data. This is because the seas() function assumes the multiplicative decomposition; while the stl() function assumes the additive decomposition. The differences of seasonally adjusted time series are very minimal between these two method.

In addition, the remainder component of the SEATS decomposition is larger than that of the X11 decomposition, where both remainders are around one. The remainder component of the STL decomposition is smaller. Thus, we can conclude that the STL decomposition fits the cangas data better.

Question 3

This exercise uses the oil data (Annual Oil Production of Saudi Arabia 1965-2013).

Decompose the series using different approaches described in the lecture.
Write a paragraph describing the results of each decomposition.

autoplot(oil) + xlab("Year") + ylab("Oil Production (millions of tonnes)") + ggtitle("Oil Production in Saudi Arabia (1965-2013)")

According to the time plot, this time series does not have a seasonal pattern. Therefore, the decomposition results will contain a trend-cycle component and a remainder component. In this case, the additive decomposition is appropriate. Advanced decomposition methods, such as X11, SEATS and STL, cannot be used.

autoplot(oil, series = "Data") +
  autolayer(ma(oil, 5), series = "5-MA") +
  xlab("Year") + ylab("Oil Production (millions of tonnes)") +
  ggtitle("Oil Production in Saudi Arabia (1965-2013)") +
  scale_colour_manual(values=c("black","blue"),
             breaks=c("Data","5-MA"))

## Warning: Removed 4 rows containing missing values (geom_path).

The result of moving average smoothing is shown above. The 5-MA has estimated the trend well.

Question 4

The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(plastics) + xlab("Month") + ylab("Sales") + ggtitle("Sales of Plastic Product")

p1 <- ggAcf(window(plastics), main = NULL)
p2 <- ggsubseriesplot(plastics, main = NULL) + ylab("sales")

grid.arrange(p1, p2, ncol = 2, top = "Sales of Plastic Product")

According to the time plot, the time series has an increasing trend with strong seasonality over time.The scallop shape in the correlogram also confirms the existence of strong seasonality with the frequency of 12 months. In the subseries plot, the sales increases since February, reaches the peak in August and September, and then decreases. In addition, sales from January to August has a clear increasing trend over time; however, sales from September to December start decreasing since the fifth month.

Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

plastics %>%
  decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") + ylab("Sales") + ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product")

Do the results support the graphical interpretation from part a?
The results of classical decomposition support the graphical interpretation from part a. The seasonal component is consistent as trend increases.
Compute and plot the seasonally adjusted data.

decompose_plastics <- decompose(plastics, type="multiplicative")
adjust_plastics <- plastics/decompose_plastics$seasonal

autoplot(plastics, series = "Data") + ylab("Sales") + xlab("Month") +
  autolayer(adjust_plastics, series = "Seasonally Adjusted") +
  ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product") +
  scale_color_manual(values = c("Data" = "black", "Seasonally Adjusted" = "blue"))

The seasonally adjusted time series has a good representation of the trend in the original data.

Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

plastics2 <- plastics

#Make the 30th observation to be an outlier (in the middle of time series)
plastics2[30] <- plastics2[30]+500

autoplot(plastics2) + xlab("Month") + ylab("sales") +
  ggtitle("Sales of Plastic Products with Outlier")

decompose_plastics2 <- decompose(plastics2, type="multiplicative")
adjust_plastics2 <- plastics2/decompose_plastics2$seasonal

plastics2 %>%
  decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") + ylab("Sales") + ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product with Outlier")

autoplot(plastics2, series = "Data") + 
  autolayer(adjust_plastics2, series = "Seasonally Adjusted") + 
  xlab("Month") + ylab("Sales") +
  ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product with Outlier") +
  scale_color_manual(values = c("Data" = "black", "Seasonally Adjusted" = "blue"))

According to the decomposition results, the outlier has a significant impact on the seasonal component, but does not influence the trend component.The seasonally adjusted time series has a dramatic change, resulting from the outlier. This is attributed to the large remainder of the outlier.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

plastics3 <- plastics

plastics3[60] <- plastics3[60]+500

autoplot(plastics3)

decompose_plastics3 <- decompose(plastics3, type="multiplicative")
adjust_plastics3 <- plastics3/decompose_plastics3$seasonal

plastics3 %>%
  decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") + ylab("Sales") + ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product with Outlier at the end")

autoplot(plastics3, series = "Data") + 
  autolayer(adjust_plastics3, series = "Seasonally Adjusted") + 
  scale_color_manual(values = c("Data" = "black", "Seasonally Adjusted" = "blue"))

According to the decomposition results, the outlier at the end does not influence the seasonal component as much as the outlier in the middle. However, the trend component is smoother. In addition, the end point of the seasonally adjusted time series is even larger than the original data.

In conclusion, the outliers in the middle of time series will have a significant impact on the seasonal component. The outliers at the end will result in a smoother trend.

Question 5

This exercise uses the stfl() function. (A short-cut approach to decompose a time series using STL, forecast the seasonally adjusted series, and return the reseasonalised forecasts. The stlf() function uses mstl() to carry out the decomposition, so there are default values for s.window and t.window. Note: Forecasts using seasonally adjusted data need to be “reseasonalised” by adding in the forecasts of the seasonal component. The stfl() function will return “reseasonalised” data.)

Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

p1 <- autoplot(fancy, main = NULL) + xlab("Month") + ylab("Sales") 
p2 <- ggsubseriesplot(fancy, main = NULL) + xlab("Month") + ylab("Sales")

grid.arrange(p1, p2, ncol = 2, top = "Sales for a Souvenir Shop")

According to the time plot, there is a clear upward trend and strong seasonality. Increasing seasonality is observed as trend increases. As shown in the subseries plot, the sales increase significantly from November to December, because of the holiday season. Besides, sales in March is higher, because of the surfing event.

fit <- stlf(fancy, t.window = 13, s.window = "periodic", robust = TRUE, method = "rwdrift")

p1 <- autoplot(fit, main = "Forecast without Box-Cox Transformation") + xlab("Month") + ylab("Sales")


fit2 <- stlf(fancy, t.window = 13, s.window = "periodic", robust = TRUE,lambda = BoxCox.lambda(fancy) )

p2 <- autoplot(fit2, main = "Forecast with Box-Cox Transformation") + xlab("Month") + ylab("Sales")

grid.arrange(p1, p2, ncol = 2)

Since the original time series have both trend and seasonality, the Drift method will fit better than the Naive method. The results of forecasts with and without Box-Cox transformation are shown above. The forecast with transformation is more reasonable than the one without transformation. This is because the original time series has increasing seasonality over time. What’s more, the forecast with Box-Cox transformation has smaller prediction intervals. Conclusively, forecasting with drift method and Box-Cox transformation fits this time series better.

Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

p1 <- autoplot(writing, main = NULL) + xlab("Month") + ylab("Sales") 
p2 <- ggsubseriesplot(writing, main = NULL) + xlab("Month") + ylab("Sales")

grid.arrange(p1, p2, ncol = 2, top = "Sales of Printing and Writing Paper (Jan 1963 - Dec 1972)")

According to the time plot, the writing data has an increasing trend over time with strong seasonality. The seasonality is varying over time. The sales of paper drops dramatically in every Aug.

fit <- stlf(writing, t.window = 13, s.window = "periodic", robust = TRUE, method = "rwdrift")

p1 <- autoplot(fit, main = "Forecast without Box-Cox Transformation") + xlab("Month") + ylab("Sales")


fit2 <- stlf(writing, t.window = 13, s.window = "periodic", robust = TRUE,lambda = BoxCox.lambda(fancy) )

p2 <- autoplot(fit2, main = "Forecast with Box-Cox Transformation") + xlab("Month") + ylab("Sales")

grid.arrange(p1, p2, ncol = 2)

Since the original time series have both trend and seasonality, the Drift method will fit better than the Naive method. The results of forecasts with and without Box-Cox transformation are shown above. The forecast with transformation is more reasonable than the one without transformation. This is because the seasonality varies over time in the original data. What’s more, the forecast with Box-Cox transformation has smaller prediction intervals. Conclusively, forecasting with drift method and Box-Cox transformation fits this time series better.