Question 1

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

retaildata <- read_excel("retail.xlsx", skip=1)

#Select one of the time series
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))

#Decompose using X11
fit <- seas(myts, x11 = "")
autoplot(fit) + ggtitle("X11 decomposition of Australian retail data")

According to the time plot, the retail data has a clear upward trend and strong seasonality. The seasonality is increasing over time.There is no outliers observed. However, after the X11 decomposition, the seasonal component is decreasing as the trend increases.This observation is different from that in the time plot. Therefore, this is an unusual feature.

Question 2

This exercise uses the cangas data (monthly Canadian gas production in billions of cubic meters, January 1960 - February 2005).

autoplot(cangas) + xlab("Month") + ylab("Gas Production (billions of cubic metres)") + ggtitle("Canadian Gas Production (Jan 1960 - Feb 2005)")

ggsubseriesplot(cangas, main = NULL) + xlab("Month") + ylab("Gas Production (billions of cubic metres)")

ggseasonplot(cangas, main = NULL) + xlab("Month") + ylab("Gas Production (billions of cubic metres)")

According to the time plot, the cangas data has a clear increasing trend and strong seasonality. This observation is verified in the subseries plot and the seasonal plot as well. In general, the gas production decreases in summer and increases in winter.The seasonality increases dramatically from 1975 to 1990.This results from the larger differences in gas production between summer and winter in those years, shown as the blue and green lines in the seasonal plot.

fit_stl <- stl(cangas, s.window = 13, robust = TRUE)
autoplot(fit_stl) + xlab("Month") + ggtitle("STL Decomposition of Canadian Gas Production")

autoplot(cangas, series="Data") +
  autolayer(trendcycle(fit_stl), series="Trend") +
  autolayer(seasadj(fit_stl), series="Seasonally Adjusted") +
  xlab("Year") + ylab("New orders index") +
  ggtitle("STL Decomposition of Canadian Gas Production") +
  scale_colour_manual(values=c("gray","blue","red"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

The results of the STL decomposition are shown above. The trend component adequately represent the trend in the original data. The seasonal component increases from 1975 to 1985, and then decreases. This observation is consistent with the time plot. Besides, the remainder component is around zero.

fit_seats <- seas(cangas)

p1 <- autoplot(fit_seats) + xlab("Month") + ggtitle("SEATS Decomposition of Canadian Gas Production")

p2 <- autoplot(cangas, series="Data") +
  autolayer(trendcycle(fit_seats), series="Trend") +
  autolayer(seasadj(fit_seats), series="Seasonally Adjusted") +
  xlab("Year") + ylab("New orders index") +
  ggtitle("SEATS Decomposition of Canadian Gas Production") +
  scale_colour_manual(values=c("gray","blue","red"),
             breaks=c("Data","Seasonally Adjusted","Trend"))


fit_x11 <-seas(cangas, x11 = "")

p3 <- autoplot(fit_x11) + xlab("Month") + ggtitle("X11 Decomposition of Canadian Gas Production")

p4 <- autoplot(cangas, series="Data") +
  autolayer(trendcycle(fit_x11), series="Trend") +
  autolayer(seasadj(fit_x11), series="Seasonally Adjusted") +
  xlab("Year") + ylab("New orders index") +
  ggtitle("X11 Decomposition of Canadian Gas Production") +
  scale_colour_manual(values=c("gray","blue","red"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

grid.arrange(p1, p2, p3, p4, ncol = 2)

The results of SEATS and X11 decomposition are shown above. The decomposed trend and seasonal components are similar. The changes of seasonality are different from the original data. This is because the seas() function assumes the multiplicative decomposition; while the stl() function assumes the additive decomposition. The differences of seasonally adjusted time series are very minimal between these two method.

In addition, the remainder component of the SEATS decomposition is larger than that of the X11 decomposition, where both remainders are around one. The remainder component of the STL decomposition is smaller. Thus, we can conclude that the STL decomposition fits the cangas data better.

Question 3

This exercise uses the oil data (Annual Oil Production of Saudi Arabia 1965-2013).

autoplot(oil) + xlab("Year") + ylab("Oil Production (millions of tonnes)") + ggtitle("Oil Production in Saudi Arabia (1965-2013)")

According to the time plot, this time series does not have a seasonal pattern. Therefore, the decomposition results will contain a trend-cycle component and a remainder component. In this case, the additive decomposition is appropriate. Advanced decomposition methods, such as X11, SEATS and STL, cannot be used.

autoplot(oil, series = "Data") +
  autolayer(ma(oil, 5), series = "5-MA") +
  xlab("Year") + ylab("Oil Production (millions of tonnes)") +
  ggtitle("Oil Production in Saudi Arabia (1965-2013)") +
  scale_colour_manual(values=c("black","blue"),
             breaks=c("Data","5-MA"))
## Warning: Removed 4 rows containing missing values (geom_path).

The result of moving average smoothing is shown above. The 5-MA has estimated the trend well.

Question 4

The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

autoplot(plastics) + xlab("Month") + ylab("Sales") + ggtitle("Sales of Plastic Product")

p1 <- ggAcf(window(plastics), main = NULL)
p2 <- ggsubseriesplot(plastics, main = NULL) + ylab("sales")

grid.arrange(p1, p2, ncol = 2, top = "Sales of Plastic Product")

According to the time plot, the time series has an increasing trend with strong seasonality over time.The scallop shape in the correlogram also confirms the existence of strong seasonality with the frequency of 12 months. In the subseries plot, the sales increases since February, reaches the peak in August and September, and then decreases. In addition, sales from January to August has a clear increasing trend over time; however, sales from September to December start decreasing since the fifth month.

plastics %>%
  decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") + ylab("Sales") + ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product")

decompose_plastics <- decompose(plastics, type="multiplicative")
adjust_plastics <- plastics/decompose_plastics$seasonal

autoplot(plastics, series = "Data") + ylab("Sales") + xlab("Month") +
  autolayer(adjust_plastics, series = "Seasonally Adjusted") +
  ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product") +
  scale_color_manual(values = c("Data" = "black", "Seasonally Adjusted" = "blue"))

The seasonally adjusted time series has a good representation of the trend in the original data.

plastics2 <- plastics

#Make the 30th observation to be an outlier (in the middle of time series)
plastics2[30] <- plastics2[30]+500

autoplot(plastics2) + xlab("Month") + ylab("sales") +
  ggtitle("Sales of Plastic Products with Outlier")

decompose_plastics2 <- decompose(plastics2, type="multiplicative")
adjust_plastics2 <- plastics2/decompose_plastics2$seasonal

plastics2 %>%
  decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") + ylab("Sales") + ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product with Outlier")

autoplot(plastics2, series = "Data") + 
  autolayer(adjust_plastics2, series = "Seasonally Adjusted") + 
  xlab("Month") + ylab("Sales") +
  ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product with Outlier") +
  scale_color_manual(values = c("Data" = "black", "Seasonally Adjusted" = "blue"))

According to the decomposition results, the outlier has a significant impact on the seasonal component, but does not influence the trend component.The seasonally adjusted time series has a dramatic change, resulting from the outlier. This is attributed to the large remainder of the outlier.

plastics3 <- plastics

plastics3[60] <- plastics3[60]+500

autoplot(plastics3)

decompose_plastics3 <- decompose(plastics3, type="multiplicative")
adjust_plastics3 <- plastics3/decompose_plastics3$seasonal

plastics3 %>%
  decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") + ylab("Sales") + ggtitle("Classical Multiplicative Decomposition of Sales of Plastic Product with Outlier at the end")

autoplot(plastics3, series = "Data") + 
  autolayer(adjust_plastics3, series = "Seasonally Adjusted") + 
  scale_color_manual(values = c("Data" = "black", "Seasonally Adjusted" = "blue"))

According to the decomposition results, the outlier at the end does not influence the seasonal component as much as the outlier in the middle. However, the trend component is smoother. In addition, the end point of the seasonally adjusted time series is even larger than the original data.

In conclusion, the outliers in the middle of time series will have a significant impact on the seasonal component. The outliers at the end will result in a smoother trend.

Question 5

This exercise uses the stfl() function. (A short-cut approach to decompose a time series using STL, forecast the seasonally adjusted series, and return the reseasonalised forecasts. The stlf() function uses mstl() to carry out the decomposition, so there are default values for s.window and t.window. Note: Forecasts using seasonally adjusted data need to be “reseasonalised” by adding in the forecasts of the seasonal component. The stfl() function will return “reseasonalised” data.)

p1 <- autoplot(fancy, main = NULL) + xlab("Month") + ylab("Sales") 
p2 <- ggsubseriesplot(fancy, main = NULL) + xlab("Month") + ylab("Sales")

grid.arrange(p1, p2, ncol = 2, top = "Sales for a Souvenir Shop")

According to the time plot, there is a clear upward trend and strong seasonality. Increasing seasonality is observed as trend increases. As shown in the subseries plot, the sales increase significantly from November to December, because of the holiday season. Besides, sales in March is higher, because of the surfing event.

fit <- stlf(fancy, t.window = 13, s.window = "periodic", robust = TRUE, method = "rwdrift")

p1 <- autoplot(fit, main = "Forecast without Box-Cox Transformation") + xlab("Month") + ylab("Sales")


fit2 <- stlf(fancy, t.window = 13, s.window = "periodic", robust = TRUE,lambda = BoxCox.lambda(fancy) )

p2 <- autoplot(fit2, main = "Forecast with Box-Cox Transformation") + xlab("Month") + ylab("Sales")

grid.arrange(p1, p2, ncol = 2)

Since the original time series have both trend and seasonality, the Drift method will fit better than the Naive method. The results of forecasts with and without Box-Cox transformation are shown above. The forecast with transformation is more reasonable than the one without transformation. This is because the original time series has increasing seasonality over time. What’s more, the forecast with Box-Cox transformation has smaller prediction intervals. Conclusively, forecasting with drift method and Box-Cox transformation fits this time series better.

p1 <- autoplot(writing, main = NULL) + xlab("Month") + ylab("Sales") 
p2 <- ggsubseriesplot(writing, main = NULL) + xlab("Month") + ylab("Sales")

grid.arrange(p1, p2, ncol = 2, top = "Sales of Printing and Writing Paper (Jan 1963 - Dec 1972)")

According to the time plot, the writing data has an increasing trend over time with strong seasonality. The seasonality is varying over time. The sales of paper drops dramatically in every Aug.

fit <- stlf(writing, t.window = 13, s.window = "periodic", robust = TRUE, method = "rwdrift")

p1 <- autoplot(fit, main = "Forecast without Box-Cox Transformation") + xlab("Month") + ylab("Sales")


fit2 <- stlf(writing, t.window = 13, s.window = "periodic", robust = TRUE,lambda = BoxCox.lambda(fancy) )

p2 <- autoplot(fit2, main = "Forecast with Box-Cox Transformation") + xlab("Month") + ylab("Sales")

grid.arrange(p1, p2, ncol = 2)

Since the original time series have both trend and seasonality, the Drift method will fit better than the Naive method. The results of forecasts with and without Box-Cox transformation are shown above. The forecast with transformation is more reasonable than the one without transformation. This is because the seasonality varies over time in the original data. What’s more, the forecast with Box-Cox transformation has smaller prediction intervals. Conclusively, forecasting with drift method and Box-Cox transformation fits this time series better.