library(fpp2)
library(ggplot2)

Homework-2 Exercises

3.1 Box-Cox Transformations

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

  • usnetelec
  • usgdp
  • mcopper
  • enplanements

a. Box-Cox for usnetelec

autoplot(usnetelec)

(lambda <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
autoplot(BoxCox(usnetelec, lambda))

  • Even though the suggested lambda is 0.5167714, the transformation made no apparent difference to reduce variation in the data. This may not be surprising as there is no evident trend of increasing or decreasing variation in the time series data. Therefore, in this particular case, no Box-Cox transformation is required.

b. Box-Cox for usgdp

autoplot(usgdp)

(lambda <- BoxCox.lambda(usgdp))
## [1] 0.366352
autoplot(BoxCox(usgdp, lambda))

  • In this example, the Box-Cox transformation, with lambda 0.366352, is helpful as it removed the curvature in the original data and therefore makes it possible for a straight-line linear regression model.

c. Box-Cox for mcopper

autoplot(mcopper)

(lambda <- BoxCox.lambda(mcopper))
## [1] 0.1919047
autoplot(BoxCox(mcopper, lambda))

  • For this data, the Box-Cox transformation, in my opinion, did not make any improvements.

d. Box-Cox for enplanements

autoplot(enplanements)

(lambda <- BoxCox.lambda(enplanements))
## [1] -0.2269461
autoplot(BoxCox(enplanements, lambda))

  • The Box-Cox transformation makes slight improvement in stabilizing the high and lows of the seasonal patterns in the data and therefore, in my opinion, it has some variance reducing effect on the data.

3.2 cangas data

Why is a Box-Cox transformation unhelpful for the cangas data?

autoplot(cangas)

(lambda <- BoxCox.lambda(cangas))
## [1] 0.5767759
autoplot(BoxCox(cangas, lambda))

  • For the cangas data, the Box-Cox transformation may be useful if the data is separated into 3 regions. For the overall data, the transformation is not useful because the middle portion of the data varies much wildly than the lower and upper regions of the data. In other words there is no uniform increase or decrease in the variation of the data.

3.3 Retail data

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[,"A3349398A"], frequency=12, start=c(1982,4))
autoplot(myts)

(lambda <- BoxCox.lambda(myts))
## [1] 0.1231563
autoplot(BoxCox(myts, lambda))

autoplot(BoxCox(myts, 0))

  • Even though the logarithmic transformation (with \(\lambda = 0\)) is an improvement, the transformation with a low-value lambda (\(\lambda = 0.1231563\)) is slightly better, since it also better straightens the trend line of the data.

3.8

For your retail time series (from Exercise 3 in Section 2.10):

a. Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

c. Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

d. Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)
fc2 <- meanf(myts.train)
fc3 <- rwf(myts.train, drift = TRUE)
autoplot(myts) +
  autolayer(fc2, series="Mean", PI=FALSE) +
  autolayer(fc3, series="Naïve", PI=FALSE) +
  autolayer(fc, series="Seasonal naïve", PI=FALSE) +
  guides(colour=guide_legend(title="Forecast"))

accuracy(fc,myts.test)
##                     ME      RMSE       MAE      MPE     MAPE     MASE
## Training set  73.94114  88.31208  75.13514 6.068915 6.134838 1.000000
## Test set     115.00000 127.92727 115.00000 4.459712 4.459712 1.530576
##                   ACF1 Theil's U
## Training set 0.6312891        NA
## Test set     0.2653013 0.7267171
accuracy(fc2,myts.test)
##                        ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 1.103832e-13  638.0607  543.1396 -32.20817 58.11126  7.228836
## Test set     1.213556e+03 1216.2132 1213.5559  48.35666 48.35666 16.151644
##                   ACF1 Theil's U
## Training set  0.971582        NA
## Test set     -0.223985   10.0224
accuracy(fc3,myts.test)
##                         ME     RMSE       MAE         MPE      MAPE
## Training set  4.527233e-14 111.7486  76.00016  -0.4853011  5.767758
## Test set     -5.263812e+02 532.1483 526.38116 -21.1167167 21.116717
##                  MASE       ACF1 Theil's U
## Training set 1.011513 -0.3205577        NA
## Test set     7.005792 -0.3054077  4.527626
  • Based on the graph and the accuracy tables, the Seasonal Naïve forecast is not very accurate, but it is still much better than the Mean and the Naïve forecast methods.

e. Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 671.41, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

  • Based on the graphs above, the residuals appear to be nearly normally distributed with a slight right skewness. There appears to be a bias in the distribution, because the mean of the residuals is not zero. Also, based on the ACF graph and the Ljub-Box test, the residuals appear to be significantly correlated. Q* value is large (and with a p-value being close to zero).

f. How sensitive are the accuracy measures to the training/test split?

  • The test data was split from the tail of the data and therefore, the accuracy measures are highly sensitive to the training/test split, given that there is a significant grows in trend and variability of the data. Based on the accuracy tables, the Mean based error measurements are more effected than the Percentage based measurements.