questions

Question 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

  • usnetelec
  • usgdp
  • mcopper
  • enplanements

I decided to compare two different methods of calculating the boxcox transformation. The first from the book and the second from the MASS library. Their methods are not identical and should allow for spot checking of the results.

usnetlec

The suggested transformations are quite different between the two methods. In fact, the MASS library has 1.0 in the range of acceptable transformations indicating that no transformation may in fact be needed. The resulting plots are all very similar, however, with the scale being the major change.

data(usnetelec)
lambda <- BoxCox.lambda(usnetelec)
lambda
## [1] 0.5167714
df <- data.frame(Y=as.matrix(usnetelec), date=time(usnetelec))
MASS::boxcox(df$Y ~ df$date, seq(0, 2))

plots

original
usnetelec %>%
  autoplot()

forecast
usnetelec %>%
  BoxCox(lambda) %>%
  autoplot()

MASS
df %>%
  mutate(Y = (Y**.9-1)/0.9) %>%
  ggplot(aes(date, Y)) +
  geom_line()

usgdb

The suggested transformations are very similar for both functions and therefore the resulting plots are very similar in scale.

data(usgdp)
lambda <- BoxCox.lambda(usgdp)
lambda
## [1] 0.366352
df = data.frame(Y=as.matrix(usgdp), date=time(usgdp))
MASS::boxcox(df$Y ~ df$date, seq(0, .4, .1))

plots

original
usgdp %>%
  autoplot()

forecast
usgdp %>%
  BoxCox(lambda) %>%
  autoplot()

MASS
df %>%
  mutate(Y = (Y**.2-1)/0.2) %>%
  ggplot(aes(date, Y)) +
  geom_line()

mcopper

Once again, the two functions are in close agreement with each other.

data(mcopper)
lambda <- BoxCox.lambda(mcopper)
lambda
## [1] 0.1919047
df = data.frame(Y=as.matrix(mcopper), date=time(mcopper))
MASS::boxcox(df$Y ~ df$date, seq(-.3, .3, .1))

plots

original
mcopper %>%
  autoplot()

forecast
mcopper %>%
  BoxCox(lambda) %>%
  autoplot()

MASS
df %>%
  mutate(Y = (Y**.12-1)/0.12) %>%
  ggplot(aes(date, Y)) +
  geom_line()

enplanements

In this case the two functions provide widely different transformations. Interestingly though, both transformations appear to be an improvement over the original data which shows large variance. Both transformations are somewhat successful in standardizing the variations.

data(enplanements)
lambda <- BoxCox.lambda(enplanements)
lambda
## [1] -0.2269461
df = data.frame(Y=as.matrix(enplanements), date=time(enplanements))
MASS::boxcox(df$Y ~ df$date, seq(0, 1, .1))

plots

original
enplanements %>%
  autoplot()

forecast
enplanements %>%
  BoxCox(lambda) %>%
  autoplot()

MASS
df %>%
  mutate(Y = (Y**.5-1)/0.5) %>%
  ggplot(aes(date, Y)) +
  geom_line()

Question 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

Following BoxCox’s suggested transformation does not result in a plot that has roughly uniform variance. In fact, the original’s variance does not appear to be mitigated much if at all. As such, the transformation has not been helpful in addressing the changing variance over time. This is possibly a result of the fact that the variance’s variation is not linearly related to time.

lambda <- BoxCox.lambda(cangas)
lambda
## [1] 0.5767759
data(cangas)
autoplot(cangas)

cangas %>%
  BoxCox(lambda) %>%
  autoplot()

Question 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

I reused code from the last assignment to reimport my selected data.

retaildata <- readxl::read_excel('./retail.xlsx', skip=1)
myts <- retaildata %>%
  select(A3349335T) %>%
  ts(frequency=12, start=c(1982, 4))
myts %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
A3349335T
303.1
297.8
298.0
307.9
299.2
305.4

The plot appears to very much be in need of a transformation. The variance appears to be increasing over time.

autoplot(myts)

lambda <- BoxCox.lambda(myts)
lambda
## [1] 0.193853
myts %>%
  BoxCox(lambda) %>%
  autoplot()

The BoxCox transformation has had an immense effect on the resulting plot. The variance has been smoothed out and made more consistent and the resulting plot appears to be more linear. A transformation of approximately 0.2 is highly recommended for this dataset.

Question 3.8

For your retail time series (from Exercise 3 in Section 2.10):

  1. Split the data into two parts using
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
  1. Check that your data have been split appropriately by producing the following plot.
autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

  1. Calculate forecasts using snaive applied to myts.train.
fc <- snaive(myts.train)
  1. Compare the accuracy of your forecasts against the actual values stored in myts.test.
accuracy(fc,myts.test)
##                    ME      RMSE       MAE      MPE     MAPE     MASE
## Training set 61.56787  72.20702  61.68438 6.388722 6.404105 1.000000
## Test set     97.44583 109.62545 100.02917 4.629852 4.751209 1.621629
##                   ACF1 Theil's U
## Training set 0.6018274        NA
## Test set     0.2686595 0.9036205
  1. Check the residuals.

Do the residuals appear to be uncorrelated and normally distributed?

No. There is ample evidence that the residuals are both correlated and not normally distributed. The ACF plot shows strong correlation in lag observations and the p-value of the Ljung-Box test indicates that the residuals are distinguishable from a white noise series. Finally, the hisogram of residuals is clearly not normally distributed or centered near 0.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 812.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24
  1. How sensitive are the accuracy measures to the training/test split?

The test error is significantly higher for Mean Absolute Error and Root Mean Squared Error. This indicates that the model is not an appropriate fit for the data. This is supported by the above residuals plots that show we have violated the required preconditions for good forecasting.

Since the previous questions demonstrated a successful boxcox transformation of the data, I will attempt to create a better forecast utilizing the transformed data.

my_transformed_ts <- BoxCox(myts, lambda)
myts.train <- window(my_transformed_ts, end=c(2010,12))
myts.test <- window(my_transformed_ts, start=2011)

autoplot(my_transformed_ts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

fc <- snaive(myts.train)
accuracy(fc,myts.test)
##                     ME      RMSE       MAE      MPE     MAPE     MASE
## Training set 0.2447664 0.2704803 0.2452941 1.762554 1.766778 1.000000
## Test set     0.2091097 0.2369097 0.2144393 1.189854 1.220087 0.874213
##                   ACF1 Theil's U
## Training set 0.3340681        NA
## Test set     0.2648355 0.9435551
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 289.92, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Apply the BoxCox transformation from the previous question to this problem has produced promising results. The training and test error are more closely aligned across the board and indicate a strong fit. The residual data is promising as well. There is still, however, correlation in the lag plot (and supported by the Ljung-Box test). This indicates that additional work may need to be done on the dataset to achieve complete results.