questions

Question 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec
usgdp
mcopper
enplanements

I decided to compare two different methods of calculating the boxcox transformation. The first from the book and the second from the MASS library. Their methods are not identical and should allow for spot checking of the results.

usnetlec

The suggested transformations are quite different between the two methods. In fact, the MASS library has 1.0 in the range of acceptable transformations indicating that no transformation may in fact be needed. The resulting plots are all very similar, however, with the scale being the major change.

data(usnetelec)
lambda <- BoxCox.lambda(usnetelec)
lambda

## [1] 0.5167714

df <- data.frame(Y=as.matrix(usnetelec), date=time(usnetelec))
MASS::boxcox(df$Y ~ df$date, seq(0, 2))

plots

original

usnetelec %>%
  autoplot()

forecast

usnetelec %>%
  BoxCox(lambda) %>%
  autoplot()

MASS

df %>%
  mutate(Y = (Y**.9-1)/0.9) %>%
  ggplot(aes(date, Y)) +
  geom_line()

usgdb

The suggested transformations are very similar for both functions and therefore the resulting plots are very similar in scale.

data(usgdp)
lambda <- BoxCox.lambda(usgdp)
lambda

## [1] 0.366352

df = data.frame(Y=as.matrix(usgdp), date=time(usgdp))
MASS::boxcox(df$Y ~ df$date, seq(0, .4, .1))

plots

original

usgdp %>%
  autoplot()

forecast

usgdp %>%
  BoxCox(lambda) %>%
  autoplot()

MASS

df %>%
  mutate(Y = (Y**.2-1)/0.2) %>%
  ggplot(aes(date, Y)) +
  geom_line()

mcopper

Once again, the two functions are in close agreement with each other.

data(mcopper)
lambda <- BoxCox.lambda(mcopper)
lambda

## [1] 0.1919047

df = data.frame(Y=as.matrix(mcopper), date=time(mcopper))
MASS::boxcox(df$Y ~ df$date, seq(-.3, .3, .1))

plots

original

mcopper %>%
  autoplot()

forecast

mcopper %>%
  BoxCox(lambda) %>%
  autoplot()

MASS

df %>%
  mutate(Y = (Y**.12-1)/0.12) %>%
  ggplot(aes(date, Y)) +
  geom_line()

enplanements

In this case the two functions provide widely different transformations. Interestingly though, both transformations appear to be an improvement over the original data which shows large variance. Both transformations are somewhat successful in standardizing the variations.

data(enplanements)
lambda <- BoxCox.lambda(enplanements)
lambda

## [1] -0.2269461

df = data.frame(Y=as.matrix(enplanements), date=time(enplanements))
MASS::boxcox(df$Y ~ df$date, seq(0, 1, .1))

plots

original

enplanements %>%
  autoplot()

forecast

enplanements %>%
  BoxCox(lambda) %>%
  autoplot()

MASS

df %>%
  mutate(Y = (Y**.5-1)/0.5) %>%
  ggplot(aes(date, Y)) +
  geom_line()

Question 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

Following BoxCox’s suggested transformation does not result in a plot that has roughly uniform variance. In fact, the original’s variance does not appear to be mitigated much if at all. As such, the transformation has not been helpful in addressing the changing variance over time. This is possibly a result of the fact that the variance’s variation is not linearly related to time.

lambda <- BoxCox.lambda(cangas)
lambda

## [1] 0.5767759

data(cangas)
autoplot(cangas)

cangas %>%
  BoxCox(lambda) %>%
  autoplot()

Question 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

I reused code from the last assignment to reimport my selected data.

retaildata <- readxl::read_excel('./retail.xlsx', skip=1)
myts <- retaildata %>%
  select(A3349335T) %>%
  ts(frequency=12, start=c(1982, 4))
myts %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

A3349335T
303.1
297.8
298.0
307.9
299.2
305.4

The plot appears to very much be in need of a transformation. The variance appears to be increasing over time.

autoplot(myts)

lambda <- BoxCox.lambda(myts)
lambda

## [1] 0.193853

myts %>%
  BoxCox(lambda) %>%
  autoplot()

The BoxCox transformation has had an immense effect on the resulting plot. The variance has been smoothed out and made more consistent and the resulting plot appears to be more linear. A transformation of approximately 0.2 is highly recommended for this dataset.

Question 3.8

For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                    ME      RMSE       MAE      MPE     MAPE     MASE
## Training set 61.56787  72.20702  61.68438 6.388722 6.404105 1.000000
## Test set     97.44583 109.62545 100.02917 4.629852 4.751209 1.621629
##                   ACF1 Theil's U
## Training set 0.6018274        NA
## Test set     0.2686595 0.9036205

Check the residuals.

Do the residuals appear to be uncorrelated and normally distributed?

No. There is ample evidence that the residuals are both correlated and not normally distributed. The ACF plot shows strong correlation in lag observations and the p-value of the Ljung-Box test indicates that the residuals are distinguishable from a white noise series. Finally, the hisogram of residuals is clearly not normally distributed or centered near 0.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 812.76, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

How sensitive are the accuracy measures to the training/test split?

The test error is significantly higher for Mean Absolute Error and Root Mean Squared Error. This indicates that the model is not an appropriate fit for the data. This is supported by the above residuals plots that show we have violated the required preconditions for good forecasting.

Since the previous questions demonstrated a successful boxcox transformation of the data, I will attempt to create a better forecast utilizing the transformed data.

my_transformed_ts <- BoxCox(myts, lambda)
myts.train <- window(my_transformed_ts, end=c(2010,12))
myts.test <- window(my_transformed_ts, start=2011)

autoplot(my_transformed_ts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

fc <- snaive(myts.train)
accuracy(fc,myts.test)

##                     ME      RMSE       MAE      MPE     MAPE     MASE
## Training set 0.2447664 0.2704803 0.2452941 1.762554 1.766778 1.000000
## Test set     0.2091097 0.2369097 0.2144393 1.189854 1.220087 0.874213
##                   ACF1 Theil's U
## Training set 0.3340681        NA
## Test set     0.2648355 0.9435551

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 289.92, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Apply the BoxCox transformation from the previous question to this problem has produced promising results. The training and test error are more closely aligned across the board and indicate a strong fit. The residual data is promising as well. There is still, however, correlation in the lag plot (and supported by the Ljung-Box test). This indicates that additional work may need to be done on the dataset to achieve complete results.

Homework 2

Brian Weinfeld

February 12th, 2019

questions

Question 3.1

usnetlec

plots

original

forecast

MASS

usgdb

plots

original

forecast

MASS

mcopper

plots

original

forecast

MASS

enplanements

plots

original

forecast

MASS

Question 3.2

Question 3.3

Question 3.8