CUNY DATA624 Homework 2

Question 3.1
- For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
Question 3.2
- Why is a Box-Cox transformation unhelpful for the cangas data?
Question 3.3
- What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
Question 3.8 – For your retail time series (from Exercise 3 in Section 2.10):

library(fpp2)

Question 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

I’ll create a function to identify the appropriate Box-Cox Transformation for each dataset.

BC <- function(x){
  require(gridExtra)
  grid.arrange(autoplot(x)+ylab(''),
               autoplot(BoxCox(x,BoxCox.lambda(x))) + 
                 ggtitle('BoxCox Transformation') +
                 ylab(paste('Lambda = ',BoxCox.lambda(x))),
               ncol=2
               )
}

`usnetelec`

BC(usnetelec)

## Loading required package: gridExtra

`usgdp`

BC(usgdp)

`mcopper`

BC(mcopper)

`enplanements`

BC(enplanements)

Question 3.2

Why is a Box-Cox transformation unhelpful for the `cangas` data?

When you compare the two plots, it appears the transformation actually makes the series appear more complex. The original series appears slightly more linear whereas the Box-Cox transformation appears to have a more prominent curve. The spike around 1973 also is more prominent in the transformation making it more complex then the original.

BC(cangas)

Question 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

For exercise 3 in section 2.10, I selected the “Turnover ; New South Wales ; Food retailing” data.

The original plot looks fairly simple, but when we apply a Box-Cox power transformation, we notice the variation in the later years (about 2005 and later) is reduced.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349398A"],
  frequency=12, start=c(1982,4))
BC(myts)

Question 3.8 – For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

Calculate forecasts using `snaive` applied to `myts.train`.

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in `myts.test`

accuracy(fc,myts.test)

##                     ME      RMSE       MAE      MPE     MAPE     MASE      ACF1
## Training set  73.94114  88.31208  75.13514 6.068915 6.134838 1.000000 0.6312891
## Test set     115.00000 127.92727 115.00000 4.459712 4.459712 1.530576 0.2653013
##              Theil's U
## Training set        NA
## Test set     0.7267171

Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 671.41, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

The residuals look close to normal, but there appears to be a small right skew to the distribution. The mean is not at zero, which also indicates that the forecast is biased. Also, there appears to be some correlation in the ACF plot suggesting that the forecasting model could be improved upon.

How sensitive are the accuracy measures to the training/test split?

In general, accuracy measures are very sensitive to the training/test split. This is typically validated with time series cross validation. We can use the tsCV function to do this. From the below, you can tell as the test set size gets larger, the forecast error also correspondingly gets larger.

e <- tsCV(myts, forecastfunction=snaive, h=8)
mse <- colMeans(e^2, na.rm = T)
data.frame(h = 1:8, MSE = mse) %>%
  ggplot(aes(x = h, y = MSE)) + geom_point()

CUNY DATA624 Homework 2

CUNY DATA624 Homework 2

Question 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec

usgdp

mcopper

enplanements

Question 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

Question 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

Question 3.8 – For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts

Check that your data have been split appropriately by producing the following plot.

Calculate forecasts using snaive applied to myts.train.

Compare the accuracy of your forecasts against the actual values stored in myts.test

Check the residuals.

Do the residuals appear to be uncorrelated and normally distributed?

How sensitive are the accuracy measures to the training/test split?

`usnetelec`

`usgdp`

`mcopper`

`enplanements`

Why is a Box-Cox transformation unhelpful for the `cangas` data?

Calculate forecasts using `snaive` applied to `myts.train`.

Compare the accuracy of your forecasts against the actual values stored in `myts.test`