DATA 624: Home Work 02

Exercise 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

We can see from above that, eventhough we found most appropriate lamda by BoxCox.Lambda function, the Box-Cox transformation did not simplify the model. The seasonal variability remains largely same thoughout the years.

We can see following by inspecting cangas time series plot

  • The seasonal variability was smaller for years before 1975
  • The seasonal variability began to increase after 1975, and reached highest around 1985
  • The seasonal variability began to decrease after 1990

So to stabilise the variance, the transformation will need to “stretch” the seasonal variability before 1975; “shrink” the variability between 1975 and 1985; and then again “stretch” the variability after 1990.

If we plot the Box-Cox transformation curve with lambda = 0.577, we have:

It is almost linear. Therefore this results in relatively uniform transformation, with almost no distinguishment among values being transformed. This is not what we want for the cangas data. We want a larger transformation (steep transform curve) for smaller values (lower than 9); a smaller transformation (flat curve) for values between 7.5 and 10; and a larger transformation for values larger than 10. This is why the Box-Cox transoformation is unhelpful.

Exercise 3.3

What Box-Cox transformation would you select for your retail data?

The variation was increasing over time in the original data. It has become significantly more uniform once it is transformed with a lambda of 0.13. Because the variance was increasing over time this was an effective transformation.

Exercise 3.8

For your retail time series:

Split the data into two parts using

Check that your data have been split appropriately by producing the following plot.

Calculate forecasts using snaive applied to myts.train.

Compare the accuracy of your forecasts against the actual values stored in myts.test.

##                     ME     RMSE      MAE       MPE      MAPE     MASE
## Training set  7.772973 20.24576 15.95676  4.702754  8.109777 1.000000
## Test set     55.300000 71.44309 55.78333 14.900996 15.082019 3.495907
##                   ACF1 Theil's U
## Training set 0.7385090        NA
## Test set     0.5315239  1.297866

Check the residuals.

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 624.45, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

The residuals do appear to be normally distributed but with a slight positve skew. The residuals do not appear to be uncorrelated. The Ljung-Box test has a p value that is less than 0.05. This suggests there is more information that can be discovered and that the seasonal naive model is not the best model.

How sensitive are the accuracy measures to the training/test split?

The accuracy measures are quite sensitive to the training/test split. The values are significantly different between the two. This would suggest that the model does not generalize well.

Forhad Akbar

9/12/2020