DATA 624 Homework 2

Chapter 3 The forecaster’s toolbox

3.1 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

Annual US net electricity generation

This data set is for the annual US net electricity generation (billion kwh) for 1949-2003. We don't observe much seasonality or variation over the the years. We tried using a Box-Cox transformation using a lambda coefficient of 0.517. The resulting transformation does not seem to have improved the variance.

Quarterly US GDP

This data set shows the increase in the quarterly United States gross domestic product (GDP) between the years 1947 to 2006. The initial data shows a strong upwards trend - we could also try to plot the data using an inflation adjusted price index. However, we don't observe seasonal effects. We used a Box-Cox transformation using a lambda coefficient of 0.366 that straightened the bend in the center of the initial curve.

Monthly copper prices

This data set shows the monthly copper prices between the time ranges of 1960, 1 and 2006, 12. We tried two transformations:

Box-Cox transformation using the lambda coefficient of 0.192. We observe how the variation in the data has been decreased compared to the initial data. However, most of the big structures (cyclic pattern?) still present.
We also tried doing a calendar adjustment transformation. We see that there is visible change in the variability of the data. This may indicate that the different month lenghts are having no effect in the variance stability.

Monthly US domestic enplanements

This data set contains montly "Domestic Revenue Enplanements" in millions between the years of 1996 to 2000. Let's try a Box-Cox tranformation and calendar adjustment.

We can observe a strong cyclic behavior that was not corrected with a Box-Cox transformation using the lambda coefficient of -0.227. The calendar adjustment did not reduce the variance either.

We can see how instead of a Box-Cox transformation, we can try to account for the trend and cyclic behavior. We notice that the strenght of the seasonality increases over time. This indicate that should do a Multiplicative Decomposition (instead of Additive Decomposition). Above, we can see how the variability has been reduced by first finding the trend in the data and then plotting the ratio between the initial data over the trend.

This signficantly improves the variance over the range of the data. A Box-Cox transformation using the detrended-data does not show any improvment.

3.2 Why is a Box-Cox transformation unhelpful for the cangas data?

Monthly Canadian gas production This data set shows "Monthly Canadian gas production" in billions of cubic metres between the dates of January 1960 to February 2005.

A Box-Cox transformation is not helpful because there is no transformation that can account for the change in the strenght of seasonality that keeps increasing until 1990 and then starts decreasing. Let's see what happens if we try to account for the trend instead:

We can extract the trend from the data set. However, it was not useful in trying to account for the changing seasonality strengh as either Additive or Multiplicative Decomposition.

3.3 What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

Calendar adjustment did not much for our data in terms of reducing variability in the strenght of seasonality. However, using a Box-Cox transforamtion with a lambda parameter value of -0.058 improved the seasonal variability.

3.8 For your retail time series (from Exercise 3 in Section 2.10):

a. Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b. Check that your data have been split appropriately by producing the following plot.

c. Calculate forecasts using snaive applied to myts.train.

## 
## Forecast method: Seasonal naive method
## 
## Model Information:
## Call: snaive(y = myts.train) 
## 
## Residual sd: 10.1875 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE MASE      ACF1
## Training set 6.870871 12.27525 8.893093 5.476112 7.780981    1 0.6617306
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2011          233.7 217.9686 249.4314 209.6410 257.7590
## Feb 2011          199.0 183.2686 214.7314 174.9410 223.0590
## Mar 2011          222.6 206.8686 238.3314 198.5410 246.6590
## Apr 2011          222.2 206.4686 237.9314 198.1410 246.2590
## May 2011          213.4 197.6686 229.1314 189.3410 237.4590
## Jun 2011          206.3 190.5686 222.0314 182.2410 230.3590
## Jul 2011          206.7 190.9686 222.4314 182.6410 230.7590
## Aug 2011          212.9 197.1686 228.6314 188.8410 236.9590
## Sep 2011          224.3 208.5686 240.0314 200.2410 248.3590
## Oct 2011          238.7 222.9686 254.4314 214.6410 262.7590
## Nov 2011          251.2 235.4686 266.9314 227.1410 275.2590
## Dec 2011          380.0 364.2686 395.7314 355.9410 404.0590
## Jan 2012          233.7 211.4525 255.9475 199.6754 267.7246
## Feb 2012          199.0 176.7525 221.2475 164.9754 233.0246
## Mar 2012          222.6 200.3525 244.8475 188.5754 256.6246
## Apr 2012          222.2 199.9525 244.4475 188.1754 256.2246
## May 2012          213.4 191.1525 235.6475 179.3754 247.4246
## Jun 2012          206.3 184.0525 228.5475 172.2754 240.3246
## Jul 2012          206.7 184.4525 228.9475 172.6754 240.7246
## Aug 2012          212.9 190.6525 235.1475 178.8754 246.9246
## Sep 2012          224.3 202.0525 246.5475 190.2754 258.3246
## Oct 2012          238.7 216.4525 260.9475 204.6754 272.7246
## Nov 2012          251.2 228.9525 273.4475 217.1754 285.2246
## Dec 2012          380.0 357.7525 402.2475 345.9754 414.0246

d. Compare the accuracy of your forecasts against the actual values stored in myts.test.

##                     ME     RMSE       MAE       MPE      MAPE    MASE
## Training set  6.870871 12.27525  8.893093  5.476112  7.780981 1.00000
## Test set     28.400000 29.39091 28.400000 11.015822 11.015822 3.19349
##                   ACF1 Theil's U
## Training set 0.6617306        NA
## Test set     0.5697915 0.7493485

e. Check the residuals.

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 591.71, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The residuals seems to have short term auto-correlation as we can see in the ACF plot. This autocorreation is strong at short lag periods but decreases to below significant treshold level after 10 lag periods. The residuals do seem to be normally distributed with a hint of positive skewness.

f. How sensitive are the accuracy measures to the training/test split?

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 591.71, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 352.69, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The residuals are indeed sensitive to the splits that we use for the training and test data. As we can see above, these are the residual accuracies for:

myts.train_1 <- window(myts, start =1990, end=c(2000,12))
myts.test_1 <- window(myts, start=2011)

and for:

myts.train_1 <- window(myts, end=c(2000,12))
myts.test_1 <- window(myts, start=2001)

The range of the residuals, degree of auto-correlationd and normality are dependent of the time interval we select. Strong cyclic behavior and outliers will have an influence in the trained model as we can see above.

Extracting Seasonality and Trend from Data: Decomposition Using R *https://anomaly.io/seasonal-trend-decomposition-in-r/index.html*