Data 624 Week 2 Homework Exercises

Exercise 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec usgdp mcopper enplanements

Lets look at usnetelec dataset. This dataset is about the annual US net electricity generation. The frequency is every month

library(fpp2)

## Loading required package: ggplot2

## Loading required package: forecast

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: fma

## Loading required package: expsmooth

#help(usnetelec)
frequency(usnetelec)

## [1] 1

(lambda <- BoxCox.lambda(usnetelec))

## [1] 0.5167714

dframe <- cbind(BoxCox  = BoxCox(usnetelec,lambda),
                Original = usnetelec)
autoplot(dframe) + 
  xlab('Years') + 
  ylab('billion kwh') + 
  ggtitle('Annual US net electricity generation')

Below graph plots the timeseries in transformed scale which gives a better picture of the impact of transformation.

autoplot(BoxCox(usnetelec,lambda)) + 
  xlab('Years') + 
  ylab('BoxCox(billion kwh)') + 
  ggtitle('Annual US net electricity generation')

Observation

There is an upward trend. But there is no cyclic or seasonality observed. Box-Cox tranformation is applied to see if we can make the time series plot more simplier and consistent. Box Cox suggested the lambda value be 0.51. Transformed plot helped to make the lower portion of the trend line more linear compared to the original one.

Lets do the same excerise for usgdp dataset. This dataset is Quarterly US GDP from 1947 to 2006

#help(usgdp)
frequency(usgdp)

## [1] 4

(lambda <- BoxCox.lambda(usgdp))

## [1] 0.366352

dframe <- cbind(BoxCox  = BoxCox(usgdp,lambda),
                Original = usgdp)
autoplot(dframe) + 
  xlab('Years') + 
  ylab('GDP') + 
  ggtitle('Quarterly US GDP')

Below graph plots the timeseries in transformed scale which gives a better picture of the impact of transformation.

autoplot(BoxCox(usgdp,lambda)) + 
  xlab('Years') + 
  ylab('BoxCox(GDP)') + 
  ggtitle('Quarterly US GDP')

Observation

There is an upward trend. But there is no cyclic or seasonality observed. Box-Cox transformation is applied to see if we can make the time series plot more simpler and consistent. Box Cox suggested the lambda value be 0.36 Transformed plot helped to make the entire trend line more linear compared to the original one.

Lets see mcopper dataset. This dataset is Monthly copper prices in Pounds per ton.

#help(mcopper) 

(lambda <- BoxCox.lambda(mcopper))

## [1] 0.1919047

dframe <- cbind(BoxCox  = BoxCox(mcopper,lambda),
                Original = mcopper)
autoplot(dframe) + 
  xlab('Years') + 
  ylab('Pounds Per ton') + 
  ggtitle('Montly Copper Price US GDP')

Below graph plots the timeseries in transformed scale which gives a better picture of the impact of transformation.

autoplot(BoxCox(mcopper,lambda)) + 
  xlab('Years') + 
  ylab('BoxCox(Pounds Per ton)') + 
  ggtitle('Montly Copper Price US GDP')

Observation

There is upward trend, cyclic in nature. But there is no seasonality. The box-cox transformation is applied and the value of lambda chosen is 0.19. Since there is no seasonilty in the original trend graph, we dont see the much changes on the tranformed plot.

Lastly, we will see enplanements dataset.This is about Monthly US domestic enplanements from 1996-2000.

#help(enplanements)
(lambda <- BoxCox.lambda(enplanements))

## [1] -0.2269461

dframe <- cbind(BoxCox  = BoxCox(enplanements,lambda),
                Original = enplanements)
autoplot(dframe) + 
  xlab('Years') + 
  ylab('(millions') + 
  ggtitle('Monthly US domestic enplanements')

Below graph plots the timeseries in transformed scale which gives a better picture of the impact of transformation.

autoplot(BoxCox(enplanements,lambda)) + 
  xlab('Years') + 
  ylab('BoxCox((millions)') + 
  ggtitle('Monthly US domestic enplanements')

##### Observation

From the orginal trend plot, we clearly see there is trend, cyclic and seasonal. Box-COx transformation is applied to make the trend line more consistent in terms of seasonal variation. As observed, Box Cox is able to make the seasonal variation more consistent through out the entire period except the last spike. This looks to be a outlier and as you see it is coming back to the same varience after that outlier.

Exercise 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

#help(cangas)


(lambda <- BoxCox.lambda(cangas))

## [1] 0.5767759

dframe <- cbind(BoxCox  = BoxCox(cangas,lambda),
                Original = cangas)
autoplot(dframe) + 
  xlab('Years') + 
  ylab('Revenue') + 
  ggtitle('Monthly Canadian gas production')

autoplot(BoxCox(cangas,lambda)) + 
  xlab('Years') + 
  ylab('BoxCox(Revenue)') + 
  ggtitle('Monthly Canadian gas production')

As you can see, the Box cox transformation is not making the seasonal variation as same as the entire series of data. The variance of middle portion looks to be wider. The right most portion looks to be very thin. Note: The transformation or adjustment would make the pattern more consistent across the whole data set. Simpler patterns usually lead to more accurate forecasts.

Exercise 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel('C:\\Users\\charls.joseph\\Documents\\Cuny\\Data624\\week1\\retail.xlsx', skip=1)
myts <- ts(retaildata[,"A3349397X"],
  frequency=12, start=c(1982,4))

autoplot(myts) +
  xlab('Year') +
  ylab("Turn Over")

Lets apply Box-Cox transformation

(lambda <- BoxCox.lambda(myts))

## [1] 0.4605998

autoplot(BoxCox(myts,lambda)) + 
  xlab('Years') + 
  ylab('Turn Over')

This time series has an upward trend and seasonal in nature. Box-Cox transformation is applied to make the seasonal variance constant. The box-cox lambda value chosen is 0.46. The transformed data has consistent seasonal variance compared to the original one.

Excerise 3.8

For your retail time series (from Exercise 3 in Section 2.10):

a. Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

c. Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)
autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test") +
  autolayer(fc, series="Prediction")

### d. Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set  31.38979 51.12549 40.82222  5.159977 7.098791 1.0000000 0.6439376
## Test set     -12.48333 26.43446 21.06667 -1.260287 1.982606 0.5160588 0.4863980
##              Theil's U
## Training set        NA
## Test set     0.2186765

e. Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 588.46, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

Side notes :

A good forecast method would yield following properties

Residuals are useful in checking whether a model has adequately captured the information in the data. A good forecasting method will yield residuals with the following properties:

The residuals are uncorrelated. If there are correlations between residuals, then there is information left in the residuals which should be used in computing forecasts.
The residuals have zero mean. If the residuals have a mean other than zero, then the forecasts are biased.

Additionally, there are other 2 properties.

The residuals have constant variance.
The residuals are normally distributed.

The last two properties make the calculation of prediction intervals easier. However, a forecasting method that does not satisfy these 2 properties cannot necessarily be improved. Sometimes applying a Box-Cox transformation may assist with these properties.

Observation

From the residual ACF plot, the residuals are correlated which means that there are still some patterns left out to be captured in the model. The residuals are not centered at mean 0 and slightly right skewed. This indicate that the model is biased and can be further improved.

Additionally, the residual variance docent have constant variance which may have implication on the prediction interval.

f. How sensitive are the accuracy measures to the training/test split?

In order to check whether if the split is sensitive, i picked a different split and validated the accuracy. If we compare this results with previous accuracy results, there is huge variation which indicate that the accuracy measures are sensitive to training/test split chosen.

autoplot(myts)

myts.train2 <- window(myts, end=c(2012,12))
myts.test2 <- window(myts, start=2013)
fc2 <- snaive(myts.train2)
accuracy(fc2,myts.test2)

##                      ME     RMSE      MAE        MPE     MAPE     MASE
## Training set 28.7268908 49.74907 39.32577 4.75707814 6.738086 1.000000
## Test set      0.8333333 32.47879 28.20000 0.01699768 2.618190 0.717087
##                   ACF1 Theil's U
## Training set 0.6536728        NA
## Test set     0.5143458 0.3001948

Data 624 Week 2 Homework Exercises

Charls Joseph

9/12/2020

Exercise 3.1

Observation

Observation

Observation

Exercise 3.2

Exercise 3.3

Excerise 3.8

a. Split the data into two parts using

b. Check that your data have been split appropriately by producing the following plot.

c. Calculate forecasts using snaive applied to myts.train.

e. Check the residuals.

Do the residuals appear to be uncorrelated and normally distributed?

Side notes :

Observation

f. How sensitive are the accuracy measures to the training/test split?