Data-624 Homework-2

—————————————————————————

Student Name : Sachid Deshmukh

Date : 02/08/2020

RPubs location of published file

—————————————————————————

3.1 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec:
usgdp:
mcopper:
enplanements:

math.trans = function(ts, objname)
{
  lambda = BoxCox.lambda(ts)
  print(paste('Lambda value for time series', objname, '=', lambda))
  print(paste('Plotting Original vs Transformed time series for', objname))
  
  df = cbind(Original = ts, Transformed = BoxCox(ts, lambda))
  
  autoplot(df, facet = TRUE) +
    xlab('Time') + ylab('Value') +
    ggtitle(paste('Original vs Transformed plot for', objname))
}

math.trans(usnetelec, 'usnetelec')

## [1] "Lambda value for time series usnetelec = 0.516771443964645"
## [1] "Plotting Original vs Transformed time series for usnetelec"

math.trans(usgdp, 'usgdp')

## [1] "Lambda value for time series usgdp = 0.366352049520934"
## [1] "Plotting Original vs Transformed time series for usgdp"

math.trans(mcopper, 'mcopper')

## [1] "Lambda value for time series mcopper = 0.191904709003829"
## [1] "Plotting Original vs Transformed time series for mcopper"

math.trans(enplanements, 'enplanements')

## [1] "Lambda value for time series enplanements = -0.226946111237065"
## [1] "Plotting Original vs Transformed time series for enplanements"

3.2 Why is a Box-Cox transformation unhelpful for the cangas data?

math.trans(cangas, 'cangas')

## [1] "Lambda value for time series cangas = 0.576775938228139"
## [1] "Plotting Original vs Transformed time series for cangas"

BoxCox transformation is helpful for a time series where variation in the time series changes with time. BoxCox transformation is used to make those variations uniform over the period of time which helps to better forecasting of a time series. The cangas time series doesn’t have these characteristics so BoxCox transformation is not really useful in this case

3.3 What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)

myts <- ts(retaildata[,"A3349627V"],
  frequency=12, start=c(1982,4))
math.trans(myts, 'retaildata')

## [1] "Lambda value for time series retaildata = -0.0579656021687643"
## [1] "Plotting Original vs Transformed time series for retaildata"

The original time series of the retail data plotted above has peak variation over the period of time. We can see that original time series exhibits peak changes over the period of time. Transformed time series makes those peak changes uniform. This indicates that BoxCox transformation with the lambda value of -0.0579 is useful in removing peak variation in the original time series and making them uniform over the period of time after applying transformation

3.8 For your retail time series (from Exercise 3 in Section 2.10):

a. Split the data into two parts using:

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
   autolayer(myts.train, series="Train") +
   autolayer(myts.test, series="Test")

c. Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

d. Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                     ME     RMSE       MAE       MPE      MAPE    MASE
## Training set  6.870871 12.27525  8.893093  5.476112  7.780981 1.00000
## Test set     28.400000 29.39091 28.400000 11.015822 11.015822 3.19349
##                   ACF1 Theil's U
## Training set 0.6617306        NA
## Test set     0.5697915 0.7493485

e. Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 591.71, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

From the ACF plot above residuals are not un-correlated. From the histogram we can see the residuals are normally distributed with long tail on right. However the residuals are not centered around mean indicating bias in the forecast

f. How sensitive are the accuracy measures to the training/test split?

As we can see above time series shows strong seasonality and peaks are increasing over the period of time. If we use original time series (without transformation) for the forecasting then accuracy measures are very sensitive to training and test split. Accuracy will be highly dependent on what part of the time series is used for training and what part is used for testing specially while using seasonal naive method for forecasting. Let’s prove this using cross validation technique

e = tsCV(myts, snaive, h=1)
rmse = sqrt(mean(e^2, na.rm=TRUE))
print(rmse)

## [1] 12.60226

We can see that root mean square value with cross validation is 12.60 and root mean square value obtained using train test split is 29.39. Huge RMSE difference between cross validation and train test split approach indicates that accuracy measure of the above time series is vary sensitive to training/test split