Homework 2

library(fpp2)

## Loading required package: ggplot2

## Loading required package: forecast

## Loading required package: fma

## Loading required package: expsmooth

library(gridExtra)

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec usgdp mcopper enplanements

We will use BoxCox.lambda function to pick up the best lambda value for appropriate box cox transformation.

usn1 <- autoplot(usnetelec)
lambda_usnetelec <- BoxCox.lambda(usnetelec)
usn_bc <-autoplot(BoxCox(usnetelec, lambda_usnetelec))
grid.arrange( grobs = list(usn1,usn_bc))

usg1<- autoplot(usgdp)
lambda_usgdp <- BoxCox.lambda(usgdp)
usg_bc<- autoplot(BoxCox(usgdp, lambda_usgdp))
grid.arrange( grobs = list(usg1,usg_bc))

enp1<- autoplot(enplanements)
lambda_enplanements <- BoxCox.lambda(enplanements)
enp_bc <- autoplot(BoxCox(enplanements, lambda_enplanements))
grid.arrange( grobs = list(enp1,enp_bc))

mc1 <- autoplot(mcopper)
lambda_mcopper <- BoxCox.lambda(mcopper)
mc_bc <-autoplot(BoxCox(mcopper, lambda_mcopper))
grid.arrange(mc1, mc_bc)

Why is a Box-Cox transformation unhelpful for the cangas data?

lambda_cangas <- BoxCox.lambda(cangas)
cangas1<- autoplot(cangas) 
cangas2<- autoplot(BoxCox(cangas, lambda_cangas))
grid.arrange( grobs = list(cangas1,cangas2))

After picking the best value of lambda and applying box cox on the time series data, we were not able to stabilize the variance. As we can see the comparison of both autoplots, the box-cox transformation is generating similar results to the original, in other words the transformation does not provide stationarity to time series.

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("C:/Users/Gurpreet/Documents/Data624/retail.xlsx", skip=1)

myts <- ts(retaildata[,"A3349627V"],frequency=12, start=c(1982,4))

autoplot(myts)

lambda_retaildat <- BoxCox.lambda(myts)
autoplot(BoxCox(myts, lambda_retaildat))

The transformation has smoothed out the data to a greater extent. The seasonal variations are stabilizied and time series is almost stationary in the transformed plot. The seasonals peaks are more stationary than the original ts data plot.

For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b.Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +autolayer(myts.train, series="Training") + autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                     ME     RMSE       MAE       MPE      MAPE    MASE
## Training set  6.870871 12.27525  8.893093  5.476112  7.780981 1.00000
## Test set     28.400000 29.39091 28.400000 11.015822 11.015822 3.19349
##                   ACF1 Theil's U
## Training set 0.6617306        NA
## Test set     0.5697915 0.7493485

Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 591.71, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

Residuals are correlated. That is there is information in the residuals that should have been captured by the forecating method. The normality does not seems to hold. The mean is not centered resulting in the positive skewness, that is the forecast is biased. We do have negative and positive outliers. We can consider the possibility of adjusting forecast.

How sensitive are the accuracy measures to the training/test split?

myts2.train <- window(myts, end=c(2011,12))
myts2.test <- window(myts, start=2012)
fc2 <- snaive(myts2.train)
accuracy(fc2,myts.test)

##                    ME     RMSE       MAE      MPE     MAPE     MASE
## Training set 7.471014 12.92370  9.422899 5.612836 7.837536 1.000000
## Test set     9.825000 14.96852 12.808333 4.008724 4.871206 1.359277
##                   ACF1 Theil's U
## Training set 0.6958216        NA
## Test set     0.3652316 0.3676318

The accuracy measures are very sensitive to the training/test split. In order to check, we changed the train/test split percentage and run the accuracy check again. The matrix clearly reveals the low values in accuracy measure indicators. Comparing this to original matrix clearly indicates that the measures are sensitive to the split.

Homework 2

Gurpreet Singh

February 13, 2019