Predictive Analytics Homework #2

For the following series, find an appropriate BOX-Cox transformation in order to stabilise the variance.

usnetelec
usgdp
mcopper
enplanements

library(fpp2)
lambda_netelec = BoxCox.lambda(usnetelec)
lambda_netelec   # See the optimal transformation chosen (??)

## [1] 0.5167714

usnetelec_tran = BoxCox(usnetelec,lambda_netelec)
temp_netelec = cbind(usnetelec,usnetelec_tran)
autoplot(temp_netelec,facet=T) + ggtitle("Annual US Net Electricity Generation") + xlab("Year")

The netelect time series data represents the annual US net electricity generation(billion kwh) for the time period 1949 through 2003.I was able to find that the lambda for this transformation is 0.5167714 as well as generate a plot that will allow us to compare plots for both the original and transformed timeseries.

lambda_gdp = BoxCox.lambda(usgdp)
lambda_gdp   # See the optimal transformation chosen (??)

## [1] 0.366352

usgdp_tran = BoxCox(usgdp,lambda_gdp)
temp_gdp = cbind(usgdp, usgdp_tran)
autoplot(temp_gdp,facet=T) + ggtitle("US Quarterly GDP") + xlab("Year")

The usgdp time series data shows the quarterly United States gross domestic product for the time period of 1947 - 2006. We found the lambda value to be 0.366352.

lambda_copper = BoxCox.lambda(mcopper)
lambda_copper   # See the optimal transformation chosen (??)

## [1] 0.1919047

mcopper_tran = BoxCox(mcopper,lambda_copper)
temp_copper = cbind(mcopper, mcopper_tran)
autoplot(temp_copper, facet=T) + ggtitle("Monthly Copper Prices") + xlab("Year")

The mcopper time series data shows monthly copper prices for an unspecified time period (using ?mcopper). The value of lambda is 0.1919047.

lambda_enplanements = BoxCox.lambda(enplanements)
lambda_enplanements   # See the optimal transformation chosen (??)

## [1] -0.2269461

enplanements_tran = BoxCox(enplanements,lambda_enplanements)
temp_enplanements = cbind(enplanements, enplanements_tran)
autoplot(temp_enplanements, facet=T) + ggtitle("Monthly US Domestic Enplanements") + xlab("Year")

This time series data shows the monthly US domestic enplanements for the time period of 1996 through 2000. The lambda value is -0.2269461.

Why is a Box-Cox transformation unhelpful for the cangas data?

lambda_cangas = BoxCox.lambda(cangas)
lambda_cangas   # See the optimal transformation chosen (??)

## [1] 0.5767759

cangas_tran = BoxCox(cangas,lambda_cangas)
temp_cangas = cbind(cangas, cangas_tran)
autoplot(temp_cangas, facet=T) + ggtitle("Monthly Canadian Gas Production") + xlab("Year")

With previous plots, I was able to see an improvement in the transformed plots. With the cangas time series data, I don’t see an improvement. This leads me to say that the Box-Cox transformation is unhelpful.

What Box-Cox tranformation would you select for your retail data (from exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts_2 <- ts(retaildata[, "A3349336V"], frequency = 12, start = c(1982, 4) )
lambda_retail = BoxCox.lambda(myts_2)
lambda_retail   # See the optimal transformation chosen (??)

## [1] 0.02481794

retail_tran = BoxCox(myts_2,lambda_retail)
temp_retail = cbind(myts_2, retail_tran)
autoplot(temp_retail, facet=T) + ggtitle("Retail Data") + xlab("Year")

Using lambda value of 0.02481794 we’ll see which transformation looks better, drift or bias-adjusted.

retail <- rwf(myts_2, drift = TRUE, lambda = lambda_retail, h = 50, level = 80)
retail_BiasAdjusted <- rwf(myts_2, drift = TRUE, lambda = lambda_retail, h = 50, level = 80, biasadj = TRUE)

autoplot(myts_2) + autolayer(retail, series = "Box-Cox Transformation - Drift") + autolayer(retail_BiasAdjusted$mean, series = "Bias-Adjusted") + guides(colour = guide_legend(title = "Forecast Plot"))

I would go with the Bias-adjusted transformation.

For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts using

myts_2.train <- window(myts_2, end = c(2010, 12))
myts_2.test <- window(myts_2, start = 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts_2) + autolayer(myts_2.train, series = "Training") + autolayer(myts_2.test, series = "Test")

+ Calculate the forecasts using snaive applied to myts_2.train.

fc <- snaive(myts_2.train)

Compare the accuracy of your forecasts against the actual values stored in myts_2.test.

accuracy(fc, myts_2.test)

##                     ME     RMSE      MAE       MPE     MAPE     MASE
## Training set  13.51592 25.11534 19.49850  4.932556 7.778814 1.000000
## Test set     -42.55833 45.76343 42.55833 -9.898213 9.898213 2.182647
##                   ACF1 Theil's U
## Training set 0.5717573        NA
## Test set     0.1302427 0.5709353

Check the residuals

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 526.3, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

If I’m understanding this correctly, the residuals are not correlated and the distribution is not normal. + How sensitive are the accuracy measures to the training/test split?

Predictive Analytics Homework #2

Oluwakemi Omotunde

February 15, 2019