library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth
library(gridExtra)
usnetelec usgdp mcopper enplanements
We will use BoxCox.lambda function to pick up the best lambda value for appropriate box cox transformation.
usn1 <- autoplot(usnetelec)
lambda_usnetelec <- BoxCox.lambda(usnetelec)
usn_bc <-autoplot(BoxCox(usnetelec, lambda_usnetelec))
grid.arrange( grobs = list(usn1,usn_bc))
usg1<- autoplot(usgdp)
lambda_usgdp <- BoxCox.lambda(usgdp)
usg_bc<- autoplot(BoxCox(usgdp, lambda_usgdp))
grid.arrange( grobs = list(usg1,usg_bc))
enp1<- autoplot(enplanements)
lambda_enplanements <- BoxCox.lambda(enplanements)
enp_bc <- autoplot(BoxCox(enplanements, lambda_enplanements))
grid.arrange( grobs = list(enp1,enp_bc))
mc1 <- autoplot(mcopper)
lambda_mcopper <- BoxCox.lambda(mcopper)
mc_bc <-autoplot(BoxCox(mcopper, lambda_mcopper))
grid.arrange(mc1, mc_bc)
lambda_cangas <- BoxCox.lambda(cangas)
cangas1<- autoplot(cangas)
cangas2<- autoplot(BoxCox(cangas, lambda_cangas))
grid.arrange( grobs = list(cangas1,cangas2))
After picking the best value of lambda and applying box cox on the time series data, we were not able to stabilize the variance. As we can see the comparison of both autoplots, the box-cox transformation is generating similar results to the original, in other words the transformation does not provide stationarity to time series.
retaildata <- readxl::read_excel("C:/Users/Gurpreet/Documents/Data624/retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349627V"],frequency=12, start=c(1982,4))
autoplot(myts)
lambda_retaildat <- BoxCox.lambda(myts)
autoplot(BoxCox(myts, lambda_retaildat))
The transformation has smoothed out the data to a greater extent. The seasonal variations are stabilizied and time series is almost stationary in the transformed plot. The seasonals peaks are more stationary than the original ts data plot.
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
b.Check that your data have been split appropriately by producing the following plot.
autoplot(myts) +autolayer(myts.train, series="Training") + autolayer(myts.test, series="Test")
fc <- snaive(myts.train)
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 6.870871 12.27525 8.893093 5.476112 7.780981 1.00000
## Test set 28.400000 29.39091 28.400000 11.015822 11.015822 3.19349
## ACF1 Theil's U
## Training set 0.6617306 NA
## Test set 0.5697915 0.7493485
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 591.71, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
Residuals are correlated. That is there is information in the residuals that should have been captured by the forecating method. The normality does not seems to hold. The mean is not centered resulting in the positive skewness, that is the forecast is biased. We do have negative and positive outliers. We can consider the possibility of adjusting forecast.
myts2.train <- window(myts, end=c(2011,12))
myts2.test <- window(myts, start=2012)
fc2 <- snaive(myts2.train)
accuracy(fc2,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 7.471014 12.92370 9.422899 5.612836 7.837536 1.000000
## Test set 9.825000 14.96852 12.808333 4.008724 4.871206 1.359277
## ACF1 Theil's U
## Training set 0.6958216 NA
## Test set 0.3652316 0.3676318
The accuracy measures are very sensitive to the training/test split. In order to check, we changed the train/test split percentage and run the accuracy check again. The matrix clearly reveals the low values in accuracy measure indicators. Comparing this to original matrix clearly indicates that the measures are sensitive to the split.