library(fpp2)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------------------------ fpp2 2.4 --
## v ggplot2 3.1.0 v fma 2.4
## v forecast 8.12 v expsmooth 2.3
##
library(gridExtra)
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance.
#help(usnetelec)
#help(usgdp)
#help(mcopper)
#help(enplanements)
Annual US net electricity generation (billion kwh) for 1949-2003
Use BoxCox.lambda
function to determine best lambda value that makes size of variation about the same across the whole series. Lambda 0.5167714
is the best value as per output of function.
(lambda1 <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
Plot of the Box-Cox transformation with lambda 0.5167714
plot1 <- autoplot(usnetelec)
plot2 <- autoplot(BoxCox(usnetelec,lambda1))
grid.arrange(plot1, plot2)
Quarterly US GDP. 1947:1 - 2006.1.
Use BoxCox.lambda
function to determine best lambda value that makes size of variation about the same across the whole series. Lambda 0.366352
is the best value as per output of function.
(lambda2 <- BoxCox.lambda(usgdp))
## [1] 0.366352
Plot of the Box-Cox transformation with lambda 0.366352
plot1 <- autoplot(usgdp)
plot2 <- autoplot(BoxCox(usgdp,lambda2))
grid.arrange(plot1, plot2)
Monthly copper prices.
Use BoxCox.lambda
function to determine best lambda value that makes size of variation about the same across the whole series. Lambda 0.1919047
is the best value as per output of function.
(lambda3 <- BoxCox.lambda(mcopper))
## [1] 0.1919047
Plot of the Box-Cox transformation with lambda 0.1919047
plot1 <- autoplot(mcopper)
plot2 <- autoplot(BoxCox(mcopper,lambda3))
grid.arrange(plot1, plot2)
Monthly US Domestic Revenue Enplanements (millions): 1996-2000
Use BoxCox.lambda
function to determine best lambda value that makes size of variation about the same across the whole series. Lambda -0.2269461
is the best value as per output of function.
(lambda4 <- BoxCox.lambda(enplanements))
## [1] -0.2269461
Plot of the Box-Cox transformation with lambda -0.2269461
plot1 <- autoplot(enplanements)
plot2 <- autoplot(BoxCox(enplanements,lambda4))
grid.arrange(plot1, plot2)
Why is a Box-Cox transformation unhelpful for the cangas data?
help(cangas)
## starting httpd help server ... done
Monthly Canadian gas production, billions of cubic metres, January 1960 - February 2005
Box-Cox transformation using the best lambda to stabilize variance Value of lambda is ’0.5767759`
(lambda <- BoxCox.lambda(cangas))
## [1] 0.5767759
cangas_plot <- autoplot(cangas)
cangas_boxcox <- autoplot(BoxCox(cangas,lambda))
grid.arrange(cangas_plot, cangas_boxcox)
The resulting plot of the Box-cox transformation does not look much different from the original plot.
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349335T"], frequency=12, start=c(1982,4))
Box-cox transformation with lambda 0.193853
(lambda <- BoxCox.lambda(myts))
## [1] 0.193853
plot1 <- autoplot(myts)
plot2 <- autoplot(BoxCox(myts, lambda))
grid.arrange(plot1, plot2)
The transformation improved the variance across the time series so that it is more similar
For your retail time series (from Exercise 3 in Section 2.10):
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
fc <- snaive(myts.train)
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 61.56787 72.20702 61.68438 6.388722 6.404105 1.000000
## Test set 97.44583 109.62545 100.02917 4.629852 4.751209 1.621629
## ACF1 Theil's U
## Training set 0.6018274 NA
## Test set 0.2686595 0.9036205
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 812.76, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
There appears to be some correlation in the residuals, which indicates that “there is information left in the residuals which should be used in computing forecasts”. The Ljung-Box test p-value is < 2.2e-16, which tells us differences in residuals is not likely to be white noise.
To investigate this, we changed the split. The accuracy measures are different for each of the splits. So, I would say that training/test split are sensitive to accuracy measures.
New split:
myts.train2 <- window(myts, end=c(2011,12))
myts.test2 <- window(myts, start=2012)
fc2 <- snaive(myts.train2)
accuracy(fc2,myts.test2)
## ME RMSE MAE MPE MAPE MASE
## Training set 62.05884 72.75961 62.35101 6.293351 6.316641 1.000000
## Test set 80.76250 97.02262 82.36250 3.634259 3.709092 1.320949
## ACF1 Theil's U
## Training set 0.5866573 NA
## Test set 0.2515324 0.707386
Original split:
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 61.56787 72.20702 61.68438 6.388722 6.404105 1.000000
## Test set 97.44583 109.62545 100.02917 4.629852 4.751209 1.621629
## ACF1 Theil's U
## Training set 0.6018274 NA
## Test set 0.2686595 0.9036205