library(fpp2)
## Warning: package 'fpp2' was built under R version 3.6.3
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## -- Attaching packages ------------------------------------------------------------------------------ fpp2 2.4 --
## v ggplot2 3.3.2 v fma 2.4
## v forecast 8.13 v expsmooth 2.3
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'forecast' was built under R version 3.6.3
## Warning: package 'fma' was built under R version 3.6.3
## Warning: package 'expsmooth' was built under R version 3.6.3
##
library(ggplot2)
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
usnetelec
usgdp
mcopper
enplanements
plot(usnetelec)
lambda<-BoxCox.lambda(usnetelec)
lambda
## [1] 0.5167714
autoplot(BoxCox(usnetelec,lambda))
plot(usgdp)
lambda2<-BoxCox.lambda(usgdp)
lambda2
## [1] 0.366352
autoplot(BoxCox(usgdp,lambda2))
plot(mcopper)
lambda3<-BoxCox.lambda(mcopper)
lambda3
## [1] 0.1919047
autoplot(BoxCox(mcopper,lambda3))
plot(enplanements)
lambda4<-BoxCox.lambda(enplanements)
lambda4
## [1] -0.2269461
autoplot(BoxCox(enplanements,lambda4))
Why is a Box-Cox transformation unhelpful for the cangas data?
plot(cangas)
lambda5<-BoxCox.lambda(cangas)
lambda5
## [1] 0.5767759
autoplot(BoxCox(cangas,lambda5))
When compare the plot after we apply Box-Cox transformation, we don’t see an obvious change because the data various much widely during 1978 to 1993 and we can’t detect an uniform increase or decrease during this period. The volatility of the data might affect the Box-Cox transformation.
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
I will select A3349874C from the column.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349874C"],
frequency=12, start=c(1982,4))
autoplot(myts)
lambda_myts<-BoxCox.lambda(myts)
lambda_myts
## [1] -0.01469377
autoplot(BoxCox(myts,lambda_myts))
The plot shows the upward trend and seasonality. The lambda value, -0.01469377, is very small. Box-Cox transformation smooth the drastically changes after year 2010 and make variation curve looks less abnormal.
For your retail time series (from Exercise 3 in Section 2.10):
Split the data into two parts using
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
Check that your data have been split appropriately by producing the following plot.
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
Calculate forecasts using snaive applied to myts.train.
fc <- snaive(myts.train)
Compare the accuracy of your forecasts against the actual values stored in myts.test.
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 6.122823 12.45603 8.666967 5.906047 7.848706 1.000000
## Test set -31.379167 34.36303 31.379167 -18.882634 18.882634 3.620548
## ACF1 Theil's U
## Training set 0.5367142 NA
## Test set 0.1146989 0.8221791
Check the residuals
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 301.98, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
When we check the autocorrelation plot, lag 1 to lag 9 all above the blue dot line, indicating that the residuals are positive correlated. The Histogram confirms our finding that the mean of residual is more than 0, but they are normal distributed with some outliers. Ljung-Box test indicate the P value is 2.2e-16, thus we can conclude that the residuals are distinguishable from a white noise series.
How sensitive are the accuracy measures to the training/test split?
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 6.122823 12.45603 8.666967 5.906047 7.848706 1.000000
## Test set -31.379167 34.36303 31.379167 -18.882634 18.882634 3.620548
## ACF1 Theil's U
## Training set 0.5367142 NA
## Test set 0.1146989 0.8221791
The accuracy measures to the training/test split is very sensitive. Except ME and MPE, those measures are move in the same direction. The mean absolute scaled error is very similar for both training and test set. Mean error has the biggest difference because of some negative residuals.