library(fpp2)
library(gridExtra)
lambda_test <- function(data) {
ap <- autoplot(data)
lambda <- BoxCox.lambda(data)
ap2 <- autoplot(BoxCox(data,lambda))
grid.arrange(ap, ap2, ncol = 2, nrow = 1)
}
lambda_test(usnetelec)
There was a slightly nonlinear trend pattern between 1950 and 1970. The Box-Cox transformation linearizes this.
lambda_test(usgdp)
The Box-Cox transformation again linearizes the trend pattern.
lambda_test(mcopper)
The transformation makes the fluctuations in the 1960s the same in magnitude as the rest of the series. It also somewhat reduces the extreme magnitude of the increase in the 2000s.
lambda_test(enplanements)
In the early 1980s the seasonal pattern seemed to depend on the level of the series. The Box-Cox fixes this.
lambda_test(cangas)
There is a period of extreme seasonal fluctuations in the 1970s and 1980s that the Box-Cox is not able to smooth. The magnitude of the seasonal pattern seems to be tied to a factor other than the level of the series, as the fluctions in the 2000s match those of the 1960s.
Retail Data
Suggested Lambda through BoxCox.lambda function:
library(tidyverse)
retaildata <- read_csv("https://raw.githubusercontent.com/TheFedExpress/Data/master/retail.csv", skip=1)
myts <- ts(retaildata[,12],frequency=12, start=c(1982,4))
BoxCox.lambda(myts)
## [1] 0.0106194
I would round to zero and use a log transform, making it easier to explain to a stakeholder.
ap <- autoplot(myts)
ap2 <- autoplot(BoxCox(myts,0))
grid.arrange(ap, ap2, ncol = 2, nrow = 1)
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
There appears to be a bit of a cyclical pattern starting around 2010. The peak seasons are steadily rising from 1980 through 2010, but are flat from the period 2010-2014.
fc <- snaive(myts.train)
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 15.13063 28.69434 22.20571 4.854662 6.873099 1.000000
## Test set -21.02500 31.50373 24.22500 -4.276868 4.878829 1.090936
## ACF1 Theil's U
## Training set 0.4670908 NA
## Test set 0.5491969 0.3020243
There is a clear difference between the test and training sets. The sign on the mean error flips, which is likely a result of the cyclical pattern observed earlier. If there is an increasing trend pattern then the mean error will be positive, but because of the down cycle from 2010-2014, the mean error turns negative. The magnitude of the errors is about the same indicating about the same level of variance.
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 248.99, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
The residuals follow a fairly symmetric distriubtion, but there is clear serial correlation. This is because the series follows a random walk-like pattern without strong mean reversion. This is most clearly a problem during the downturn in the early 1990s. Nearly all of the residuals are negative during this time.
If we difference the series(ideally this would be seasonal differencing), the serial correaltion is somewhat minimized, though there is obvious heteroskedasticity:
diffed <- diff(myts.train) %>% snaive() %>% checkresiduals()
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 161.39, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Because we are using the seasonal naive method, the test set accuracy should be pretty sensitive to the split. The only factor in this model is the last value for each season. If we leave off at a good year, we expect every following year to be the same.
We see that chaning the end to 2007 drastically alters the accuracy:
myts.train <- window(myts, end=c(2007,12))
myts.test <- window(myts, start=2008)
m1 <- snaive(myts.train)
accuracy(m1,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 14.37003 24.11580 19.82121 4.996571 6.812931 1.00000
## Test set 59.19583 75.06786 60.57917 10.360162 10.645953 3.05628
## ACF1 Theil's U
## Training set 0.3786887 NA
## Test set 0.4319383 0.7357454
autoplot(myts.train) +
autolayer(m1, series="Forecast", PI=FALSE)