This week I will be constructing ets models. Te data I will use comes from the ffp2 package. It measures monthly retail debit card usage in Iceland in millions of ISK. Let’s take a preliminary look at the structure of the set.

library(forecast)
library(fpp2)
## Loading required package: ggplot2
## Loading required package: fma
## Loading required package: expsmooth
library(seasonal)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
str(debitcards)
##  Time-Series [1:164] from 2000 to 2014: 7.2 7.33 7.81 7.41 9.14 ...

Looks like it is already a time series. Let’s see what the scope of the series is.

debitcards
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 2000  7.204  7.335  7.812  7.413  9.136  8.725  8.751  9.609  8.601  8.930
## 2001  8.078  7.892  8.151  8.738  9.416  9.533  9.943 10.859  8.789  9.960
## 2002  8.620  8.401  8.546 10.004 10.675 10.115 11.206 11.555 10.453 10.421
## 2003  9.315  9.366  9.910 10.302 11.371 11.857 12.387 12.421 12.073 11.963
## 2004 10.586 10.558 12.064 11.899 12.077 13.918 13.611 14.132 13.509 13.152
## 2005 14.262 13.024 14.062 14.718 16.544 16.732 16.230 18.126 16.016 15.601
## 2006 14.991 14.908 17.459 14.501 18.271 17.963 17.026 18.111 15.989 16.735
## 2007 16.198 15.060 16.168 16.376 18.403 19.113 19.303 20.560 16.621 18.788
## 2008 16.658 16.214 16.043 16.418 17.644 17.705 18.107 17.975 17.598 17.658
## 2009 16.065 15.467 16.297 16.530 18.410 20.274 21.311 20.991 18.305 17.832
## 2010 15.964 16.606 19.216 16.419 19.638 19.773 21.052 22.011 19.039 17.893
## 2011 16.699 16.619 17.851 18.160 22.032 21.395 22.217 24.565 21.095 20.114
## 2012 18.580 18.492 19.724 20.123 22.582 22.595 23.379 24.920 20.325 22.038
## 2013 19.061 19.303 19.323 21.573 23.685 22.104 25.340 25.211              
##         Nov    Dec
## 2000  8.835 11.688
## 2001  9.619 12.900
## 2002  9.950 13.975
## 2003 10.666 15.613
## 2004 13.993 18.203
## 2005 15.394 20.439
## 2006 15.949 20.216
## 2007 17.970 22.464
## 2008 15.750 22.414
## 2009 18.223 23.987
## 2010 19.276 25.167
## 2011 19.931 26.120
## 2012 20.988 26.675
## 2013

Looks like we have monthly observations from 2000-2012. Let’s plot this series.

autoplot(debitcards)+
  ggtitle("Monthly Retail Debit Card Usage In Iceland") +
  ylab("million ISK") +
  scale_x_continuous(breaks=seq(2000, 2012, 1))+
  ggthemes::theme_calc()
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

There appears to be some seasonality and an increasing trend over time. We can decompose this series a number of ways to get some insight into those. Let’s use an additive and multiplicative decomposition to start.

autoplot(decompose(debitcards, "a")) + 
  ggtitle("Additive Decomposition") +
  ggthemes::theme_calc()

autoplot(decompose(debitcards, "m")) + 
  ggtitle("Multiplicative Decomposition") +
  ggthemes::theme_calc()

The looks of these two are pretty similar. That being said the remainder component seems to become less seasonal later in time in the multiplicative series. Let’s take a look at a few more decompositions. First we will look at STL, which is an acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating nonlinear relationships.

debitcards %>%
  stl(t.window=13, s.window="periodic", robust=TRUE) %>%
  autoplot() +
  ggtitle("STL Decomposition") +
  ggthemes::theme_calc()

Still pretty seasonal in the latter parts of time in the remainder section. Let’s finally take a look at the X11 method, which has the features of trend-cycle estimates are available for all observations including the end points, and the seasonal component is allowed to vary slowly over time. This method could be appealing here, because the increasing trend seems to be slowing down later in the series.

debitcards %>%
  seas(x11 = "") %>%
  autoplot() +
  ggtitle("X11 Decomposition") +
  ggthemes::theme_calc()

This does a nice job of removing any components from the remainder component. Let’s split our data into a training set, and an evaluation set, and fit some models.

train <- window(debitcards, end = c(2010, 12))
test <- window(debitcards, start = c(2011, 1))

Ok, so the first step is for us to iteratively fit ETS models, and trim down the ones that do not perform well. We can use that using a nifty function called lapply. If you are not familiar with the apply family functions, they’re similar to loops. Here I am going to use lapply to create a list of the accuracy function output results to compare ETS Models. ETS is an acronym for for (Error, Trend, Seasonal). This label can also be thought of as ExponenTial Smoothing. They are a type of exponential smoothing. Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight.

The ETS models I will test are “ZZZ”, which is auto selected, “ANN” is simple exponential smoothing with additive errors, “MNN” is simple exponential smoothing with multiplicative errors, “MAM” is multiplicative Holt-Winters’ method with multiplicative errors, and “MAN” Holt’s linear method with multiplicative errors.

ets_model_types <- c("ZZZ", "ANN", "MNN", "MAM", "MAN")

ts_model_list <- lapply(setNames(unique(ets_model_types), 
                                  unique(ets_model_types)),
                         
                         function(x){
                           
                           model <- ets(y = train, model = x)
                           
                           fcsts <- forecast(model, h = 32)
                           
                           eval_metrics <-
                             accuracy(object = fcsts,
                                      x = test)
                           return(eval_metrics)
                           
                         })

kable(ts_model_list[["ZZZ"]], caption = "ZZZ")
ZZZ
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set -0.0113675 0.6866424 0.5146408 -0.2388585 3.567102 0.3980951 -0.1337876 NA
Test set 0.1113424 0.8470315 0.6892889 0.2776684 3.206333 0.5331924 0.0792811 0.3020759
kable(ts_model_list[["ANN"]], caption = "ANN")
ANN
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set 0.4288732 1.859755 1.403262 1.924465 9.379717 1.085479 -0.0445358 NA
Test set 0.7430751 2.752153 2.244654 1.981945 10.318884 1.736329 0.2953674 0.9848035
kable(ts_model_list[["MNN"]], caption = "MNN")
MNN
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set 0.3906990 1.862208 1.410478 1.678175 9.431244 1.091061 -0.0600566 NA
Test set 0.5638757 2.709270 2.222254 1.129080 10.299906 1.719002 0.2953674 0.9718695
kable(ts_model_list[["MAM"]], caption = "MAM")
MAM
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set -0.0113675 0.6866424 0.5146408 -0.2388585 3.567102 0.3980951 -0.1337876 NA
Test set 0.1113424 0.8470315 0.6892889 0.2776684 3.206333 0.5331924 0.0792811 0.3020759
kable(ts_model_list[["MAN"]], caption = "MAN")
MAN
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set -0.0553899 1.775643 1.371454 -1.511517 9.29431 1.060874 0.0297447 NA
Test set -0.9309842 2.548919 2.169781 -5.712057 10.67002 1.678412 0.1866413 0.9396888

Ok, so comparing those models in terms of RMSA and MAE “MAM” which is multiplicative Holt-Winters’ method with multiplicative errors, and “MAN” which is Holt’s linear method with multiplicative errors performed the best. The “MAM” model was the model using “ZZZ”. The “MAM” model performed SO well that I think it could be a case of over fitting. Let’s generate the training data models AIC scores as another benchmark to see if it is worth considering an alternative model to these two for forecasts.

ets_model_list_AIC <- lapply(setNames(unique(ets_model_types), 
                                  unique(ets_model_types)),
                         
                             function(x){
                           
                               model <- ets(y = train,
                                            model = x)
                               
                               fcsts <- forecast(model, h = 32)
                               
                               eval_metric <- AIC(model)
                           
                         })
aics <- data.frame("Model Type" = ets_model_types,
                   "AIC" = unlist(ets_model_list_AIC)) 

kable(aics, caption = "AIC Scores For Training Models")
AIC Scores For Training Models
Model.Type AIC
ZZZ ZZZ 556.5116
ANN ANN 814.3273
MNN MNN 798.1591
MAM MAM 556.5116
MAN MAN 785.7836

Ok, So “MAM” and ’MAN" performed the best once again. I think that comparing the AIC’s scores between those two is also a confirmation of some possible overfitting in the “MAM” model. It performs so much better than the rest in this metric as well. Let’s forecast.

ets_MAM <- ets(debitcards, 'MAM')

ets_MAN <- ets(debitcards, 'MAN')

hist(ets_MAM$residuals, col = "grey", main = "MAM ETS Model Resiuals", xlab = "MAM ETS MODEL")

hist(ets_MAN$residuals, col = "grey", main = "MAN ETS Model Resiuals", xlab = "MAN ETS MODEL")

So in terms of the distribution of residuals, we see that the “MAM” is nearly normally distributed, and the “MAN” model has a pretty significant left skew. How normally distributed the “MAM” residuals are leads me to believe that again, we are seeing some overfitting here. Let’s look at some more indicators of model performance from a visual perspective. We can plot the residuals in a line chart, and the forecast errors as well.

cbind('Residuals' = residuals(ets_MAM),
      'Forecast errors' = residuals(ets_MAM,type='response')) %>%
  autoplot(facet=TRUE) + xlab("Year") + ylab("") + 
  ggtitle("Residuals and Forecast Errors From ETS Model")+
  ggthemes::theme_calc()

cbind('Residuals' = residuals(ets_MAN),
      'Forecast errors' = residuals(ets_MAN,type='response')) %>%
  autoplot(facet=TRUE) + xlab("Year") + ylab("") + 
  ggtitle("Residuals and Forecast Errors From ETS Model")+
  ggthemes::theme_calc()

Ok, so again, we see that the “MAM” has nearly perfect white noise residuals, and the “MAN” model has some elements of seasonality left unaccounted for. Let’s see what the plots of the fitted values vs. actual values are for both models, forecast 2 years into the future, and add their forecasts to the plot as well.

my_fcsts_MAM <- forecast(ets_MAM, h = 24)

my_fcsts_MAN <- forecast(ets_MAN, h = 24)

autoplot(debitcards) +
  autolayer(fitted(ets_MAM), series = "Fitted Values") + 
  autolayer(my_fcsts_MAM, series = "2 yr forecats")+
  ggtitle("Fitted Values Versus Actual W/ Forecasts 'MAM'") + 
  ylab("million ISK")+
  theme_classic()

autoplot(debitcards) +
  autolayer(fitted(ets_MAN), series = "Fitted Values") + 
  autolayer(my_fcsts_MAN, series = "2 yr forecats")+
  ggtitle("Fitted Values Versus Actual W/ Forecasts 'MAN'") + 
  ylab("million ISK") +
  theme_classic()

So It would appear as suspected, their may be some overfittig occurring here in the “MAM” model. The “MAN” model does a really good job of forecasting the trend, but misses some of the seasonality component occurring. I would also point out that the forecasts for the “MAN” model capture the diminish returns of the trend seen occurring in the latter years of the series very well. The slope of the forecasts is increasing, but does appear some what slater than the overall trend. Finally, let’s look at the forecast models metrics to see which model performed better, and to what magnitude better.

accuracy(ets_MAM)
##                      ME      RMSE       MAE         MPE     MAPE      MASE
## Training set 0.01998494 0.7239404 0.5423039 -0.01069241 3.456976 0.4232463
##                    ACF1
## Training set -0.1427266
accuracy(ets_MAN)
##                      ME     RMSE      MAE       MPE     MAPE     MASE
## Training set -0.0153459 1.951717 1.531244 -1.258842 9.498252 1.195074
##                    ACF1
## Training set 0.02628659
print(paste("The AIC Score for the 'MAM' model is", AIC(ets_MAM)))
## [1] "The AIC Score for the 'MAM' model is 738.938758708601"
print(paste("The AIC Score for the 'MAN' model is", AIC(ets_MAN)))
## [1] "The AIC Score for the 'MAN' model is 1038.10141980846"

As expected, the “MAM” model performed much better than the “MAN” in every accuracy metric, and AIC score. However The “MAN” also performed very well. I am curious as to what my class mates opinion on the matter of the possible over fitting occurring in the “MAM” is!