JW Week 3 Discussion ADEC746001

The objective of this week’s discussion is to build and ARIMA model and an ETS model, and compare the forecasts. The data I will use comes from the fpp2 package. It is a time series with a monthly period that details total expenditures on restraint, cafe, and takeout interactions. Let’s look at the series.

library(fpp2)

## Loading required package: ggplot2

## Loading required package: forecast

## Loading required package: fma

## Loading required package: expsmooth

auscafe

##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1982                      0.3424 0.3421 0.3287 0.3385 0.3315 0.3419 0.3584
## 1983 0.3686 0.3481 0.3658 0.3511 0.3605 0.3471 0.3645 0.3760 0.3776 0.3741
## 1984 0.3888 0.3771 0.3978 0.3833 0.4140 0.3815 0.3930 0.4090 0.3947 0.4245
## 1985 0.4256 0.3918 0.4161 0.4198 0.4462 0.4068 0.4485 0.4660 0.4552 0.4936
## 1986 0.5043 0.4529 0.4803 0.4972 0.5308 0.4848 0.5263 0.5384 0.5374 0.5704
## 1987 0.5716 0.5251 0.5436 0.5581 0.5651 0.5423 0.5986 0.5837 0.5927 0.6233
## 1988 0.6050 0.5861 0.6253 0.6119 0.6305 0.6355 0.6587 0.6564 0.6600 0.6702
## 1989 0.7328 0.6607 0.7132 0.6939 0.7100 0.7217 0.7413 0.7455 0.7669 0.7937
## 1990 0.8575 0.7644 0.8403 0.8046 0.8089 0.7990 0.8150 0.8280 0.8124 0.8330
## 1991 0.8617 0.7711 0.8133 0.7974 0.8206 0.8013 0.8294 0.8535 0.8820 0.9389
## 1992 0.9383 0.8615 0.9361 0.9320 0.9294 0.8685 0.8913 0.8755 0.9141 0.9357
## 1993 0.9178 0.8381 0.8698 0.8621 0.8521 0.8276 0.8823 0.8669 0.9049 0.9312
## 1994 0.9845 0.9016 1.0152 0.9391 0.9414 0.9348 1.0133 1.0175 1.0407 1.0618
## 1995 1.0763 0.9816 1.0989 1.0678 1.0833 1.0450 1.0944 1.1097 1.1261 1.1665
## 1996 1.2131 1.1280 1.1800 1.1693 1.1457 1.1090 1.1381 1.1457 1.1049 1.1422
## 1997 1.1801 1.0595 1.1480 1.1413 1.1703 1.1135 1.1646 1.1733 1.1541 1.2048
## 1998 1.1859 1.0500 1.1411 1.1067 1.1442 1.0882 1.1622 1.1448 1.1495 1.2586
## 1999 1.2440 1.1244 1.2454 1.2360 1.2714 1.2080 1.2192 1.2336 1.2613 1.3144
## 2000 1.2974 1.2067 1.3246 1.2525 1.2821 1.2747 1.3179 1.3288 1.4324 1.4608
## 2001 1.4708 1.3380 1.5212 1.4413 1.4461 1.3976 1.4774 1.5046 1.4372 1.4885
## 2002 1.4821 1.3298 1.4789 1.4604 1.4819 1.4249 1.5273 1.5516 1.4951 1.5566
## 2003 1.6204 1.4218 1.5742 1.5891 1.6574 1.5542 1.6994 1.7378 1.7094 1.8784
## 2004 1.8952 1.7658 1.8728 1.8737 1.8462 1.7585 1.8782 1.8501 1.9215 1.9145
## 2005 1.7987 1.6612 1.7803 1.8090 1.7889 1.7689 1.8690 1.8134 1.8458 1.9781
## 2006 1.9138 1.7501 1.9838 1.9665 2.0051 1.9436 2.0192 2.0432 2.0387 2.1133
## 2007 2.0971 1.9390 2.1919 2.1340 2.1455 2.1338 2.1784 2.2579 2.1795 2.2573
## 2008 2.1635 2.0867 2.1594 2.1482 2.1767 2.0899 2.1726 2.2040 2.1242 2.3024
## 2009 2.3183 2.0695 2.3032 2.3269 2.4017 2.2630 2.4151 2.4461 2.4307 2.6020
## 2010 2.5213 2.3239 2.6075 2.5358 2.5714 2.5049 2.7813 2.8000 2.7428 2.7302
## 2011 2.5982 2.3913 2.6489 2.5989 2.6041 2.5195 2.7019 2.7396 2.7443 2.8144
## 2012 2.7295 2.5560 2.8393 2.7374 2.8367 2.7848 2.9322 2.9627 2.8859 2.9667
## 2013 2.8496 2.6077 2.9240 2.8535 2.9015 2.8130 2.9684 3.0651 2.9887 3.1754
## 2014 3.2091 2.8799 3.2320 3.1821 3.2283 3.0675 3.3158 3.3507 3.3604 3.4016
## 2015 3.3913 3.0275 3.3619 3.2665 3.3140 3.2575 3.4453 3.4211 3.4444 3.5258
## 2016 3.4318 3.1869 3.4352 3.4517 3.4310 3.3139 3.5730 3.6475 3.6963 3.7166
## 2017 3.6214 3.2606 3.6190 3.5670 3.5986 3.5442 3.6981 3.7112 3.7297       
##         Nov    Dec
## 1982 0.3747 0.4331
## 1983 0.3906 0.4590
## 1984 0.4365 0.4881
## 1985 0.5056 0.5639
## 1986 0.5638 0.6632
## 1987 0.6219 0.7398
## 1988 0.6708 0.7739
## 1989 0.8075 0.9427
## 1990 0.8515 0.9457
## 1991 0.9538 1.0483
## 1992 0.9318 1.0458
## 1993 0.9630 1.0995
## 1994 1.0702 1.1883
## 1995 1.1943 1.3147
## 1996 1.1294 1.2308
## 1997 1.1837 1.3067
## 1998 1.2008 1.3102
## 1999 1.3183 1.4553
## 2000 1.4163 1.5386
## 2001 1.4829 1.6114
## 2002 1.5719 1.7340
## 2003 1.8384 2.0483
## 2004 1.8399 2.0419
## 2005 1.9650 2.1528
## 2006 2.1094 2.2973
## 2007 2.2846 2.4611
## 2008 2.2428 2.4794
## 2009 2.5699 2.8971
## 2010 2.6844 3.0211
## 2011 2.7842 3.0466
## 2012 2.9731 3.1775
## 2013 3.2108 3.5368
## 2014 3.3749 3.6927
## 2015 3.4913 3.8199
## 2016 3.6785 4.0473
## 2017

As can be seen, it is a monthly time series, that begins in 1982-04, and ends on 2017-09. Let’s plot this series and several of its decomposition.

library(dplyr)
library(fpp2)
library(scales)
my_ts_plot <-
  function(data, plt_name, y_axis){
    autoplot(data)+
      labs(title = plt_name, y = y_axis)
  }

my_ts_plot(auscafe, "Austrailian Dining Out 1982-04 to 2017-09",
            "Billions") +
  scale_x_continuous(limits = c(1982, 2017), breaks = c(1982, 1990, 2000, 2010, 2017))

So this series clearly has an increasing trend, and some elements of seasonality. Let’s plot some decomposition to get a sense of these in separate components.

autoplot(decompose(auscafe, "a")) + 
  ggtitle("Additive Decomposition")

autoplot(decompose(auscafe, "m")) + 
  ggtitle("Multiplicative Decomposition")

Both series seem to retain elements of the seasonality in the remainder component. I think the additive of it in the middle of the series, but at the beginning and end it is still very apparent. Let’s see if more sophisticated method of decomposition do a better job with this.

library(seasonal)
auscafe %>% 
 stl(t.window=13, s.window="periodic", robust=TRUE) %>%
  autoplot() +
  ggtitle("STL Decomposition")

auscafe %>% 
  seas(x11 = "") %>%
  autoplot() +
  ggtitle("X11 Decomposition")

STL nearly removes all of the seasonal component in the remainder, up until the end. X11 removes more of it than the standard decomposition, but is still present. We clearly have a series that is highly seasonal. We will probably need to account for that in modeling.

We will now split the series into a test set and a training set. The training set will span from 1982 - 2009. The test set will span from 2010-2017

train <- window(auscafe, end = c(2009, 12))

test <- window(auscafe, start = c(2010, 1))

Ok so we have been tasked with training an ETS model, and an ARIMA model using auto.arima. Let’s fit the ETS models first with the training set, and select the best one based on performance metrics. ETS is an acronym for for (Error, Trend, Seasonal). This label can also be thought of as ExponenTial Smoothing. They are a type of exponential smoothing. Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight.

The ETS models I will test are “ZZZ”, which is auto selected, “ANN” is simple exponential smoothing with additive errors, “MNN” is simple exponential smoothing with multiplicative errors, “MAM” is multiplicative Holt-Winters’ method with multiplicative errors, and “MAN” Holt’s linear method with multiplicative errors.

library(kableExtra)
ets_model_types <- c("ZZZ", "ANN", "MNN", "MAM", "MAN")

ts_model_list <- lapply(setNames(unique(ets_model_types), 
                                  unique(ets_model_types)),
                         
                         function(x){
                           
                           model <- ets(y = train, model = x)
                           
                           fcsts <- forecast(model, h = 32)
                           
                           eval_metrics <-
                             accuracy(object = fcsts,
                                      x = test)
                           return(eval_metrics)
                           
                         })

kable(ts_model_list[["ZZZ"]], caption = "ZZZ")

ZZZ
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil’s U
Training set	0.0026892	0.0325982	0.0240223	0.1407121	2.055519	0.2731629	0.0161812	NA
Test set	-0.0369013	0.0908579	0.0792221	-1.4731475	2.972193	0.9008539	0.6720455	0.5010856

kable(ts_model_list[["ANN"]], caption = "ANN")

ANN
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil’s U
Training set	0.0144490	0.0765183	0.0531780	0.9874391	4.570800	0.6047005	0.0013551	NA
Test set	-0.0156719	0.1667139	0.1321269	-0.9646250	4.962657	1.5024479	0.3815126	0.9347789

kable(ts_model_list[["MNN"]], caption = "MNN")

MNN
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil’s U
Training set	0.0130442	0.0766605	0.0536317	0.8693043	4.613798	0.6098591	-0.0423072	NA
Test set	-0.0412970	0.1710361	0.1322676	-1.9173815	5.014294	1.5040473	0.3815126	0.9610564

kable(ts_model_list[["MAM"]], caption = "MAM")

MAM
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil’s U
Training set	0.0026892	0.0325982	0.0240223	0.1407121	2.055519	0.2731629	0.0161812	NA
Test set	-0.0369013	0.0908579	0.0792221	-1.4731475	2.972193	0.9008539	0.6720455	0.5010856

kable(ts_model_list[["MAN"]], caption = "MAN")

MAN
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil’s U
Training set	0.0036645	0.0750516	0.0510645	-0.2437217	4.43438	0.5806669	0.0339442	NA
Test set	-0.0950234	0.1718955	0.1381713	-3.8469389	5.29407	1.5711799	0.2196715	0.9764657

It looks like the “MAM” model performs best. Let’s fit that model and compare to the ARIMA model we will also generate.

myETS <- ets(train, "MAM")

myARIMA <- auto.arima(train)

fcts_ets <- forecast(myETS, h = length(test))

fcts_arima <- forecast(myARIMA, h = length(test))

kable(accuracy(fcts_arima, test), caption = "ARIMA Performence")

ARIMA Performence
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil’s U
Training set	0.0016766	0.0330029	0.0239287	0.0928071	2.017529	0.272099	-0.0000215	NA
Test set	-0.4646644	0.5009389	0.4646644	-14.6762595	14.676260	5.283812	0.8744104	2.452036

kable(accuracy(fcts_ets, test), caption = "MAM Performence")

MAM Performence
	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil’s U
Training set	0.0026892	0.0325982	0.0240223	0.1407121	2.055519	0.2731629	0.0161812	NA
Test set	0.0814082	0.1600318	0.1345081	2.1589560	4.150107	1.5295244	0.8714623	0.7571392

print(summary(myARIMA))

## Series: train 
## ARIMA(2,1,3)(0,1,1)[12] 
## 
## Coefficients:
##           ar1     ar2      ma1     ma2     ma3     sma1
##       -0.0898  0.5833  -0.3182  -0.607  0.4454  -0.6212
## s.e.   0.1219  0.1097   0.1198   0.094  0.0609   0.0454
## 
## sigma^2 estimated as 0.001155:  log likelihood=627.82
## AIC=-1241.63   AICc=-1241.27   BIC=-1215.25
## 
## Training set error measures:
##                       ME       RMSE        MAE        MPE     MAPE     MASE
## Training set 0.001676571 0.03300293 0.02392869 0.09280713 2.017529 0.272099
##                      ACF1
## Training set -2.15187e-05
##                       ME       RMSE        MAE        MPE     MAPE     MASE
## Training set 0.001676571 0.03300293 0.02392869 0.09280713 2.017529 0.272099
##                      ACF1
## Training set -2.15187e-05

print(summary(myETS))

## ETS(M,A,M) 
## 
## Call:
##  ets(y = train, model = "MAM") 
## 
##   Smoothing parameters:
##     alpha = 0.6213 
##     beta  = 0.0058 
##     gamma = 0.138 
## 
##   Initial states:
##     l = 0.345 
##     b = 0.0037 
##     s = 0.9945 0.9435 1.0235 1.1688 1.0147 1.0041
##            0.9764 0.9839 0.9824 0.9435 0.9954 0.9693
## 
##   sigma:  0.026
## 
##       AIC      AICc       BIC 
## -457.4763 -455.5334 -392.7379 
## 
## Training set error measures:
##                       ME       RMSE        MAE       MPE     MAPE      MASE
## Training set 0.002689214 0.03259818 0.02402225 0.1407121 2.055519 0.2731629
##                    ACF1
## Training set 0.01618115
##                       ME       RMSE        MAE       MPE     MAPE      MASE
## Training set 0.002689214 0.03259818 0.02402225 0.1407121 2.055519 0.2731629
##                    ACF1
## Training set 0.01618115

It would appear that in terms of model performance metrics, the “MAM” model has a slight edge over the ARIMA model. The order of the ARIMA generated is (2,1,3), with a monthly seasonal component order of (0,1,1).

Let’s plot residuals and forecast errors of the models.

hist(myETS$residuals, col = "grey", main = "MAM ETS Resiuals", xlab = "Residuals")

hist(myARIMA$residuals, col = "grey", main = "ARIMA(2,1,3)(0,1,1)[12] Resiuals",
     xlab = "Residuals")

cbind('Residuals' = residuals(myETS),
      'Forecast errors' = residuals(myETS,type='response')) %>%
  autoplot(facet=TRUE) + xlab("Year") + ylab("") + 
  ggtitle("Residuals and Forecast Errors From MAM ETS")

cbind('Residuals' = residuals(myARIMA),
      'Forecast errors' = residuals(myARIMA,type='response')) %>%
  autoplot(facet=TRUE) + xlab("Year") + ylab("") + 
  ggtitle("Residuals and Forecast Errors From ARIMA(2,1,3)(0,1,1)[12]")

It would appear that the ARIMA residuals are much more normally distributed than the “MAM” residuals. The plots of the forecast error and residuals both resemble white noise. It should be noted the forecast errors and residuals get larger later in the ARIMA model. It could be the case that the later forecasts are overestimated from the model relying to heavily on trend.

Finally, let’s plot the forecasts against the the test series.

autoplot(test) +
  autolayer(fcts_ets, series = "Forecasts", PI = F)+
  ggtitle("Forecasts of 'MAM' Against Observed Values") + 
  ylab("Billions")

autoplot(test) +
  autolayer(fcts_arima, series = "Forecasts", PI = F)+
  ggtitle("Forecasts of ARIMA(2,1,3)(0,1,1)[12] Against Observed Values") + 
  ylab("Billions")

These plots are very informative, clearly indicating that the “MAM” model performs better. The forecasts nearly over lap in this model. The ARIMA model seems to have relied more heavily on the increasing trend, as it over forecasts the test series by a good amount, and especially later in the series.

JW Week 3 Discussion ADEC746001

James Williams

7/22/2020