What are the differences between Auto ARIMA and ETS models?

Both models are widely used approaches in forecasting time series data. However, the two models differ in the main component that is focused on. ETS models focus on the trend and seasonality in the data while ARIMA focuses on the autocorrelations in the data.

ETS Models

All ETS models are non-stationary. ETS models are considered under the umbrella first of exponential smoothing and then state-space. This type of model describes how unobserved components of the data (error, trend, and seasonality) change over time.

ARIMA Models

All ARIMA models are stationary. ARIMA stands for AutoRegressive Integrated Moving Average Model. The goal of ARIMA is to make the data stationary.

From my understanding, there are two types of ARIMA models: non-seasonal and seasonal. The non-seasonal model is a combination of differencing with autoregression and a moving average model.

Akaike’s Information Criterion

This was useful in selecting predictors for regression, but is also useful for determining the order of an ARIMA model. Additionally, good models are obtained by minimizing AIC, so we’ll look to this when compating ETS to ARIMA as well as RMSE and other error metrics. One thing to remember about AICc though is that it is useful for selecting models in the same class. Because ETS and ARIMA models are different classes, but we can use it to determine which ARIMA model out of multiple of which ETS model out of model would be the best to then use other metrics to compare the two.

The data that I will be modelling is the Total Distributed Solar Energy Consumption for Heat in US in trillion BTUs.

#libraries
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
#import data
data <- read.csv("C:/Users/andre/Documents/Boston College/2. Summer 2020/Forecasting/week 4/solar_energy_consumption.csv")

#convert factors to numbers
data$Value <- as.numeric(data$Value) #data of interest ends at 376
## Warning: NAs introduced by coercion
#create time series
data_ts <- ts(data[1:376,2], start = c(1989, 01), end = c(2020, 03), frequency = 12)
autoplot(data_ts, 
         main = "Total Distributed Solar Energy Consumption for Heat in the US, January 1989 - March 2020", 
         ylab = "Consumption in Trillions of BTUs")

#training and test sets
train <- ts(data_ts[1:300], start = c(1989, 01), frequency = 12)
test <- ts(data_ts[301:376], start = c(2014, 01), frequency = 12)

ETS Model of Solar Energy Consumption

#ets model 
ets <- ets(train)
ets
## ETS(M,Ad,M) 
## 
## Call:
##  ets(y = train) 
## 
##   Smoothing parameters:
##     alpha = 0.5231 
##     beta  = 0.0243 
##     gamma = 1e-04 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 4.2284 
##     b = 0.0203 
##     s = 0.7943 0.8376 1.0213 1.1204 1.2318 1.2447
##            1.1866 1.1754 1.0643 0.9675 0.7027 0.6533
## 
##   sigma:  0.0093
## 
##        AIC       AICc        BIC 
## -162.91715 -160.48298  -96.24906
fc_ets <- forecast(ets, 12)
autoplot(fc_ets)

acc_ets <- accuracy(fc_ets, test[1:12])
acc_ets
##                        ME       RMSE        MAE         MPE      MAPE      MASE
## Training set 0.0003016468 0.02984880 0.01962933 0.007158707 0.5051085 0.0423235
## Test set     0.0886268758 0.09992101 0.08862688 1.928736588 1.9287366 0.1910916
##                   ACF1
## Training set 0.4584956
## Test set            NA

ARIMA

Based off of the visual plot of the data, this is a seasonal time series and will require a seasonal ARIMA model. This also makes sense because the need for heat, regardless of if it comes from solar energy or not, will fluctuate depending on the time of year.

I decided to start first with auto.arima() to see how well the automatic fitting would perform.

myarima <- auto.arima(train)
summary(myarima)
## Series: train 
## ARIMA(1,1,2)(1,1,2)[12] 
## 
## Coefficients:
##          ar1      ma1     ma2     sar1    sma1    sma2
##       0.4703  -0.3192  0.2323  -0.4997  0.8367  0.5168
## s.e.  0.1299   0.1344  0.0595   0.1318  0.1180  0.0554
## 
## sigma^2 estimated as 0.0004093:  log likelihood=711.9
## AIC=-1409.79   AICc=-1409.39   BIC=-1384.17
## 
## Training set error measures:
##                         ME     RMSE        MAE          MPE      MAPE
## Training set -0.0001446744 0.019579 0.01023283 -0.003391747 0.2707296
##                    MASE        ACF1
## Training set 0.07880246 0.004802359
fc_arima1 <- forecast(myarima, 12)
autoplot(fc_arima1)

acc_arima1 <- accuracy(fc_arima1, test[1:12])
acc_arima1
##                         ME       RMSE        MAE          MPE      MAPE
## Training set -0.0001446744 0.01957900 0.01023283 -0.003391747 0.2707296
## Test set      0.0459922441 0.06293965 0.04972804  1.050850700 1.1094113
##                    MASE        ACF1
## Training set 0.02206337 0.004802359
## Test set     0.10722041          NA

This actually looks pretty good! Fairly unbiased by the looks of the error metrics, better than a naive forecast according to the MASE, but let’s see if we can get the ACF number down any further.

myarima2 <- arima(train, order = c(1,1,1), 
                  seasonal = list(order = c(1,1,1), 
                                  period = 12))
summary(myarima2)
## 
## Call:
## arima(x = train, order = c(1, 1, 1), seasonal = list(order = c(1, 1, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1    sar1     sma1
##       0.6485  -0.4337  0.7724  -0.4689
## s.e.  0.0930   0.1016  0.0622   0.0725
## 
## sigma^2 estimated as 0.0004559:  log likelihood = 695.16,  aic = -1380.31
## 
## Training set error measures:
##                         ME       RMSE        MAE          MPE      MAPE
## Training set -0.0001160372 0.02090184 0.01046672 -0.002775187 0.2808014
##                    MASE        ACF1
## Training set 0.02256767 -0.07374552
fc_arima2 <- forecast(myarima2, 12)
autoplot(fc_arima2)

acc_arima2 <- accuracy(fc_arima2, test[1:12])
acc_arima2
##                         ME       RMSE        MAE          MPE      MAPE
## Training set -0.0001160372 0.02090184 0.01046672 -0.002775187 0.2808014
## Test set      0.0381438058 0.06138326 0.04700046  0.921229089 1.0602812
##                    MASE        ACF1
## Training set 0.02256767 -0.07374552
## Test set     0.10133938          NA

I don’t believe that it’s by much, but this model performs better on the bias aspect and then ACF has gone into the negative. I am a little wary of comparing the two by ACF numbers as the first one was done by auto.arima() and may be a different class. The ARIMA model outperforms the ETS model on bias, but it’s very close. This is also visible in how similar the forecast plots look. However, when comparing how the test set performs, the ARIMA model outperforms the ETS model by a greater margin, and therefore is the best model for this solar consumption data.