Both models are widely used approaches in forecasting time series data. However, the two models differ in the main component that is focused on. ETS models focus on the trend and seasonality in the data while ARIMA focuses on the autocorrelations in the data.
All ETS models are non-stationary. ETS models are considered under the umbrella first of exponential smoothing and then state-space. This type of model describes how unobserved components of the data (error, trend, and seasonality) change over time.
All ARIMA models are stationary. ARIMA stands for AutoRegressive Integrated Moving Average Model. The goal of ARIMA is to make the data stationary.
From my understanding, there are two types of ARIMA models: non-seasonal and seasonal. The non-seasonal model is a combination of differencing with autoregression and a moving average model.
This was useful in selecting predictors for regression, but is also useful for determining the order of an ARIMA model. Additionally, good models are obtained by minimizing AIC, so we’ll look to this when compating ETS to ARIMA as well as RMSE and other error metrics. One thing to remember about AICc though is that it is useful for selecting models in the same class. Because ETS and ARIMA models are different classes, but we can use it to determine which ARIMA model out of multiple of which ETS model out of model would be the best to then use other metrics to compare the two.
The data that I will be modelling is the Total Distributed Solar Energy Consumption for Heat in US in trillion BTUs.
#libraries
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
#import data
data <- read.csv("C:/Users/andre/Documents/Boston College/2. Summer 2020/Forecasting/week 4/solar_energy_consumption.csv")
#convert factors to numbers
data$Value <- as.numeric(data$Value) #data of interest ends at 376
## Warning: NAs introduced by coercion
#create time series
data_ts <- ts(data[1:376,2], start = c(1989, 01), end = c(2020, 03), frequency = 12)
autoplot(data_ts,
main = "Total Distributed Solar Energy Consumption for Heat in the US, January 1989 - March 2020",
ylab = "Consumption in Trillions of BTUs")
#training and test sets
train <- ts(data_ts[1:300], start = c(1989, 01), frequency = 12)
test <- ts(data_ts[301:376], start = c(2014, 01), frequency = 12)
#ets model
ets <- ets(train)
ets
## ETS(M,Ad,M)
##
## Call:
## ets(y = train)
##
## Smoothing parameters:
## alpha = 0.5231
## beta = 0.0243
## gamma = 1e-04
## phi = 0.98
##
## Initial states:
## l = 4.2284
## b = 0.0203
## s = 0.7943 0.8376 1.0213 1.1204 1.2318 1.2447
## 1.1866 1.1754 1.0643 0.9675 0.7027 0.6533
##
## sigma: 0.0093
##
## AIC AICc BIC
## -162.91715 -160.48298 -96.24906
fc_ets <- forecast(ets, 12)
autoplot(fc_ets)
acc_ets <- accuracy(fc_ets, test[1:12])
acc_ets
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0003016468 0.02984880 0.01962933 0.007158707 0.5051085 0.0423235
## Test set 0.0886268758 0.09992101 0.08862688 1.928736588 1.9287366 0.1910916
## ACF1
## Training set 0.4584956
## Test set NA
Based off of the visual plot of the data, this is a seasonal time series and will require a seasonal ARIMA model. This also makes sense because the need for heat, regardless of if it comes from solar energy or not, will fluctuate depending on the time of year.
I decided to start first with auto.arima() to see how well the automatic fitting would perform.
myarima <- auto.arima(train)
summary(myarima)
## Series: train
## ARIMA(1,1,2)(1,1,2)[12]
##
## Coefficients:
## ar1 ma1 ma2 sar1 sma1 sma2
## 0.4703 -0.3192 0.2323 -0.4997 0.8367 0.5168
## s.e. 0.1299 0.1344 0.0595 0.1318 0.1180 0.0554
##
## sigma^2 estimated as 0.0004093: log likelihood=711.9
## AIC=-1409.79 AICc=-1409.39 BIC=-1384.17
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.0001446744 0.019579 0.01023283 -0.003391747 0.2707296
## MASE ACF1
## Training set 0.07880246 0.004802359
fc_arima1 <- forecast(myarima, 12)
autoplot(fc_arima1)
acc_arima1 <- accuracy(fc_arima1, test[1:12])
acc_arima1
## ME RMSE MAE MPE MAPE
## Training set -0.0001446744 0.01957900 0.01023283 -0.003391747 0.2707296
## Test set 0.0459922441 0.06293965 0.04972804 1.050850700 1.1094113
## MASE ACF1
## Training set 0.02206337 0.004802359
## Test set 0.10722041 NA
This actually looks pretty good! Fairly unbiased by the looks of the error metrics, better than a naive forecast according to the MASE, but let’s see if we can get the ACF number down any further.
myarima2 <- arima(train, order = c(1,1,1),
seasonal = list(order = c(1,1,1),
period = 12))
summary(myarima2)
##
## Call:
## arima(x = train, order = c(1, 1, 1), seasonal = list(order = c(1, 1, 1), period = 12))
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.6485 -0.4337 0.7724 -0.4689
## s.e. 0.0930 0.1016 0.0622 0.0725
##
## sigma^2 estimated as 0.0004559: log likelihood = 695.16, aic = -1380.31
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.0001160372 0.02090184 0.01046672 -0.002775187 0.2808014
## MASE ACF1
## Training set 0.02256767 -0.07374552
fc_arima2 <- forecast(myarima2, 12)
autoplot(fc_arima2)
acc_arima2 <- accuracy(fc_arima2, test[1:12])
acc_arima2
## ME RMSE MAE MPE MAPE
## Training set -0.0001160372 0.02090184 0.01046672 -0.002775187 0.2808014
## Test set 0.0381438058 0.06138326 0.04700046 0.921229089 1.0602812
## MASE ACF1
## Training set 0.02256767 -0.07374552
## Test set 0.10133938 NA
I don’t believe that it’s by much, but this model performs better on the bias aspect and then ACF has gone into the negative. I am a little wary of comparing the two by ACF numbers as the first one was done by auto.arima() and may be a different class. The ARIMA model outperforms the ETS model on bias, but it’s very close. This is also visible in how similar the forecast plots look. However, when comparing how the test set performs, the ARIMA model outperforms the ETS model by a greater margin, and therefore is the best model for this solar consumption data.