Using the data from Federal Reserve Economic Data (FRED), I want to compare between SES and ARIMA model which one is better to explain Indonesia GDP growth rate.

1 Input Data

1.1 Library

library(dplyr)
library(lubridate)
library(forecast)
library(tseries)
library(MLmetrics)
library(readxl)
library(ggplot2)
library(plotly)
library(TSstudio)

1.2 Data Setup

the data is quarterly from 1991 to 2019

gdp <- read_xls("growth rate quarterly.xls")
head(gdp)

># # A tibble: 6 x 2
>#   Date                `GDP Growth Rate`
>#   <dttm>                          <dbl>
># 1 1991-01-01 00:00:00              3.57
># 2 1991-04-01 00:00:00              2.56
># 3 1991-07-01 00:00:00              1.34
># 4 1991-10-01 00:00:00              1.53
># 5 1992-01-01 00:00:00              1.09
># 6 1992-04-01 00:00:00              1.29

2 Exploratory Data

Convert the data into time series data

gdpts <- ts(gdp$`GDP Growth Rate`, frequency = 4, start = c(1991,1))
ts_info(gdpts)
ts_plot(gdpts,
        title = "Indonesia GDP by Expenditure in Constant Prices",
        Ytitle = "Growth Rate Previous Period",
        Xtitle = "Year",
        slider = TRUE)

>#  The gdpts series is a ts object with 1 variable and 116 observations
>#  Frequency: 4 
>#  Start time: 1991 1 
>#  End time: 2019 4

Can be seen from the data, Indonesia GDP Growth rate more volatile in the beginning and tend to flat at the end.

ts_decompose(gdpts)

from the trend perspective, can be seen that the trend tends to be volatile at the beginning but flat at the end.

Now I will how the seasonal movement in GDP.

ts_seasonal(gdpts - decompose(gdpts)$trend,
            type = "all",
            title = "Seasonal Plot")

from the seasonal perspective, can be seen that seasonal movement more likely to be random. For more detail, I will use more graph to more clearly to analyse.

ts_heatmap(gdpts)

Focusing in each Quarterly for GDP growth rate, the pattern is more likely random, more variety in the beginning (seen by color density) and tend to flat at the end. Focusing in year 1998 on Q1 and Q2, the color density is more intense to white. it happens because there was a crisis in Asia especially Indonesia and began to recover from 1998.

More detail can be seen from 3D graph GDP growth rate.

ts_surface(gdpts)

ts_lags(gdpts)

Looking how relationship the GDP Growth rate with the lag, it seems that the relationship tends to be flat.

3 Time Series Modelling

3.1 Stationarity

adf.test(gdpts)
kpss.test(gdpts)

># 
>#  Augmented Dickey-Fuller Test
># 
># data:  gdpts
># Dickey-Fuller = -3.788, Lag order = 4, p-value = 0.02199
># alternative hypothesis: stationary
># 
># 
>#  KPSS Test for Level Stationarity
># 
># data:  gdpts
># KPSS Level = 0.098021, Truncation lag parameter = 4, p-value = 0.1

from ADF Test, since p-value < 0.05, can be concluded that the data is stationary.
from KPSS Test, since p-value >0.05, can be condluded same that the data is stationary.

3.2 Cross Validation

I will divided the data into train (1991-2017) and test (2018-2019) for forecasting

h1 <- 8
h2 <- 20

gdpts_split <- ts_split(gdpts, sample.out = h1)
train <- gdpts_split$train
test <- gdpts_split$test
ts_info(train)
ts_info(test)

>#  The train series is a ts object with 1 variable and 108 observations
>#  Frequency: 4 
>#  Start time: 1991 1 
>#  End time: 2017 4 
>#  The test series is a ts object with 1 variable and 8 observations
>#  Frequency: 4 
>#  Start time: 2018 1 
>#  End time: 2019 4

3.3 Error, Trend, and Seasonal (ETS) Model

Because the data doesn’t have trend and seasonal, I will use SES

modelets <- ets(train, model = "ZNN")
fc2 <- forecast(modelets, h = h1)
accuracy(fc2, test)

>#                       ME       RMSE        MAE       MPE       MAPE       MASE
># Training set -0.04149507 1.62921119 0.78494700 48.708225 184.379530 0.64833286
># Test set     -0.01220515 0.03544488 0.02730129 -1.059164   2.241803 0.02254971
>#                   ACF1 Theil's U
># Training set 0.1735786        NA
># Test set     0.1784716 0.9880781

RMSE value for ETS model is 1.629, later I will compare with ARIMA model

test_forecast(forecast.obj = fc2, actual = gdpts, test = test) %>% 
  layout(legend = list(x = 0.1, y = 0.95))

From the graph, can be seen that the model can’t explain too well the actual value.

Now I will make another model using ARIMA

3.4 Autoregressive Moving Average (ARIMA) Model

arima <- auto.arima(train)
fc1 <- forecast(arima, h = h1)
accuracy(fc1, test)

>#                        ME       RMSE        MAE        MPE       MAPE
># Training set -0.010801306 1.47274443 0.82016336 68.3008089 138.214551
># Test set      0.004117018 0.03155132 0.02867744  0.2647297   2.320384
>#                    MASE      ACF1 Theil's U
># Training set 0.67742008 0.0370900        NA
># Test set     0.02368635 0.1720567 0.7644466

RMSE value for ARIMA model is 1.4727

test_forecast(forecast.obj = fc1, actual =  gdpts, test = test) %>% 
  layout(legend = list(x = 0.1, y = 0.95))

From the graph, ARIMA Model much better than SES model to explain the actual value.

Conclusions :
- based by the RMSE value, ARIMA Model have lower value in RMSE that SES (ARIMA model = 1.4727) < (SES Model = 1.629)
- ARIMA model look better than SES Model to capture the actual value and forecasting.

3.5 Assumption Check

3.5.1 Normality

shapiro.test(arima$residuals)

># 
>#  Shapiro-Wilk normality test
># 
># data:  arima$residuals
># W = 0.79354, p-value = 0.00000000005197

Since the p-value < 0.05, we can conclude that the residuals are not normally distributed. its normal thing that in a time series, errors might emerge from various unpredictable events and is actually quite unavoidable.

3.5.2 Autocorrelation

Box.test(arima$residuals, type = "Ljung-Box")

># 
>#  Box-Ljung test
># 
># data:  arima$residuals
># X-squared = 0.15274, df = 1, p-value = 0.6959

since the p-value > 0.05, we can conclude that no autocorrelation in the forecast errors.

Indonesia GDP Growth Rate

Aji Putera Tanumihardja

3/22/2020