Using the data from Federal Reserve Economic Data (FRED), I want to compare between SES and ARIMA model which one is better to explain Indonesia GDP growth rate.
the data is quarterly from 1991 to 2019
># # A tibble: 6 x 2
># Date `GDP Growth Rate`
># <dttm> <dbl>
># 1 1991-01-01 00:00:00 3.57
># 2 1991-04-01 00:00:00 2.56
># 3 1991-07-01 00:00:00 1.34
># 4 1991-10-01 00:00:00 1.53
># 5 1992-01-01 00:00:00 1.09
># 6 1992-04-01 00:00:00 1.29
Convert the data into time series data
gdpts <- ts(gdp$`GDP Growth Rate`, frequency = 4, start = c(1991,1))
ts_info(gdpts)
ts_plot(gdpts,
title = "Indonesia GDP by Expenditure in Constant Prices",
Ytitle = "Growth Rate Previous Period",
Xtitle = "Year",
slider = TRUE)># The gdpts series is a ts object with 1 variable and 116 observations
># Frequency: 4
># Start time: 1991 1
># End time: 2019 4
Can be seen from the data, Indonesia GDP Growth rate more volatile in the beginning and tend to flat at the end.
from the trend perspective, can be seen that the trend tends to be volatile at the beginning but flat at the end.
Now I will how the seasonal movement in GDP.
from the seasonal perspective, can be seen that seasonal movement more likely to be random. For more detail, I will use more graph to more clearly to analyse.
Focusing in each Quarterly for GDP growth rate, the pattern is more likely random, more variety in the beginning (seen by color density) and tend to flat at the end. Focusing in year 1998 on Q1 and Q2, the color density is more intense to white. it happens because there was a crisis in Asia especially Indonesia and began to recover from 1998.
More detail can be seen from 3D graph GDP growth rate.
Looking how relationship the GDP Growth rate with the lag, it seems that the relationship tends to be flat.
>#
># Augmented Dickey-Fuller Test
>#
># data: gdpts
># Dickey-Fuller = -3.788, Lag order = 4, p-value = 0.02199
># alternative hypothesis: stationary
>#
>#
># KPSS Test for Level Stationarity
>#
># data: gdpts
># KPSS Level = 0.098021, Truncation lag parameter = 4, p-value = 0.1
from ADF Test, since p-value < 0.05, can be concluded that the data is stationary.
from KPSS Test, since p-value >0.05, can be condluded same that the data is stationary.
I will divided the data into train (1991-2017) and test (2018-2019) for forecasting
h1 <- 8
h2 <- 20
gdpts_split <- ts_split(gdpts, sample.out = h1)
train <- gdpts_split$train
test <- gdpts_split$test
ts_info(train)
ts_info(test)># The train series is a ts object with 1 variable and 108 observations
># Frequency: 4
># Start time: 1991 1
># End time: 2017 4
># The test series is a ts object with 1 variable and 8 observations
># Frequency: 4
># Start time: 2018 1
># End time: 2019 4
Because the data doesn’t have trend and seasonal, I will use SES
># ME RMSE MAE MPE MAPE MASE
># Training set -0.04149507 1.62921119 0.78494700 48.708225 184.379530 0.64833286
># Test set -0.01220515 0.03544488 0.02730129 -1.059164 2.241803 0.02254971
># ACF1 Theil's U
># Training set 0.1735786 NA
># Test set 0.1784716 0.9880781
RMSE value for ETS model is 1.629, later I will compare with ARIMA model
test_forecast(forecast.obj = fc2, actual = gdpts, test = test) %>%
layout(legend = list(x = 0.1, y = 0.95))From the graph, can be seen that the model can’t explain too well the actual value.
Now I will make another model using ARIMA
># ME RMSE MAE MPE MAPE
># Training set -0.010801306 1.47274443 0.82016336 68.3008089 138.214551
># Test set 0.004117018 0.03155132 0.02867744 0.2647297 2.320384
># MASE ACF1 Theil's U
># Training set 0.67742008 0.0370900 NA
># Test set 0.02368635 0.1720567 0.7644466
RMSE value for ARIMA model is 1.4727
test_forecast(forecast.obj = fc1, actual = gdpts, test = test) %>%
layout(legend = list(x = 0.1, y = 0.95))From the graph, ARIMA Model much better than SES model to explain the actual value.
Conclusions :
- based by the RMSE value, ARIMA Model have lower value in RMSE that SES (ARIMA model = 1.4727) < (SES Model = 1.629)
- ARIMA model look better than SES Model to capture the actual value and forecasting.
>#
># Shapiro-Wilk normality test
>#
># data: arima$residuals
># W = 0.79354, p-value = 0.00000000005197
Since the p-value < 0.05, we can conclude that the residuals are not normally distributed. its normal thing that in a time series, errors might emerge from various unpredictable events and is actually quite unavoidable.
>#
># Box-Ljung test
>#
># data: arima$residuals
># X-squared = 0.15274, df = 1, p-value = 0.6959
since the p-value > 0.05, we can conclude that no autocorrelation in the forecast errors.