The Covid-19 pandemic in Indonesia is a part of the ongoing worldwide pandemic of coronavirus desease 2019 caused by SARS-CoV2. On March 2th, 2020, President Joko Widodo anounced the first detection of Covid-19 case in Indonesia, marking the beginning of national health security crisis. By April 9th, the pandemic has spread to all 34 provinces in the country.
This mini research aims to build a suitable ARMA model for Covid-19 case and forecast future cases of the pandemic in Indonesia. ARMA model is chosen because it is fairly simple to estimate, produce reasonable forecasts, and it requires no knowledge of structural variables that might be required for more traditional econometric analysis. The result can be used to aid policy maker in constructing public healthcare around the Covid-19 pandemic.
Auto-Regression(AR) and Moving Average(MA) model provides a parsimonious description of a stationary stochastic process in terms of two polynomials, one for the autoregression and the second for the moving average. Using the datasets of Covid-19 Cases in 2020, we will create an ARMA model and forecast future Covid-19 cases in Indonesia.
AR - Auto Regressive
The AR model is one of the most widely used time series model. Its interpretation is similiar to simple linear regression, but here, each observation is regressed on the previous regression.
MA - Moving Average The MA model is parsimonious time series model which takes into account short-run autocorrelation. Here, each observation is regressed on the previous residual, which is not actually observed. A weighted sum of previous and current noise is called MA model.
Difference between AR and MA Models In simple term, AR model depends on the lagged values of the data, while MA model depends on the residuals of the previous period.
Load Library
covid
data# separated using semicolon
<- read_delim("covid.csv", delim = ";",
covid escape_double = FALSE, trim_ws = TRUE)
lubridate
$Date <- dmy(covid$Date) covid
<- covid[68:364,1:2] covid
Even though Covid-19 earliest date in Indonesia was March 3th, the following days were zero case, so we can drop the observations until March 9th.
colSums(is.na(covid))
## Date New_cases
## 0 0
$logcase <- log(covid$New_cases) covid
Using the xts
package, we will transform the dataframe covid
to an xts object.
<- xts(covid[,-1], order.by=as.POSIXct(covid$Date)) xts_covid
# check of if the data is an xts object
is.xts(xts_covid)
## [1] TRUE
xts_covid
datasummary(xts_covid)
## Index New_cases logcase
## Min. :2020-03-10 07:00:00 Min. : 4 Min. :1.386
## 1st Qu.:2020-05-23 07:00:00 1st Qu.: 568 1st Qu.:6.342
## Median :2020-08-05 07:00:00 Median :1882 Median :7.540
## Mean :2020-08-05 07:00:00 Mean :2502 Mean :7.221
## 3rd Qu.:2020-10-18 07:00:00 3rd Qu.:4056 3rd Qu.:8.308
## Max. :2020-12-31 07:00:00 Max. :8369 Max. :9.032
The series span from March 10th, 2020 to December 31th, 2020. The smallest increase of new case of Covid-19 in 2020 was 4, the highest being 8,369 new cases in a day, with an average of 2,502.
plot(xts_covid$New_cases, main = "Daily Cases of Covid-19 in Indonesia")
Daily Cases of Covid-19 in Indonesia had an exponential growth with a strong positive trends.
Time series decomposition using moving average is a fast way to view overall trends in time series data. Using a moving average of 14 days (2 weeks), the plot is as following:
<- SMA(xts_covid$New_cases, n=14)
covidSMA20 plot(covidSMA20)
From the plot above, we can infer that in early September 2020, daily new cases of Covid-19 has a subtrend of even stronger positive trend, which is followed by a downward trend by early October, 2020. However, it begins to return to its main positive trend by mid November, with an even steeper slope.
Stationarity means that the statistical property of a process generating a time series do not change over time. Before modelling ARMA, we need to make sure that the data is stationary. Using non-stationary time series data in financial models produces unreliable and spurious results and leads to poor understanding and forecasting.
From the graph above, we can infer that Covid-19 daily new cases grows exponentially. The variable would be at least non-stationary in means and variance. To be sure, we would use **Augmented Dickey-Fuller* test.
adf.test(xts_covid$New_cases)
##
## Augmented Dickey-Fuller Test
##
## data: xts_covid$New_cases
## Dickey-Fuller = -0.78987, Lag order = 6, p-value = 0.9618
## alternative hypothesis: stationary
The p-value is 0.9618, which is higher than the 5% signifinance level. As expected, the raw data variable doesn’t reject the null hypothesis (non-stationary).
Hence, our next best bet would be its log form, since it undo exponential effects.
adf.test(xts_covid$logcase)
## Warning in adf.test(xts_covid$logcase): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: xts_covid$logcase
## Dickey-Fuller = -6.8804, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
We reject the null hypothesis since P < 5% (0,01 < 0,05). The log variable of Covid Daily New Cases is stationary. Now, we will plot the log form:
plot(xts_covid$logcase)
Actually, the plot shows that log variable has a trend. However, referring the ADF test, it would still be considered stationary.
<- arima(covid$logcase, order=c(1, 0, 0))
ar_logcovid ar_logcovid
##
## Call:
## arima(x = covid$logcase, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.9939 6.5753
## s.e. 0.0065 1.6960
##
## sigma^2 estimated as 0.066: log likelihood = -19.99, aic = 45.98
<- arima(covid$logcase, order=c(0, 0, 1))
ma_logcovid ma_logcovid
##
## Call:
## arima(x = covid$logcase, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 0.8500 7.2184
## s.e. 0.0208 0.0875
##
## sigma^2 estimated as 0.666: log likelihood = -361.71, aic = 729.42
<- arima(covid$logcase, order=c(1, 0, 1)) arma_logcovid
## Warning in arima(covid$logcase, order = c(1, 0, 1)): possible convergence
## problem: optim gave code = 1
arma_logcovid
##
## Call:
## arima(x = covid$logcase, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## 0.9991 -0.4479 6.2504
## s.e. 0.0013 0.0544 2.9510
##
## sigma^2 estimated as 0.05698: log likelihood = 1.35, aic = 5.3
We will choose the best model from AIC, BIC, and LogLikelihood value. Best model would have the smallest AIC and BIC value, also the highest LL value.
::glance(ar_logcovid) broom
::glance(ma_logcovid) broom
::glance(arma_logcovid) broom
As seen from the AIC, BIC, and LL values above, we can conclude that ARMA model is the best one among the three models.
Accuracy of prediction
accuracy(arma_logcovid)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.04002369 0.2387077 0.1615796 0.4704124 2.870368 0.9362276
## ACF1
## Training set 0.02759253
The MAPE is 2.87. On average, the forecast is off by 2,87%. Since MAPE is less than 20%, the ARMA model has an excellent predictive ability.
Forecasting the Next 5 Observations
forecast(arma_logcovid, 5)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 298 8.964009 8.658093 9.269926 8.496151 9.431868
## 299 8.961573 8.612268 9.310878 8.427358 9.495788
## 300 8.959139 8.571335 9.346943 8.366044 9.552233
## 301 8.956707 8.533955 9.379458 8.310164 9.603249
## 302 8.954277 8.499310 9.409244 8.258465 9.650089
autoplot(forecast(arma_logcovid, 5))
According to ARMA Forecast, Indonesia would have an increase of daily Covid-19 confirmed cases by around 9% five days following December 31th, 2020. Our recommendations are:
Combined with forecast of daily Covid-19 case recovery and Covid-19 deaths, we can prescribe the number of healthcare capacity increase such as hospital beds, nurse outsourcing, medicine import and production.
The forecasted daily Covid-case increase is way above global average of x%, we need to bring it down to at least the acceptable level.