Brief Background of Covid-19 Pandemic in Indonesia

The Covid-19 pandemic in Indonesia is a part of the ongoing worldwide pandemic of coronavirus desease 2019 caused by SARS-CoV2. On March 2th, 2020, President Joko Widodo anounced the first detection of Covid-19 case in Indonesia, marking the beginning of national health security crisis. By April 9th, the pandemic has spread to all 34 provinces in the country.

This mini research aims to build a suitable ARMA model for Covid-19 case and forecast future cases of the pandemic in Indonesia. ARMA model is chosen because it is fairly simple to estimate, produce reasonable forecasts, and it requires no knowledge of structural variables that might be required for more traditional econometric analysis. The result can be used to aid policy maker in constructing public healthcare around the Covid-19 pandemic.

ARMA Modelling

Definition

Auto-Regression(AR) and Moving Average(MA) model provides a parsimonious description of a stationary stochastic process in terms of two polynomials, one for the autoregression and the second for the moving average. Using the datasets of Covid-19 Cases in 2020, we will create an ARMA model and forecast future Covid-19 cases in Indonesia.

Introduction to ARMA

AR - Auto Regressive

The AR model is one of the most widely used time series model. Its interpretation is similiar to simple linear regression, but here, each observation is regressed on the previous regression.

MA - Moving Average The MA model is parsimonious time series model which takes into account short-run autocorrelation. Here, each observation is regressed on the previous residual, which is not actually observed. A weighted sum of previous and current noise is called MA model.

Difference between AR and MA Models In simple term, AR model depends on the lagged values of the data, while MA model depends on the residuals of the previous period.

Load Library

Read and Preprocess the Data

Read covid data

# separated using semicolon
covid <- read_delim("covid.csv", delim = ";", 
    escape_double = FALSE, trim_ws = TRUE)

Change column type to date using lubridate

covid$Date <- dmy(covid$Date)

Subset the row only from the date Covid-19 entered Indonesia, the the columns needed.

covid <- covid[68:364,1:2]

Even though Covid-19 earliest date in Indonesia was March 3th, the following days were zero case, so we can drop the observations until March 9th.

Checking for NA

colSums(is.na(covid))
##      Date New_cases 
##         0         0
covid$logcase <- log(covid$New_cases)

Exploration of Time Series Data

Using the xts package, we will transform the dataframe covid to an xts object.

xts_covid <- xts(covid[,-1], order.by=as.POSIXct(covid$Date))
# check of if the data is an xts object
is.xts(xts_covid)
## [1] TRUE

Summarize xts_covid data

summary(xts_covid)
##      Index                       New_cases       logcase     
##  Min.   :2020-03-10 07:00:00   Min.   :   4   Min.   :1.386  
##  1st Qu.:2020-05-23 07:00:00   1st Qu.: 568   1st Qu.:6.342  
##  Median :2020-08-05 07:00:00   Median :1882   Median :7.540  
##  Mean   :2020-08-05 07:00:00   Mean   :2502   Mean   :7.221  
##  3rd Qu.:2020-10-18 07:00:00   3rd Qu.:4056   3rd Qu.:8.308  
##  Max.   :2020-12-31 07:00:00   Max.   :8369   Max.   :9.032

The series span from March 10th, 2020 to December 31th, 2020. The smallest increase of new case of Covid-19 in 2020 was 4, the highest being 8,369 new cases in a day, with an average of 2,502.

Visualize the Daily Cases of Covid-19 in Indonesia

plot(xts_covid$New_cases, main = "Daily Cases of Covid-19 in Indonesia")

Daily Cases of Covid-19 in Indonesia had an exponential growth with a strong positive trends.

Decompose time series data using Moving Average

Time series decomposition using moving average is a fast way to view overall trends in time series data. Using a moving average of 14 days (2 weeks), the plot is as following:

covidSMA20 <- SMA(xts_covid$New_cases, n=14)
plot(covidSMA20)

From the plot above, we can infer that in early September 2020, daily new cases of Covid-19 has a subtrend of even stronger positive trend, which is followed by a downward trend by early October, 2020. However, it begins to return to its main positive trend by mid November, with an even steeper slope.

ARMA Modelling

Testing for Stationarity

Stationarity means that the statistical property of a process generating a time series do not change over time. Before modelling ARMA, we need to make sure that the data is stationary. Using non-stationary time series data in financial models produces unreliable and spurious results and leads to poor understanding and forecasting.

From the graph above, we can infer that Covid-19 daily new cases grows exponentially. The variable would be at least non-stationary in means and variance. To be sure, we would use **Augmented Dickey-Fuller* test.

adf.test(xts_covid$New_cases)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  xts_covid$New_cases
## Dickey-Fuller = -0.78987, Lag order = 6, p-value = 0.9618
## alternative hypothesis: stationary

The p-value is 0.9618, which is higher than the 5% signifinance level. As expected, the raw data variable doesn’t reject the null hypothesis (non-stationary).

Remediating non-stationarity

Hence, our next best bet would be its log form, since it undo exponential effects.

adf.test(xts_covid$logcase)
## Warning in adf.test(xts_covid$logcase): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  xts_covid$logcase
## Dickey-Fuller = -6.8804, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary

We reject the null hypothesis since P < 5% (0,01 < 0,05). The log variable of Covid Daily New Cases is stationary. Now, we will plot the log form:

plot(xts_covid$logcase)

Actually, the plot shows that log variable has a trend. However, referring the ADF test, it would still be considered stationary.

ARMA Model

ar_logcovid <- arima(covid$logcase, order=c(1, 0, 0))
ar_logcovid
## 
## Call:
## arima(x = covid$logcase, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.9939     6.5753
## s.e.  0.0065     1.6960
## 
## sigma^2 estimated as 0.066:  log likelihood = -19.99,  aic = 45.98
ma_logcovid <- arima(covid$logcase, order=c(0, 0, 1))
ma_logcovid
## 
## Call:
## arima(x = covid$logcase, order = c(0, 0, 1))
## 
## Coefficients:
##          ma1  intercept
##       0.8500     7.2184
## s.e.  0.0208     0.0875
## 
## sigma^2 estimated as 0.666:  log likelihood = -361.71,  aic = 729.42
arma_logcovid <- arima(covid$logcase, order=c(1, 0, 1))
## Warning in arima(covid$logcase, order = c(1, 0, 1)): possible convergence
## problem: optim gave code = 1
arma_logcovid
## 
## Call:
## arima(x = covid$logcase, order = c(1, 0, 1))
## 
## Coefficients:
##          ar1      ma1  intercept
##       0.9991  -0.4479     6.2504
## s.e.  0.0013   0.0544     2.9510
## 
## sigma^2 estimated as 0.05698:  log likelihood = 1.35,  aic = 5.3

Model Selection

We will choose the best model from AIC, BIC, and LogLikelihood value. Best model would have the smallest AIC and BIC value, also the highest LL value.

broom::glance(ar_logcovid)
broom::glance(ma_logcovid)
broom::glance(arma_logcovid)

As seen from the AIC, BIC, and LL values above, we can conclude that ARMA model is the best one among the three models.

Prediction

Accuracy of prediction

accuracy(arma_logcovid)
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.04002369 0.2387077 0.1615796 0.4704124 2.870368 0.9362276
##                    ACF1
## Training set 0.02759253

The MAPE is 2.87. On average, the forecast is off by 2,87%. Since MAPE is less than 20%, the ARMA model has an excellent predictive ability.

Forecasting the Next 5 Observations

forecast(arma_logcovid, 5)
##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 298       8.964009 8.658093 9.269926 8.496151 9.431868
## 299       8.961573 8.612268 9.310878 8.427358 9.495788
## 300       8.959139 8.571335 9.346943 8.366044 9.552233
## 301       8.956707 8.533955 9.379458 8.310164 9.603249
## 302       8.954277 8.499310 9.409244 8.258465 9.650089
autoplot(forecast(arma_logcovid, 5))

Conclusion and Recommendation

According to ARMA Forecast, Indonesia would have an increase of daily Covid-19 confirmed cases by around 9% five days following December 31th, 2020. Our recommendations are:

  1. Combined with forecast of daily Covid-19 case recovery and Covid-19 deaths, we can prescribe the number of healthcare capacity increase such as hospital beds, nurse outsourcing, medicine import and production.

  2. The forecasted daily Covid-case increase is way above global average of x%, we need to bring it down to at least the acceptable level.