At the time of this document’s creation, the CDC has forecasted between 100,000 and 240,000 deaths from Covid-19 in the coming months. We still do not know how the CDC reached that conclusion and now outside experts and White House advisors seem to be doubting the models used.
Modeling a crisis like this is near impossible, as the statisticians at Five Thirty Eight explained. But we will try to replicate the CDC models using simple ARIMA methods to see what kind of results we can get within a 30 day timeframe.
This program produces three models:
RED Model: A pessimistic (1,2,1) ARIMA model picked by R as the one with the best fit.
BLUE Model: A less pessimistic (2,1,1) model with a good fit that replicates the CDC’s forecasts.
GREEN Model: An optimistic (2,1,0) model that I do not think is correct, but visually could be used as a representation of a hypothetically aggressive response to the crisis a few weeks ago similar to the South Korean and Japanese curves.
acf(covid_USA_ts)
pacf(covid_USA_ts)
(AR <- arima(covid_USA_ts, order =c(2,1,1))) # Blue Model
##
## Call:
## arima(x = covid_USA_ts, order = c(2, 1, 1))
##
## Coefficients:
## ar1 ar2 ma1
## 0.7150 0.2611 -0.7598
## s.e. 0.1477 0.1379 0.1125
##
## sigma^2 estimated as 0.0442: log likelihood = 9.28, aic = -10.57
(AR2 <- arima(covid_USA_ts, order =c(2,1,0))) # Green Model
##
## Call:
## arima(x = covid_USA_ts, order = c(2, 1, 0))
##
## Coefficients:
## ar1 ar2
## 0.1894 0.4941
## s.e. 0.1028 0.1073
##
## sigma^2 estimated as 0.05064: log likelihood = 4.8, aic = -3.6
(AR3 <- auto.arima(covid_USA_ts)) # Red Model
## Series: covid_USA_ts
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2629 -0.7753
## s.e. 0.1400 0.0985
##
## sigma^2 estimated as 0.0459: log likelihood=8.86
## AIC=-11.71 AICc=-11.35 BIC=-4.97
Based on the ACF and PCF it appears we are dealing at least with an AR(1) and an MA(1).
We then run a few variations of ARIMA models, letting auto.arima pick the model with the best fit, which turns out to be a (1,2,1) model. This will later on be the model with the most forecasted cases/deaths, which seems unusually pessimistic. This model is in “RED” in the graphs.
The “BLUE” model (2,1,1) seems to fit the CDC’s forecasts, and the “GREEN” (2,1,0) seems to be extremely optimistic and almost certainly incorrect at this point.
We also conduct a Durbin-Watson test and confirm there is some autocorrelation (d<2), which explains why the program picks a 1,2,1 model as best fit.
## [1] 1.931167
This was merely an exercise to practice ARIMA modelling in R and to try and replicate the CDC numbers, which we were able to achieve with a (2,1,1) model. In the next few weeks, models will likely change drastically as the situation unfolds.
Assuming a 2% mortality rate, these models forecast between 600,000 and 7,000 deaths in the next 30 days. The “BLUE” (2,1,1) model is similar to the CDC’s current forecasts, but these values fluctuate daily at a very high rate.