Abstract
The Spanish Government declared a national emergency on March 14th, 2020 due to the coronavirus outbreak. Three weeks later, there’s no end in sight. The scarcity of hard data makes it hard to predict when the peak will be reached, and when it might end. In this report we will try and estimate the peak combining prediction based on different time series and the logistic growth curve.
For those of us who are lucky enough to not be sick with COVID-19, confinement and lack of expectations on when it will end (Wang et al. 2020,BSD and Wilder-Smith (n.d.)) might cause anxiety and decrease mental well-being due to the lack of control over what might happen in the immediate future. Having at least a rough estimation over how many days the situation will last will give at least a bit of control back to the individual person and their families.
Data has been extracted from the Datadista repository, in CSV format; the data is officially published by the Spanish Ministry of Health, and every quantity is clearly a sample of the real number. Reported cases are a sample of total cases, since testing is patchy and not extensive; discharges is also a sample, since it reveals mostly hospital discharges. The amount of people in hospitals and in ICUs is probably the one with the highest certainty, except for the fact that these are mostly overwhelmed and in some cases saturation point has been reached, so it does not say how many requests are there. In any case, let’s just assume that at least the dynamics has some similarity to reality, and let’s predict the future cases based on the current curve.
We will be using Prophet to create a model and compute these numbers. Number of cases predicted are below, along with the higher and lower bound. Underneath, prophet uses a series of models for prediction, including the Bayesian models included in Stan. We have capped the growth at 1 million initially.
Let’s try and predict future growth of the number of cases using a logistic growth model.
Fitting this model is actually a try-and-error process, where you need to check different cap values until one that fits the curve the best, at least for the last few days. Finally, 190000 was a good fit until April 1st, but with today’s data, 155000 yields the fit shown above, which might indicate that the cap is actually far away, and in fact could take place some time in mid-April. Since this is the series that has the most uncertainty, let’s try to check it against hospital occupation
Again, some time around April 15th might reach the peak, and it might be around 60000 accumulated people treated in hospitals (which is an increase over previous estimation, which was 57000). But in this case the problem is not sampling, which is probably accurate, but saturation of hospitals. So let us try a different approach: checking the progress of the resolution rate, that is, the relationship between discharges and deceases and new cases.
We set the cap of the resolution rate at 1.5. That would be a very good outcome, when the system is able to cope with more cases than there are actually. And in this case, the date is exactly April 15th, when the resolution rate will go over 1.
Unfortunately, we have to try and preview the number of deaths reported. Although this is going to be much lower than the real number or increment of deaths due to the overall situation, so this is mostly an academic exercise. We can see, however, how it evolves in the next few days and we’ll be able to check if we’re past the daily peak in deceases.
dc <- data.frame( y = data$fallecimientos,
ds = data$Fecha,
floor = 0,
cap = 15100)
model.d <- prophet(dc, growth='logistic')
## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## n.changepoints greater than number of observations. Using 23
future.d <- make_future_dataframe(model.d, periods=30)
future.d$cap <- 15100
future.d$floor <- 0
forecast.d <- predict(model.d,future.d)
forecast.d$ds <- as.Date(forecast.d$ds, "%Y-%m-%d")
ggplot(forecast.d,aes(x=ds)) + geom_point(data=forecast.d,aes(y=trend,color="Trend"))+ geom_point(data=dc,aes(y=y,color="Actual")) + theme_economist()
With this model, we can try and predict future reported deceases
forecast.d <- forecast.d %>% mutate( Future.Deceases= trend - lag(trend, 1, default=1))
ggplot(forecast.d,aes(x=ds,y=Future.Deceases,color="Predicted")) + geom_point()+geom_line() + geom_point(data=data,aes(y=Fallecimientos.nuevos,color="Actual"))+ theme_economist()
## Warning: Removed 9 rows containing missing values (geom_point).
The fit is not too precise, but it seems clear that the peak was reached on March 31st or thereabouts; the data the next few days will be essential to determine if that’s the case or we will actually have to wait a few days more.
If data provided is correct, and there’s no hidden pocket of contagions somewhere, the accumulated number of cases of COVID-19 in Spain might be capped at approximately 190000 cases, with an accumulated 57000 check-ins into hospital and that might happen in or around April 15th.
Past that date, there will still be contagions and deceases, but the health system will be able to arrive to a resolution before new cases pile in.
The initial if, however, is a big if. Health system might be already past its breaking point, and this might result in more unwanted deceases, not to mention an almost total collapse of the economic system. And past the peak, the population will still be susceptible to get sick, since 190000 cases are a very small fraction of the population, so health measures will need to be taken. But this at least gives us a little hope that we might be able to hit the streets again on or before the beginning of May.
This report has been generated from data extracted from the Ministry of Health database and published by Datadista in GitHub under a free License. This paper uses RMarkdown, includes all code needed to generate it, and can be downloaded from this repository.
BSD, Raman Preet, and Annelies Wilder-Smith. n.d. “The Pandemic of Social Media Panic Travels Faster Than the Covid-19 Outbreak.”
Wang, Guanghai, Yunting Zhang, Jin Zhao, Jun Zhang, and Fan Jiang. 2020. “Mitigate the Effects of Home Confinement on Children During the Covid-19 Outbreak.” The Lancet 395 (10228). Elsevier: 945–47.