Abstract
In a pandemic such as this one, it’s impossible to ascertain the real number of cases except by testing the whole population; this leads to the problem of different death rates for different countries, but mainly to the more poignant problem of allocating resources for urgent and grave cases or assessing the local peak of the epidemic. In this report we will try to find out an estimate of the death rate by looking at the countries that have made a more extensive testing, like Germany or South Korea. We will first try to estimate the time from onset to outcome, and from that, we will try and give an estimate of cases (and possible deaths) in other countries where testing has not been so extensive, like Spain.Transparency in the management of a critical situation such as the one we are living with coronavirus is essential. Not only for the peace of mind of the population, but also for being able to take informed decisions on the allocation of resources for those affected with the pandemic.
Knowing the real number of infected people and their evolution is one of those things, and countries have reacted differently to this challenge. From testing only those with symptoms and that self-select and call up health services, to testing massively, as it has been done in South Korea (Shim et al. 2020), Germany, or in Japan, at least with people repatriated from affected areas (Nishiura et al. 2020). This might be the reason why these countries report a lower Case Fatality Ratio (CFR) than in other cases.
Let’s first check what’s the reported case fatality ratio for different regions in the world, those that have at least 1000 cases. Max and min CFR are computed over the rows in which there were already 1000 cases.
## # A tibble: 20 x 4
## Country.Region max.CFR min.CFR last.CFR
## <fct> <dbl> <dbl> <dbl>
## 1 Italy 0.0901 0.0201 0.0901
## 2 Iran 0.0755 0.0249 0.0755
## 3 Spain 0.0542 0.0206 0.0542
## 4 US 0.0545 0.00505 0.0524
## 5 United Kingdom 0.0509 0.0184 0.0464
## 6 France 0.0394 0.0157 0.0394
## 7 Netherlands 0.0375 0.0170 0.0375
## 8 Japan 0.0348 0.0348 0.0348
## 9 Belgium 0.0238 0.00473 0.0238
## 10 Brazil 0.0147 0.0147 0.0147
## 11 Korea, South 0.0116 0.00455 0.0116
## 12 Switzerland 0.0114 0.00636 0.0114
## 13 Sweden 0.0113 0.00294 0.0113
## 14 Denmark 0.00980 0.00378 0.00980
## 15 Portugal 0.00938 0.00588 0.00938
## 16 Germany 0.00378 0 0.00378
## 17 Malaysia 0.00338 0.00291 0.00338
## 18 Norway 0.00401 0.00205 0.00331
## 19 Austria 0.00298 0.00225 0.00284
## 20 China 0.0534 0 0.000809
The countries with the lowest CFR have peaks in the area of 0.3%, with some cases even below that; China and Germany have very low last reported Cf Rs. The two mentioned countries, South Korea and Germany, have a current CFR in the 0.2-0.8% area. This contrasts with the US, which has the highest current CFR at 5%, similar to Italy, the UK and Netherlands.
But the case fatality ratio does not give the whole picture. Some might have been tested one day before, other even post-mortem. Another quantity, the infected fatality ratio, will give a more accurate scenario of what’s happening. But in absence of individualized data, we need to deduce that from published data, by calculating correlations between cases and deaths. We’ll do this next.
This chart, that shows the evolution of the CFR once 1000 reached in the territories where they have, shows roughly two groups. In one, the CFR remains roughly constant after initial growth; the other shows unlimited growth after the beginning. To exemplify these behaviors, let’s show Germany vs. Italy:
While the CFR remains low, and more or less constant, in Germany, the one in Italy grows as just the cases that enter hospital are tested for coronavirus, yielding an ever-increasing CFR, which can’t simply be true.
What we need to know is, approximately, what’s the expected time it elapses from infection to the final outcome. We’ll zero in on Germany and South Korea for this. Let’s plot correlation first for South Korea
There’s negative correlation 12 and 2 days before, as well as positive same-day and -3 days. That is, lower than average cases will lead to higher-than-average deaths 12 days later.
Let’s do the same for Germany:
There is a very strong positive correlation with a lag of 10 days, as well as a very strong negative correlation with a lag of 12 days and 9 days; same-day correlation is also strong, but not as high as in the case of Korea.
Let’s try several more countries, Norway and Malaysia, chosen also by their low CCF
The two top ones, for Norway and Denmark, show again the strong correlation (or anti-) in the -12:-9 days area. In Malaysia the correlation is totally different, with a very strong same-day correlation, and then deaths preceding cases by 4 days, probably indicating an onslaught of testing when figures are published.
It would be interesting to consider here Iceland. Although it’s a small country, it’s also performed extensive testing on their citizens.1
In this case, correlation has been inverted, and deaths lead new cases because testing has literally lagged behind deaths. Although this is not ideal, it’s better than no testing, and a negative correlation between testing and deaths is clearly observed at +5 days. Since, as the linked report indicates, those who test are self-selected largely (and, except for a few, test negative), this is probably the case.
Taking this into account, we will try to find an estimate of the IFR by using the ratio of deaths to cases reported 10 days before, using also 3-day aggregates.
In order to find the relation between cases and deaths and thus the infection mortality rate, let’s create a rolling window of three days for both, since the effect is spread over three days, and attempt correlation again. It might be that close positive and negative correlations eliminate each other, but since data for a day is spread over three days, we expect this will find bigger correlations, and then help us calculate ratios.
Let’s compute again correlations, using these rolling averages
which shows a positive correlation between cases and deceases that start at -18 days, and as a matter of fact rolls over so that deceases are positively related to new cases up to 5 days. Peak is at 3 days before decease, showing probably the mode in detection-to-outcome duration.
We can try and give an estimate of the infected fatality rate. Median seems to be at around days, let’s plot the rolling sum of cases against the rolling sum of deaths 9 days later
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## Warning: Removed 13 rows containing missing values (geom_point).
Let’s create a linear model for this
KO.lm <- lm( Rolling.Sum.New.Deaths ~ Lagged.Rolling.New.Cases, data=KO.data)
summary(KO.lm)
##
## Call:
## lm(formula = Rolling.Sum.New.Deaths ~ Lagged.Rolling.New.Cases,
## data = KO.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.399 -3.663 -1.927 2.344 12.269
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6529774 0.8254347 4.426 6.06e-05 ***
## Lagged.Rolling.New.Cases 0.0051702 0.0009849 5.249 4.00e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.664 on 45 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.3798, Adjusted R-squared: 0.366
## F-statistic: 27.55 on 1 and 45 DF, p-value: 3.996e-06
The estimation of the infected fatality rate would be, with a p value of 4e-6, 0.5172 in this case.
Let’s follow the same procedure for Germany, computing 3-day aggregates and correlation
The correlation in the case of Germany seems to start later than in the case of South Korea; we’ll adjust the lag in the same way, using 8 days instead of 11 as in that case
## Warning: Removed 10 rows containing non-finite values (stat_smooth).
## Warning: Removed 10 rows containing missing values (geom_point).
There are far more cases in the case of Germany, and the adjustment seems to be a bit better, but let’s fit a linear model as in the case of South Korea.
DE.lm <- lm( Rolling.Sum.New.Deaths ~ Lagged.Rolling.New.Cases, data=DE.data)
summary(DE.lm)
##
## Call:
## lm(formula = Rolling.Sum.New.Deaths ~ Lagged.Rolling.New.Cases,
## data = DE.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.6492 0.0129 0.2245 0.2790 18.6773
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.279019 0.500006 -0.558 0.579
## Lagged.Rolling.New.Cases 0.027275 0.001238 22.036 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.271 on 48 degrees of freedom
## (10 observations deleted due to missingness)
## Multiple R-squared: 0.91, Adjusted R-squared: 0.9082
## F-statistic: 485.6 on 1 and 48 DF, p-value: < 2.2e-16
The slope, at 2.72%, seems much higher than in the case of South Korea. This might be due to the fact that there might be some under-reporting of cases, something that is reinforced by the 3-days difference between the two countries. The intercept is not significant, however. If we try using the same delay as for South Korea
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## Warning: Removed 13 rows containing missing values (geom_point).
##
## Call:
## lm(formula = Rolling.Sum.New.Deaths ~ Lagged.Rolling.New.Cases,
## data = DE.data.11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.8237 0.1616 0.3360 0.3942 18.1422
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.394153 0.747002 -0.528 0.6
## Lagged.Rolling.New.Cases 0.058134 0.003959 14.683 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.661 on 45 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.8273, Adjusted R-squared: 0.8235
## F-statistic: 215.6 on 1 and 45 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Rolling.Sum.New.Deaths ~ Lagged.Rolling.New.Cases,
## data = DE.data.5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7180 0.0249 0.3671 0.3915 12.4272
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3915368 0.3854311 -1.016 0.315
## Lagged.Rolling.New.Cases 0.0122225 0.0004326 28.256 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.603 on 51 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.94, Adjusted R-squared: 0.9388
## F-statistic: 798.4 on 1 and 51 DF, p-value: < 2.2e-16
Linear models fit the slope for a lag of 11 and also 5; however, the intercept does not have a significant p value, which means that we can’t really use it for projections. The fact that cases can’t reliably predict deaths probably indicates that testing is not taking place as extensively as it appeared initially. Also, case fatality ratio is lower than infected fatality rate, which is probably an indicative either of under-reporting of cases or under-reporting of deaths. We should expect an IF similar to the one in Korea, however, it seems to be, as the slope of the case where delay is -5 days indicates, around 4 times as high.
Different countries have different testing and reporting policies in the COVID-19 pandemic. Testing extensively and without a self-selection bias seems to be the best option; reporting all cases of deaths with persons that have been infected with the virus seems to be the best option too. South Korea seems to have followed these best practices, and in this case, the estimated time from infection to death seems to be around 11 days, and with this estimate, the infected fatality rate is 0.5172% against a CFR of 1.16%.
In the case of Germany, the lag from reporting to death is around 5 days, estimated IFR for this delay 1.22%, with a CFR of 0.378%. Unlike in Korea, CFR < IFR. This probably indicates either lack of testing, or lack of reporting deaths, or maybe another, unknown, cause.
At any rate, using correlations from raw time series, together with rolling sums, seems to be a valid methodology for estimating reliability in reporting by different countries and territories, as it allows us to discover inconsistencies in the time series, as well as possible errors in pandemic-tackling policies. As future work, we will try and use these IFR estimates to find out real infection rates for countries and territories where testing has not been so extensive.
This file has been generated from data published by JHU CSSE. It’s data-driven and it can be re-generated from the script in this repository.
Nishiura, Hiroshi, Tetsuro Kobayashi, Yichi Yang, Katsuma Hayashi, Takeshi Miyama, Ryo Kinoshita, Natalie M Linton, et al. 2020. “The Rate of Underascertainment of Novel Coronavirus (2019-nCoV) Infection: Estimation Using Japanese Passengers Data on Evacuation Flights.” JCM. Multidisciplinary Digital Publishing Institute.
Shim, Eunha, Amna Tariq, Wongyeong Choi, Yiseul Lee, and Gerardo Chowell. 2020. “Transmission Potential and Severity of Covid-19 in South Korea.” International Journal of Infectious Diseases. doi:https://doi.org/10.1016/j.ijid.2020.03.031.
As published, for instance, in this Iceland Review article↩