Disclaimer: I am data scientist but not an expert in the field of epidemics.
This is a analysis of the corona Covid-19 data, provided from: https://github.com/CSSEGISandData/COVID-19. It tries to answer the question if the actions which have been taken by the european goverment are effective.
Loading of the data (code for loading taken from https://towardsdatascience.com/create-a-coronavirus-app-using-r-shiny-and-plotly-6a6abf66091d).
library(lubridate)
source('load_data.R') #See also: https://gist.github.com/oduerr/9b69aac21463560679515ab51046b2c7
allData =
loadData("time_series_covid19_confirmed_global.csv", "CumConfirmed") %>%
inner_join(loadData("time_series_covid19_deaths_global.csv", "CumDeaths")) #%>%
#inner_join(
# loadData("time_series_19-covid-Recovered.csv","CumRecovered"))
last_date = max(allData$date)
print(paste0('Data from ', last_date))
## [1] "Data from 2020-04-26"
The lastest number of cases are (we included the Hubei Province in China for Reference)
germany = allData %>% filter(`Country/Region` == c('Germany'))
italy = allData %>% filter(`Country/Region` == 'Italy')
france = allData %>% filter(`Country/Region` == 'France') %>% filter(`Province/State` == '<all>') #includes non-european teretories?
china_hubei = allData %>% filter(`Country/Region` == 'China') %>% filter(`Province/State` == 'Hubei')
suisse = allData %>% filter(`Country/Region` == 'Switzerland')
austria = allData %>% filter(`Country/Region` == 'Austria')
spain = allData %>% filter(`Country/Region` == 'Spain')
europe = rbind(germany, italy, france, suisse, austria, spain)
europe_with_china = rbind(europe, china_hubei)
last_day = max(europe$date)
library(knitr) #For nicer plotting of the table
kable(filter(europe_with_china, date==last_day))
| Province/State | Country/Region | date | CumConfirmed | CumDeaths |
|---|---|---|---|---|
| Germany | 2020-04-26 | 157770 | 5976 | |
| Italy | 2020-04-26 | 197675 | 26644 | |
| France | 2020-04-26 | 160847 | 22856 | |
| Switzerland | 2020-04-26 | 29061 | 1610 | |
| Austria | 2020-04-26 | 15225 | 542 | |
| Spain | 2020-04-26 | 226629 | 23190 | |
| Hubei | China | 2020-04-26 | 68128 | 4512 |
Here we investigate the number of cases over time. We include some important dates in the plots.
Italy nationwide lockdown on 2020-03-09 https://en.wikipedia.org/wiki/2020_Italy_coronavirus_lockdown the first lockdown of lombardy and other regions were a bit earlier.
For the rest of europe the most meassures took place around Monday 16th of March. On that day Germany closed it schools, and other countries like France, Spain, and Austria, even imposed a curfew.
From China, we see that it takes about 14 days until the effects of a lockdown will become visible, this is shown by the corresponding second vertical line.
italy_nationwide = as.Date('2020-03-09')
italy_nationwide_effect = as.Date('2020-03-23')
other_lockdown = as.Date('2020-03-16')
other_lockdown_effect = as.Date('2020-03-30')
cols <- c("Austria" = "darkred", "Italy" = 'darkgreen', "Switzerland" = "red","Germany" = "black", "France"='darkblue', "China"='red', "Spain"='yellow3')
title = paste0('Corona Cases in Europe as of : ', last_day)
library(ggplot2)
library(ggpubr)
library(scales)
d_plot = ggplot(europe) +
geom_line(aes(x = date, y=CumConfirmed, color=`Country/Region`)) +
#geom_line(aes(x = date, y=CumDeaths, color=`Country/Region`)) +
#scale_y_continuous(name = 'Total Number of Confirmed Cases') +
scale_y_continuous(trans = 'log10', name = 'Total Number of Confirmed Cases') +
scale_x_date(labels=date_format("%d/%m"),
limits = c(as.Date('2020-02-20'), last_date)) +
ggtitle(title, subtitle = 'Focus on dates around the actions taken') +
scale_color_manual(values=cols) +
geom_vline(xintercept = italy_nationwide, col=cols['Italy']) +
geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy']) +
geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed') +
geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')
d_plot
d_plot +
geom_line(data = china_hubei, aes(x = date, y=CumConfirmed, color=`Country/Region`)) +
scale_x_date(labels=date_format("%d/%m")) + ggtitle(title, subtitle = 'Complete Time Span with China')
Let’s predict the numbers before govermental actions have been taken into the future. We assume an exponetial law of the from \[ y = a \cdot 2^{t/t_0} \] That formulae states that the number of cases \(y\) double after a period \(t_0\) of days. We try to estimate \(a\) and \(t_0\). We do a trick and take the log, this yields:
\[ \lg(y) = \frac{1}{t_0} \cdot t + \lg(a) \] We fit that model to all states.
fit = function(state, start_date, plot=TRUE, days_back = 7) {
f = state %>%
filter(date < (start_date + days(0))) %>%
filter(date > (start_date - days(days_back)))
t = as.numeric(f$date) - as.numeric(start_date) #-6 -5 -4 ... 4 5 6
y = log2(f$CumConfirmed)
lm = lm(y ~ t)
a = 2^(lm$coefficients[1])
t0 = 1/lm$coefficients[2]
name = state$`Country/Region`[1]
if (plot) {
plot(t, y, main=paste0('Before gov. action ', name, ' t0=', round(t0,2)), cex.main=0.8, xlab='time (0 is lockdown)', ylab='log number of cases')
lines(t, lm$coefficients[2]*t+lm$coefficients[1])
}
t = seq(-10,30) #project 20 days after the lockdown
return(data.frame(date = start_date + days(t), pred = a*2^(t/t0), state=name, t0))
}
europ_fit = fit(italy, italy_nationwide)
europ_fit = rbind(europ_fit,fit(germany, other_lockdown))
europ_fit = rbind(europ_fit,fit(france, other_lockdown))
europ_fit = rbind(europ_fit,fit(suisse, other_lockdown))
europ_fit = rbind(europ_fit,fit(austria, other_lockdown))
europ_fit = rbind(europ_fit,fit(spain, other_lockdown))
We see that the used model describes the data before the actions have been taken quite well. However, the slope of the individual countries is quite different. Before the lockdown \(t_0\), the number of days needed for doubling (doubleing time):
kable(europ_fit %>% select(c('state', 't0')) %>% group_by(state) %>% summarise_each(mean))
| state | t0 |
|---|---|
| Italy | 3.225488 |
| Germany | 2.401180 |
| France | 3.409355 |
| Switzerland | 2.364518 |
| Austria | 2.163248 |
| Spain | 2.098819 |
What would happen if no actions would have been taken?
d_fit = d_plot + geom_line(data=europ_fit, aes(x=date, y=pred, col=state), linetype='dashed') +
#scale_x_date(labels=date_format("%d/%m"), limits = as.Date(c('2020-03-02','2020-03-20')))
ggtitle(title, subtitle = 'Data with fitted values (dashed), projections')
d_fit
If no meassures would have been taken, we would have by the end of the month:
europ_fit %>% filter(date == as.Date("2020-03-31")) %>% select(date, state, pred)
## date state pred
## 1 2020-03-31 Italy 1022206.9
## 2 2020-03-31 Germany 599591.2
## 3 2020-03-31 France 127612.0
## 4 2020-03-31 Switzerland 214067.4
## 5 2020-03-31 Austria 148511.2
## 6 2020-03-31 Spain 1634716.3
The cases are in the millions. While there is much spread in the data, I believe this numbers are not totally off. Since the states have 10s of millions of inhabitants the spread of the virus would be still in an exponential phase (not all infected or recovered / deceased).
First, we compare the projections to the real data.
d_fit_2 = d_fit +
ggtitle(title, subtitle = 'Data with fitted values (dashed), focus on fitted regions') +
scale_x_date(labels=date_format("%d/%m"), limits = as.Date(c('2020-03-08',last_date))) +
coord_cartesian(ylim=c(500,150000), xlim = as.Date(c('2020-03-08',last_date)))
d_fit_2
d_fit_2 + scale_y_continuous()
We do the same fit as above, but now for many dates. We do a calculate a fit over 7 days in the past.
i = 0
add_state = function(double_times, date, state){
res = fit(state = state, date, plot=FALSE)
double_times = double_times %>% add_row(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
return (double_times)
}
for (i in 0:50) {
date = last_date - days(i)
res = fit(state = italy, date, plot=FALSE)
if (i == 0) {
double_times = tibble(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
} else {
double_times = double_times %>% add_row(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
}
double_times = add_state(double_times, date, germany)
double_times = add_state(double_times, date, france)
double_times = add_state(double_times, date, italy)
double_times = add_state(double_times, date, austria)
double_times = add_state(double_times, date, spain)
double_times = add_state(double_times, date, suisse)
}
d_plot = ggplot(double_times) +
geom_line(aes(x = date, y=t0, color=`Country/Region`)) +
scale_y_continuous(trans = 'log10', name = 'Doubeling Time of Confirmed Cases') +
scale_x_date(labels=date_format("%d/%m"),
limits = as.Date(c('2020-03-09',last_date))) +
ggtitle('Doubeling Times ', subtitle = 'Focus on dates around the actions taken') +
scale_color_manual(values=cols) +
geom_vline(xintercept = italy_nationwide, col=cols['Italy']) +
geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy']) +
geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed') +
geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')
d_plot
d = double_times
for (i in 0:52) {
date = last_date - days(i)
double_times = add_state(double_times, date, china_hubei)
}
d_plot = ggplot(double_times) +
geom_line(aes(x = date, y=t0, color=`Country/Region`)) +
scale_y_continuous(trans = 'log10', name = 'Doubeling Time of Confirmed Cases') +
scale_x_date(labels=date_format("%d/%m")) +
#limits = as.Date(c('2020-03-09','2020-03-23'))) +
ggtitle('Doubeling Times ', subtitle = 'Long time with Hubei region in China') +
scale_color_manual(values=cols) +
geom_vline(xintercept = italy_nationwide, col=cols['Italy']) +
geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy']) +
geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed') +
geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')
d_plot
As Lucas Kook pointed out, the above analysis is a bit naive. First, the data points are corralated since, we take the sum over the timepoints. Another issue is that we so far neglect the fact that we have count data, which has a distribution. If, we prefect e.g. 100 cases we just predict the expected number of cases, 101 would be also ok. A way to model this is using
It seems that the actions taken start to be effective and the grows slows down. However, it’s still quite early to tell how strong the effect is.