Disclaimer: I am data scientist but not an expert in the field of epidemics.

This is a analysis of the corona Covid-19 data, provided from: https://github.com/CSSEGISandData/COVID-19. It tries to answer the question if the actions which have been taken by the european goverment are effective.

Data

Loading of the data (code for loading taken from https://towardsdatascience.com/create-a-coronavirus-app-using-r-shiny-and-plotly-6a6abf66091d).

library(lubridate)
source('load_data.R') #See also: https://gist.github.com/oduerr/9b69aac21463560679515ab51046b2c7
allData = 
  loadData("time_series_covid19_confirmed_global.csv", "CumConfirmed") %>%
  inner_join(loadData("time_series_covid19_deaths_global.csv", "CumDeaths")) #%>%
  #inner_join(
  #  loadData("time_series_19-covid-Recovered.csv","CumRecovered"))

last_date = max(allData$date)
print(paste0('Data from ', last_date))
## [1] "Data from 2020-04-26"

Current Data

The lastest number of cases are (we included the Hubei Province in China for Reference)

germany = allData %>% filter(`Country/Region` == c('Germany'))
italy = allData %>% filter(`Country/Region` == 'Italy')
france = allData %>% filter(`Country/Region` == 'France') %>% filter(`Province/State` == '<all>') #includes non-european teretories?
china_hubei = allData %>% filter(`Country/Region` == 'China') %>% filter(`Province/State` == 'Hubei')
suisse = allData %>% filter(`Country/Region` == 'Switzerland')
austria = allData %>% filter(`Country/Region` == 'Austria')
spain = allData %>% filter(`Country/Region` == 'Spain')
europe = rbind(germany, italy, france, suisse, austria, spain)
europe_with_china = rbind(europe, china_hubei)
last_day = max(europe$date)

library(knitr) #For nicer plotting of the table
kable(filter(europe_with_china, date==last_day))
Province/State Country/Region date CumConfirmed CumDeaths
Germany 2020-04-26 157770 5976
Italy 2020-04-26 197675 26644
France 2020-04-26 160847 22856
Switzerland 2020-04-26 29061 1610
Austria 2020-04-26 15225 542
Spain 2020-04-26 226629 23190
Hubei China 2020-04-26 68128 4512

Europe over time

Here we investigate the number of cases over time. We include some important dates in the plots.

From China, we see that it takes about 14 days until the effects of a lockdown will become visible, this is shown by the corresponding second vertical line.

  italy_nationwide = as.Date('2020-03-09')
  italy_nationwide_effect = as.Date('2020-03-23')
  
  other_lockdown = as.Date('2020-03-16')
  other_lockdown_effect = as.Date('2020-03-30')
cols <- c("Austria" = "darkred", "Italy" = 'darkgreen', "Switzerland" = "red","Germany" = "black", "France"='darkblue', "China"='red', "Spain"='yellow3')
title = paste0('Corona Cases in Europe as of : ', last_day)

library(ggplot2)
library(ggpubr)
library(scales)
d_plot = ggplot(europe) + 
  geom_line(aes(x = date, y=CumConfirmed, color=`Country/Region`)) + 
  #geom_line(aes(x = date, y=CumDeaths, color=`Country/Region`)) + 
  #scale_y_continuous(name = 'Total Number of Confirmed Cases') + 
  scale_y_continuous(trans = 'log10', name = 'Total Number of Confirmed Cases') + 
  scale_x_date(labels=date_format("%d/%m"),
               limits = c(as.Date('2020-02-20'), last_date)) +
  ggtitle(title, subtitle = 'Focus on dates around the actions taken') +
  scale_color_manual(values=cols) +
  geom_vline(xintercept = italy_nationwide, col=cols['Italy'])  +
  geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy'])  +
  geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed')  +
  geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')  
d_plot

d_plot + 
  geom_line(data = china_hubei, aes(x = date, y=CumConfirmed, color=`Country/Region`)) + 
  scale_x_date(labels=date_format("%d/%m")) + ggtitle(title, subtitle = 'Complete Time Span with China') 

What would happen isf no meassures have been taken

Let’s predict the numbers before govermental actions have been taken into the future. We assume an exponetial law of the from \[ y = a \cdot 2^{t/t_0} \] That formulae states that the number of cases \(y\) double after a period \(t_0\) of days. We try to estimate \(a\) and \(t_0\). We do a trick and take the log, this yields:

\[ \lg(y) = \frac{1}{t_0} \cdot t + \lg(a) \] We fit that model to all states.

  fit = function(state, start_date, plot=TRUE, days_back = 7) {
    f = state %>% 
    filter(date < (start_date + days(0))) %>% 
    filter(date > (start_date - days(days_back)))
    t = as.numeric(f$date) - as.numeric(start_date) #-6 -5 -4 ... 4  5  6
    y = log2(f$CumConfirmed)
    lm = lm(y ~ t)
    a = 2^(lm$coefficients[1])
    t0 = 1/lm$coefficients[2]
    
    name = state$`Country/Region`[1]
    if (plot) {
      plot(t, y, main=paste0('Before gov. action ', name, ' t0=', round(t0,2)), cex.main=0.8, xlab='time (0 is lockdown)', ylab='log number of cases')
    lines(t, lm$coefficients[2]*t+lm$coefficients[1])
    }
    
    t = seq(-10,30) #project 20 days after the lockdown
    return(data.frame(date = start_date + days(t), pred = a*2^(t/t0), state=name, t0))
  }
  
  europ_fit = fit(italy, italy_nationwide)

  europ_fit = rbind(europ_fit,fit(germany, other_lockdown))

  europ_fit = rbind(europ_fit,fit(france, other_lockdown))

  europ_fit = rbind(europ_fit,fit(suisse, other_lockdown))

  europ_fit = rbind(europ_fit,fit(austria, other_lockdown))

  europ_fit = rbind(europ_fit,fit(spain, other_lockdown))

We see that the used model describes the data before the actions have been taken quite well. However, the slope of the individual countries is quite different. Before the lockdown \(t_0\), the number of days needed for doubling (doubleing time):

  kable(europ_fit %>% select(c('state', 't0')) %>% group_by(state) %>% summarise_each(mean))
state t0
Italy 3.225488
Germany 2.401180
France 3.409355
Switzerland 2.364518
Austria 2.163248
Spain 2.098819

Forecasts (without action taken)

What would happen if no actions would have been taken?

  d_fit = d_plot + geom_line(data=europ_fit, aes(x=date, y=pred, col=state), linetype='dashed') +
    #scale_x_date(labels=date_format("%d/%m"), limits = as.Date(c('2020-03-02','2020-03-20')))
    ggtitle(title, subtitle = 'Data with fitted values (dashed), projections') 
  d_fit

If no meassures would have been taken, we would have by the end of the month:

europ_fit %>% filter(date == as.Date("2020-03-31")) %>% select(date, state, pred)
##         date       state      pred
## 1 2020-03-31       Italy 1022206.9
## 2 2020-03-31     Germany  599591.2
## 3 2020-03-31      France  127612.0
## 4 2020-03-31 Switzerland  214067.4
## 5 2020-03-31     Austria  148511.2
## 6 2020-03-31       Spain 1634716.3

The cases are in the millions. While there is much spread in the data, I believe this numbers are not totally off. Since the states have 10s of millions of inhabitants the spread of the virus would be still in an exponential phase (not all infected or recovered / deceased).

Are the actions effective?

Comparing projections to real data

First, we compare the projections to the real data.

  d_fit_2 = d_fit + 
    ggtitle(title, subtitle = 'Data with fitted values (dashed), focus on fitted regions') +
    scale_x_date(labels=date_format("%d/%m"), limits = as.Date(c('2020-03-08',last_date))) +
  coord_cartesian(ylim=c(500,150000), xlim = as.Date(c('2020-03-08',last_date)))
  d_fit_2

  d_fit_2 + scale_y_continuous()

Calculation of doubeling times

We do the same fit as above, but now for many dates. We do a calculate a fit over 7 days in the past.

i = 0

add_state = function(double_times, date, state){
  res = fit(state = state, date, plot=FALSE)
  double_times = double_times %>%  add_row(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
  return (double_times)
}

for (i in 0:50) {
  date = last_date - days(i)
  res = fit(state = italy, date, plot=FALSE)
  if (i == 0) {
    double_times = tibble(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
  } else {
    double_times = double_times %>%  add_row(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
  }
  double_times = add_state(double_times, date, germany)
  double_times = add_state(double_times, date, france)
  double_times = add_state(double_times, date, italy)
  double_times = add_state(double_times, date, austria)
  double_times = add_state(double_times, date, spain)
  double_times = add_state(double_times, date, suisse)
}
d_plot = ggplot(double_times) + 
  geom_line(aes(x = date, y=t0, color=`Country/Region`)) + 
  scale_y_continuous(trans = 'log10', name = 'Doubeling Time of Confirmed Cases') + 
  scale_x_date(labels=date_format("%d/%m"),
               limits = as.Date(c('2020-03-09',last_date))) +
  ggtitle('Doubeling Times ', subtitle = 'Focus on dates around the actions taken') +
  scale_color_manual(values=cols) +
  geom_vline(xintercept = italy_nationwide, col=cols['Italy'])  +
  geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy'])  +
  geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed')  + 
  geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')
d_plot

d = double_times
for (i in 0:52) {
  date = last_date - days(i)
  double_times = add_state(double_times, date, china_hubei)
}
d_plot = ggplot(double_times) + 
  geom_line(aes(x = date, y=t0, color=`Country/Region`)) + 
  scale_y_continuous(trans = 'log10', name = 'Doubeling Time of Confirmed Cases') + 
  scale_x_date(labels=date_format("%d/%m")) + 
               #limits = as.Date(c('2020-03-09','2020-03-23'))) +
  ggtitle('Doubeling Times ', subtitle = 'Long time with Hubei region in China') +
  scale_color_manual(values=cols) +
  geom_vline(xintercept = italy_nationwide, col=cols['Italy'])  +
  geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy'])  +
  geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed')  + 
  geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')
d_plot

Advanced Analysis

As Lucas Kook pointed out, the above analysis is a bit naive. First, the data points are corralated since, we take the sum over the timepoints. Another issue is that we so far neglect the fact that we have count data, which has a distribution. If, we prefect e.g. 100 cases we just predict the expected number of cases, 101 would be also ok. A way to model this is using

First conclusions

It seems that the actions taken start to be effective and the grows slows down. However, it’s still quite early to tell how strong the effect is.