Disclaimer: I am data scientist but not an expert in the field of epidemics.

This is a analysis of the corona Covid-19 data, provided from: https://github.com/CSSEGISandData/COVID-19. It tries to answer the question if the actions which have been taken by the european goverment are effective.

Other links
- https://www.sueddeutsche.de/wissen/coronavirus-zahlen-daten-1.4844448
Earlier Versions
- 3/19: https://rpubs.com/Oliver4242/586828

Data

Loading of the data (code for loading taken from https://towardsdatascience.com/create-a-coronavirus-app-using-r-shiny-and-plotly-6a6abf66091d).

library(lubridate)
source('load_data.R') #See also: https://gist.github.com/oduerr/9b69aac21463560679515ab51046b2c7
allData = 
  loadData("time_series_covid19_confirmed_global.csv", "CumConfirmed") %>%
  inner_join(loadData("time_series_covid19_deaths_global.csv", "CumDeaths")) #%>%
  #inner_join(
  #  loadData("time_series_19-covid-Recovered.csv","CumRecovered"))

last_date = max(allData$date)
print(paste0('Data from ', last_date))

## [1] "Data from 2020-04-26"

Current Data

The lastest number of cases are (we included the Hubei Province in China for Reference)

germany = allData %>% filter(`Country/Region` == c('Germany'))
italy = allData %>% filter(`Country/Region` == 'Italy')
france = allData %>% filter(`Country/Region` == 'France') %>% filter(`Province/State` == '<all>') #includes non-european teretories?
china_hubei = allData %>% filter(`Country/Region` == 'China') %>% filter(`Province/State` == 'Hubei')
suisse = allData %>% filter(`Country/Region` == 'Switzerland')
austria = allData %>% filter(`Country/Region` == 'Austria')
spain = allData %>% filter(`Country/Region` == 'Spain')
europe = rbind(germany, italy, france, suisse, austria, spain)
europe_with_china = rbind(europe, china_hubei)
last_day = max(europe$date)

library(knitr) #For nicer plotting of the table
kable(filter(europe_with_china, date==last_day))

Province/State	Country/Region	date	CumConfirmed	CumDeaths
	Germany	2020-04-26	157770	5976
	Italy	2020-04-26	197675	26644
	France	2020-04-26	160847	22856
	Switzerland	2020-04-26	29061	1610
	Austria	2020-04-26	15225	542
	Spain	2020-04-26	226629	23190
Hubei	China	2020-04-26	68128	4512

Europe over time

Here we investigate the number of cases over time. We include some important dates in the plots.

Italy nationwide lockdown on 2020-03-09 https://en.wikipedia.org/wiki/2020_Italy_coronavirus_lockdown the first lockdown of lombardy and other regions were a bit earlier.
For the rest of europe the most meassures took place around Monday 16th of March. On that day Germany closed it schools, and other countries like France, Spain, and Austria, even imposed a curfew.

From China, we see that it takes about 14 days until the effects of a lockdown will become visible, this is shown by the corresponding second vertical line.

  italy_nationwide = as.Date('2020-03-09')
  italy_nationwide_effect = as.Date('2020-03-23')
  
  other_lockdown = as.Date('2020-03-16')
  other_lockdown_effect = as.Date('2020-03-30')

cols <- c("Austria" = "darkred", "Italy" = 'darkgreen', "Switzerland" = "red","Germany" = "black", "France"='darkblue', "China"='red', "Spain"='yellow3')
title = paste0('Corona Cases in Europe as of : ', last_day)

library(ggplot2)
library(ggpubr)
library(scales)
d_plot = ggplot(europe) + 
  geom_line(aes(x = date, y=CumConfirmed, color=`Country/Region`)) + 
  #geom_line(aes(x = date, y=CumDeaths, color=`Country/Region`)) + 
  #scale_y_continuous(name = 'Total Number of Confirmed Cases') + 
  scale_y_continuous(trans = 'log10', name = 'Total Number of Confirmed Cases') + 
  scale_x_date(labels=date_format("%d/%m"),
               limits = c(as.Date('2020-02-20'), last_date)) +
  ggtitle(title, subtitle = 'Focus on dates around the actions taken') +
  scale_color_manual(values=cols) +
  geom_vline(xintercept = italy_nationwide, col=cols['Italy'])  +
  geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy'])  +
  geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed')  +
  geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')  
d_plot

d_plot + 
  geom_line(data = china_hubei, aes(x = date, y=CumConfirmed, color=`Country/Region`)) + 
  scale_x_date(labels=date_format("%d/%m")) + ggtitle(title, subtitle = 'Complete Time Span with China')

What would happen isf no meassures have been taken

Let’s predict the numbers before govermental actions have been taken into the future. We assume an exponetial law of the from \[ y = a \cdot 2^{t/t_0} \] That formulae states that the number of cases \(y\) double after a period \(t_0\) of days. We try to estimate \(a\) and \(t_0\). We do a trick and take the log, this yields:

\[ \lg(y) = \frac{1}{t_0} \cdot t + \lg(a) \] We fit that model to all states.

  fit = function(state, start_date, plot=TRUE, days_back = 7) {
    f = state %>% 
    filter(date < (start_date + days(0))) %>% 
    filter(date > (start_date - days(days_back)))
    t = as.numeric(f$date) - as.numeric(start_date) #-6 -5 -4 ... 4  5  6
    y = log2(f$CumConfirmed)
    lm = lm(y ~ t)
    a = 2^(lm$coefficients[1])
    t0 = 1/lm$coefficients[2]
    
    name = state$`Country/Region`[1]
    if (plot) {
      plot(t, y, main=paste0('Before gov. action ', name, ' t0=', round(t0,2)), cex.main=0.8, xlab='time (0 is lockdown)', ylab='log number of cases')
    lines(t, lm$coefficients[2]*t+lm$coefficients[1])
    }
    
    t = seq(-10,30) #project 20 days after the lockdown
    return(data.frame(date = start_date + days(t), pred = a*2^(t/t0), state=name, t0))
  }
  
  europ_fit = fit(italy, italy_nationwide)

  europ_fit = rbind(europ_fit,fit(germany, other_lockdown))

  europ_fit = rbind(europ_fit,fit(france, other_lockdown))

  europ_fit = rbind(europ_fit,fit(suisse, other_lockdown))

  europ_fit = rbind(europ_fit,fit(austria, other_lockdown))

  europ_fit = rbind(europ_fit,fit(spain, other_lockdown))

We see that the used model describes the data before the actions have been taken quite well. However, the slope of the individual countries is quite different. Before the lockdown \(t_0\), the number of days needed for doubling (doubleing time):

  kable(europ_fit %>% select(c('state', 't0')) %>% group_by(state) %>% summarise_each(mean))

state	t0
Italy	3.225488
Germany	2.401180
France	3.409355
Switzerland	2.364518
Austria	2.163248
Spain	2.098819

Forecasts (without action taken)

What would happen if no actions would have been taken?

  d_fit = d_plot + geom_line(data=europ_fit, aes(x=date, y=pred, col=state), linetype='dashed') +
    #scale_x_date(labels=date_format("%d/%m"), limits = as.Date(c('2020-03-02','2020-03-20')))
    ggtitle(title, subtitle = 'Data with fitted values (dashed), projections') 
  d_fit

If no meassures would have been taken, we would have by the end of the month:

europ_fit %>% filter(date == as.Date("2020-03-31")) %>% select(date, state, pred)

##         date       state      pred
## 1 2020-03-31       Italy 1022206.9
## 2 2020-03-31     Germany  599591.2
## 3 2020-03-31      France  127612.0
## 4 2020-03-31 Switzerland  214067.4
## 5 2020-03-31     Austria  148511.2
## 6 2020-03-31       Spain 1634716.3

The cases are in the millions. While there is much spread in the data, I believe this numbers are not totally off. Since the states have 10s of millions of inhabitants the spread of the virus would be still in an exponential phase (not all infected or recovered / deceased).

Are the actions effective?

Comparing projections to real data

First, we compare the projections to the real data.

  d_fit_2 = d_fit + 
    ggtitle(title, subtitle = 'Data with fitted values (dashed), focus on fitted regions') +
    scale_x_date(labels=date_format("%d/%m"), limits = as.Date(c('2020-03-08',last_date))) +
  coord_cartesian(ylim=c(500,150000), xlim = as.Date(c('2020-03-08',last_date)))
  d_fit_2

  d_fit_2 + scale_y_continuous()

Calculation of doubeling times

We do the same fit as above, but now for many dates. We do a calculate a fit over 7 days in the past.

i = 0

add_state = function(double_times, date, state){
  res = fit(state = state, date, plot=FALSE)
  double_times = double_times %>%  add_row(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
  return (double_times)
}

for (i in 0:50) {
  date = last_date - days(i)
  res = fit(state = italy, date, plot=FALSE)
  if (i == 0) {
    double_times = tibble(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
  } else {
    double_times = double_times %>%  add_row(date=date, t0 = res$t0[1], `Country/Region` = res$state[1])
  }
  double_times = add_state(double_times, date, germany)
  double_times = add_state(double_times, date, france)
  double_times = add_state(double_times, date, italy)
  double_times = add_state(double_times, date, austria)
  double_times = add_state(double_times, date, spain)
  double_times = add_state(double_times, date, suisse)
}

d_plot = ggplot(double_times) + 
  geom_line(aes(x = date, y=t0, color=`Country/Region`)) + 
  scale_y_continuous(trans = 'log10', name = 'Doubeling Time of Confirmed Cases') + 
  scale_x_date(labels=date_format("%d/%m"),
               limits = as.Date(c('2020-03-09',last_date))) +
  ggtitle('Doubeling Times ', subtitle = 'Focus on dates around the actions taken') +
  scale_color_manual(values=cols) +
  geom_vline(xintercept = italy_nationwide, col=cols['Italy'])  +
  geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy'])  +
  geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed')  + 
  geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')
d_plot

d = double_times
for (i in 0:52) {
  date = last_date - days(i)
  double_times = add_state(double_times, date, china_hubei)
}
d_plot = ggplot(double_times) + 
  geom_line(aes(x = date, y=t0, color=`Country/Region`)) + 
  scale_y_continuous(trans = 'log10', name = 'Doubeling Time of Confirmed Cases') + 
  scale_x_date(labels=date_format("%d/%m")) + 
               #limits = as.Date(c('2020-03-09','2020-03-23'))) +
  ggtitle('Doubeling Times ', subtitle = 'Long time with Hubei region in China') +
  scale_color_manual(values=cols) +
  geom_vline(xintercept = italy_nationwide, col=cols['Italy'])  +
  geom_vline(xintercept = italy_nationwide_effect, col=cols['Italy'])  +
  geom_vline(xintercept = other_lockdown, col=cols['Germany'], linetype='dashed')  + 
  geom_vline(xintercept = other_lockdown_effect, col=cols['Germany'],linetype='dashed')
d_plot

Advanced Analysis

As Lucas Kook pointed out, the above analysis is a bit naive. First, the data points are corralated since, we take the sum over the timepoints. Another issue is that we so far neglect the fact that we have count data, which has a distribution. If, we prefect e.g. 100 cases we just predict the expected number of cases, 101 would be also ok. A way to model this is using

First conclusions

It seems that the actions taken start to be effective and the grows slows down. However, it’s still quite early to tell how strong the effect is.

Data Analysis of Corona Outbreak in Europe

Stay at home baby

Oliver Dürr

4/27/2020