In this vignette, we compare the results of the reproduction number estimation, Rt, obtained by the methods:
We will compare the different methods using the daily incidence reported cases in France, Germany, the USA and the UK. In this comparison, we will focus on the similarity of the different Rt estimations computing an optimal temporal shift between them. We will also analyze the regularity of the different estimations and the behavior of the estimations when we approach the current time. We do not intend, to conclude here, what is the best method. We want just, to contribute to a better understanding of the relation between the Rt estimations provided by these methods
We attach some required packages
library(ggplot2)
library(dplyr)
library(grid)
library(EpiEstim)
library(EpiNow2)
library(EpiInvert)Loading stored data on COVID-19 daily incidence up to 2022-05-05 for France, Germany, the USA and the UK:
data(incidence)
tail(incidence)
#> date FRA DEU USA UK
#> 828 2022-04-30 49482 11718 23349 0
#> 829 2022-05-01 36726 4032 16153 0
#> 830 2022-05-02 8737 113522 81644 32
#> 831 2022-05-03 67017 106631 61743 35518
#> 832 2022-05-04 47925 96167 114308 16924
#> 833 2022-05-05 44225 85073 72158 12460Loading some festive days for the same countries:
data(festives)
head(festives)
#> USA DEU FRA UK
#> 1 2020-01-01 2020-01-01 2020-01-01 2020-01-01
#> 2 2020-01-20 2020-04-10 2020-04-10 2020-04-10
#> 3 2020-02-17 2020-04-13 2020-04-13 2020-04-13
#> 4 2020-05-25 2020-05-01 2020-05-01 2020-05-08
#> 5 2020-06-21 2020-05-21 2020-05-08 2020-05-25
#> 6 2020-07-03 2020-06-01 2020-05-21 2020-06-21Loading the serial interval proposed in Nishiura et al. which follows a log-normal distribution with a median equals to 4 days, a mean equals to 4.7 days and standard deviation of 2.9 days:
si_distrib <- read.csv(url("https://www.ctim.es/covid19/nishiura.csv"),header=FALSE)
head(si_distrib)
#> V1
#> 1 0.00000000
#> 2 0.04169583
#> 3 0.16207936
#> 4 0.20316240
#> 5 0.17512594
#> 6 0.13039113In what follows we will estimate Rt using the different methods in the case of France. For the other countries, we can do it in the same way, changing, in the script ‘FRA’ by ‘DEU’, ‘USA’ or ‘UK’
We prepare the incidence data for EpiEstim:
FRA_incid <- data.frame(
dates = as.Date(incidence$date),
I = incidence$FRA
)
FRA_incid$I[incidence$FRA < 0] <- 0We compute the EpiEstim Rt estimate
N<-length(FRA_incid$I)
FRA_EpiEstim <- estimate_R(FRA_incid,
method="non_parametric_si",
config = make_config(list(
t_start = seq(30, N-6),
t_end = seq(30+6, N),
si_distr = si_distrib$V1))
)
tail(FRA_EpiEstim$R$Mean)
[1] 0.7337554 0.7319374 0.7672001 0.7490753 0.7473938 0.7532675We compute the Wallinga-Teunis Rt estimate:
FRA_Wall_Teun <- wallinga_teunis(FRA_incid,
method="non_parametric_si",
config = list(
t_start = seq(30, N-6),
t_end = seq(30+6, N),
si_distr =si_distrib$V1,
n_sim =0))
tail(FRA_Wall_Teun$R$`Mean(R)`)
[1] 0.6699915 0.6393988 0.6294066 0.5260823 0.4289424 0.3322501We compute the EpiInvert Rt estimate:
FRA_EpiInvert <- EpiInvert(incidence$FRA,
"2022-05-05",
festives$FRA,
select_params(list(si_distr =si_distrib$V1,
max_time_interval=9999)))
tail(FRA_EpiInvert$Rt)
[1] 0.7637433 0.7652671 0.7659675 0.7661722 0.7661985 0.7661985EpiNow2 does not use the serial interval. Instead, it uses the generation time and the incidence is computed backward using distributions for the incubation period and reporting delay.
First we upload the information about these new required distributions:
reporting_delay <- estimate_delay(rlnorm(1000, log(2), 1),
max_value = 15, bootstraps = 1)
generation_time <- get_generation_time(disease = "SARS-CoV-2", source = "ganyani")
incubation_period <- get_incubation_period(disease = "SARS-CoV-2", source = "lauer")We were unable to run EpiNow2 for the full sequences of the incidence data with 833 samples. We have used a subsample of 490 data corresponding to the incidence after January 1, 2021. In the following instruction we eliminate the data of previous dates:
d <- dplyr::filter(incidence,date >= "2021-01-01")We prepare the incidence data to call to EpiNow2
reported_cases <- data.frame(
date = as.Date(d$date),
confirm = d$FRA
)We execute EpiNow2 which includes a 7 day forecast of Rt
FRA_EpiNow2 <- epinow(reported_cases = reported_cases,
generation_time = generation_time,
delays = delay_opts(incubation_period, reporting_delay),
rt = rt_opts(prior = list(mean = 2, sd = 0.2)),
stan = stan_opts(cores = 4))
tail(summary(FRA_EpiNow2, type = "parameters", params = "R")$mean)
[1] 0.8171678 0.8171678 0.8171678 0.8171678 0.8171678 0.8171678Limitations of this study concerning the use of EpiNow2: The inputs of EpiNow2 are quite different to the ones of the previous methods. In particular, the reporting delay is country depending and we have not information about it for the countries we study, so, we use a synthetic example proposed in the EpiNow2 GitHub webpage. On the other hand the execution of EpiNow2 for the incidence data we use produced, in some cases, a warning. Next, we present a table with the summary of the EpiNow2 execution:
| Country | Execution time | Warning message |
|---|---|---|
| France | 10 hours | - |
| Germany | 4 hours | There were 3 divergent transitions after warmup |
| the USA | 6 hours | There were 2 divergent transitions after warmup |
| the UK | 2 hours | There were 15 divergent transitions after warmup |
First, we will compare the Rt estimations provided by the different methods by estimating the optimal shift between every pair of methods. We assume that the Rt estimates are defined in the time interval [0,T] where the value in T corresponds to the last value of Rt provided by the method. Let R and R’ be two estimations of Rt provided by 2 methods. We define the optimal shift, s(R,R’), by minimizing the average error d(R,R’:s) :
\((1) \ \ \ \ s(R,R^{\prime})=\underset{s\in\lbrack-25,25]}{\arg\min} \ \ d(R,R^{\prime}\!:\!s)\equiv\sum_{t=30}^{T-30}\frac{|R(t)-R^{\prime}(t+s)|+|R(t-s)-R^{\prime}(t)|}{2(T-60)}\)
The value 30 is introduced in the formula to avoid boundary effects in the comparison. The lower is the value of d(R,R’:s) the better is the match between R and R’ after a shift given by s. Notice that the optimal shift is anti-symmetric, that is s(R,R’)=-s(R’,R) and d(R,R’:s) is symmetric, that is d(R,R’:s)=d(R’,R:s). To evaluate R(t-s) or R’(t+s) we use linear interpolation.
In the following 4 figures, we show, for the different countries, the plot of Rt obtained by the methods EE (EpiEstim), EI (EpiInvert), WT (Wallinga-Teunis) and EN (EpiNow2) after applying the optimal shift estimated between EI and the other methods. That is, we compute s(EI,EE), s(EI,WT) and s(EI,EN).
We observe a quite good agreement between the different Rt estimates after applying the optimal shift. The obtained shifted Rt estimates that we show in the above figures can be downloaded from:
FRA_Rt_shift <- read.csv(url("https://www.ctim.es/covid19/FRA_Rt_shift.csv"),sep=";")
DEU_Rt_shift <- read.csv(url("https://www.ctim.es/covid19/DEU_Rt_shift.csv"),sep=";")
USA_Rt_shift <- read.csv(url("https://www.ctim.es/covid19/USA_Rt_shift.csv"),sep=";")
UK_Rt_shift <- read.csv(url("https://www.ctim.es/covid19/UK_Rt_shift.csv"),sep=";")Next, we present 2 tables with the mean and standard deviation of s(R,R’) and d(R,R’,s(R,R’)) for the different countries.
Table 1 : mean and standard deviation of s(R,R’) for the different countries
| EI | EE | WT | EN | |
|---|---|---|---|---|
| EI | 0 \(\pm\) 0 | 7.36 \(\pm\) 0.02 | 3.02 \(\pm\) 0.19 | -8.92 \(\pm\) 0.43 |
| EE | -7.36 \(\pm\) 0.02 | 0 \(\pm\) 0 | -4.32 \(\pm\) 0.03 | -16.71 \(\pm\) 0.41 |
| WT | -3.02 \(\pm\) 0.19 | 4.32 \(\pm\) 0.03 | 0 \(\pm\) 0 | -12.08 \(\pm\) 0.44 |
| EN | 8.92 \(\pm\) 0.43 | 16.71 \(\pm\) 0.41 | -12.08 \(\pm\) 0.44 | 0 \(\pm\) 0 |
Table 2 : mean and standard deviation of d(R,R’,s(R,R’)) for the different countries
| EI | EE | WT | EN | |
|---|---|---|---|---|
| EI | 0 \(\pm\) 0 | 0.034 \(\pm\) 0.011 | 0.026 \(\pm\) 0.009 | 0.057 \(\pm\) 0.021 |
| EE | 0.034 \(\pm\) 0.011 | 0 \(\pm\) 0 | 0.025 \(\pm\) 0.007 | 0.058 \(\pm\) 0.016 |
| WT | 0.026 \(\pm\) 0.009 | 0.025 \(\pm\) 0.007 | 0 \(\pm\) 0 | 0.046 \(\pm\) 0.015 |
| EN | 0.057 \(\pm\) 0.021 | 0.058 \(\pm\) 0.016 | 0.046 \(\pm\) 0.015 | 0 \(\pm\) 0 |
In
PNAS
(2021), we show that the expected value of
s(EI,EE) is about 3
+m where m is closed to a central value (median or mean)
of the serial interval distribution. Since the used serial interval has
a median of 4 and a mean of 4.7, the expected value of
s(EI,EE) is between
7 and 7.7. Notice that, as shown in table 1, the
obtained mean value of
s(EI,EE) for the 4 countries
we use is 7.36 which confirms our expectations. On the other
hand, in
PNAS
(2021), we also show that if R and R’ are the Rt
estimations for WT and EE
respectively, then
\(R(t)=\sum_k\Phi(k)R'(t+k)\)
the second term of this equation can be approximated in the following way:
\(\sum_k\Phi(k)R'(t+k) \approx R'(t+m)\)
where m is a central value of the serial interval \(\Phi\). Therefore, the expected value of s(WT,EE) is between 4 and 4.7 which is confirmed by the obtained value, 4.32, showed in table 1. On the other hand, as it is also expected, we have that, using the values of table 1, s(EI,EE)=7.36 is closed to s(EI,WT)+s(WT,EE)=3.02+4.32=7.34.
Concerning the shift of EN with respect to the other methods, we have that the most significant part of this shift is due to the 7-day forecast included in the EN estimation. In spite of the method EN is quite different to the other ones, the stability of the shift value between EN and the other methods for the different countries suggests that it must exist a theoretical reason which justify such shift value.
In the following plot we show the optimal shift s(R,R’) versus the average error d(R,R’,s(R,R’)) for any pair of methods and the different countries. As stated numerically in tables 1 and 2, we can observe that the value of s(R,R’) is very stable for the different countries and that the largest values of d(R,R’,s(R,R’)) corresponds to the comparison of EN with the other methods and the lowest values to the comparison of WT with EE and EI.
To measure the regularity of the different Rt estimates, we use the following averages of the absolute values of the first and second derivatives of Rt.
\((2) \ \ \ \ \text{FIRST DERIVATIVE AVERAGE:} \ \ ||dR/dt||=\sum_{t=30}^{T-30}\frac{|R(t+h)-R(t-h)|}{2(T-60)}\)
\((3) \ \ \ \ \text{SECOND DERIVATIVE AVERAGE:} \ \ ||d^2R/dt^2||=\sum_{t=30}^{T-30}\frac{|R(t+h)-2R(t)+R(t-h)|}{T-60}\)
In the following plot we compare the regularity of the different Rt estimates. We observe that by far, EN provides the smoother estimate, EE provides the most irregular estimates and EI and WT provide similar results. Concerning the regularity we have to take into account that we can tune the regularity of the methods changing the method parameters. For instance, in the case of WT and EE we can change the number of weeks to average the incidence and in the case of EI there exists a parameter to fit the regularity in the estimation. In this comparison we use the default parameters provided by the methods.
Next, we study, in more details, the behavior of the shifted Rt estimates when we approach the current time. In the next figure we show, for the different countries the shifted Rt estimates after March 1, 2022. We observe, as expected, that the precision of WT deteriorates as the current time approaches. The reason is that QT is designed to calculate Rt retrospectively, that is, to calculate Rt at time t it uses the incidence data up to time t+max(support(\(\Phi\))) (where support(\(\Phi\))=\(\{k: \Phi(k)>0\}\)). Therefore, the quality of the estimate deteriorates as we approach the current time. Regarding the methods EI, EE and EN, when the estimate varies smoothly in the 3 cases, as is the case for France, the final estimate of Rt is quite similar in the 3 cases. When the estimation of EE, which is the most irregular, has strong oscillations, the estimation of the 3 methods can be quite different. In the case of Germany, we can observe that the estimates of EE and EI are decreasing until reaching the last value of EE , but then the estimate of EI grows in the following days approaching the value of EN, this may be due, as showed in PNAS, 2021, to the fact that EI reacts before EE to a change in trend. As can be seen, the estimation of EN is very smooth. A smooth estimate has the advantage that it is more stable but on the other hand we can expect it to take longer to react to a change in trend. Therefore, it is necessary to seek a certain balance between the stability of the estimate and its ability to react to a change in trend. In any case, as already mentioned, by modifying the parameters of the methods it is possible to tune the regularity of the estimation.