A novel EVD outbreak in Ankh, Republic of Morporkia

A photo of the Ebola virus

A photo of the Ebola virus


A new EVD outbreak has been notified in the small city of Ankh, located in the Northern, rural district of the Republic of Morporkia. Public Health Morporkia (PHM) is in charge of coordinating the outbreak response, and have contracted you as a consultant in epidemics analysis to inform the response in real time.

Ankh, Republic of Morkporkia

Ankh, Republic of Morkporkia


Managing the Data

Importing Data Files

We’ll load in various R libraries, and three documents that the Ministry of Health has provided us. First, the linelist of 12 confirmed Ebola cases. Second, a list of 11 contacts from these cases. Last, a document that tells R how to clean the messy data.

library(rio) library(ggplot2) library(dplyr) library(magrittr) library(outbreaks) library(incidence) library(epicontacts) library(linelist) library(distcrete) library(epitrix) library(earlyR) library(projections) library(tidyverse) library(sjPlot) library(jtools)

linelist_raw <- rio::import(“phm_evd_linelist_2017-10-27.xlsx”) contacts_raw <- rio::import(“phm_evd_contacts_2017-10-27.xlsx”) cleaning_rules <- rio::import(“phm_evd_cleaning_rules.xlsx”)


Original, Messy Data

In the file we’ve been sent, the dates and the sex variable are not standardized. Location names are misspelled. In the contacts list, all we have is a case ID for the contact and it’s connection to the confirmed case.

##        id  Date of Onset     Date.Report.   SEX. Âge     location
## 1  39e9dc          43018            43030 Female  62  Pseudopolis
## 2  664549          43024            43032   Male  28   peudopolis
## 3  B4D8AA          43025     “23/10/2017”   male  54 Ankh-Morpork
## 4  51883d “18// 10/2017”       22-10-2017   male  57  PSEUDOPOLIS
## 5  947e40          43028     “2017/10/25”      f  23 Ankh Morpork
## 6  9aa197          43028     “2017-10-23”      f  66           AM
## 7  e4b0a2          43029     “2017-10-24” female  13 Ankh Morpork
## 8  AF0AC0   “2017-10-21”     “26-10-2017”      M  10 ankh morpork
## 9  185911   “2017-10-21”     “26-10-2017” female  34           AM
## 10 601D2E   “2017/10/22”     “28-10-2017”   <NA>  11           AM
## 11 605322          43030 “28 / 10 / 2017” FEMALE  23 Ankh Morpork
## 12 E399B1 23 / 10 / 2017     “28/10/2017” female  23 Ankh Morpork
##    Source Case #ID case.id
## 1           51883d  185911
## 2           b4d8aa  e4b0a2
## 3           39e9dc  b4d8aa
## 4           39E9DC  601d2e
## 5           51883d  9AA197
## 6           39e9dc  51883d
## 7           39E9DC  e399b1
## 8           b4d8aa  AF0AC0
## 9           39E9DC  947e40
## 10          39E9DC  664549
## 11          39e9dc  605322

Cleaned Data

Once we run it through our cleaning rules, the data looks better.

Linelist
id date of_onset date report sex age location
39e9dc 2017-10-10 2017-10-22 female 62 pseudopolis
664549 2017-10-16 2017-10-24 male 28 pseudopolis
b4d8aa 2017-10-17 2017-10-23 male 54 ankh_morpork
51883d 2017-10-18 2017-10-22 male 57 pseudopolis
947e40 2017-10-20 2017-10-25 female 23 ankh_morpork
9aa197 2017-10-20 2017-10-23 female 66 ankh_morpork
e4b0a2 2017-10-21 2017-10-24 female 13 ankh_morpork
af0ac0 2017-10-21 2017-10-26 male 10 ankh_morpork
185911 2017-10-21 2017-10-26 female 34 ankh_morpork
601d2e 2017-10-22 2017-10-28 unknown 11 ankh_morpork
605322 2017-10-22 2017-10-28 female 23 ankh_morpork
e399b1 2017-10-23 2017-10-28 female 23 ankh_morpork
Contacts
source case_id case id
51883d 185911
b4d8aa e4b0a2
39e9dc b4d8aa
39e9dc 601d2e
51883d 9aa197
39e9dc 51883d
39e9dc e399b1
b4d8aa af0ac0
39e9dc 947e40
39e9dc 664549
39e9dc 605322

Connecting Contacts to Cases

Next, we will connect the contacts to their cases, and map the linkages. We can see here that a few cases have spread EVD to multiple people. You can scroll over the icons to see the characteristics of these “superspreaders”. We have identified many contacts, but not all of them. There is lots of work to do.


Incidence

Next we want to find the incidence of Ebola in the community. We will make a simple epi curve by date of symptom onset. It is important to note that the graph includes data through Oct. 28, there just haven’t been any reported cases from Oct. 23 to Oct. 28.


Statistical Analysis

Log-linear model

The simplest model of incidence is probably the log-linear model. This is a linear regression on log-transformed incidences. We will estimate the parameters of the model (the model fit, f) based on the outbreak’s incidence (i). When we look at the model fit, we will derive the estimates of the growth rate (r) and the doubling time of the outbreak. Here, we’ve added the model to the incidence plot.

We see the estimated daily growth rate is 0.05 (-0.03, 0.14) and that the outbreak is doubling every 12 days.

## <incidence_fit object>
## 
## $model: regression of log-incidence over time
## 
## $info: list containing the following items:
##   $r (daily growth rate):
## [1] 0.05352107
## 
##   $r.conf (confidence interval):
##           2.5 %    97.5 %
## [1,] -0.0390633 0.1461054
## 
##   $doubling (doubling time in days):
## [1] 12.95092
## 
##   $doubling.conf (confidence interval):
##         2.5 %    97.5 %
## [1,] 4.744158 -17.74421
## 
##   $pred: data.frame of incidence predictions (8 rows, 5 columns)

Visualizing the model fit

We’ll now add the log-linear fit to our graph of outbreak data.

How would you interpret this result? What criticism would you make on this model?

It works fine.


Estimating R

The transmissibility of the disease can be assessed through the estimation of the reproduction number R, defined as the number of expected secondary cases per infected case. In the early stages of an outbreak, and assuming no immunity in the population, this quantity is also the basic reproduction number R0, i.e. R in a fully susceptible population.

The package earlyR implements a simple maximum-likelihood estimation of R, using dates of onset of symptoms and information on the serial interval distribution.


Estimating the serial interval from older Ebola data

As current data are insufficient to estimate the serial interval distribution, some colleague recommends using data from a past outbreak stored in the outbreaks package, as the dataset ebola_sim_clean.

We load in this large dataset (6,000 cases, 4,000 contacts) in order to estimate the serial number from this outbreak.

The function get_pairwise can be used to extract pairwise features of contacts based on attributes of the linelist. For instance, it could be used to test for assortativity, but also to compute delays between connected cases. Here, we use it to extract the serial interval:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    5.00    9.00   11.06   15.00   99.00

Removing the missing variables, we get this five number summary of the serial interval. The mean serial interval is 11.06.


Serial interval

Here is the empirical distribution of the serial intervals, if we were interested.

What do you think of this distribution?


Gamma distribution

Then we’ll use the function fit_disc_gamma from the package epitrix to fit a discretised Gamma distribution to these data.

## $mu
## [1] 11.55634
## 
## $cv
## [1] 0.6493035
## 
## $sd
## [1] 7.50357
## 
## $ll
## [1] -6939.76
## 
## $converged
## [1] TRUE
## 
## $distribution
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.37194455151125
##     scale: 4.87209431456314

si_fit contains various information about the fitted delays, including the estimated distribution in the $distribution slot.


You can compare this distribution to the empirical data in the following plot:

Would you trust this estimation of the generation time? How would you compare it to actual estimates from the West African EVD outbreak (WHO Ebola Response Team (2014) NEJM 371:1481–1495) with a mean of 15.3 days and a standard deviation 9.3 days?

The model somewhat fits the data. The serial interval we’ve estimated is lower than what WHO published in 2014, closer to 11 than 15.


Estimation of R0 in the current outbreak

Now that we have estimates of the serial interval based on a previous outbreak, we can use this information to estimate transmissibility of the disease (as measured by R0) in the current outbreak.

Using the estimates of the mean and standard deviation of the serial interval you just obtained, we’ll use the function get_R to estimate the reproduction number, specifying a maximum R of 10.

## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 1.821822
## 
## 
##  // $lambda:
##   0.0414587 0.05362817 0.06170817 0.06622619 0.06786168 0.06728274...
## 
##  // $dates:
## [1] "2017-10-11" "2017-10-12" "2017-10-13" "2017-10-14" "2017-10-15"
## [6] "2017-10-16"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.37194455151125
##     scale: 4.87209431456314

## NULL

This graph shows us the possible range of R values, with the most likely value being 1.822.


Force of Infection

We can also graph the force of infection over time, based on our current 12 cases.

Interpret these results: what do you make of the reproduction number? What does it reflect? Based on the last part of the epicurve, some colleagues suggest that incidence is going down and the outbreak may be under control. What is your opinion on this?

R0 is significantly higher than 1, which is concerning. Incidence may be 0 for the last few days, but that is not enough to say it is going down. The force of infection graph shows us that Ebola is still transmitting quickly in the community through the last days of October. This would suggest there will be many more cases through early November.


Short-term forecasting

The function project from the package projections can be used to simulate plausible epidemic trajectories by simulating daily incidence using the same branching process as the one used to estimate R0 in earlyR. All that is needed is one or several values of R0 and a serial interval distribution, stored as a distcrete object.

Here we have generated 5 random trajectories of Ebola over the next few days. In this case, we used a fixed R value. Here we are showing the random trajectories over the next four days.

## 
## /// Incidence projections //
## 
##   // class: projections, matrix
##   // 10 dates (rows); 5 simulations (columns)
## 
##  // first rows/columns:
##            [,1] [,2] [,3] [,4] [,5]
## 2017-10-28    2    2    3    1    0
## 2017-10-29    0    1    1    2    1
## 2017-10-30    0    3    4    0    2
## 2017-10-31    0    4    6    0    1
##  .
##  .
##  .
## 
##  // dates:
##  [1] "2017-10-28" "2017-10-29" "2017-10-30" "2017-10-31" "2017-11-01"
##  [6] "2017-11-02" "2017-11-03" "2017-11-04" "2017-11-05" "2017-11-06"

30 day projections

Next, we’ll generate a graph of 1,000 possible trajectories over the next 30 days. We’ll use an unfixed R value, which will produce slightly more varied (dramatic) projections.


Combining our Data and Projections

Now we’ll combine this with the case data we already have. This is what the Ministry of Health will be most interested in. In gray we see the currently collected data, and the light blue shows the confidence intervals of the potential trajectory of cases.


30 day projections

Last, we’ll look just at the projection of mean number of cases per day over the next 30 days.

## 2017-10-28 2017-10-29 2017-10-30 2017-10-31 2017-11-01 2017-11-02 2017-11-03 
##      0.708      0.734      0.729      0.742      0.750      0.779      0.770 
## 2017-11-04 2017-11-05 2017-11-06 2017-11-07 2017-11-08 2017-11-09 2017-11-10 
##      0.779      0.818      0.831      0.832      0.856      0.851      0.868 
## 2017-11-11 2017-11-12 2017-11-13 2017-11-14 2017-11-15 2017-11-16 2017-11-17 
##      0.888      0.919      0.910      0.926      0.929      0.936      0.931 
## 2017-11-18 2017-11-19 2017-11-20 2017-11-21 2017-11-22 2017-11-23 2017-11-24 
##      0.954      0.955      0.953      0.959      0.971      0.973      0.967 
## 2017-11-25 2017-11-26 
##      0.985      0.983

According to these results, what are the chances that more cases will appear in the near future? Is this outbreak being brought under control? Would you recommend scaling up / down the response?

There will be an average of 0.7 (and increasing) number of new cases per day for the next month. More cases will appear in the future.

In conclusion, this outbreak is not under control. It is likely in it’s initial increasing phase. I would recommend scaling up the response.