Background: This dataset is a simulation of an Ebola outbreak in Freetown, Sierra Leone. The dataset contains each case’s unique identifier (case ID), the case’s location (longitude and latitude), the case’s generation (i.e., the number of cases caused by the case), the case’s outcome (i.e., whether the case recovered or died), and the date of infection. Data spans from April 2014 to May 2015 and includes 5,888 simulated cases.
Ebola is a viral infectious disease spread through direct contact with bodily fluids of an infected person. The disease has a high fatality rate, and the World Health Organization (WHO) has stated that the 2014-2016 West African Ebola outbreak was the largest outbreak since the disease’s discovery[1].
Here we simulate an outbreak in Freetown, Sierra Leone, to understand the dynamics of disease spread and outcomes. The dataset provides valuable insights into the geographic distribution of the outbreak and the impact of individual cases on the spread of the disease. We aim to understand how disease spreads geographically and how changes in disease outcomes change as generations of disease increase.
[1] World Health Organization. (n.d.). Ebola outbreak 2014-2016 - West Africa. World Health Organization. https://www.who.int/emergencies/situations/ebola-outbreak-2014-2016-West-Africa
Our first goal is to understand the geographical distribution of the outbreak within Freetown. We note that the map shows each individual case as a point on the map which can be clicked for more information on the case itself, including the case ID, the generation of the case, the outcome of the infected person, the amount of cases caused by the infected, and the date of infection.
The map allows us to visualize the spread of the disease across Freetown and understand the relationship between the location of cases and the outcomes of the disease. We note that there are clusters of cases and clusters of no cases geographically within the city limits. However, it’s difficult to see the relation, if any, between the outcome and number of cases caused.
Instead of looking at the data in a geographical way to answer this question, we can instead look at the number of cases caused by each case and the outcomes of those cases and look at the specific outcomes within those groups.
In this case, we see that when an infected person does not spread the disease (i.e., the number of cases caused is 0), the a majority of patients die. This makes intuitive sense as when a person dies, they can no longer be a vector for the disease. Additionally, there is a selection bias as those who die early are unable to spend as much time being vectors for infection as those who survive longer.
Conversely, as the number of cases caused increases, the proportion of those that survive also increases. This suggests that the spread of the disease is associated with a higher risk of surviving the disease.
From the map, we can visually see that there are geographical clusters of cases and clusters of no cases within the city limits. These seem to have a somewhat random distribution, with no apparent relation between outcome (i.e., infected person survived or died) and geographic distribution.
We do note that there does seem to be a relationship between the number of caused cases and the survival of the infected person. As the number of cases caused increases, the proportion of those that survive also increases. This suggests that the spread of the disease is associated with a higher risk of surviving the disease. Given the biologically plausible explanation that, in the case of Ebola vectors, the infected needs to be alive to transmit, we can conclude there is a causal relationship between the number of caused cases and the survival outcome.