Hendra virus is a disease that naturaly occurs in fruit bats in Southeast Asia and Australia. Affected bats are completely asymptomatic, but they can still transmit the disease to other creatures through their infectious secretions. For example, if a bat urinates on hay kept for feeding horses, and if this urine contains Hendra virus, the horse could get infected. The symptoms range from nonspecific fever, anorexia and depression to more severe respiratory and neurological signs, and even death. This disease has been a huge problem in Eastern Australia, with horses from a number of cities and towns falling prey to it.
The problem with Hendra virus is that it goes deeper than just being a sickness of horses. It is a classic example of an emerging zoonosis because a sick horse can transmit the disease to its human owners through its bodily fluids, causing them to develop flu like symptoms, which could progress to breathing issues, neurological conditions and death. This transmission could happen in a variety of ways, for example, a human could kiss her sick horse affectionately and contract Hendra virus from the saliva, a veterinarian could perform an autopsy of a horse and be exposed to Hendra virus through the blood and so on.
How does One Health play a role in all this? The One Health principle states that the health of humans, animals and the environment are interlinked. This is well-illustrated in the case of paramyxoviruses like Hendra or Nipah virus. The destruction of the fruit bats’ forest habitat leads to their settlement in urban and suburban areas, increasing the rate of contact between them and humans and wildlife, driving up the risk of infection transmission.
The principles of One Health may seem intuitive, but it is challenging to communicate these concepts to laypersons with just numbers and data on infection rates. In this document, I will be using my skills as a programmer to examine a scientific paper focusing on Hendra virus shedding and will attempt to present spatial and temporal information in a clear manner.
Hume Field and Jordan David, from EcoHealth Alliance are the authors of a paper called ‘Spatiotemporal Aspects of Hendra Virus Infection in Pteropid Bats (Flying-Foxes) in Eastern Australia’. To summarize the study briefly, pooled urine samples from select bat roosting sites in towns in the eastern part of Australia were collected and tested for Hendra virus activity. The dates on which the samples were collected and the locations of the pooled urine collections sites is also noted.
This is a remarkable study because of the length of the study duration: very few studies on Hendra virus have involved the collection of samples over 3 years. What is even more fascinating about this paper is that Field and David have made the data freely available for download on the Queensland Government Data website. Click here to download the data and click here to obtain the data documentation.
The paper was extremely well written. Take a quick look at these two images.
Image included in the study
This image is trying to explain differences in probabilities of encountering Hendra virus across different sampling locations in eastern Australia. This is a very elegant image as it conveys both the estimate and confidence intervals with minimal ink, but I feel that the data would be more easily understood by the layperson if it was communicated spatially, as a map, perhaps.
Another image from the study
Take a look at this second image. Temporal data is being represented, and the authors are trying to convey the seasonality of the viral shedding patterns. This is complex information. I would like to make it simpler to look at using faceting of the dates.
Let’s dive into it. I will assume intermediate familiarity with R programming, especially around concepts such as filtering data using dplyr and visualizing data using ggplot2.
We will go from that Excel sheet to a shiny interactive app!
You will need these libraries to reproduce this analysis.
library(knitr)
library(dplyr)
library(ggplot2)
library(tmap)
library(knitr)
library(sp)
library(ggmap)
library(ggrepel)
library(shiny)
library(plotly)
library(DT)
library(scales)
library(ggthemes)
Download the files, place them in your working directory, and then read them in.
data <- read.csv("hendravirus_transmissiondata.csv", header = TRUE,
stringsAsFactors = FALSE) #data
docum <- read.csv("doc.csv") #documentation
Please refer to the documentation to understand what the variables signify.
I will change the column name ‘loc’ to ‘Place’ as I prefer that more.
colnames(data)[3] <- "Place"
Notice that R recognizes the date as a character. I will format it to a ‘date’ format.
data$date <- as.Date(data$date, "%d/%m/%Y")
temp <- data %>% filter(Place == "Sydney CP") %>% select(sam, date, Place, res)
kable(temp[c(1:6), ])
| sam | date | Place | res |
|---|---|---|---|
| N_2012-001_E_11 | 2012-04-29 | Sydney CP | 0 |
| N_2012-001_J_25 | 2012-04-29 | Sydney CP | 0 |
| N_2012-001_A_1 | 2012-04-29 | Sydney CP | 0 |
| N_2012-001_D_10 | 2012-04-29 | Sydney CP | 0 |
| N_2012-001_C_7 | 2012-04-29 | Sydney CP | 0 |
| N_2012-001_K_26 | 2012-04-29 | Sydney CP | 0 |
#refer to documentation to understand column names
Each location was visited approximately once every month, for several years. In any given visit, multiple samples could be obtained from the same location. Take a look at the res column. Each sample is assigned either a 0 or 1 in the res column based on whether the sample tested positive for Hendra virus. Thus, we can assign each location an average probability of encountering Hendra virus in any month of any year by taking the mean of the res column for that location for that month and year.
Seems confusing? To put it simply, we are going to group our rows by:
so that we can compute an average probability for each unique month-year-place combination.
data2 <- data %>%
mutate(Month = format(date, "%m"),
Year = format(date, "%Y")) %>%
group_by(Place, Month, Year) %>%
summarize(HendravirusProbability = round(mean(res), 3))
At this stage, I will be adding a “, Australia” to the locations to avoid ambiguity and enhance geocoding. Why ? Because I spotted a town called Avoca which is, apparently, also a town in New York, in the US.
data3 <- data2 %>% mutate(country = rep(", Australia", 1))
cols <- c("Place", "country")
data3$Location <- apply( data3[ , cols] , 1 , paste , collapse = "" )
As I have mentioned in earlier uploads, you need a google API to carry out the geocoding. Refer to my earlier post to learn how to introduce your key in the code.
for (i in 1:nrow(data3)) {
result <- geocode(data3$Location[i], output = "latlon", source = "google")
data3$Longitude[i] <- as.numeric(result[1])
data3$Latitude[i] <- as.numeric(result[2])
}
Making copies of the latitude and longitude column, and dropping unnecessary columns.
data3$Lat <- data3$Latitude
data3$Long <- data3$Longitude
drops <- c("country", "Location", "Longitude", "Latitude")
data3 <- data3[ , !(names(data3) %in% drops)]
Notice that the months are numbers. I would lke to label the month numbers as month names. Converting the column to factor and speifying levels is a good way of avoiding lengthy nested if-else statements.
data3$Month <- factor(data3$Month,
levels = c("01", "02", "03", "04", "05", "06",
"07", "08", "09", "10", "11", "12"),
labels = c("January", "February", "March", "April",
"May", "June", "July", "August", "September",
"October", "November", "December"))
#Important to make sure that the levels are quoted, because the month column
#was originally a character column
Our data is prepped and ready to go. Important before we begin mapping (for purposes of reproduction) to take a look at the data at this stage to see what it looks like.
head(kable(data3), 10)
## [1] "Place Month Year HendravirusProbability Lat Long"
## [2] "----------------- ---------- ----- ----------------------- ---------- ---------"
## [3] "Alstonville January 2014 0.032 -28.84087 153.4496"
## [4] "Alstonville February 2014 0.033 -28.84087 153.4496"
## [5] "Alstonville March 2014 0.045 -28.84087 153.4496"
## [6] "Alstonville April 2013 0.000 -28.84087 153.4496"
## [7] "Alstonville April 2014 0.062 -28.84087 153.4496"
## [8] "Alstonville May 2012 0.000 -28.84087 153.4496"
## [9] "Alstonville May 2013 0.133 -28.84087 153.4496"
## [10] "Alstonville May 2014 0.000 -28.84087 153.4496"
In order to represent this data spatially, we need a base layer, a base map. I found an interesting tutorial on this website that walks you through accessing a custom basemap.
Similar to the geocoding API, in order to obtain a basemap, you need to sign into your Google Cloud, enable the static maps API, associate it with your project and get a key. If this seems tedious, there are alternative ways to plot a map. For example, you can use the maps package to obtain base layers, as explained in this blog by Daniela Vasquez.
basemap <- get_map(location = " east australia",
source = "stamen",
maptype = "watercolor", crop = FALSE,
zoom = 6)
# plot map
ggmap(basemap)
Now that our dataset is clean and we have a basemap, we can go ahead and plot the points of sample collection on the map, with the size of the points representing the probability that a fruit bat urine sample from the region tested positive for Hendra virus.
ggmap(basemap) + labs(x = "Longitude", y = "Latitude") +
geom_point(data = data3, aes(x = Long, y = Lat, size = HendravirusProbability),
fill = "black", shape = 19, alpha = 0.2)
Let’s add a title, a subtitle and a caption to this map.
ggmap(basemap) + labs(x = "Longitude",
y = "Latitude",
caption = "Interpretation: 0.3 represents a 30 percent probability of encountering a fruit bat carrying Hendra virus") +
geom_point(data = data3, aes(x = Long, y = Lat, size = HendravirusProbability),
fill = "black", shape = 19, alpha = 0.2) +
ggtitle(expression(atop("Probability of Wild Fruit Bats Carrying Hendra Virus",
atop(italic("Select Cities in Eastern Australia")))))+
theme(plot.caption = element_text(hjust = 0))
Adding a north arrow would be a good next step.
ggmap(basemap) + labs(x = "Longitude",
y = "Latitude",
caption = "Interpretation: 0.3 represents a 30 percent probability of encountering a fruit bat carrying Hendra virus") +
geom_point(data = data3, aes(x = Long, y = Lat, size = HendravirusProbability),
fill = "black", shape = 19, alpha = 0.2) +
ggtitle(expression(atop("Probability of Wild Fruit Bats Carrying Hendra Virus",
atop(italic("Select Cities in Eastern Australia")))))+
theme(plot.caption = element_text(hjust = 0)) +
geom_segment(arrow=arrow(length=unit(4,"mm")), aes(x=156,xend=156,y=-17.9,yend=-16.9),
colour="black") +
annotate(x=156.1, y=-18.9, label="N", colour="black", geom="text", size=6)
#we manually drew the arrow
What is the data missing? That’s right, labels. We need to see the names of the cities. Let’s add the labels.
library(ggrepel)
ggmap(basemap) + labs(x = "Longitude",
y = "Latitude",
caption = "Interpretation: 0.3 represents a 30 percent probability of encountering a fruit bat carrying Hendra virus") +
geom_point(data = data3, aes(x = Long, y = Lat, size = HendravirusProbability),
fill = "black", shape = 19, alpha = 0.2) +
ggtitle(expression(atop("Probability of Wild Fruit Bats Carrying Hendra Virus",
atop(italic("Select Cities in Eastern Australia")))))+
theme(plot.caption = element_text(hjust = 0)) +
geom_segment(arrow=arrow(length=unit(4,"mm")), aes(x=156,xend=156,y=-17.9,yend=-16.9),
colour="black") +
annotate(x=156.1, y=-18.9, label="N", colour="black", geom="text", size=6)+
geom_text_repel(data = data3, aes(Long, Lat, label = Place), size = 3)
Oof. That’s such a mess. Why is this happening? My guess is that:
How can we fix this?
Can we facet instead?
data4 <- data3
data4$Year <- as.factor(data4$Year)
ggmap(basemap) + labs(x = "Longitude",
y = "Latitude",
caption = "Interpretation: 0.3 represents a 30 percent probability of encountering a fruit bat carrying Hendra virus") +
geom_point(data = data4, aes(x = Long, y = Lat, size = HendravirusProbability),
fill = "black", shape = 19, alpha = 0.2) +
ggtitle(expression(atop("Probability of Wild Fruit Bats Carrying Hendra Virus",
atop(italic("Select Cities in Eastern Australia")))))+
theme(plot.caption = element_text(hjust = 0)) +
geom_segment(arrow=arrow(length=unit(4,"mm")), aes(x=156,xend=156,y=-17.9,yend=-16.9),
colour="black") +
annotate(x=156.1, y=-18.9, label="N", colour="black", geom="text", size=6)+ facet_grid(.~Year)
This seems great, but the problem is that locations were visited multiple times in a year. These data are being plotted atop each other.
Best solution would be to use a month and year slider, to depict probabilities at each location for a unique combination of month and year. Dean Attali’s lessons on building interactive R Shiny apps have been very helpful for me in learning to fix my data visualization woes.
You can click on this link to view the app that I created to overcome the issues we have been having. The map is interactive, and adapts dynamically to your input.
For those interested, here is the code I used to create the app.
ui <- fluidPage(
titlePanel("Hendra Virus in East Australia"),
sidebarLayout(
sidebarPanel(radioButtons("MonthInput", "Month",
choices = c("January", "February", "March", "April",
"May", "June", "July", "August", "September",
"October", "November", "December"),
selected = "January"),
selectInput("YearInput", "Year",
choices = c("2011", "2012", "2013", "2014"))
),
mainPanel(plotOutput("coolplot"),
br(),
br(),
tableOutput("results"), height = "auto")
)
)
server <- function(input, output) {
output$coolplot <- renderPlot({
filtered <-
data3 %>%
filter(Month == input$MonthInput,
Year == input$YearInput)
ggmap(basemap) + labs(x = "Longitude",
y = "Latitude",
caption = "Interpretation: 0.3 represents a 30 percent probability of encountering a fruit bat carrying Hendra virus") +
geom_point(data = filtered, aes(x = Long, y = Lat, size = HendravirusProbability),
fill = "black", shape = 19, alpha = 0.5) +
ggtitle(expression(atop("Probability of Wild Fruit Bats Carrying Hendra Virus",
atop(italic("Select Cities in Eastern Australia")))))+
theme(plot.caption = element_text(hjust = 0)) +
geom_segment(arrow=arrow(length=unit(4,"mm")), aes(x=156,xend=156,y=-17.9,yend=-16.9),
colour="black") +
annotate(x=156.1, y=-18.9, label="N", colour="black", geom="text", size=6)+
geom_text_repel(data = filtered, aes(Long, Lat, label = Place), size = 3)
})
output$results <- renderTable({
filtered <-
data3 %>%
filter(Month == input$MonthInput,
Year == input$YearInput)
filtered
})
}
shinyApp(ui = ui, server = server)