Summary

Our goal in this tutorial is to create interactive plots of certain syndrome trends.

We’ll start out by reading in a line-level syndromic dataset. We’ll define some syndrome definitions for ILI, RESP, GI and then format the data to get daily counts. Finally, we’ll apply some plotting functions to get our interactive plots. You’ll see that creating interactive plots with R does not require much code.

First, we’ll need to get set up.

Install/load necessary packages

# Install packages if not already installed
if("tidyverse" %in% rownames(installed.packages()) == FALSE) {install.packages("tidyverse", repos='http://cran.us.r-project.org')};
if("stringr" %in% rownames(installed.packages()) == FALSE) {install.packages("stringr", repos='http://cran.us.r-project.org')};
if("lubridate" %in% rownames(installed.packages()) == FALSE) {install.packages("lubridate", repos='http://cran.us.r-project.org')};
if("plotly" %in% rownames(installed.packages()) == FALSE) {install.packages("plotly", repos='http://cran.us.r-project.org')};

# Load necessary packages
library(tidyverse)
library(stringr)
library(lubridate)
library(plotly)

# Download data file from GitHub repository
download.file("https://github.com/haroldgil/SyS-tools/raw/master/data/syndromicData_raw.csv", "syndromicData_raw.csv")
syn_raw = read_csv("syndromicData_raw.csv")

The syndromic dataset named syn_raw should now be imported.

Define and apply syndrome definition

Now we’ll flag each record if one of my three syndromes of interest (ILI, RESP, GI) is present.

We use str_detect() to detect the presence of certain substrings, defined by regular expressions, in the Chief.Complaint field. Note: ESSENCE also uses regular expressions. Go here to see a short tutorial video on regular expressions from Stanford’s Natural Language Processing course.

# Define simple syndrome definitions
syn_s <- syn_raw %>% 
          mutate(ILI = str_detect(Chief.Complaint, "FLU"), 
                 RESP = str_detect(Chief.Complaint, "RESP|LUNG|BREATH"),
                 GI = str_detect(Chief.Complaint, "GASTRO|NAUSEA|VOMIT|DIARRHEA"))

My new dataset has a column for each syndrome and indicates if the syndrome was detected with the value TRUE and FALSE otherwise.

Transform data for plotting

Now I want to get the daily counts for each defined syndrome from the line-level data. First I’ll change the Date field to be of Date type (as opposed to chr type). Then I’ll get the count of each syndrome by day. The data will be in wide format. While having data be in wide format works great for many plots, other plots require the data to be in long format, so I’ll also create a copy of the data in long format for later use.

# Reformat Date field to be a Date data type (having dates as Date types makes life easier)
syn_d <- syn_s %>% mutate(Date = mdy(Date))

# WIDE FORMAT
# Get aggregate counts for each syndrome by date. 
syn_agg_w <- syn_d %>% group_by(Date) %>% summarize(ILI = sum(ILI), RESP = sum(RESP), GI = sum(GI))

# LONG FORMAT
# Sometimes a specific plot requires the data to be in `long` format.
syn_agg_l <- syn_agg_w %>% gather(key = Syndrome, value = Count, ILI, RESP, GI) 

Plot data using ggplot2 and then ggplotly()

We will create our first interactive plot today using the ggplotly() function from the package plotly.

Important: ggplotly() (sometimes?) requires as its input a ggplot2 object (a plot created using ggplot2) “saved” under a variable name. So I’ll create my plot (specifically a ggplot2 plot) and “save” it as an object named p (you can actually save it under any name you want, the variable p is used alot because it’s short for plot).

Note that I used the long format dataset syn_agg_l!

Note: If you haven’t already, take 30-60 minutes to learn about basic ggplot2 plotting syntax from the best written tutorial on it EVA!

# Static plot of daily ILI counts
p <- ggplot(data = syn_agg_l) + geom_line(mapping = aes(x = Date, y = Count, color = Syndrome))

# Show plot, p
p

Now I can apply ggplotly() to p and I have my interactive plot!

# Apply ggplotly() to p which is a ggplot2 object
ggplotly(p)

Plot data using plot_ly()

Plotly has many other ways to create interactive plots.

Below, I’ll create a plot using the wide format dataset syn_agg_w and the function plot_ly().

# Dynamic plot of daily counts
p <- syn_agg_w %>% 
  plot_ly(x = ~Date) %>%
  add_lines(y = ~ILI, name = "ILI") %>%
  layout(
    title = "Daily ILI Count",
    xaxis = list(rangeslider = list(type = "date"))
  )
 
# Display the plot 
p

I can add lines for each of the other syndromes too.

# Dynamic plot of daily counts
p <- syn_agg_w %>% 
  plot_ly(x = ~Date) %>%
  add_lines(y = ~ILI, name = "ILI") %>%
  add_lines(y = ~RESP, name = "RESP") %>%
  add_lines(y = ~GI, name = "GI") %>%
  layout(
    title = "Daily Count of Syndromes",
    xaxis = list(rangeslider = list(type = "date"))
  )
  
# Display the plot 
p

Final Comments

You can make all sorts of neat interactive plots with plotly. See the official Plotly website for R for more examples.

A great and simple resource I recommend for plotly beginners is plotly for R by Carson Sievert

Plotly function syntax on the official Plotly website may look a bit intimidating at first, but in many cases you can get to interactivity (with trend lines, bar graphs, maps, etc.) by solely using the ggplotly() function. You’ve learned that once you create a ggplot2 plot, you can easily make it interactive with ggplotly(). So remember to learn ggplot2 if you haven’t already!