Background + Purpose

The Ohio Department of Health (ODH) recently published a dataset of all Ohio COVID-19 cases by county of residence. The dataset is updated daily at 2 pm EST. ODH also provides a very useful Tableau dashboard of the data on their website.

The purpose of this document is to outline the process it took to develop the following three animated visualizations of Ohio counties’ COVID-19 cases using the ODH COVID-19 dataset.

Load Libraries

library(janitor)
library(gganimate)
library(ggrepel)
library(magick)
library(sf)
library(tidyverse)
date <- paste0(str_replace_all(Sys.Date(),"-","_"),"_")

Load and Prepare Data

STEP 1: Load Map of Ohio Data

First, we need a map of Ohio counties. We can get these data using the map function from maps package.

The map function provides us an object of class map. We will want to convert this to an sf object using the st_as_sf function from the sf package. This will allow us to more easily manipulate the Ohio map data (e.g. filtering for only Ohio counties, calculating new columns, and joining the map dataset with other datasets [e.g. the ODH COVID-19 dataset]).

In this step, we will also be calculating a centroid of each county using the st_centroid function from the sf package. We will need this centroid location so we can overlay the daily increases in cases as text when we build our map visualization.

ohio_county_map_df <-st_as_sf(maps::map("county",plot=F,fill=T)) %>%
  filter(str_detect(ID,"ohio,")) %>%
  separate(ID,into=c("state","county"),sep=",") %>%
  mutate_if(is.character,funs(tools::toTitleCase(.))) %>%
  mutate(centroid = st_centroid(geom),
         lat_long = map(.x=centroid, ~st_coordinates(.x) %>% as_tibble(.))) %>%
  unnest(lat_long) %>%
  rename(centroid_lat=X,
         centroid_long=Y)

Now we will quickly check to make sure we can generate a plot of Ohio counties.

ohio_county_map_df %>%
  ggplot()+
  geom_sf()

STEP 2: Load and Clean ODH COVID-19 Data

Here we will pull the ODH COVID dataset down directly from their website.

ohio_data_raw <- read_csv("https://coronavirus.ohio.gov/static/COVIDSummaryData.csv") %>%
  clean_names() %>%
  filter(!str_detect(county,"Total"))

Here we will take a quick glance at the dataset. There are a few important things to note:

The data appear to contain the number of cases/hospitalizations/deaths in each county on a given date + also provides info on the sex and age range of each new case.
Some counties are missing rows for certain dates - presumably there were no new cases in these counties on those dates
Some counties have more than one row for the same date - since each sex and age_range has its own row even if its on the same date
The variable onset_date is of class chr and not of class Date

ohio_data_raw %>%
  arrange(county,onset_date)

## # A tibble: 2,033 x 8
##    county sex   age_range onset_date date_of_death case_count death_count
##    <chr>  <chr> <chr>     <chr>      <chr>              <dbl>       <dbl>
##  1 Allen  Male  60-69     3/15/2020  <NA>                   1           0
##  2 Allen  Fema… 70-79     3/21/2020  <NA>                   1           0
##  3 Allen  Male  50-59     3/21/2020  <NA>                   1           0
##  4 Allen  Fema… 70-79     3/23/2020  <NA>                   2           0
##  5 Allen  Male  70-79     3/23/2020  <NA>                   1           0
##  6 Allen  Fema… 40-49     3/25/2020  <NA>                   1           0
##  7 Allen  Male  30-39     3/25/2020  <NA>                   1           0
##  8 Allen  Fema… 60-69     3/26/2020  <NA>                   1           0
##  9 Allen  Male  80+       3/26/2020  <NA>                   1           0
## 10 Ashla… Fema… 20-29     3/15/2020  <NA>                   1           0
## # … with 2,023 more rows, and 1 more variable: hospitalized_count <dbl>

These charactersitics are important to note because they will influence how we will go about cleaning our data. For our final cleaned dataset we will need the following:

One row for every county on every date (from earliest to latest date available) which has the daily and cumulative cases/hospitalizations/deaths for that particular date
The variable of onset_date to be of class Date

We will clean the data in two Parts:

Part 1: Convert onset_date to Date variable

Nothing exciting here. Pretty self-explanatory.

ohio_data_cleaned_part_1 <- ohio_data_raw %>%
  mutate(onset_date = as.Date(onset_date,format="%m/%d/%Y")) %>%
  filter(!is.na(onset_date)) %>%
  filter(onset_date<=as.Date("2020-03-31"))

Part 2: Expand and re-summarize dataframe so each county has a row for every date

First we will create a table which has a new row for each date for every county.

entire_date_range_df <- tibble(onset_date = seq(min(ohio_data_cleaned_part_1$onset_date),
                                         max(ohio_data_cleaned_part_1$onset_date),
                                         1)) %>%
  crossing(county=unique(ohio_county_map_df$county)) %>%
  arrange(county,onset_date)

entire_date_range_df %>% head()

## # A tibble: 6 x 2
##   onset_date county
##   <date>     <chr> 
## 1 2020-02-16 Adams 
## 2 2020-02-17 Adams 
## 3 2020-02-18 Adams 
## 4 2020-02-19 Adams 
## 5 2020-02-20 Adams 
## 6 2020-02-21 Adams

Next, we will join our this new dataframe with our dataframe from Part 1 using the left_join function from dplyr. We will also arrange the data by so all rows are ordered by the county and then by date (both in ascending order).

ohio_data_cleaned_part_2a <- entire_date_range_df %>%
  left_join(ohio_data_cleaned_part_1,
            by=c("county","onset_date"))%>%
  arrange(county,onset_date)

Lastly, we’ll resummarize the data in two steps:

First, we will group the data by county and onset_date to calculate the total number of cases/hospitalizations/deaths associated with each date. Note - it is very important to include the argument na.rm = T when summing up the case counts for each county + date since there are a lot of missing values.
Next, we will group the data by county to calculate the cumulative total of cases/hospitalizations/deaths for each date.

ohio_data_cleaned_part_2b <- ohio_data_cleaned_part_2a %>%
  group_by(county,onset_date) %>%
  summarize(case_count_daily = sum(case_count,na.rm=T),
            hospitalized_count_daily = sum(hospitalized_count,na.rm=T),
            death_count_daily = sum(death_count,na.rm=T)) %>%
  ungroup() %>%
  group_by(county) %>%
  mutate(case_count_total = cumsum(case_count_daily),
         hospitalized_count_total = cumsum(hospitalized_count_daily),
         death_count_total = cumsum(death_count_daily)) %>%
  ungroup()

Now we will do a quick check to show that each county has 54 rows which correspond to the daily and cumulative cases/hospitalzations/deaths for each date.

ohio_data_cleaned_part_2b %>% 
  count(county) %>%
  head()

## # A tibble: 6 x 2
##   county        n
##   <chr>     <int>
## 1 Adams        45
## 2 Allen        45
## 3 Ashland      45
## 4 Ashtabula    45
## 5 Athens       45
## 6 Auglaize     45

STEP 3: Join ODH dataframe to Ohio map dataframe

We will join the Ohio COVID data to the Ohio counties’ map data using the left_join function from dplyr.

In this step, we will also use the hue_pal function from scales package to assign a single color to each county to help us have consistent color scheme across different visualizations.

ohio_data_summarized <- ohio_data_cleaned_part_2b %>%
  left_join(ohio_county_map_df %>%
              mutate(color_pal = scales::hue_pal()(nrow(.))),
            by="county")

STEP 4: Summarize data for cumulative cases each date

We will also include a counter with total cumulative cases across all counties for a given date. Thus, we will group the data by date and count the total cases for each date.

total_by_date <- ohio_data_summarized %>%
  group_by(onset_date) %>%
  summarise(case_count_total=sum(case_count_total)) %>%
  ungroup()

STEP 5: Summarize data to count the top 10 counties on each date

For the bar plot, we only want to display the the top 10 counties with most cases for each date. Below is the code we used to quickly filter out only the 10 counties with the most cases for each date.

ohio_data_summarized_top_10 <-ohio_data_summarized %>%
  arrange(onset_date,desc(case_count_total)) %>%
  group_by(onset_date) %>%
  mutate(rank=row_number()) %>%
  ungroup() %>%
  filter(rank<=10)

Make Visualizations

All visualizations will be completed in 3 steps:

Generate static ggplot figure
Add gganimation components to ggplot
Create gif for each plot

We will not go over the details of the code used to generate each visualization - but encourage those interested to copy and experiment with the code themselves.

PLOT #1: Heat Map of Ohio COVID Cases by County

Heat maps are helpful for visualizing the geospatial distribution of cases.

We first saw a similar heat map of Ohio COVID cases posted by Anthony Reinhard, so credit to him for the original idea. Below is our attempt to recreate his work.

STEP 1: Generate static ggplot figure

p_ohio_county_map<-  ohio_data_summarized %>%
  mutate(case_count_total=ifelse(case_count_total==0,
                                           NA,
                                           case_count_total)) %>%
  ggplot()+
  geom_sf(mapping=aes(geometry=geom,
              fill=case_count_total),
          color="black")+
  geom_text(data=ohio_data_summarized %>%
            filter(case_count_daily!=0) %>%
              filter(onset_date!=max(onset_date)) %>%
              mutate(county_date = paste0(county,onset_date)),
            aes(x=centroid_lat,
                y=centroid_long,
                label=paste0("+",case_count_daily),
                group=county_date))+
    scale_fill_gradient(low="#fff2e5",
                       high="red",
                       na.value = "white")+
  theme_bw()+
  theme(title = element_text(size=14,face="bold"),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank())+
  labs(title="Ohio Counties: # of Confirmed Cases",
       subtitle='Date: 3/31/20',
       x="",
       y="",
       fill="Total\nCases")

p_ohio_county_map

STEP 2: Add gganimation components to ggplot

ggan_p_ohio_county_map <- p_ohio_county_map +
  labs(title="Ohio Counties: # of Confirmed Cases",
       subtitle='Date: {format(frame_time,format="%m/%d/%y")}     Total Cases: {total_by_date$case_count_total[as.numeric(frame_time-min(total_by_date$onset_date))+1]}',
       x="",
       y="",
       fill="Total\nCases")+
  transition_time(onset_date)+
  enter_fade()+
  enter_drift(x_mod=0,y_mod=0.05)+
  exit_fade()

STEP 3: Create gif for each plot

Here is a quick explanation of the parameters we are providing to theanimate function to create the gif:

duration = 15 secs: the gif will last for 15 seconds
fps = 9: there will be 9 frames per second (fps), so 9 frames/sec * 15 seconds = 135 frames total
end_pause = 27: the last frame will display 27 times (27 frames / 9 frames/sec = 3 second pause at the end)
height = 400 and width = 500: the resulting gif will be 400 by 500 pixels

an_p_ohio_county_map <- animate(ggan_p_ohio_county_map,
                                duration=15,
                                fps=9,
                                end_pause = 27,
                                height=400,
                                width=500)
an_p_ohio_county_map

PLOT #2: Line Plot of Ohio Counties Cases Over Time

Line plots can show us how the data are trending over time for different groups of data - in this case we will be comparing counties.

We first saw a similar line graph in Patrick Rotzetter’s post. We also benefitted from reviewing the associated code on his github. Below is our attempt to repurpose his code for our dataset.

STEP 1: Generate static ggplot figure

p_county_line_plot <- ohio_data_summarized %>%
  ggplot(aes(x=onset_date,
             y=case_count_total,
             color=color_pal,
             group=county))+
  geom_line()+
  geom_segment(data=ohio_data_summarized %>%
              filter(case_count_total>=75),
            mapping=aes(y=case_count_total,
                        yend=case_count_total,
                        x=onset_date,
                        color=color_pal),
            xend=max(ohio_data_summarized$onset_date)+0.5,
            linetype=2
            )+
  geom_text_repel(data=ohio_data_summarized %>%
              filter(case_count_total>=75),
            mapping=aes(y=case_count_total,
                        color=color_pal,
                label=county),
            x=max(ohio_data_summarized$onset_date)+0.5,
            hjust=0,
            fontface="bold",
            nudge_y = 5,
            direction="y")+
  labs(y="# of Confirmed Cases",
       x="Date")+
  guides(color=F)+
  scale_color_identity()+
  scale_y_continuous(breaks=seq(0,1000,100))+
  scale_x_date(expand=expand_scale(mult=c(0,0.13)),
               breaks="1 week",
               labels = function(x) format(x,format="%m/%d"))+
  theme_bw()+
  theme(axis.text = element_text(size=12,face="bold"),
        title = element_text(size=14,face="bold"),
        panel.grid.minor = element_blank())

p_county_line_plot

STEP 2: Add gganimation components to ggplot

ggan_p_county_line_plot <- p_county_line_plot+
  labs(title = 'Ohio Counties: # of Confirmed Cases',  
       subtitle  =  'Date: {format(frame_along,format="%m/%d/%y")}')+
  transition_reveal(onset_date)

STEP 3: Create gif for each plot

Here is a quick explanation of the parameters we are providing to the animate function to create the gif:

duration = 15 secs: the gif will last for 15 seconds
fps = 9: there will be 9 frames per second (fps), so 9 frames/sec * 15 seconds = 135 frames total
end_pause = 27: the last frame will display 27 times (27 frames / 9 frames/sec = 3 second pause at the end)
height = 400 and width = 500: the resulting gif will be 400 by 500 pixels

an_p_county_line_plot <- animate(ggan_p_county_line_plot,
                                 duration=15,
                                 fps=9,
                                 end_pause=27,
                                 height=400,
                                 width=500)
an_p_county_line_plot

PLOT #3: Bar Plot for Top 10 Ohio Counties with Most Cases

Although much of what can be seen in a bar plot can be visualized using the above line plot, bar plots are useful for displaying total counts of data at one point in time.

We first saw a similar graph on a medium post, and benefitted tremendously from the code they shared in the post. Below is our attempt to repurpose their code for our dataset.

STEP 1: Generate static ggplot figure

p_bar_counties_cases <- ohio_data_summarized_top_10 %>%
  ggplot(aes(x=rank,y=case_count_total,fill=color_pal))+
  geom_col()+
  geom_text(aes(y=case_count_total,
                group=county,
                color=color_pal,
                label=as.character(case_count_total)),
            fontface="bold",
            size=4,
            show.legend = F,
            hjust=0)+
  geom_text(aes(group=county,
                color=color_pal,
                label=county),
            show.legend = F,
            fontface="bold",
            size=4,
            y=0,
            hjust=1.1)+
  scale_x_reverse(expand=expand_scale(mult=0))+
  scale_y_continuous(expand=expand_scale(mult=c(0.2,0.06)),
                     breaks=seq(0,10000,100))+
  scale_fill_identity()+
  scale_color_identity()+
  coord_flip()+
  guides(fill=F)+
  labs(y="# of Confirmed Cases")+
  theme_bw()+
  theme(axis.text = element_text(size=12,face="bold"),
        title = element_text(size=14,face="bold"),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.y = element_blank(),
        panel.border  = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        panel.grid.minor.x = element_blank())
p_bar_counties_cases

STEP 2: Add gganimation components to ggplot

ggan_p_bar_counties_cases<- p_bar_counties_cases +
  transition_time(onset_date) +
  labs(title = 'Top 10 Ohio Counties with Most Cases',  
       subtitle  =  'Date: {format(frame_time,format="%m/%d/%y")}')

STEP 3: Create gif for each plot

Here is a quick explanation of the parameters we are providing to the animate function to create the gif:

duration = 15 secs: the gif will last for 15 seconds
fps = 9: there will be 9 frames per second (fps), so 9 frames/sec * 15 seconds = 135 frames total
end_pause = 27: the last frame will display 27 times (27 frames / 9 frames/sec = 3 second pause at the end)
height = 400 and width = 500: the resulting gif will be 400 by 500 pixels

an_p_bar_counties_cases <- animate(ggan_p_bar_counties_cases,
                                  duration=15,
                                  fps=9,
                                  end_pause = 27,
                                  height=400,
                                  width=500)

an_p_bar_counties_cases

Merge the Three .gif Files Together

Now that we have created our three individual .gif files we can merge them together using the magick package in R.

However, in our experience, this step was the biggest headache of all. Although we are not experts on the magick package and it does provide some very useful functionality for combining gganimate graphics, we found these functions were pretty unreliable. This step frequently led to endlessly running functions (literally > 10 hours with no results) or cause the R session to crash. Not sure if it is the computer we are using or how we are creating the files, but it was not a pleasant experience.

If you want to just combine two .gif files, we’d highly recommend using a gif combining website ( link here ) as it seems to be much more straightforward and much less headache.

However, if you really want to combine more than 2 .gif files, you are welcome to try your luck with the code below.

a_gif <- image_read(an_p_ohio_county_map)
b_gif <- image_read(an_p_county_line_plot)
c_gif <- image_read(an_p_bar_counties_cases)

new_gif_row <- image_append(c(a_gif[1],b_gif[1], c_gif[1]))
                          
for(i in 2:135){
  new_gif_row_i <- image_append(c(a_gif[i], b_gif[i], c_gif[i]))
  new_gif_row <- c(new_gif_row, new_gif_row_i)
}

new_gif_row

Case Western Reserve University School of Medicine, @rdwinkelman↩
The Ohio State University, @cwaltz38↩

Exploring the Temporal Evolution of COVID-19 Cases in the Ohio Counties

Robert Winkelman, MS¹

Colin Waltz²

Published: 2020-04-01
Last Updated: 2020-04-02

Background + Purpose

Load Libraries

Load and Prepare Data

STEP 1: Load Map of Ohio Data

STEP 2: Load and Clean ODH COVID-19 Data

Part 1: Convert onset_date to Date variable

Part 2: Expand and re-summarize dataframe so each county has a row for every date

STEP 3: Join ODH dataframe to Ohio map dataframe

STEP 4: Summarize data for cumulative cases each date

STEP 5: Summarize data to count the top 10 counties on each date

Make Visualizations

PLOT #1: Heat Map of Ohio COVID Cases by County

STEP 1: Generate static ggplot figure

STEP 2: Add gganimation components to ggplot

STEP 3: Create gif for each plot

PLOT #2: Line Plot of Ohio Counties Cases Over Time

STEP 1: Generate static ggplot figure

STEP 2: Add gganimation components to ggplot

STEP 3: Create gif for each plot

PLOT #3: Bar Plot for Top 10 Ohio Counties with Most Cases

STEP 1: Generate static ggplot figure

STEP 2: Add gganimation components to ggplot

STEP 3: Create gif for each plot

Merge the Three .gif Files Together

Exploring the Temporal Evolution of COVID-19 Cases in the Ohio Counties

Robert Winkelman, MS1

Colin Waltz2

Published: 2020-04-01 Last Updated: 2020-04-02

Background + Purpose

Load Libraries

Load and Prepare Data

STEP 1: Load Map of Ohio Data

STEP 2: Load and Clean ODH COVID-19 Data

Part 1: Convert onset_date to Date variable

Part 2: Expand and re-summarize dataframe so each county has a row for every date

STEP 3: Join ODH dataframe to Ohio map dataframe

STEP 4: Summarize data for cumulative cases each date

STEP 5: Summarize data to count the top 10 counties on each date

Make Visualizations

PLOT #1: Heat Map of Ohio COVID Cases by County

STEP 1: Generate static ggplot figure

STEP 2: Add gganimation components to ggplot

STEP 3: Create gif for each plot

PLOT #2: Line Plot of Ohio Counties Cases Over Time

STEP 1: Generate static ggplot figure

STEP 2: Add gganimation components to ggplot

STEP 3: Create gif for each plot

PLOT #3: Bar Plot for Top 10 Ohio Counties with Most Cases

STEP 1: Generate static ggplot figure

STEP 2: Add gganimation components to ggplot

STEP 3: Create gif for each plot

Merge the Three .gif Files Together

Robert Winkelman, MS¹

Colin Waltz²

Published: 2020-04-01
Last Updated: 2020-04-02