COVID-19 Visualizations

Introduction

Many public health and other research organizations are actively tracking COVID-19 cases around the world. One of the best is the Center for Systems Science and Engineering at John Hopkins University. They have a great interactive map here:

https://coronavirus.jhu.edu/map.html

They are also putting updated data on a Git Hub repository:

https://github.com/CSSEGISandData/COVID-19

The following activity is modified from an assignment by my colleague Randall Pruim at Calvin University in Michigan. You can refer to it for additional background on the repository:

https://rpruim.github.io/ds303/S20/hw/covid-19/covid-19.html

Importing the Data

The daily reports are best for making single-day maps that cover many countries or regions. The three time series files are best for line graphs that pertain to a few countries or regions over a period of time.

When we import we will want to work from the raw data. To find the URL for raw data, open the file in the repository that you would like to import, and press the Raw button.

Let’s import the daily report for a particular day:

Let’s also get the time series data. First, the three URLs:

## [1] "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
## [2] "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv"   
## [3] "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv"

Now we can read in the time-series data for confirmed cases of COVID-19, number of deaths and number of people who have recovered:

A U.S Map for a Single Day

Let’s have a look at daily. (We will give package reactable a try. It is a worthy alternative to package DT, see the documentation here.)

When we search for “US”, we see that data for the United States is available by state:

Looking through the table above, we see a few “locations” (including cruise ships!) that are not among the fifty states plus District of Columbia. For ease in mapping, let’s remove them:

In the us data table, state names are given in full. In order to make maps it’s best to have abbreviations for the states, or even their FIPS codes. We could scrape FIPS and abbreviation from the web, but we’ll just grab them from a dataset in the package usmaps:

## Observations: 51
## Variables: 4
## $ fips     <chr> "01", "02", "04", "05", "06", "08", "09", "10", "11", "12", …
## $ abbr     <chr> "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL", …
## $ full     <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California", "C…
## $ pop_2015 <dbl> 4858979, 738432, 6828065, 2978204, 39144818, 5456574, 359088…

We see that full name for states in statepop is full. We can now join us with statepop and prepare to make a choropleth map with plotly:

Borrowing from examples in the Plotly R documentation, we try the following plot:

Time-Series Line Plots for Confirmed Cases

Let’s say that we are interested in just the following countries, for now:

A ggplotly() Approach

Let’s have a look at Confirmed:

For a line-plot with ggplot2, it would be best to have a data table where each row is a single country, on a single day, with variables being:

  • the date, and
  • the total number of confirmed cases in that country by that date.

Accordingly we will reshape the data. (Instead of tidyr::gather() we will experiment with its preferred successor, tidyr:pivot_longer():

We should also:

  • filter to the desired countries;
  • transform the dates from strings to date objects;
  • group by country and add up the confirmed cases in each region of the country (if applicable);
  • rename some variables (for convenience);
  • add some tool-tip text for Plotly.

Here we go:

As a first step, let’s make a regular, non-interactive line plot:

So for the interactive plot we only need the following:

A dygraphs Approach

For a dygraph, we need the data to be a bit wider: a separate column for the confirmed cases in each country.

Let’s examine the results:

Dygraphs need the data as a recognizable time-series, with clear indication of the variable on the x-axis and a specification of how that variable is to be ordered.

We intend to put the date on the x-axis, so let’s pull out all of our dates, eliminate duplicate dates and arrange them in order:

The package xts is used for time-series analysis. It has a function xts() that converts a data frame to a class that is recognizable to time-series plotting packages like dygraphs. We will make use of it below:

Thinking ahead to Shiny applications, we see if we can encapsulate our work into a function:

Let’s test it:

Note to Students: Where to Go From Here

The above investigations have not rendered us COVID-19 experts, but at least they have afforded a good review of many of the data science tools that we have studied over the course of the academic year. You can also mine them for hints on some current homework problems!

We are in the midst of selecting projects for the course. Hopefully our COVID-19 investigations will suggests ideas for further work that could be turned into a project proposal. Consider the following:

  • Figure out how to make a world map of confirmed cases (or deaths, or recoveries).
  • Extend the time-series plotting to deaths and recoveries.
  • Consider finding other data online that could be usefully joined with the CSSE data. for example, could one find how many people have been tested for COVID-19 have occurred and compute rates at which tested persons are found to be infected?
  • Shiny-fy your work?

Homer White

23 March, 2020