How to create a COVID-19 Dashboard in R

By now you will be familiar with different dashboards that visualize the latest developments of the COVID-19 pandemic. The most famous coronavirus dashboard is arguable the dashboard created by Johns Hopkins University. But how can you build such a dashboard yourself? This tutorial will teach you how. We will use data from the EU Open Data Portal and build our own simple dashboard in R. Along the way we will learn about two very powerful packages called leaflet and shiny. Here’s a screenshot of what the final product of this tutorial will look like:

On the left of the dashboard you can select a variable and a color palette. The main body of the dashboard then changes accordingly. (Unfortunately, RPubs didn’t allow me to host the dashboard directly. But if you click on the image you will get redirected to the dashboard.)

Data

As always everything starts with the right kind of data. On the coronavirus page of the EU Open Data Portal you will find an Excel file called “COVID-19 cases worldwide”. I downloaded these data in an Excel file called “covid.xlsx” within a folder called “data”. We can then import these data into R using the readxl package. To manipulate the data we will use the tidyverse package.

library(readxl)
library(tidyverse)

# import data
covid <- read_excel("data/covid.xlsx")    # read excel file
  covid <- covid[complete.cases(covid),]  # remove NAs
  covid <- covid %>% 
    select(-day, -month, -year) %>%       # drop variables
    rename(                               # rename variables
      date = dateRep,
      country = countriesAndTerritories,
      code2 = geoId,
      code3 = countryterritoryCode
      )
  
head(covid)

## # A tibble: 6 x 8
##   date                cases deaths country  code2 code3 popData2019 continentExp
##   <dttm>              <dbl>  <dbl> <chr>    <chr> <chr>       <dbl> <chr>       
## 1 2020-07-01 00:00:00   279     13 Afghani… AF    AFG      38041757 Asia        
## 2 2020-06-30 00:00:00   271     12 Afghani… AF    AFG      38041757 Asia        
## 3 2020-06-29 00:00:00   351     18 Afghani… AF    AFG      38041757 Asia        
## 4 2020-06-28 00:00:00   165     20 Afghani… AF    AFG      38041757 Asia        
## 5 2020-06-27 00:00:00   276      8 Afghani… AF    AFG      38041757 Asia        
## 6 2020-06-26 00:00:00   460     36 Afghani… AF    AFG      38041757 Asia

As we can see, we are dealing with a panel data set. One observation refers to one instance in time for one specific country. For example, on July 1st, 279 people tested positive on coronavirus in Afghanistan. The day before, 271 people had tested positive on coronavirus. For our purposes we need to get rid of the time dimension and aggregate these data per country. To to do so we use the commands group_by() and summarise() which are both included in the tidyverse package.

# aggregate data by country
covid <- covid %>% 
  select(country,code3,cases,deaths,popData2019) %>%
  group_by(country) %>%
  summarise(code = code3[1],
            cases = sum(cases),
            deaths = sum(deaths),
            population = popData2019[1]
  )
head(covid)

## # A tibble: 6 x 5
##   country     code  cases deaths population
##   <chr>       <chr> <dbl>  <dbl>      <dbl>
## 1 Afghanistan AFG   31517    746   38041757
## 2 Albania     ALB    2535     62    2862427
## 3 Algeria     DZA   13907    912   43053054
## 4 Andorra     AND     855     52      76177
## 5 Angola      AGO     284     13   31825299
## 6 Anguilla    AIA       3      0      14872

As you can see, we are now left with a single row of data per country. This row summarizes how many cases and deaths were reported in that country during the first half of 2020. Next, we compute three new variables that will tell us how lethal coronavirus has been in the different countries and how many people were infected / died in relation to the population size.

# new variables
covid$fatality <- covid$deaths/covid$cases
covid$casesMill <- covid$cases/covid$population*1000000
covid$deathsMill <- covid$deaths/covid$population*1000000

Of course, if we want to visualize the data on a map, we will need some geographic data. One possible source can be found in this GitHub repository. Simply copy-paste these data and save the file as “geo.json”. This will create a .json file instead of the typical .txt file. We import these data into R using the read_json() function of the geojsonio package. The data come in a somewhat special format, the “SpatialPolygonsDataFrame”:

library(geojsonio)
geo <- geojson_read("data/geo.json", what = "sp")
class(geo)

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

Using the “@” operator, we can access the identifying variables in the larger spatial polygons data object:

# use "@" to view identifiers
head(geo@data)

##    id                 name
## 1 AFG          Afghanistan
## 2 AGO               Angola
## 3 ALB              Albania
## 4 ARE United Arab Emirates
## 5 ARG            Argentina
## 6 ARM              Armenia

Now, we need to add the COVID-19 data from above to these geographic data. We can do this using a very simple for-loop:

# add variables = 0
geo@data$cases <- 0
geo@data$deaths <- 0
geo@data$casesMill <- 0
geo@data$deathsMill <- 0
geo@data$fatality <- 0

# auxiliary function
'%!in%' <- function(x,y)!('%in%'(x,y))

# loop through each row and add covid data
for(i in 1:nrow(geo@data)){
  id = geo@data$id[i]
  if(id %!in% covid$code) next()
  geo@data$cases[i] <- covid$cases[covid$code == id]
  geo@data$deaths[i] <- covid$deaths[covid$code == id]
  geo@data$fatality[i] <- covid$fatality[covid$code == id]
  geo@data$casesMill[i] <- covid$casesMill[covid$code == id]
  geo@data$deathsMill[i] <- covid$deathsMill[covid$code == id]
}
rm(i, id)
head(geo@data)

##    id                 name cases deaths   casesMill  deathsMill    fatality
## 1 AFG          Afghanistan 31517    746  828.484342  19.6100301 0.023669766
## 2 AGO               Angola   284     13    8.923718   0.4084801 0.045774648
## 3 ALB              Albania  2535     62  885.612105  21.6599410 0.024457594
## 4 ARE United Arab Emirates 48667    315 4981.001023  32.2398200 0.006472558
## 5 ARG            Argentina 64517   1307 1440.733084  29.1866972 0.020258227
## 6 ARM              Armenia 25542    443 8635.682524 149.7771262 0.017343982

Making a map using `leaflet`

To draw a map with different colors using the leaflet package, we first have to define a function that turns the numeric values of a particular variable into a color code. We can do that using the colorBin() function. In this function, we first need to specify a “domain”, which is really just one of the variables in our dataset, e.g., “fatality”. Next, we have to determine how many different shades of colors we would like to have in our plot. Put differently, this means into how many “bins” or “slices” we group the different numerical values of our domain. Here, we will select 20. An argument called “palette” determines the overall color palette to be used. For now, we will work with the built-in “cm.colors” palette. Lastly, the Boolean argument “pretty” specifies if the different bins should correspond to nice round values. Note this can lead to a number of bins that is slightly different than what we specified with the “bins” argument. However, “pretty” is nonetheless very useful in case you would like to add a legend to your plot (which we will skip in this tutorial).

library(leaflet)

# function to make colors
makeCol <- colorBin(
    domain = geo@data$fatality, 
    bins = 20, 
    palette = heat.colors(20),   
    pretty = T 
  )

# colors for 0%, 1% and 2% fatality rate 
makeCol(c(0,0.01,0.02))

## [1] "#FF0000" "#FF0E00" "#FF1C00"

The three outputs “#FF0000”, “#FF0E00” and “#FF1C00” are the three color codes corresponding to fatality rates of 0%, 1% and 2%. Now, we are ready to create a “leaflet” map object. The different polygons of this object will be the different colors. The setView() function determines were the initial center of the map will be. Of course the user can move the map around and zoom in and out later on. Notice also the “smoothFactor” argument in the addPolygons() function. This argument controls how precise the different polygon borders should be drawn. If you choose a high value, the polygons will be very precise, but the map will adjust more slowly if the user changes the zoom level.

# make colors
colors <- makeCol(geo@data$fatality)

# make map
map <- leaflet(geo) %>%
  addPolygons(
    stroke = F,         # soft borders?
    smoothFactor = 0.5, # border resolution 
    fillOpacity = .85,  # opacity of colors
    color = colors      # which colors
  ) %>%
  setView(lng = 20, # longitude
          lat = 30, # latitude
          zoom = 1  # zoom-factor
  )

# show map
map

The basics of `shiny`

The shiny package allows you to create interactive web applications with R. Every “Shiny” consists of two elements: a “user interface” (or simply “UI”) script and a “server” function. The UI script determines the visual appearance of the shiny and creates all the buttons and input fields that the user can interactive with. The server function does all the work in the background. Its inputs will be the decision made by the user, its outputs will be whatever the shiny is supposed to do, e.g., plot a variable or make a map. Let’s open a new R script and create a very basic Shiny.

library(shiny)

# some data
df <- data.frame(
  name = c("Max","Michael","Tim"),
  height = c(180,170,190)
)

# ui script
ui <- fluidPage( # a bare bones layout without any structure
  
  # a simple dropdown menu
  selectInput(inputId = "color", 
              label = "Make your selection:",
              choices = c("Draw red bars" = 2,
                          "Draw green bars" = 3,
                          "Draw blue bars" = 4),
  ),
  
  # the output plot
  plotOutput("our_plot")
)

# server function
server <- function(input,output){
  
  observe({ # a wrapper that observes the user's input
    output$our_plot <- renderPlot( # send plot to output
      barplot(df$height, col = input$color) # the barplot
    )
  })
  
}

# run shiny
shinyApp(ui,server)

As before, I can only show you a screenshot:

Of course this does not replace a full tutorial series on how to use Shiny. But you should get the main idea. If you want to learn more about Shiny, you can click here.

Making our own coronavirus dashboard

We will now use the same basic building blocks to construct our own coronavirus dashboard. We will need three additional packages: viridis, RColorBrewer and shinydashboard. In our user interface we will now use the dashboardPage() layout with a header and a sidebar instead of the simple fluidpage() layout. In the main body, we will specify a leafletOutput() instead of the earlier plotOutput(). Lastly, within in server function, we need to wrap the reactive() function around our makeColor() function so that new colors are generated in case the user specifies a different variable or color palette.

library(viridis) # additional color palettes
library(RColorBrewer) # additional color palettes
library(shinydashboard) # make dashboards in shiny 

ui <- dashboardPage(
  
  # title
  dashboardHeader(title = "COVID-19 Dashboard"),
  
  # sidebar
  dashboardSidebar(
    
    selectInput("variable", "Variable:", 
                c("Fatality rate" = "fatality",
                  "Cases" = "cases", 
                  "Cases/Million" = "casesMill",
                  "Deaths" = "deaths",
                  "Deaths/Million" = "deathsMill"
                  )
    ),
    
    selectInput("palette", "Color palette:",
                c("Magma",
                  "Red",
                  "Blue",
                  "Viridis")
                )
  ),
  
  # main body
  dashboardBody(
    
    # make sure the leaflet output fills the entire screen
    tags$style(type = "text/css", "#map {height: calc(100vh - 80px) !important;}"),
    
    # output
    leafletOutput("map")
  )

)

server <- function(input, output){

  # function to make colors
  makeCol <- reactive({
    colorBin(
      if(input$palette == "Blue"){
        palette = brewer.pal(9, "Blues")
      } else if(input$palette == "Red"){
        palette = brewer.pal(9, "Reds")
      } else if(input$palette == "Magma"){
        palette = rev(magma(587))
      } else palette = rev(viridis(587)),
      domain = geo@data[,input$variable],
      bins = 20,
      pretty = T
    )
  })
  
  # make map
  observe({
    colors <- makeCol()
    map <- leaflet(geo) %>%
      addPolygons(
        stroke = F,
        smoothFactor = 0.5,
        fillOpacity = .85,
        color = ~colors(geo@data[,input$variable])
      ) %>%
      setView(lng = 20,
              lat = 30,
              zoom = 1
      )
    output$map <- renderLeaflet(map)
  })
  
}

# run shiny
shinyApp(ui,server)

And that’s it. We just created a simple COVID-19 dashboard. Of course, with better data, we could a lot more features to it. Users might for example be interested in how much testing is done in different countries or they might like to know about local and regional outbreaks of coronavirus. Lastly, and that is something you could also do with the coronavirus data used here, users might like to see how coronavirus spread over time. So have fun developing your own Shiny apps.

How to create a COVID-19 Dashboard in R

Martin Stefan

7/2/2020

Data

Making a map using `leaflet`

The basics of `shiny`

Making our own coronavirus dashboard

How to create a COVID-19 Dashboard in R

Martin Stefan

7/2/2020

Data

Making a map using leaflet

The basics of shiny

Making our own coronavirus dashboard

Making a map using `leaflet`

The basics of `shiny`