By now you will be familiar with different dashboards that visualize the latest developments of the COVID-19 pandemic. The most famous coronavirus dashboard is arguable the dashboard created by Johns Hopkins University. But how can you build such a dashboard yourself? This tutorial will teach you how. We will use data from the EU Open Data Portal and build our own simple dashboard in R. Along the way we will learn about two very powerful packages called leaflet
and shiny
. Here’s a screenshot of what the final product of this tutorial will look like:
On the left of the dashboard you can select a variable and a color palette. The main body of the dashboard then changes accordingly. (Unfortunately, RPubs didn’t allow me to host the dashboard directly. But if you click on the image you will get redirected to the dashboard.)
As always everything starts with the right kind of data. On the coronavirus page of the EU Open Data Portal you will find an Excel file called “COVID-19 cases worldwide”. I downloaded these data in an Excel file called “covid.xlsx” within a folder called “data”. We can then import these data into R using the readxl
package. To manipulate the data we will use the tidyverse
package.
library(readxl)
library(tidyverse)
# import data
covid <- read_excel("data/covid.xlsx") # read excel file
covid <- covid[complete.cases(covid),] # remove NAs
covid <- covid %>%
select(-day, -month, -year) %>% # drop variables
rename( # rename variables
date = dateRep,
country = countriesAndTerritories,
code2 = geoId,
code3 = countryterritoryCode
)
head(covid)
## # A tibble: 6 x 8
## date cases deaths country code2 code3 popData2019 continentExp
## <dttm> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 2020-07-01 00:00:00 279 13 Afghani… AF AFG 38041757 Asia
## 2 2020-06-30 00:00:00 271 12 Afghani… AF AFG 38041757 Asia
## 3 2020-06-29 00:00:00 351 18 Afghani… AF AFG 38041757 Asia
## 4 2020-06-28 00:00:00 165 20 Afghani… AF AFG 38041757 Asia
## 5 2020-06-27 00:00:00 276 8 Afghani… AF AFG 38041757 Asia
## 6 2020-06-26 00:00:00 460 36 Afghani… AF AFG 38041757 Asia
As we can see, we are dealing with a panel data set. One observation refers to one instance in time for one specific country. For example, on July 1st, 279 people tested positive on coronavirus in Afghanistan. The day before, 271 people had tested positive on coronavirus. For our purposes we need to get rid of the time dimension and aggregate these data per country. To to do so we use the commands group_by()
and summarise()
which are both included in the tidyverse
package.
# aggregate data by country
covid <- covid %>%
select(country,code3,cases,deaths,popData2019) %>%
group_by(country) %>%
summarise(code = code3[1],
cases = sum(cases),
deaths = sum(deaths),
population = popData2019[1]
)
head(covid)
## # A tibble: 6 x 5
## country code cases deaths population
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 31517 746 38041757
## 2 Albania ALB 2535 62 2862427
## 3 Algeria DZA 13907 912 43053054
## 4 Andorra AND 855 52 76177
## 5 Angola AGO 284 13 31825299
## 6 Anguilla AIA 3 0 14872
As you can see, we are now left with a single row of data per country. This row summarizes how many cases and deaths were reported in that country during the first half of 2020. Next, we compute three new variables that will tell us how lethal coronavirus has been in the different countries and how many people were infected / died in relation to the population size.
# new variables
covid$fatality <- covid$deaths/covid$cases
covid$casesMill <- covid$cases/covid$population*1000000
covid$deathsMill <- covid$deaths/covid$population*1000000
Of course, if we want to visualize the data on a map, we will need some geographic data. One possible source can be found in this GitHub repository. Simply copy-paste these data and save the file as “geo.json”. This will create a .json file instead of the typical .txt file. We import these data into R using the read_json()
function of the geojsonio
package. The data come in a somewhat special format, the “SpatialPolygonsDataFrame”:
library(geojsonio)
geo <- geojson_read("data/geo.json", what = "sp")
class(geo)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
Using the “@” operator, we can access the identifying variables in the larger spatial polygons data object:
# use "@" to view identifiers
head(geo@data)
## id name
## 1 AFG Afghanistan
## 2 AGO Angola
## 3 ALB Albania
## 4 ARE United Arab Emirates
## 5 ARG Argentina
## 6 ARM Armenia
Now, we need to add the COVID-19 data from above to these geographic data. We can do this using a very simple for-loop:
# add variables = 0
geo@data$cases <- 0
geo@data$deaths <- 0
geo@data$casesMill <- 0
geo@data$deathsMill <- 0
geo@data$fatality <- 0
# auxiliary function
'%!in%' <- function(x,y)!('%in%'(x,y))
# loop through each row and add covid data
for(i in 1:nrow(geo@data)){
id = geo@data$id[i]
if(id %!in% covid$code) next()
geo@data$cases[i] <- covid$cases[covid$code == id]
geo@data$deaths[i] <- covid$deaths[covid$code == id]
geo@data$fatality[i] <- covid$fatality[covid$code == id]
geo@data$casesMill[i] <- covid$casesMill[covid$code == id]
geo@data$deathsMill[i] <- covid$deathsMill[covid$code == id]
}
rm(i, id)
head(geo@data)
## id name cases deaths casesMill deathsMill fatality
## 1 AFG Afghanistan 31517 746 828.484342 19.6100301 0.023669766
## 2 AGO Angola 284 13 8.923718 0.4084801 0.045774648
## 3 ALB Albania 2535 62 885.612105 21.6599410 0.024457594
## 4 ARE United Arab Emirates 48667 315 4981.001023 32.2398200 0.006472558
## 5 ARG Argentina 64517 1307 1440.733084 29.1866972 0.020258227
## 6 ARM Armenia 25542 443 8635.682524 149.7771262 0.017343982
leaflet
To draw a map with different colors using the leaflet
package, we first have to define a function that turns the numeric values of a particular variable into a color code. We can do that using the colorBin()
function. In this function, we first need to specify a “domain”, which is really just one of the variables in our dataset, e.g., “fatality”. Next, we have to determine how many different shades of colors we would like to have in our plot. Put differently, this means into how many “bins” or “slices” we group the different numerical values of our domain. Here, we will select 20. An argument called “palette” determines the overall color palette to be used. For now, we will work with the built-in “cm.colors” palette. Lastly, the Boolean argument “pretty” specifies if the different bins should correspond to nice round values. Note this can lead to a number of bins that is slightly different than what we specified with the “bins” argument. However, “pretty” is nonetheless very useful in case you would like to add a legend to your plot (which we will skip in this tutorial).
library(leaflet)
# function to make colors
makeCol <- colorBin(
domain = geo@data$fatality,
bins = 20,
palette = heat.colors(20),
pretty = T
)
# colors for 0%, 1% and 2% fatality rate
makeCol(c(0,0.01,0.02))
## [1] "#FF0000" "#FF0E00" "#FF1C00"
The three outputs “#FF0000”, “#FF0E00” and “#FF1C00” are the three color codes corresponding to fatality rates of 0%, 1% and 2%. Now, we are ready to create a “leaflet” map object. The different polygons of this object will be the different colors. The setView()
function determines were the initial center of the map will be. Of course the user can move the map around and zoom in and out later on. Notice also the “smoothFactor” argument in the addPolygons()
function. This argument controls how precise the different polygon borders should be drawn. If you choose a high value, the polygons will be very precise, but the map will adjust more slowly if the user changes the zoom level.
# make colors
colors <- makeCol(geo@data$fatality)
# make map
map <- leaflet(geo) %>%
addPolygons(
stroke = F, # soft borders?
smoothFactor = 0.5, # border resolution
fillOpacity = .85, # opacity of colors
color = colors # which colors
) %>%
setView(lng = 20, # longitude
lat = 30, # latitude
zoom = 1 # zoom-factor
)
# show map
map
shiny
The shiny
package allows you to create interactive web applications with R. Every “Shiny” consists of two elements: a “user interface” (or simply “UI”) script and a “server” function. The UI script determines the visual appearance of the shiny and creates all the buttons and input fields that the user can interactive with. The server function does all the work in the background. Its inputs will be the decision made by the user, its outputs will be whatever the shiny is supposed to do, e.g., plot a variable or make a map. Let’s open a new R script and create a very basic Shiny.
library(shiny)
# some data
df <- data.frame(
name = c("Max","Michael","Tim"),
height = c(180,170,190)
)
# ui script
ui <- fluidPage( # a bare bones layout without any structure
# a simple dropdown menu
selectInput(inputId = "color",
label = "Make your selection:",
choices = c("Draw red bars" = 2,
"Draw green bars" = 3,
"Draw blue bars" = 4),
),
# the output plot
plotOutput("our_plot")
)
# server function
server <- function(input,output){
observe({ # a wrapper that observes the user's input
output$our_plot <- renderPlot( # send plot to output
barplot(df$height, col = input$color) # the barplot
)
})
}
# run shiny
shinyApp(ui,server)
As before, I can only show you a screenshot:
Of course this does not replace a full tutorial series on how to use Shiny. But you should get the main idea. If you want to learn more about Shiny, you can click here.