Introduction

This website was created in partial fulfilment of the Developing Data Products Course which comprises one of the five courses necessary for the Data Science: Statistics and Machine Learning Specialization offered by Johns Hopikins University through Coursera. This assignment challenged candidates to create a web page using R Markdown that features a map created with Leaflet. Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. This R package makes it easy to integrate and control Leaflet maps in R.Once completed, candidates were required to host their webpage on either GitHub Pages, RPubs, or NeoCities.

Rationale

The Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease was first identified in December 2019 in Wuhan, the capital of China’s Hubei province, and has since spread globally, resulting in the ongoing 2019–20 coronavirus pandemic. For this coursework project, I have opted to use Leaflet to map the incidence of the Novel Coronavirus among CARICOM Member States. All CARICOM countries are classified as developing countries. They are all relatively small in terms of population and size, and diverse in terms of geography and population, culture and levels of economic and social development. While the pandemic was slow to reach the CARICOM region, the begining of March saw the onset of the pandemic among CARICOM member states.

Data Sources

With a view to map the spread of the disease thus far, I have elected to use two main data sources. Firstly, to obtain the most current data on the incidence of COVID-19, I have opted to utilise the data colelcted by the Johns Hopkins Coronavirus Resource Centre. The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE is compiled from a cross section of sources daily. At the time of the preparation of this project, these included the following:

To supplement this data with relevant socio-demographic data, I have opted to utilise the World Development Indicator Database maintained by the World Bank Group. The World Development Indicators is a compilation of relevant, high-quality, and internationally comparable statistics about global development and the fight against poverty. The database contains 1,600 time series indicators for 217 economies and more than 40 country groups, with data for many indicators going back more than 50 years.

The following code outlines the procecss for getting and cleaning the data relevant to mapping the incidence and spread of the Novel Coronavirus in the CARICOM region.

Importing and Cleaning Time Series Data from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository

In order to map the pandemic, the following script draws on the most recent data collected by the Johns Hopkins Coronavirus Resource Centre, then attaches the ISO 3166-1 alpha-3 three-letter country codes to the countries to facilitate merging with the World Bank data.

# Import Required Data ----------------------------------------------------
# Import most recent COVID-19 Confirmed Cases and Deaths
library(readr)
library(tidyverse)
library(tidyselect)
library(countrycode)

# Total Confirmed Cases
covid_19_confirmed <- tbl_df(read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv", col_types = cols()))

covid_19_confirmed <- covid_19_confirmed %>% 
  select('Country/Region', Lat, Long, last_col()) %>% 
  rename(country = 'Country/Region',
         lat = Lat,
         lng = Long,
         confirmed = last_col()) %>%  
  mutate(iso3c = countrycode(country,
                             origin = "country.name",
                             destination = "iso3c"))

# Total Deaths
covid_19_deaths <- tbl_df(read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv", col_types = cols()))

covid_19_deaths <- covid_19_deaths %>% 
  select('Country/Region', last_col()) %>% 
  rename(country = 'Country/Region',
         deaths = last_col()) %>%  
  mutate(iso3c = countrycode(country,
                             origin = "country.name",
                             destination = "iso3c")) %>% 
  select(-country)

covid_19_confirmed_and_deaths <- covid_19_confirmed %>% 
  left_join(covid_19_deaths, 
            by = 'iso3c') %>% 
  select(-iso3c,iso3c)

Importing and Cleaning Socioecomic Data from the World Development Indicators Database

Once the dataframe of confirmed COVID-19 cases and deaths has been created, one can proceed to import other relevant socioeconomic variables. For this analysis, I have imported the following indicators:

  • GDP per capita (constant 2010 US$)
  • Population, total
  • Population ages 0-14 (% of total population)
  • Population ages 15-64 (% of total population)
  • Population ages 65 and above (% of total population)
# Import Relevant Socio-Economic Data from the World Development Indicators Database

library(wbstats)
series <- c("NY.GDP.PCAP.KD", "SP.POP.TOTL", "SP.POP.0014.TO.ZS", "SP.POP.1564.TO.ZS", "SP.POP.65UP.TO.ZS", "SH.STA.DIAB.ZS", "SH.DTH.NCOM.ZS", "SH.DYN.NCOM.ZS")

wb_data <- wb(indicator = series,
              mrv = 1) %>% 
  select(iso3c, value, indicatorID) %>% 
  spread(indicatorID, value) %>% 
  rename(gdp_capita_2018 = NY.GDP.PCAP.KD,
         pop_2018 = SP.POP.TOTL,
         pop_0_14_2018 = SP.POP.0014.TO.ZS,
         pop_15_64_2018 = SP.POP.1564.TO.ZS,
         pop_65_over_2018 = SP.POP.65UP.TO.ZS,
         diabetes_20_79 = SH.STA.DIAB.ZS,
         death_by_ncd = SH.DTH.NCOM.ZS,
         death_by_cvd_ca_dm_30_70 = SH.DYN.NCOM.ZS)

Merging the COVID-19 cases with the World Bank Data

Now that both dataframes have been generated, I proceed to merge the two and select CARICOM member states for mapping. In order to facilitate the analysis of the severity of the pandemic country by country, three new variables will be generated:

  • Confirmed Cases per 100,000
  • Deaths per 100,000
  • Mortality Rate

The script also amends the GPS Coordinates for the country of Belize owing to the fact that the source database had the incorrect coordinates.1

# Merge COVID-19 Confirmed Cases, World Bank Data and Filter CARICOM Countries
caricom_covid <- covid_19_confirmed_and_deaths %>% 
    filter(country == "Antigua and Barbuda"|
           country == "Bahamas"|
           country == "Barbados"|
           country == "Belize" |
           country == "Dominica"|
           country == "Grenada"|
           country == "Guyana"|
           country == "Haiti"|
           country == "Jamaica"|
           country == "Montserrat"|
           country == "Saint Kitts and Nevis"|
           country == "Saint Lucia"|
           country == "Saint Vincent and the Grenadines"|
           country == "Suriname"|
           country == "Trinidad and Tobago") %>% 
  left_join(wb_data, 
            by = 'iso3c') %>% 
  mutate(lat = if_else(country == "Belize", 17.1899, lat),
         lng = if_else(country == "Belize", -88.4976, lng),
         confirmed_per_100k = confirmed/pop_2018*100000,
         deaths_per_100k = deaths/pop_2018*100000,
         mortality_rate = deaths/confirmed*100) %>% 
  unite(popup, 
        c("country","confirmed"), 
        sep = ",", 
        remove = FALSE) %>% 
  select(-popup,popup)

# write.csv(caricom_covid, sprintf("caricom_covid_data_%s.csv", Sys.Date()))
caricom_table <- caricom_covid %>% 
  select(country,
         confirmed_per_100k,
         deaths_per_100k,
         gdp_capita_2018) %>% 
  arrange(country)

library(broom)
library(knitr)
kable(caricom_table,
      caption = "Table 1: Confirmed COVID-19 Cases and Deaths per 100,000 by CARICOM Memember State as at June 3rd 2020")
Table 1: Confirmed COVID-19 Cases and Deaths per 100,000 by CARICOM Memember State as at June 3rd 2020
country confirmed_per_100k deaths_per_100k gdp_capita_2018
Antigua and Barbuda 97.81915 3.089026 15703.028
Bahamas 698.62022 16.175330 27477.960
Barbados 62.71231 2.438812 16099.828
Belize 349.68349 4.611211 4149.862
Dominica 30.63725 0.000000 7055.470
Grenada 21.42800 0.000000 9330.041
Guyana 217.56183 6.132101 4159.609
Haiti 74.43792 1.900014 714.483
Jamaica 116.57648 1.288887 4866.987
Saint Kitts and Nevis 32.18295 0.000000 17241.592
Saint Lucia 14.77105 0.000000 9350.748
Saint Vincent and the Grenadines 56.06344 0.000000 6852.500
Suriname 764.91472 15.824635 7966.683
Trinidad and Tobago 185.52330 2.795753 15105.119

Mapping COVID-19 Cases across CARICOM Member States

Now that the clean dataset is complete, I use leaflet to depict the situation across CARICOM Member States. The markers identify the country and the total number of confirmed cases, while the yellow and red circles provide an illustration of the confirmed cases and deaths per 100,000 persons.

library(leaflet)
library(leaflet.extras)
caricom_covid_map <- leaflet() %>% 
  addProviderTiles(providers$CartoDB.DarkMatter) %>% 
  addMarkers(lat = caricom_today$lat, 
             lng = caricom_today$lng,
             popupOptions = markerClusterOptions,
             popup = paste("<b>", caricom_today$country, "</b> <br>",
                           "Date: ", caricom_today$date, "<br>",
                           "Confirmed Cases: ", caricom_today$confirmed, "<br>",
                           "Deaths:", caricom_today$deaths,"<br>",
                           "Recovered", caricom_today$recovered,"<br>",
                           "Income Group:", caricom_today$income, "<br>",
                           "Population Aged 65+:", caricom_today$pop_65_over_2018,"<br>",
                           "Diabetes prev. (% pop 20-79):", caricom_today$diabetes_20_79)) %>% 
  addCircleMarkers(lat = caricom_today$lat, 
                   lng = caricom_today$lng,
                   weight = 1,
                   radius = log(caricom_today$confirmed_per_100k)*8,
                   color = 'yellow') %>% 
  addCircleMarkers(lat = caricom_today$lat, 
                   lng = caricom_today$lng,
                   weight = 1,
                   radius = caricom_today$deaths_per_100k,
                   color = 'red') %>%
  addLegend("bottomright", 
            colors= c('red', 'yellow'), 
            labels= c('Deaths per 100,000', 'Confirmed Cases per 100,000'), 
            title="Legend") 
caricom_covid_map 

Developer

Yohance Nicholas | Consultant Economist @ Kairi Consultants Limited | LinkedIn | GitHub


  1. This was done with the assistance of the Vectorised if function provided by dplr↩︎