Covid-19: Top Ten Tolls

Using the R code shown below, I download Covid-19 data from the Johns Hopkins Covid-19 Dashboard, identify the ten countries with the most cumulative deaths, and generate a table of their cumulative cases and cumulative deaths. For each top-ten country I also show its cumulative cases (respectively, deaths) as a percentage of the world’s cumulative cases (respectively, deaths). I also download 2020 population data from the World Bank and show for each top-ten country its population, its share of the world’s population, and its cumulative cases (and deaths) per million people.

R Packages used here

A package is code that its author(s) have generously made available to all.

# install.packages("tidyverse")
# install.packages("lubridate")
library(tidyverse)
library(lubridate)
library(WDI)

Download the Data on Deaths and Prepare them for Tabulation

The read_csv command downloads the data from the place in the Internet where it has been stored. The downloaded data is then tidied up in several ways so that the R programming language can work with it. For example, a date expressed as the text string “1/22/2020” in the downloaded data needs to be made understandable to R as the calendar date January 22, 2020.

deaths <- read_csv("https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv") %>%
  rename(country = `Country/Region`) %>%
  pivot_longer(cols = !(1:4), names_to = "date", values_to = "cum.deaths") %>%
  select(-any_of(c("Lat", "Long", "Province/State"))) %>%
  mutate(date = mdy(date)) %>%
  filter(date == max(date)) %>%
  select(-date)

# This calculates cumulative deaths for the world.
deaths.world <- data.frame(country = "World", deaths = sum(deaths$cum.deaths)) 

# Next I find the ten highest national death totals
deaths <- deaths %>%
  group_by(country) %>%
  summarise(deaths = sum(cum.deaths)) %>%
  slice_max(order_by = deaths, n = 10)

# Next I combine the top ten death tolls and the world's death toll.
deaths <- rbind(deaths, deaths.world)

Download the Data on Confirmed Cases and Prepare them for Tabulation

I then repeat what I did above for cumulative deaths, but now for cumulative cases.

cases <- read_csv("https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")  %>%
  rename(country = `Country/Region`) %>%
  pivot_longer(cols = !(1:4), names_to = "date", values_to = "cum.cases") %>%
  select(-any_of(c("Lat", "Long", "Province/State"))) %>%
  mutate(date = mdy(date)) %>%
  filter(date == max(date)) %>%
  select(-date)

cases.world <- data.frame(country = "World", cases = sum(cases$cum.cases))

cases <- cases %>%
  group_by(country) %>%
  summarise(cases = sum(cum.cases)) %>%
  filter(country %in% deaths$country)

cases <- rbind(cases, cases.world)

Download Population Data for 2020

Here the code downloads the 2020 population data using the World Development Indicators package. This package, makes the task of downloading population data a lot less painful than the method I used above to download the data on cases and deaths. Naturally, I am grateful to those who wrote the code in the WDI package and made it available for use by everybody.

The country names in the deaths-and-cases data are not always the same as the country names in the population data. In particular, the former has “Russia” and “US” whereas the latter has “Russian Federation” and “United States”. So, I had to deal with this annoyance in the code below.

pop.2020 <- WDI(indicator = "SP.POP.TOTL", start = 2020, end = 2020, extra = FALSE) %>%
  rename(population.2020 = SP.POP.TOTL) %>%
  select(country, population.2020) %>%   
  filter(country %in% c(deaths$country, "Russian Federation", "United States"))

Combine Three Dataframes

Here I combine the data on cases, the data on deaths, and the data on population for all countries into one data object.

deaths$country[deaths$country == "US"] = "United States"
cases$country[cases$country == "US"] = "United States"
pop.2020$country[pop.2020$country == "Russian Federation"] = "Russia"
combined <- merge(merge(cases, deaths), pop.2020)

Do Some Calculations

For each country, I calculate cases.pct, which is its cumulative deaths as a percentage of the world’s. Similarly, I calculatedeaths.pct and population.pct. I also calculate cases.per.million and deaths.per.million for each country.

Finally, I prepare a table for display. Done!

combined <- combined %>%
  mutate(cases.pct = 100*cases/combined$cases[combined$country == "World"],
         deaths.pct = 100*deaths/combined$deaths[combined$country == "World"],
         population.pct = 100*population.2020/combined$population.2020[combined$country == "World"],
         cases.per.million = (10^6)*cases/population.2020,
         deaths.per.million = (10^6)*deaths/population.2020) %>%
  arrange(desc(deaths)) 

combined.display <- combined %>%
  mutate(cases.pct = format(cases.pct, digits=2, nsmall = 2),
         deaths.pct = format(deaths.pct, digits=2, nsmall = 2),
         population.pct = format(population.pct, digits=2, nsmall = 2),
         cases.per.million = format(cases.per.million, digits=2, nsmall = 2),
         deaths.per.million = format(deaths.per.million, digits=2, nsmall = 2)) %>%
  knitr::kable(caption = "The Pandemic's Top Ten National Death Tolls", format.args = list(big.mark = ",")) %>%
  kableExtra::kable_styling(full_width = FALSE)

combined.display

The Pandemic’s Top Ten National Death Tolls
country	cases	deaths	population.2020	cases.pct	deaths.pct	population.pct	cases.per.million	deaths.per.million
World	388,048,849	5,712,849	7,761,620,146	100.00	100.00	100.00	49995.86	736.04
United States	75,982,593	897,237	329,484,123	19.58	15.71	4.25	230610.79	2723.16
Brazil	26,107,894	630,301	212,559,409	6.73	11.03	2.74	122826.34	2965.29
India	41,952,712	500,055	1,380,004,385	10.81	8.75	17.78	30400.42	362.36
Russia	12,090,707	326,642	144,104,080	3.12	5.72	1.86	83902.60	2266.71
Mexico	5,027,870	307,493	128,932,753	1.30	5.38	1.66	38996.06	2384.91
Peru	3,308,693	206,406	32,971,846	0.85	3.61	0.42	100349.04	6260.07
United Kingdom	17,727,681	158,266	67,215,293	4.57	2.77	0.87	263744.76	2354.61
Italy	11,348,701	147,734	59,554,023	2.92	2.59	0.77	190561.45	2480.67
Indonesia	4,353,370	144,320	273,523,621	1.12	2.53	3.52	15915.88	527.63
Colombia	5,930,393	135,043	50,882,884	1.53	2.36	0.66	116549.86	2654.00

The United States has 19.58 percent of the world’s cases, 15.71 percent of the world’s deaths, and 4.25 percent of the world’s population.

Concluding note

I began posting this data table on my Facebook page in the spring of 2020. But back then I prepared the table in a somewhat laborious manner.

First, I would go to the Johns Hopkins Covid-19 Dashboard, figure out the top ten national death tolls by eyeballing the numbers, and write down their cases and deaths numbers in an Excel spreadsheet.
I would then cut and paste the population data from the World Bank web site.
Then I would do some calculations in Excel for the various columns of my table.
I would paste the Excel table into Microsoft Paint to prepare a PNG graphic of the data.
Finally, I would post the graphic to Facebook.

It all took some time.

But now it’s all automated! All I need to do is:

open the very document you are now reading in an app called RStudio and just press the knit button. RStudio downloads the latest data and prepares the table automatically in line with my coded instructions.
I then press the publish button to upload the latest table to this web page.

It all takes two minutes or so from start to finish. And I can update the table whenever I like, as many times as I like. Every update requires the same two clicks and takes the same two minutes.

This is the magic of automation. Maybe at some point in the future, I’ll learn to automate this process even further. I’ll tell the computer when to download the data, prepare the table, and then update this web page. The computer would be automatically updating the table on this web page on schedule while I am taking a nap. People will wonder how I find the time to update the table so often!

The End