Using the R code shown below, I download Covid-19 data from the Johns Hopkins Covid-19 Dashboard, identify the ten countries with the most cumulative deaths, and generate a table of their cumulative cases and cumulative deaths. For each top-ten country I also show its cumulative cases (respectively, deaths) as a percentage of the world’s cumulative cases (respectively, deaths). I also download 2020 population data from the World Bank and show for each top-ten country its population, its share of the world’s population, and its cumulative cases (and deaths) per million people.
A package is code that its author(s) have generously made available to all.
# install.packages("tidyverse")
# install.packages("lubridate")
library(tidyverse)
library(lubridate)
library(WDI)
The read_csv command downloads the data from the place in the Internet where it has been stored. The downloaded data is then tidied up in several ways so that the R programming language can work with it. For example, a date expressed as the text string “1/22/2020” in the downloaded data needs to be made understandable to R as the calendar date January 22, 2020.
deaths <- read_csv("https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv") %>%
rename(country = `Country/Region`) %>%
pivot_longer(cols = !(1:4), names_to = "date", values_to = "cum.deaths") %>%
select(-any_of(c("Lat", "Long", "Province/State"))) %>%
mutate(date = mdy(date)) %>%
filter(date == max(date)) %>%
select(-date)
# This calculates cumulative deaths for the world.
deaths.world <- data.frame(country = "World", deaths = sum(deaths$cum.deaths))
# Next I find the ten highest national death totals
deaths <- deaths %>%
group_by(country) %>%
summarise(deaths = sum(cum.deaths)) %>%
slice_max(order_by = deaths, n = 10)
# Next I combine the top ten death tolls and the world's death toll.
deaths <- rbind(deaths, deaths.world)
I then repeat what I did above for cumulative deaths, but now for cumulative cases.
cases <- read_csv("https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv") %>%
rename(country = `Country/Region`) %>%
pivot_longer(cols = !(1:4), names_to = "date", values_to = "cum.cases") %>%
select(-any_of(c("Lat", "Long", "Province/State"))) %>%
mutate(date = mdy(date)) %>%
filter(date == max(date)) %>%
select(-date)
cases.world <- data.frame(country = "World", cases = sum(cases$cum.cases))
cases <- cases %>%
group_by(country) %>%
summarise(cases = sum(cum.cases)) %>%
filter(country %in% deaths$country)
cases <- rbind(cases, cases.world)
Here the code downloads the 2020 population data using the World Development Indicators package. This package, makes the task of downloading population data a lot less painful than the method I used above to download the data on cases and deaths. Naturally, I am grateful to those who wrote the code in the WDI package and made it available for use by everybody.
The country names in the deaths-and-cases data are not always the same as the country names in the population data. In particular, the former has “Russia” and “US” whereas the latter has “Russian Federation” and “United States”. So, I had to deal with this annoyance in the code below.
pop.2020 <- WDI(indicator = "SP.POP.TOTL", start = 2020, end = 2020, extra = FALSE) %>%
rename(population.2020 = SP.POP.TOTL) %>%
select(country, population.2020) %>%
filter(country %in% c(deaths$country, "Russian Federation", "United States"))
Here I combine the data on cases, the data on deaths, and the data on population for all countries into one data object.
deaths$country[deaths$country == "US"] = "United States"
cases$country[cases$country == "US"] = "United States"
pop.2020$country[pop.2020$country == "Russian Federation"] = "Russia"
combined <- merge(merge(cases, deaths), pop.2020)
For each country, I calculate cases.pct, which is its cumulative deaths as a percentage of the world’s. Similarly, I calculatedeaths.pct and population.pct. I also calculate cases.per.million and deaths.per.million for each country.
Finally, I prepare a table for display. Done!
combined <- combined %>%
mutate(cases.pct = 100*cases/combined$cases[combined$country == "World"],
deaths.pct = 100*deaths/combined$deaths[combined$country == "World"],
population.pct = 100*population.2020/combined$population.2020[combined$country == "World"],
cases.per.million = (10^6)*cases/population.2020,
deaths.per.million = (10^6)*deaths/population.2020) %>%
arrange(desc(deaths))
combined.display <- combined %>%
mutate(cases.pct = format(cases.pct, digits=2, nsmall = 2),
deaths.pct = format(deaths.pct, digits=2, nsmall = 2),
population.pct = format(population.pct, digits=2, nsmall = 2),
cases.per.million = format(cases.per.million, digits=2, nsmall = 2),
deaths.per.million = format(deaths.per.million, digits=2, nsmall = 2)) %>%
knitr::kable(caption = "The Pandemic's Top Ten National Death Tolls", format.args = list(big.mark = ",")) %>%
kableExtra::kable_styling(full_width = FALSE)
combined.display
| country | cases | deaths | population.2020 | cases.pct | deaths.pct | population.pct | cases.per.million | deaths.per.million |
|---|---|---|---|---|---|---|---|---|
| World | 388,048,849 | 5,712,849 | 7,761,620,146 | 100.00 | 100.00 | 100.00 | 49995.86 | 736.04 |
| United States | 75,982,593 | 897,237 | 329,484,123 | 19.58 | 15.71 | 4.25 | 230610.79 | 2723.16 |
| Brazil | 26,107,894 | 630,301 | 212,559,409 | 6.73 | 11.03 | 2.74 | 122826.34 | 2965.29 |
| India | 41,952,712 | 500,055 | 1,380,004,385 | 10.81 | 8.75 | 17.78 | 30400.42 | 362.36 |
| Russia | 12,090,707 | 326,642 | 144,104,080 | 3.12 | 5.72 | 1.86 | 83902.60 | 2266.71 |
| Mexico | 5,027,870 | 307,493 | 128,932,753 | 1.30 | 5.38 | 1.66 | 38996.06 | 2384.91 |
| Peru | 3,308,693 | 206,406 | 32,971,846 | 0.85 | 3.61 | 0.42 | 100349.04 | 6260.07 |
| United Kingdom | 17,727,681 | 158,266 | 67,215,293 | 4.57 | 2.77 | 0.87 | 263744.76 | 2354.61 |
| Italy | 11,348,701 | 147,734 | 59,554,023 | 2.92 | 2.59 | 0.77 | 190561.45 | 2480.67 |
| Indonesia | 4,353,370 | 144,320 | 273,523,621 | 1.12 | 2.53 | 3.52 | 15915.88 | 527.63 |
| Colombia | 5,930,393 | 135,043 | 50,882,884 | 1.53 | 2.36 | 0.66 | 116549.86 | 2654.00 |
The United States has 19.58 percent of the world’s cases, 15.71 percent of the world’s deaths, and 4.25 percent of the world’s population.
I began posting this data table on my Facebook page in the spring of 2020. But back then I prepared the table in a somewhat laborious manner.
It all took some time.
But now it’s all automated! All I need to do is:
knit button. RStudio downloads the latest data and prepares the table automatically in line with my coded instructions.publish button to upload the latest table to this web page.It all takes two minutes or so from start to finish. And I can update the table whenever I like, as many times as I like. Every update requires the same two clicks and takes the same two minutes.
This is the magic of automation. Maybe at some point in the future, I’ll learn to automate this process even further. I’ll tell the computer when to download the data, prepare the table, and then update this web page. The computer would be automatically updating the table on this web page on schedule while I am taking a nap. People will wonder how I find the time to update the table so often!
The End