Setup

As a reminder, the instructions to this project are formatted in markdown. It will be easiest to read the instructions if you click on “Preview” in RStudio. Doing so will render the markdown to an HTML page for easier reading!

Welcome to your off-platform project focused on Data Manipulation in R! The goal of this project is to get experience working with multiple .csv files in RStudio. We’ll be creating several data frames and manipulating them to try to find new insights.

In this project we will be looking at data associated with the Coronavirus pandemic from 2020. While this topic isn’t your typical lighthearted Codecademy assignment, we felt like it is a excellent way to use your new R skills to understand the world around you. This assignment was written on March 23rd, 2020 — since you will be taking this assignment in the future, we’re hoping that this can be an interesting historical analysis rather than an ongoing issue.

Collect and Investigate the Data

We’ve provided three datasets named corona_confirmed.csv, corona_recovered.csv and corona_deaths.csv. For the entirety of this project, we’ll be using these datasets. You can find more recent versions of this data at Johns Hopkins’ data repository on GitHub.

This is a good opportunity to practice downloading data from GitHub or other repositories. We’re using files found in the csse_covid_19_time_series folder. In addition, we recommend looking at some of the projects linked on the GitHub page — that can help give you a better sense of what type of information you can extract from this data.

For now, load these three .csv files into three separate dataframes. Note that we used the read_csv() function from the readr library. If you load the csv files using a different method, you might have slightly different column names.

Inspect the dataframes to get a sense of how the data is stored (we recommend using head()).

library(dplyr)
library(readr)

# Load the data
df_confirmed <- read.csv('corona_confirmed.csv')
df_deaths <-read.csv('corona_deaths.csv')
df_recovered <- read.csv('corona_recovered.csv')

# Inspect the data
head(df_confirmed)

##     Province.State Country.Region     Lat      Long X1.22.20 X1.23.20 X1.24.20
## 1                        Thailand 15.0000  101.0000        2        3        5
## 2                           Japan 36.0000  138.0000        2        1        2
## 3                       Singapore  1.2833  103.8333        0        1        3
## 4                           Nepal 28.1667   84.2500        0        0        0
## 5                        Malaysia  2.5000  112.5000        0        0        0
## 6 British Columbia         Canada 49.2827 -123.1207        0        0        0
##   X1.25.20 X1.26.20 X1.27.20 X1.28.20 X1.29.20 X1.30.20 X1.31.20 X2.1.20
## 1        7        8        8       14       14       14       19      19
## 2        2        4        4        7        7       11       15      20
## 3        3        4        5        7        7       10       13      16
## 4        1        1        1        1        1        1        1       1
## 5        3        4        4        4        7        8        8       8
## 6        0        0        0        1        1        1        1       1
##   X2.2.20 X2.3.20 X2.4.20 X2.5.20 X2.6.20 X2.7.20 X2.8.20 X2.9.20 X2.10.20
## 1      19      19      25      25      25      25      32      32       32
## 2      20      20      22      22      45      25      25      26       26
## 3      18      18      24      28      28      30      33      40       45
## 4       1       1       1       1       1       1       1       1        1
## 5       8       8      10      12      12      12      16      16       18
## 6       1       1       1       2       2       4       4       4        4
##   X2.11.20 X2.12.20 X2.13.20 X2.14.20 X2.15.20 X2.16.20 X2.17.20 X2.18.20
## 1       33       33       33       33       33       34       35       35
## 2       26       28       28       29       43       59       66       74
## 3       47       50       58       67       72       75       77       81
## 4        1        1        1        1        1        1        1        1
## 5       18       18       19       19       22       22       22       22
## 6        4        4        4        4        4        4        5        5
##   X2.19.20 X2.20.20 X2.21.20 X2.22.20 X2.23.20 X2.24.20 X2.25.20 X2.26.20
## 1       35       35       35       35       35       35       37       40
## 2       84       94      105      122      147      159      170      189
## 3       84       84       85       85       89       89       91       93
## 4        1        1        1        1        1        1        1        1
## 5       22       22       22       22       22       22       22       22
## 6        5        5        6        6        6        6        7        7
##   X2.27.20 X2.28.20 X2.29.20 X3.1.20 X3.2.20 X3.3.20 X3.4.20 X3.5.20 X3.6.20
## 1       40       41       42      42      43      43      43      47      48
## 2      214      228      241     256     274     293     331     360     420
## 3       93       93      102     106     108     110     110     117     130
## 4        1        1        1       1       1       1       1       1       1
## 5       23       23       25      29      29      36      50      50      83
## 6        7        7        8       8       8       9      12      13      21
##   X3.7.20 X3.8.20 X3.9.20 X3.10.20 X3.11.20 X3.12.20 X3.13.20 X3.14.20 X3.15.20
## 1      50      50      50       53       59       70       75       82      114
## 2     461     502     511      581      639      639      701      773      839
## 3     138     150     150      160      178      178      200      212      226
## 4       1       1       1        1        1        1        1        1        1
## 5      93      99     117      129      149      149      197      238      428
## 6      21      27      32       32       39       46       64       64       73
##   X3.16.20 X3.17.20 X3.18.20 X3.19.20 X3.20.20 X3.21.20 X3.22.20
## 1      147      177      212      272      322      411      599
## 2      825      878      889      924      963     1007     1086
## 3      243      266      313      345      385      432      455
## 4        1        1        1        1        1        1        2
## 5      566      673      790      900     1030     1183     1306
## 6      103      103      186      231      271      424      424

head(df_deaths)

##     Province.State Country.Region     Lat      Long X1.22.20 X1.23.20 X1.24.20
## 1                        Thailand 15.0000  101.0000        0        0        0
## 2                           Japan 36.0000  138.0000        0        0        0
## 3                       Singapore  1.2833  103.8333        0        0        0
## 4                           Nepal 28.1667   84.2500        0        0        0
## 5                        Malaysia  2.5000  112.5000        0        0        0
## 6 British Columbia         Canada 49.2827 -123.1207        0        0        0
##   X1.25.20 X1.26.20 X1.27.20 X1.28.20 X1.29.20 X1.30.20 X1.31.20 X2.1.20
## 1        0        0        0        0        0        0        0       0
## 2        0        0        0        0        0        0        0       0
## 3        0        0        0        0        0        0        0       0
## 4        0        0        0        0        0        0        0       0
## 5        0        0        0        0        0        0        0       0
## 6        0        0        0        0        0        0        0       0
##   X2.2.20 X2.3.20 X2.4.20 X2.5.20 X2.6.20 X2.7.20 X2.8.20 X2.9.20 X2.10.20
## 1       0       0       0       0       0       0       0       0        0
## 2       0       0       0       0       0       0       0       0        0
## 3       0       0       0       0       0       0       0       0        0
## 4       0       0       0       0       0       0       0       0        0
## 5       0       0       0       0       0       0       0       0        0
## 6       0       0       0       0       0       0       0       0        0
##   X2.11.20 X2.12.20 X2.13.20 X2.14.20 X2.15.20 X2.16.20 X2.17.20 X2.18.20
## 1        0        0        0        0        0        0        0        0
## 2        0        0        1        1        1        1        1        1
## 3        0        0        0        0        0        0        0        0
## 4        0        0        0        0        0        0        0        0
## 5        0        0        0        0        0        0        0        0
## 6        0        0        0        0        0        0        0        0
##   X2.19.20 X2.20.20 X2.21.20 X2.22.20 X2.23.20 X2.24.20 X2.25.20 X2.26.20
## 1        0        0        0        0        0        0        0        0
## 2        1        1        1        1        1        1        1        2
## 3        0        0        0        0        0        0        0        0
## 4        0        0        0        0        0        0        0        0
## 5        0        0        0        0        0        0        0        0
## 6        0        0        0        0        0        0        0        0
##   X2.27.20 X2.28.20 X2.29.20 X3.1.20 X3.2.20 X3.3.20 X3.4.20 X3.5.20 X3.6.20
## 1        0        0        0       1       1       1       1       1       1
## 2        4        4        5       6       6       6       6       6       6
## 3        0        0        0       0       0       0       0       0       0
## 4        0        0        0       0       0       0       0       0       0
## 5        0        0        0       0       0       0       0       0       0
## 6        0        0        0       0       0       0       0       0       0
##   X3.7.20 X3.8.20 X3.9.20 X3.10.20 X3.11.20 X3.12.20 X3.13.20 X3.14.20 X3.15.20
## 1       1       1       1        1        1        1        1        1        1
## 2       6       6      10       10       15       16       19       22       22
## 3       0       0       0        0        0        0        0        0        0
## 4       0       0       0        0        0        0        0        0        0
## 5       0       0       0        0        0        0        0        0        0
## 6       0       0       1        1        1        1        1        1        1
##   X3.16.20 X3.17.20 X3.18.20 X3.19.20 X3.20.20 X3.21.20 X3.22.20
## 1        1        1        1        1        1        1        1
## 2       27       29       29       29       33       35       40
## 3        0        0        0        0        0        2        2
## 4        0        0        0        0        0        0        0
## 5        0        2        2        2        3        4       10
## 6        4        4        7        7        8       10       10

head(df_recovered)

##     Province.State Country.Region     Lat      Long X1.22.20 X1.23.20 X1.24.20
## 1                        Thailand 15.0000  101.0000        0        0        0
## 2                           Japan 36.0000  138.0000        0        0        0
## 3                       Singapore  1.2833  103.8333        0        0        0
## 4                           Nepal 28.1667   84.2500        0        0        0
## 5                        Malaysia  2.5000  112.5000        0        0        0
## 6 British Columbia         Canada 49.2827 -123.1207        0        0        0
##   X1.25.20 X1.26.20 X1.27.20 X1.28.20 X1.29.20 X1.30.20 X1.31.20 X2.1.20
## 1        0        2        2        5        5        5        5       5
## 2        0        1        1        1        1        1        1       1
## 3        0        0        0        0        0        0        0       0
## 4        0        0        0        0        0        0        0       0
## 5        0        0        0        0        0        0        0       0
## 6        0        0        0        0        0        0        0       0
##   X2.2.20 X2.3.20 X2.4.20 X2.5.20 X2.6.20 X2.7.20 X2.8.20 X2.9.20 X2.10.20
## 1       5       5       5       5       5       5      10      10       10
## 2       1       1       1       1       1       1       1       1        4
## 3       0       0       0       0       0       0       2       2        2
## 4       0       0       0       0       0       0       0       0        0
## 5       0       0       0       0       0       1       1       1        1
## 6       0       0       0       0       0       0       0       0        0
##   X2.11.20 X2.12.20 X2.13.20 X2.14.20 X2.15.20 X2.16.20 X2.17.20 X2.18.20
## 1       10       10       12       12       12       14       15       15
## 2        9        9        9        9       12       12       12       13
## 3        9       15       15       17       18       18       24       29
## 4        0        1        1        1        1        1        1        1
## 5        3        3        3        3        7        7        7       13
## 6        0        0        0        0        0        0        0        0
##   X2.19.20 X2.20.20 X2.21.20 X2.22.20 X2.23.20 X2.24.20 X2.25.20 X2.26.20
## 1       15       15       17       17       21       21       22       22
## 2       18       18       22       22       22       22       22       22
## 3       34       34       37       37       51       51       53       62
## 4        1        1        1        1        1        1        1        1
## 5       15       15       15       15       15       18       18       18
## 6        0        0        0        0        0        0        0        0
##   X2.27.20 X2.28.20 X2.29.20 X3.1.20 X3.2.20 X3.3.20 X3.4.20 X3.5.20 X3.6.20
## 1       22       28       28      28      31      31      31      31      31
## 2       22       22       32      32      32      43      43      43      46
## 3       62       62       72      72      78      78      78      78      78
## 4        1        1        1       1       1       1       1       1       1
## 5       18       18       18      18      18      22      22      22      22
## 6        3        3        3       3       3       3       3       3       3
##   X3.7.20 X3.8.20 X3.9.20 X3.10.20 X3.11.20 X3.12.20 X3.13.20 X3.14.20 X3.15.20
## 1      31      31      31       33       34       34       35       35       35
## 2      76      76      76      101      118      118      118      118      118
## 3      78      78      78       78       96       96       97      105      105
## 4       1       1       1        1        1        1        1        1        1
## 5      23      24      24       24       26       26       26       35       42
## 6       4       4       4        4        4        4        4        4        4
##   X3.16.20 X3.17.20 X3.18.20 X3.19.20 X3.20.20 X3.21.20 X3.22.20
## 1       35       41       42       42       42       42       44
## 2      144      144      144      150      191      232      235
## 3      109      114      114      114      124      140      144
## 4        1        1        1        1        1        1        1
## 5       42       49       60       75       87      114      139
## 6        4        4        4        4        4        4        4

Looking At March 22nd, 2020

The format of these three data frames are all the same — each row contains information about the number of cases in a certain province, state, or country. Every column (other than the columns containing the latitude, longitude, and country name) represents a date. We have data starting on January 1st, 2020 and ending on March 22nd, 2020 (the day this assignment was written).

Let’s warm up by finding the total number of confirmed cases on March 22nd, 2020. To do this, we’ll use both the select() and sum() functions. When using the select() function, make sure to put the name of the column in backticks (usually found in the top left corner of your keyboard)! This usually isn’t necessary, but since the name of the column contains the / character, we need backticks! Make sure to do this whenever you’re working with these columns.

# Sum the total number of cases on March 22nd
confirmed.X3.22.20 <- df_confirmed %>%
  select(X3.22.20) %>%
  sum()

sprintf('There are %d confirmed cases on March 22nd, 2020.', confirmed.X3.22.20)

## [1] "There are 335955 confirmed cases on March 22nd, 2020."

Filter By Values

Let’s start to filter the data a bit more. Give these three tasks a try:

How many confirmed cases are there in countries on March 22nd that are north of the equator? (If a country is north of the equator, its latitude is greater than 0)
How many confirmed cases are there in March 22nd in Australia?
Can you report both of those values as a percentage of the total number of cases in the world on March 22nd?

You may want to save these results to variables as they will be useful later!

# Filter for countries in the northern hemisphere
confirmed.X3.22.20.north <- df_confirmed %>%
  filter(Lat > 0) %>%
  select(X3.22.20) %>%
  sum()

sprintf('There were %d confirmed cases in the Northern Hemisphere on March 22nd, 2020.', confirmed.X3.22.20.north)

## [1] "There were 329794 confirmed cases in the Northern Hemisphere on March 22nd, 2020."

# Filter for Australia cases
confirmed.X3.22.20.australia <- df_confirmed %>%
  filter(`Country.Region` == 'Australia') %>%
  select(X3.22.20) %>%
  sum()

sprintf('There were %d confirmed cases in Australia on March 22nd, 2020.', confirmed.X3.22.20.australia)

## [1] "There were 1314 confirmed cases in Australia on March 22nd, 2020."

# Display percentages
sprintf('%.2f%% of the confirmed cases were from the Northern Hemisphere on March 22nd, 2020.', confirmed.X3.22.20.north / confirmed.X3.22.20 * 100)

## [1] "98.17% of the confirmed cases were from the Northern Hemisphere on March 22nd, 2020."

sprintf('%.2f%% of the confirmed cases were from Australia on March 22nd, 2020.', confirmed.X3.22.20.australia / confirmed.X3.22.20 * 100)

## [1] "0.39% of the confirmed cases were from Australia on March 22nd, 2020."

Group By Country

Notice that some countries have multiple rows of data. This happens when a country has information about specific states or provinces. While this information might be useful, it makes it a bit tricky to see the total number of cases by country.

We were able to get the total number of cases in Australia by using filter(), but what if we wanted to do this for every country? It would be a pain to have to type every country name into a filter function. Use group_by() to create a new data frame containing one row for every Country/Region.

Every column of those new rows should have the sum of the total number of cases for that country for every day. We can use the summarize_at() function to do this. summarize_at() takes two parameters:

A vector of the columns you want to summarize. In this case we want every column other than Lat, Long, and Province/State. summarize_at() only works with numbers, so we’ll want to remove the column containing country names. We also don’t need to add the latitudes and longitudes. Remember, you can use the - symbol to list columns you want to exclude. Also don’t forget to use backticks around columns with / symbols. For example -`Province/State` will exclude that column. These small details get complicated!
The function that you want to apply to the column. In this case, sum.

After creating this new data frame inspect it. To confirm you did the group_by() correctly, find the row for Australia and confirm the number of cases on March 22nd matches your results from the previous step

# Group by countries
df_confirmed_grouped <- df_confirmed %>%
  group_by(Country.Region) %>%
  summarise_at(vars(-Province.State, -Lat, -Long), sum)

df_confirmed_grouped

## # A tibble: 171 x 62
##    Country.Region X1.22.20 X1.23.20 X1.24.20 X1.25.20 X1.26.20 X1.27.20 X1.28.20
##    <chr>             <int>    <int>    <int>    <int>    <int>    <int>    <int>
##  1 Afghanistan           0        0        0        0        0        0        0
##  2 Albania               0        0        0        0        0        0        0
##  3 Algeria               0        0        0        0        0        0        0
##  4 Andorra               0        0        0        0        0        0        0
##  5 Angola                0        0        0        0        0        0        0
##  6 Antigua and B~        0        0        0        0        0        0        0
##  7 Argentina             0        0        0        0        0        0        0
##  8 Armenia               0        0        0        0        0        0        0
##  9 Australia             0        0        0        0        4        5        5
## 10 Austria               0        0        0        0        0        0        0
## # ... with 161 more rows, and 54 more variables: X1.29.20 <int>,
## #   X1.30.20 <int>, X1.31.20 <int>, X2.1.20 <int>, X2.2.20 <int>,
## #   X2.3.20 <int>, X2.4.20 <int>, X2.5.20 <int>, X2.6.20 <int>, X2.7.20 <int>,
## #   X2.8.20 <int>, X2.9.20 <int>, X2.10.20 <int>, X2.11.20 <int>,
## #   X2.12.20 <int>, X2.13.20 <int>, X2.14.20 <int>, X2.15.20 <int>,
## #   X2.16.20 <int>, X2.17.20 <int>, X2.18.20 <int>, X2.19.20 <int>,
## #   X2.20.20 <int>, X2.21.20 <int>, X2.22.20 <int>, X2.23.20 <int>,
## #   X2.24.20 <int>, X2.25.20 <int>, X2.26.20 <int>, X2.27.20 <int>,
## #   X2.28.20 <int>, X2.29.20 <int>, X3.1.20 <int>, X3.2.20 <int>,
## #   X3.3.20 <int>, X3.4.20 <int>, X3.5.20 <int>, X3.6.20 <int>, X3.7.20 <int>,
## #   X3.8.20 <int>, X3.9.20 <int>, X3.10.20 <int>, X3.11.20 <int>,
## #   X3.12.20 <int>, X3.13.20 <int>, X3.14.20 <int>, X3.15.20 <int>,
## #   X3.16.20 <int>, X3.17.20 <int>, X3.18.20 <int>, X3.19.20 <int>,
## #   X3.20.20 <int>, X3.21.20 <int>, X3.22.20 <int>

# Filter the grouped data set for Australia
df_confirmed_grouped %>%
  filter(Country.Region == 'Australia') %>%
  select(X3.22.20)

## # A tibble: 1 x 1
##   X3.22.20
##      <int>
## 1     1314

Investigating The Recovered Dataset

Do the same process of grouping by country using the recovered dataset. What percentage of the cases in the US have recovered on March 22nd?

# Group by countries
df_recovered_grouped <- df_recovered %>%
  group_by(Country.Region) %>%
  summarise_at(vars(-Province.State, -Lat, -Long), sum)

df_recovered_grouped

## # A tibble: 171 x 62
##    Country.Region X1.22.20 X1.23.20 X1.24.20 X1.25.20 X1.26.20 X1.27.20 X1.28.20
##    <chr>             <int>    <int>    <int>    <int>    <int>    <int>    <int>
##  1 Afghanistan           0        0        0        0        0        0        0
##  2 Albania               0        0        0        0        0        0        0
##  3 Algeria               0        0        0        0        0        0        0
##  4 Andorra               0        0        0        0        0        0        0
##  5 Angola                0        0        0        0        0        0        0
##  6 Antigua and B~        0        0        0        0        0        0        0
##  7 Argentina             0        0        0        0        0        0        0
##  8 Armenia               0        0        0        0        0        0        0
##  9 Australia             0        0        0        0        0        0        0
## 10 Austria               0        0        0        0        0        0        0
## # ... with 161 more rows, and 54 more variables: X1.29.20 <int>,
## #   X1.30.20 <int>, X1.31.20 <int>, X2.1.20 <int>, X2.2.20 <int>,
## #   X2.3.20 <int>, X2.4.20 <int>, X2.5.20 <int>, X2.6.20 <int>, X2.7.20 <int>,
## #   X2.8.20 <int>, X2.9.20 <int>, X2.10.20 <int>, X2.11.20 <int>,
## #   X2.12.20 <int>, X2.13.20 <int>, X2.14.20 <int>, X2.15.20 <int>,
## #   X2.16.20 <int>, X2.17.20 <int>, X2.18.20 <int>, X2.19.20 <int>,
## #   X2.20.20 <int>, X2.21.20 <int>, X2.22.20 <int>, X2.23.20 <int>,
## #   X2.24.20 <int>, X2.25.20 <int>, X2.26.20 <int>, X2.27.20 <int>,
## #   X2.28.20 <int>, X2.29.20 <int>, X3.1.20 <int>, X3.2.20 <int>,
## #   X3.3.20 <int>, X3.4.20 <int>, X3.5.20 <int>, X3.6.20 <int>, X3.7.20 <int>,
## #   X3.8.20 <int>, X3.9.20 <int>, X3.10.20 <int>, X3.11.20 <int>,
## #   X3.12.20 <int>, X3.13.20 <int>, X3.14.20 <int>, X3.15.20 <int>,
## #   X3.16.20 <int>, X3.17.20 <int>, X3.18.20 <int>, X3.19.20 <int>,
## #   X3.20.20 <int>, X3.21.20 <int>, X3.22.20 <int>

confirmed.X3.22.20.us <- df_confirmed_grouped %>%
  filter(Country.Region == 'US') %>%
  select(-Country.Region) %>%
  max()

confirmed.X3.22.20.us

## [1] 33272

recovered.X3.22.20.us <- df_recovered_grouped %>%
  filter(Country.Region == 'US') %>%
  select(-Country.Region) %>%
  max()

recovered.X3.22.20.us

## [1] 17

sprintf('%.2f%% of the confirmed cases from US on March 22nd, 2020 have recovered.', recovered.X3.22.20.us / confirmed.X3.22.20.us * 100)

## [1] "0.05% of the confirmed cases from US on March 22nd, 2020 have recovered."

You may see some surprising results — are there really zero recovered cases in the US? Let’s take a closer look at the US row in the recovered table. Either view the table in RStudio’s interface, or print the row in the cell block below. What do you notice about the values as you scroll to March 22nd?

# Filter to inspect the US row
View(df_recovered_grouped)

df_recovered_grouped %>%
  filter(Country.Region == 'US')

## # A tibble: 1 x 62
##   Country.Region X1.22.20 X1.23.20 X1.24.20 X1.25.20 X1.26.20 X1.27.20 X1.28.20
##   <chr>             <int>    <int>    <int>    <int>    <int>    <int>    <int>
## 1 US                    0        0        0        0        0        0        0
## # ... with 54 more variables: X1.29.20 <int>, X1.30.20 <int>, X1.31.20 <int>,
## #   X2.1.20 <int>, X2.2.20 <int>, X2.3.20 <int>, X2.4.20 <int>, X2.5.20 <int>,
## #   X2.6.20 <int>, X2.7.20 <int>, X2.8.20 <int>, X2.9.20 <int>, X2.10.20 <int>,
## #   X2.11.20 <int>, X2.12.20 <int>, X2.13.20 <int>, X2.14.20 <int>,
## #   X2.15.20 <int>, X2.16.20 <int>, X2.17.20 <int>, X2.18.20 <int>,
## #   X2.19.20 <int>, X2.20.20 <int>, X2.21.20 <int>, X2.22.20 <int>,
## #   X2.23.20 <int>, X2.24.20 <int>, X2.25.20 <int>, X2.26.20 <int>,
## #   X2.27.20 <int>, X2.28.20 <int>, X2.29.20 <int>, X3.1.20 <int>,
## #   X3.2.20 <int>, X3.3.20 <int>, X3.4.20 <int>, X3.5.20 <int>, X3.6.20 <int>,
## #   X3.7.20 <int>, X3.8.20 <int>, X3.9.20 <int>, X3.10.20 <int>,
## #   X3.11.20 <int>, X3.12.20 <int>, X3.13.20 <int>, X3.14.20 <int>,
## #   X3.15.20 <int>, X3.16.20 <int>, X3.17.20 <int>, X3.18.20 <int>,
## #   X3.19.20 <int>, X3.20.20 <int>, X3.21.20 <int>, X3.22.20 <int>

It seems like the number of recovered cases is steadily increasing to 17, until March 18th, when it suddenly drops back to 0. This is surprising, and not what we expected! Perhaps we’re misunderstanding the format of the data, or perhaps there’s an error in reporting the data.

What’s the best course of action at this point? One strategy is to investigate the source of the data to truly understand what it is reporting. When we went back to Johns Hopkins’ repository, we found a note saying that the data had moved into a different file. It might be best to use that file!

If we wanted to use the current file, instead of reporting the number of confirmed and recovered cases in the US on March 22nd, we could report the maximum number of confirmed and recovered cases. Let’s give that a shot using the max() function. Make sure to also use select() to remove the Country/Region column — we can’t take the max of a row if it contains a string.

# Find the maximum number of confirmed and recovered cases
confirmed_max_us <- df_confirmed_grouped %>%
  filter(Country.Region == 'US') %>%
  select(-Country.Region) %>%
  max()

confirmed_max_us

## [1] 33272

recovered_max_us <- df_recovered_grouped %>%
  filter(Country.Region == 'US') %>%
  select(-Country.Region) %>%
  max()

recovered_max_us

## [1] 17

sprintf('US maximum confirmed cases - %d', confirmed_max_us)

## [1] "US maximum confirmed cases - 33272"

sprintf('US maximum recovery cases - %d', recovered_max_us)

## [1] "US maximum recovery cases - 17"

sprintf('Recovery percentage - %.2f%%', recovered_max_us / confirmed_max_us * 100)

## [1] "Recovery percentage - 0.05%"

Transposing Data Frames

The code that you just wrote in the last prompt is a little ugly — we first have to select a particular row by the value in a particular column, remove that column from the row, and then find the maximum value of the remaining values. It would be much easier if we rotated the data frame so rows represented dates and columns represented countries. Try drawing a sketch of what this rotated data frame might look like!

We could then find the maximum value of a country by simply selecting the appropriate column and finding the maximum value in that column. Let’s try that! The t() function will transpose the data frame so all of the columns become rows. t() unfortunately returns a matrix instead of a data frame. If you pipe the result of t() through the as.data.frame() function, your result will be the transposed data frame. Try transposing one of your data frames by piping it through both t() and as.data.frame() and call head() to investigate it.

# Transpose the data frame
df_confirmed_transposed <- df_confirmed_grouped %>%
  t() %>%
  as.data.frame()

head(df_confirmed_transposed)

##                         V1      V2      V3      V4     V5                  V6
## Country.Region Afghanistan Albania Algeria Andorra Angola Antigua and Barbuda
## X1.22.20                 0       0       0       0      0                   0
## X1.23.20                 0       0       0       0      0                   0
## X1.24.20                 0       0       0       0      0                   0
## X1.25.20                 0       0       0       0      0                   0
## X1.26.20                 0       0       0       0      0                   0
##                       V7      V8        V9     V10        V11          V12
## Country.Region Argentina Armenia Australia Austria Azerbaijan Bahamas, The
## X1.22.20               0       0         0       0          0            0
## X1.23.20               0       0         0       0          0            0
## X1.24.20               0       0         0       0          0            0
## X1.25.20               0       0         0       0          0            0
## X1.26.20               0       0         4       0          0            0
##                    V13        V14      V15     V16     V17   V18    V19     V20
## Country.Region Bahrain Bangladesh Barbados Belarus Belgium Benin Bhutan Bolivia
## X1.22.20             0          0        0       0       0     0      0       0
## X1.23.20             0          0        0       0       0     0      0       0
## X1.24.20             0          0        0       0       0     0      0       0
## X1.25.20             0          0        0       0       0     0      0       0
## X1.26.20             0          0        0       0       0     0      0       0
##                                   V21    V22    V23      V24          V25
## Country.Region Bosnia and Herzegovina Brazil Brunei Bulgaria Burkina Faso
## X1.22.20                            0      0      0        0            0
## X1.23.20                            0      0      0        0            0
## X1.24.20                            0      0      0        0            0
## X1.25.20                            0      0      0        0            0
## X1.26.20                            0      0      0        0            0
##                       V26      V27      V28    V29        V30
## Country.Region Cabo Verde Cambodia Cameroon Canada Cape Verde
## X1.22.20                0        0        0      0          0
## X1.23.20                0        0        0      0          0
## X1.24.20                0        0        0      0          0
## X1.25.20                0        0        0      0          0
## X1.26.20                0        0        0      1          0
##                                     V31  V32   V33   V34      V35
## Country.Region Central African Republic Chad Chile China Colombia
## X1.22.20                              0    0     0   548        0
## X1.23.20                              0    0     0   643        0
## X1.24.20                              0    0     0   920        0
## X1.25.20                              0    0     0  1406        0
## X1.26.20                              0    0     0  2075        0
##                                V36              V37        V38           V39
## Country.Region Congo (Brazzaville) Congo (Kinshasa) Costa Rica Cote d'Ivoire
## X1.22.20                         0                0          0             0
## X1.23.20                         0                0          0             0
## X1.24.20                         0                0          0             0
## X1.25.20                         0                0          0             0
## X1.26.20                         0                0          0             0
##                    V40         V41  V42    V43     V44     V45      V46
## Country.Region Croatia Cruise Ship Cuba Cyprus Czechia Denmark Djibouti
## X1.22.20             0           0    0      0       0       0        0
## X1.23.20             0           0    0      0       0       0        0
## X1.24.20             0           0    0      0       0       0        0
## X1.25.20             0           0    0      0       0       0        0
## X1.26.20             0           0    0      0       0       0        0
##                     V47                V48        V49     V50   V51         V52
## Country.Region Dominica Dominican Republic East Timor Ecuador Egypt El Salvador
## X1.22.20              0                  0          0       0     0           0
## X1.23.20              0                  0          0       0     0           0
## X1.24.20              0                  0          0       0     0           0
## X1.25.20              0                  0          0       0     0           0
## X1.26.20              0                  0          0       0     0           0
##                              V53     V54     V55      V56      V57  V58     V59
## Country.Region Equatorial Guinea Eritrea Estonia Eswatini Ethiopia Fiji Finland
## X1.22.20                       0       0       0        0        0    0       0
## X1.23.20                       0       0       0        0        0    0       0
## X1.24.20                       0       0       0        0        0    0       0
## X1.25.20                       0       0       0        0        0    0       0
## X1.26.20                       0       0       0        0        0    0       0
##                   V60   V61         V62     V63     V64   V65    V66     V67
## Country.Region France Gabon Gambia, The Georgia Germany Ghana Greece Grenada
## X1.22.20            0     0           0       0       0     0      0       0
## X1.23.20            0     0           0       0       0     0      0       0
## X1.24.20            2     0           0       0       0     0      0       0
## X1.25.20            3     0           0       0       0     0      0       0
## X1.26.20            3     0           0       0       0     0      0       0
##                      V68    V69    V70   V71      V72      V73     V74     V75
## Country.Region Guatemala Guinea Guyana Haiti Holy See Honduras Hungary Iceland
## X1.22.20               0      0      0     0        0        0       0       0
## X1.23.20               0      0      0     0        0        0       0       0
## X1.24.20               0      0      0     0        0        0       0       0
## X1.25.20               0      0      0     0        0        0       0       0
## X1.26.20               0      0      0     0        0        0       0       0
##                  V76       V77  V78  V79     V80    V81   V82     V83   V84
## Country.Region India Indonesia Iran Iraq Ireland Israel Italy Jamaica Japan
## X1.22.20           0         0    0    0       0      0     0       0     2
## X1.23.20           0         0    0    0       0      0     0       0     1
## X1.24.20           0         0    0    0       0      0     0       0     2
## X1.25.20           0         0    0    0       0      0     0       0     2
## X1.26.20           0         0    0    0       0      0     0       0     4
##                   V85        V86   V87          V88    V89    V90        V91
## Country.Region Jordan Kazakhstan Kenya Korea, South Kosovo Kuwait Kyrgyzstan
## X1.22.20            0          0     0            1      0      0          0
## X1.23.20            0          0     0            1      0      0          0
## X1.24.20            0          0     0            2      0      0          0
## X1.25.20            0          0     0            2      0      0          0
## X1.26.20            0          0     0            3      0      0          0
##                   V92     V93     V94           V95       V96        V97
## Country.Region Latvia Lebanon Liberia Liechtenstein Lithuania Luxembourg
## X1.22.20            0       0       0             0         0          0
## X1.23.20            0       0       0             0         0          0
## X1.24.20            0       0       0             0         0          0
## X1.25.20            0       0       0             0         0          0
## X1.26.20            0       0       0             0         0          0
##                       V98      V99     V100  V101       V102       V103
## Country.Region Madagascar Malaysia Maldives Malta Martinique Mauritania
## X1.22.20                0        0        0     0          0          0
## X1.23.20                0        0        0     0          0          0
## X1.24.20                0        0        0     0          0          0
## X1.25.20                0        3        0     0          0          0
## X1.26.20                0        4        0     0          0          0
##                     V104   V105    V106   V107     V108       V109    V110
## Country.Region Mauritius Mexico Moldova Monaco Mongolia Montenegro Morocco
## X1.22.20               0      0       0      0        0          0       0
## X1.23.20               0      0       0      0        0          0       0
## X1.24.20               0      0       0      0        0          0       0
## X1.25.20               0      0       0      0        0          0       0
## X1.26.20               0      0       0      0        0          0       0
##                      V111    V112  V113        V114        V115      V116  V117
## Country.Region Mozambique Namibia Nepal Netherlands New Zealand Nicaragua Niger
## X1.22.20                0       0     0           0           0         0     0
## X1.23.20                0       0     0           0           0         0     0
## X1.24.20                0       0     0           0           0         0     0
## X1.25.20                0       0     1           0           0         0     0
## X1.26.20                0       0     1           0           0         0     0
##                   V118            V119   V120 V121     V122   V123
## Country.Region Nigeria North Macedonia Norway Oman Pakistan Panama
## X1.22.20             0               0      0    0        0      0
## X1.23.20             0               0      0    0        0      0
## X1.24.20             0               0      0    0        0      0
## X1.25.20             0               0      0    0        0      0
## X1.26.20             0               0      0    0        0      0
##                            V124     V125 V126        V127   V128     V129  V130
## Country.Region Papua New Guinea Paraguay Peru Philippines Poland Portugal Qatar
## X1.22.20                      0        0    0           0      0        0     0
## X1.23.20                      0        0    0           0      0        0     0
## X1.24.20                      0        0    0           0      0        0     0
## X1.25.20                      0        0    0           0      0        0     0
## X1.26.20                      0        0    0           0      0        0     0
##                   V131   V132   V133        V134
## Country.Region Romania Russia Rwanda Saint Lucia
## X1.22.20             0      0      0           0
## X1.23.20             0      0      0           0
## X1.24.20             0      0      0           0
## X1.25.20             0      0      0           0
## X1.26.20             0      0      0           0
##                                            V135       V136         V137    V138
## Country.Region Saint Vincent and the Grenadines San Marino Saudi Arabia Senegal
## X1.22.20                                      0          0            0       0
## X1.23.20                                      0          0            0       0
## X1.24.20                                      0          0            0       0
## X1.25.20                                      0          0            0       0
## X1.26.20                                      0          0            0       0
##                  V139       V140      V141     V142     V143    V144
## Country.Region Serbia Seychelles Singapore Slovakia Slovenia Somalia
## X1.22.20            0          0         0        0        0       0
## X1.23.20            0          0         1        0        0       0
## X1.24.20            0          0         3        0        0       0
## X1.25.20            0          0         3        0        0       0
## X1.26.20            0          0         4        0        0       0
##                        V145  V146      V147  V148     V149   V150        V151
## Country.Region South Africa Spain Sri Lanka Sudan Suriname Sweden Switzerland
## X1.22.20                  0     0         0     0        0      0           0
## X1.23.20                  0     0         0     0        0      0           0
## X1.24.20                  0     0         0     0        0      0           0
## X1.25.20                  0     0         0     0        0      0           0
## X1.26.20                  0     0         0     0        0      0           0
##                 V152    V153     V154     V155        V156 V157
## Country.Region Syria Taiwan* Tanzania Thailand Timor-Leste Togo
## X1.22.20           0       1        0        2           0    0
## X1.23.20           0       1        0        3           0    0
## X1.24.20           0       3        0        5           0    0
## X1.25.20           0       3        0        7           0    0
## X1.26.20           0       4        0        8           0    0
##                               V158    V159   V160   V161    V162
## Country.Region Trinidad and Tobago Tunisia Turkey Uganda Ukraine
## X1.22.20                         0       0      0      0       0
## X1.23.20                         0       0      0      0       0
## X1.24.20                         0       0      0      0       0
## X1.25.20                         0       0      0      0       0
## X1.26.20                         0       0      0      0       0
##                                V163           V164    V165 V166       V167
## Country.Region United Arab Emirates United Kingdom Uruguay   US Uzbekistan
## X1.22.20                          0              0       0    1          0
## X1.23.20                          0              0       0    1          0
## X1.24.20                          0              0       0    2          0
## X1.25.20                          0              0       0    2          0
## X1.26.20                          0              0       0    5          0
##                     V168    V169   V170     V171
## Country.Region Venezuela Vietnam Zambia Zimbabwe
## X1.22.20               0       0      0        0
## X1.23.20               0       2      0        0
## X1.24.20               0       2      0        0
## X1.25.20               0       2      0        0
## X1.26.20               0       2      0        0

Great! We’re almost there. But take a look at the column names. We want the column names to be the countries, but instead, the country names are in the first row. We need to figure out how to change the first row of a data frame into column names. Challenge yourself to try to figure this out on your own in the following code block. We’ll walk you through our solution in the following section.

names(df_confirmed_transposed) <- df_confirmed_transposed[1,]
df_confirmed_transposed <- df_confirmed_transposed[-1,]
head(df_confirmed_transposed)

##          Afghanistan Albania Algeria Andorra Angola Antigua and Barbuda
## X1.22.20           0       0       0       0      0                   0
## X1.23.20           0       0       0       0      0                   0
## X1.24.20           0       0       0       0      0                   0
## X1.25.20           0       0       0       0      0                   0
## X1.26.20           0       0       0       0      0                   0
## X1.27.20           0       0       0       0      0                   0
##          Argentina Armenia Australia Austria Azerbaijan Bahamas, The Bahrain
## X1.22.20         0       0         0       0          0            0       0
## X1.23.20         0       0         0       0          0            0       0
## X1.24.20         0       0         0       0          0            0       0
## X1.25.20         0       0         0       0          0            0       0
## X1.26.20         0       0         4       0          0            0       0
## X1.27.20         0       0         5       0          0            0       0
##          Bangladesh Barbados Belarus Belgium Benin Bhutan Bolivia
## X1.22.20          0        0       0       0     0      0       0
## X1.23.20          0        0       0       0     0      0       0
## X1.24.20          0        0       0       0     0      0       0
## X1.25.20          0        0       0       0     0      0       0
## X1.26.20          0        0       0       0     0      0       0
## X1.27.20          0        0       0       0     0      0       0
##          Bosnia and Herzegovina Brazil Brunei Bulgaria Burkina Faso Cabo Verde
## X1.22.20                      0      0      0        0            0          0
## X1.23.20                      0      0      0        0            0          0
## X1.24.20                      0      0      0        0            0          0
## X1.25.20                      0      0      0        0            0          0
## X1.26.20                      0      0      0        0            0          0
## X1.27.20                      0      0      0        0            0          0
##          Cambodia Cameroon Canada Cape Verde Central African Republic Chad
## X1.22.20        0        0      0          0                        0    0
## X1.23.20        0        0      0          0                        0    0
## X1.24.20        0        0      0          0                        0    0
## X1.25.20        0        0      0          0                        0    0
## X1.26.20        0        0      1          0                        0    0
## X1.27.20        1        0      1          0                        0    0
##          Chile China Colombia Congo (Brazzaville) Congo (Kinshasa) Costa Rica
## X1.22.20     0   548        0                   0                0          0
## X1.23.20     0   643        0                   0                0          0
## X1.24.20     0   920        0                   0                0          0
## X1.25.20     0  1406        0                   0                0          0
## X1.26.20     0  2075        0                   0                0          0
## X1.27.20     0  2877        0                   0                0          0
##          Cote d'Ivoire Croatia Cruise Ship Cuba Cyprus Czechia Denmark Djibouti
## X1.22.20             0       0           0    0      0       0       0        0
## X1.23.20             0       0           0    0      0       0       0        0
## X1.24.20             0       0           0    0      0       0       0        0
## X1.25.20             0       0           0    0      0       0       0        0
## X1.26.20             0       0           0    0      0       0       0        0
## X1.27.20             0       0           0    0      0       0       0        0
##          Dominica Dominican Republic East Timor Ecuador Egypt El Salvador
## X1.22.20        0                  0          0       0     0           0
## X1.23.20        0                  0          0       0     0           0
## X1.24.20        0                  0          0       0     0           0
## X1.25.20        0                  0          0       0     0           0
## X1.26.20        0                  0          0       0     0           0
## X1.27.20        0                  0          0       0     0           0
##          Equatorial Guinea Eritrea Estonia Eswatini Ethiopia Fiji Finland
## X1.22.20                 0       0       0        0        0    0       0
## X1.23.20                 0       0       0        0        0    0       0
## X1.24.20                 0       0       0        0        0    0       0
## X1.25.20                 0       0       0        0        0    0       0
## X1.26.20                 0       0       0        0        0    0       0
## X1.27.20                 0       0       0        0        0    0       0
##          France Gabon Gambia, The Georgia Germany Ghana Greece Grenada
## X1.22.20      0     0           0       0       0     0      0       0
## X1.23.20      0     0           0       0       0     0      0       0
## X1.24.20      2     0           0       0       0     0      0       0
## X1.25.20      3     0           0       0       0     0      0       0
## X1.26.20      3     0           0       0       0     0      0       0
## X1.27.20      3     0           0       0       1     0      0       0
##          Guatemala Guinea Guyana Haiti Holy See Honduras Hungary Iceland India
## X1.22.20         0      0      0     0        0        0       0       0     0
## X1.23.20         0      0      0     0        0        0       0       0     0
## X1.24.20         0      0      0     0        0        0       0       0     0
## X1.25.20         0      0      0     0        0        0       0       0     0
## X1.26.20         0      0      0     0        0        0       0       0     0
## X1.27.20         0      0      0     0        0        0       0       0     0
##          Indonesia Iran Iraq Ireland Israel Italy Jamaica Japan Jordan
## X1.22.20         0    0    0       0      0     0       0     2      0
## X1.23.20         0    0    0       0      0     0       0     1      0
## X1.24.20         0    0    0       0      0     0       0     2      0
## X1.25.20         0    0    0       0      0     0       0     2      0
## X1.26.20         0    0    0       0      0     0       0     4      0
## X1.27.20         0    0    0       0      0     0       0     4      0
##          Kazakhstan Kenya Korea, South Kosovo Kuwait Kyrgyzstan Latvia Lebanon
## X1.22.20          0     0            1      0      0          0      0       0
## X1.23.20          0     0            1      0      0          0      0       0
## X1.24.20          0     0            2      0      0          0      0       0
## X1.25.20          0     0            2      0      0          0      0       0
## X1.26.20          0     0            3      0      0          0      0       0
## X1.27.20          0     0            4      0      0          0      0       0
##          Liberia Liechtenstein Lithuania Luxembourg Madagascar Malaysia
## X1.22.20       0             0         0          0          0        0
## X1.23.20       0             0         0          0          0        0
## X1.24.20       0             0         0          0          0        0
## X1.25.20       0             0         0          0          0        3
## X1.26.20       0             0         0          0          0        4
## X1.27.20       0             0         0          0          0        4
##          Maldives Malta Martinique Mauritania Mauritius Mexico Moldova Monaco
## X1.22.20        0     0          0          0         0      0       0      0
## X1.23.20        0     0          0          0         0      0       0      0
## X1.24.20        0     0          0          0         0      0       0      0
## X1.25.20        0     0          0          0         0      0       0      0
## X1.26.20        0     0          0          0         0      0       0      0
## X1.27.20        0     0          0          0         0      0       0      0
##          Mongolia Montenegro Morocco Mozambique Namibia Nepal Netherlands
## X1.22.20        0          0       0          0       0     0           0
## X1.23.20        0          0       0          0       0     0           0
## X1.24.20        0          0       0          0       0     0           0
## X1.25.20        0          0       0          0       0     1           0
## X1.26.20        0          0       0          0       0     1           0
## X1.27.20        0          0       0          0       0     1           0
##          New Zealand Nicaragua Niger Nigeria North Macedonia Norway Oman
## X1.22.20           0         0     0       0               0      0    0
## X1.23.20           0         0     0       0               0      0    0
## X1.24.20           0         0     0       0               0      0    0
## X1.25.20           0         0     0       0               0      0    0
## X1.26.20           0         0     0       0               0      0    0
## X1.27.20           0         0     0       0               0      0    0
##          Pakistan Panama Papua New Guinea Paraguay Peru Philippines Poland
## X1.22.20        0      0                0        0    0           0      0
## X1.23.20        0      0                0        0    0           0      0
## X1.24.20        0      0                0        0    0           0      0
## X1.25.20        0      0                0        0    0           0      0
## X1.26.20        0      0                0        0    0           0      0
## X1.27.20        0      0                0        0    0           0      0
##          Portugal Qatar Romania Russia Rwanda Saint Lucia
## X1.22.20        0     0       0      0      0           0
## X1.23.20        0     0       0      0      0           0
## X1.24.20        0     0       0      0      0           0
## X1.25.20        0     0       0      0      0           0
## X1.26.20        0     0       0      0      0           0
## X1.27.20        0     0       0      0      0           0
##          Saint Vincent and the Grenadines San Marino Saudi Arabia Senegal
## X1.22.20                                0          0            0       0
## X1.23.20                                0          0            0       0
## X1.24.20                                0          0            0       0
## X1.25.20                                0          0            0       0
## X1.26.20                                0          0            0       0
## X1.27.20                                0          0            0       0
##          Serbia Seychelles Singapore Slovakia Slovenia Somalia South Africa
## X1.22.20      0          0         0        0        0       0            0
## X1.23.20      0          0         1        0        0       0            0
## X1.24.20      0          0         3        0        0       0            0
## X1.25.20      0          0         3        0        0       0            0
## X1.26.20      0          0         4        0        0       0            0
## X1.27.20      0          0         5        0        0       0            0
##          Spain Sri Lanka Sudan Suriname Sweden Switzerland Syria Taiwan*
## X1.22.20     0         0     0        0      0           0     0       1
## X1.23.20     0         0     0        0      0           0     0       1
## X1.24.20     0         0     0        0      0           0     0       3
## X1.25.20     0         0     0        0      0           0     0       3
## X1.26.20     0         0     0        0      0           0     0       4
## X1.27.20     0         1     0        0      0           0     0       5
##          Tanzania Thailand Timor-Leste Togo Trinidad and Tobago Tunisia Turkey
## X1.22.20        0        2           0    0                   0       0      0
## X1.23.20        0        3           0    0                   0       0      0
## X1.24.20        0        5           0    0                   0       0      0
## X1.25.20        0        7           0    0                   0       0      0
## X1.26.20        0        8           0    0                   0       0      0
## X1.27.20        0        8           0    0                   0       0      0
##          Uganda Ukraine United Arab Emirates United Kingdom Uruguay   US
## X1.22.20      0       0                    0              0       0    1
## X1.23.20      0       0                    0              0       0    1
## X1.24.20      0       0                    0              0       0    2
## X1.25.20      0       0                    0              0       0    2
## X1.26.20      0       0                    0              0       0    5
## X1.27.20      0       0                    0              0       0    5
##          Uzbekistan Venezuela Vietnam Zambia Zimbabwe
## X1.22.20          0         0       0      0        0
## X1.23.20          0         0       2      0        0
## X1.24.20          0         0       2      0        0
## X1.25.20          0         0       2      0        0
## X1.26.20          0         0       2      0        0
## X1.27.20          0         0       2      0        0

As usual, we used Google to help solve this problem. Learning how to properly search for answers to your programming questions is an essential skill. In this case, we specifically searched “r data frame set first row as column names”, which led us to a Stack Overflow page that suggested using the janitor library. That’s a cute name for a library — this janitor library will help us clean up our data frame.

First, we installed and loaded the janitor package. Then we passed our transposed data frame into the row_to_names() function and used the parameter row_number = 1.

if (!require('janitor')) {
  install.packages('janitor')
}

## Loading required package: janitor

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(janitor)

# Make the first row the column titles
df_confirmed_transposed <- df_confirmed_grouped %>%
  t() %>%
  as.data.frame()

df_confirmed_transposed <- df_confirmed_transposed %>%
  row_to_names(row_number = 1)

head(df_confirmed_transposed)

##          Afghanistan Albania Algeria Andorra Angola Antigua and Barbuda
## X1.22.20           0       0       0       0      0                   0
## X1.23.20           0       0       0       0      0                   0
## X1.24.20           0       0       0       0      0                   0
## X1.25.20           0       0       0       0      0                   0
## X1.26.20           0       0       0       0      0                   0
## X1.27.20           0       0       0       0      0                   0
##          Argentina Armenia Australia Austria Azerbaijan Bahamas, The Bahrain
## X1.22.20         0       0         0       0          0            0       0
## X1.23.20         0       0         0       0          0            0       0
## X1.24.20         0       0         0       0          0            0       0
## X1.25.20         0       0         0       0          0            0       0
## X1.26.20         0       0         4       0          0            0       0
## X1.27.20         0       0         5       0          0            0       0
##          Bangladesh Barbados Belarus Belgium Benin Bhutan Bolivia
## X1.22.20          0        0       0       0     0      0       0
## X1.23.20          0        0       0       0     0      0       0
## X1.24.20          0        0       0       0     0      0       0
## X1.25.20          0        0       0       0     0      0       0
## X1.26.20          0        0       0       0     0      0       0
## X1.27.20          0        0       0       0     0      0       0
##          Bosnia and Herzegovina Brazil Brunei Bulgaria Burkina Faso Cabo Verde
## X1.22.20                      0      0      0        0            0          0
## X1.23.20                      0      0      0        0            0          0
## X1.24.20                      0      0      0        0            0          0
## X1.25.20                      0      0      0        0            0          0
## X1.26.20                      0      0      0        0            0          0
## X1.27.20                      0      0      0        0            0          0
##          Cambodia Cameroon Canada Cape Verde Central African Republic Chad
## X1.22.20        0        0      0          0                        0    0
## X1.23.20        0        0      0          0                        0    0
## X1.24.20        0        0      0          0                        0    0
## X1.25.20        0        0      0          0                        0    0
## X1.26.20        0        0      1          0                        0    0
## X1.27.20        1        0      1          0                        0    0
##          Chile China Colombia Congo (Brazzaville) Congo (Kinshasa) Costa Rica
## X1.22.20     0   548        0                   0                0          0
## X1.23.20     0   643        0                   0                0          0
## X1.24.20     0   920        0                   0                0          0
## X1.25.20     0  1406        0                   0                0          0
## X1.26.20     0  2075        0                   0                0          0
## X1.27.20     0  2877        0                   0                0          0
##          Cote d'Ivoire Croatia Cruise Ship Cuba Cyprus Czechia Denmark Djibouti
## X1.22.20             0       0           0    0      0       0       0        0
## X1.23.20             0       0           0    0      0       0       0        0
## X1.24.20             0       0           0    0      0       0       0        0
## X1.25.20             0       0           0    0      0       0       0        0
## X1.26.20             0       0           0    0      0       0       0        0
## X1.27.20             0       0           0    0      0       0       0        0
##          Dominica Dominican Republic East Timor Ecuador Egypt El Salvador
## X1.22.20        0                  0          0       0     0           0
## X1.23.20        0                  0          0       0     0           0
## X1.24.20        0                  0          0       0     0           0
## X1.25.20        0                  0          0       0     0           0
## X1.26.20        0                  0          0       0     0           0
## X1.27.20        0                  0          0       0     0           0
##          Equatorial Guinea Eritrea Estonia Eswatini Ethiopia Fiji Finland
## X1.22.20                 0       0       0        0        0    0       0
## X1.23.20                 0       0       0        0        0    0       0
## X1.24.20                 0       0       0        0        0    0       0
## X1.25.20                 0       0       0        0        0    0       0
## X1.26.20                 0       0       0        0        0    0       0
## X1.27.20                 0       0       0        0        0    0       0
##          France Gabon Gambia, The Georgia Germany Ghana Greece Grenada
## X1.22.20      0     0           0       0       0     0      0       0
## X1.23.20      0     0           0       0       0     0      0       0
## X1.24.20      2     0           0       0       0     0      0       0
## X1.25.20      3     0           0       0       0     0      0       0
## X1.26.20      3     0           0       0       0     0      0       0
## X1.27.20      3     0           0       0       1     0      0       0
##          Guatemala Guinea Guyana Haiti Holy See Honduras Hungary Iceland India
## X1.22.20         0      0      0     0        0        0       0       0     0
## X1.23.20         0      0      0     0        0        0       0       0     0
## X1.24.20         0      0      0     0        0        0       0       0     0
## X1.25.20         0      0      0     0        0        0       0       0     0
## X1.26.20         0      0      0     0        0        0       0       0     0
## X1.27.20         0      0      0     0        0        0       0       0     0
##          Indonesia Iran Iraq Ireland Israel Italy Jamaica Japan Jordan
## X1.22.20         0    0    0       0      0     0       0     2      0
## X1.23.20         0    0    0       0      0     0       0     1      0
## X1.24.20         0    0    0       0      0     0       0     2      0
## X1.25.20         0    0    0       0      0     0       0     2      0
## X1.26.20         0    0    0       0      0     0       0     4      0
## X1.27.20         0    0    0       0      0     0       0     4      0
##          Kazakhstan Kenya Korea, South Kosovo Kuwait Kyrgyzstan Latvia Lebanon
## X1.22.20          0     0            1      0      0          0      0       0
## X1.23.20          0     0            1      0      0          0      0       0
## X1.24.20          0     0            2      0      0          0      0       0
## X1.25.20          0     0            2      0      0          0      0       0
## X1.26.20          0     0            3      0      0          0      0       0
## X1.27.20          0     0            4      0      0          0      0       0
##          Liberia Liechtenstein Lithuania Luxembourg Madagascar Malaysia
## X1.22.20       0             0         0          0          0        0
## X1.23.20       0             0         0          0          0        0
## X1.24.20       0             0         0          0          0        0
## X1.25.20       0             0         0          0          0        3
## X1.26.20       0             0         0          0          0        4
## X1.27.20       0             0         0          0          0        4
##          Maldives Malta Martinique Mauritania Mauritius Mexico Moldova Monaco
## X1.22.20        0     0          0          0         0      0       0      0
## X1.23.20        0     0          0          0         0      0       0      0
## X1.24.20        0     0          0          0         0      0       0      0
## X1.25.20        0     0          0          0         0      0       0      0
## X1.26.20        0     0          0          0         0      0       0      0
## X1.27.20        0     0          0          0         0      0       0      0
##          Mongolia Montenegro Morocco Mozambique Namibia Nepal Netherlands
## X1.22.20        0          0       0          0       0     0           0
## X1.23.20        0          0       0          0       0     0           0
## X1.24.20        0          0       0          0       0     0           0
## X1.25.20        0          0       0          0       0     1           0
## X1.26.20        0          0       0          0       0     1           0
## X1.27.20        0          0       0          0       0     1           0
##          New Zealand Nicaragua Niger Nigeria North Macedonia Norway Oman
## X1.22.20           0         0     0       0               0      0    0
## X1.23.20           0         0     0       0               0      0    0
## X1.24.20           0         0     0       0               0      0    0
## X1.25.20           0         0     0       0               0      0    0
## X1.26.20           0         0     0       0               0      0    0
## X1.27.20           0         0     0       0               0      0    0
##          Pakistan Panama Papua New Guinea Paraguay Peru Philippines Poland
## X1.22.20        0      0                0        0    0           0      0
## X1.23.20        0      0                0        0    0           0      0
## X1.24.20        0      0                0        0    0           0      0
## X1.25.20        0      0                0        0    0           0      0
## X1.26.20        0      0                0        0    0           0      0
## X1.27.20        0      0                0        0    0           0      0
##          Portugal Qatar Romania Russia Rwanda Saint Lucia
## X1.22.20        0     0       0      0      0           0
## X1.23.20        0     0       0      0      0           0
## X1.24.20        0     0       0      0      0           0
## X1.25.20        0     0       0      0      0           0
## X1.26.20        0     0       0      0      0           0
## X1.27.20        0     0       0      0      0           0
##          Saint Vincent and the Grenadines San Marino Saudi Arabia Senegal
## X1.22.20                                0          0            0       0
## X1.23.20                                0          0            0       0
## X1.24.20                                0          0            0       0
## X1.25.20                                0          0            0       0
## X1.26.20                                0          0            0       0
## X1.27.20                                0          0            0       0
##          Serbia Seychelles Singapore Slovakia Slovenia Somalia South Africa
## X1.22.20      0          0         0        0        0       0            0
## X1.23.20      0          0         1        0        0       0            0
## X1.24.20      0          0         3        0        0       0            0
## X1.25.20      0          0         3        0        0       0            0
## X1.26.20      0          0         4        0        0       0            0
## X1.27.20      0          0         5        0        0       0            0
##          Spain Sri Lanka Sudan Suriname Sweden Switzerland Syria Taiwan*
## X1.22.20     0         0     0        0      0           0     0       1
## X1.23.20     0         0     0        0      0           0     0       1
## X1.24.20     0         0     0        0      0           0     0       3
## X1.25.20     0         0     0        0      0           0     0       3
## X1.26.20     0         0     0        0      0           0     0       4
## X1.27.20     0         1     0        0      0           0     0       5
##          Tanzania Thailand Timor-Leste Togo Trinidad and Tobago Tunisia Turkey
## X1.22.20        0        2           0    0                   0       0      0
## X1.23.20        0        3           0    0                   0       0      0
## X1.24.20        0        5           0    0                   0       0      0
## X1.25.20        0        7           0    0                   0       0      0
## X1.26.20        0        8           0    0                   0       0      0
## X1.27.20        0        8           0    0                   0       0      0
##          Uganda Ukraine United Arab Emirates United Kingdom Uruguay   US
## X1.22.20      0       0                    0              0       0    1
## X1.23.20      0       0                    0              0       0    1
## X1.24.20      0       0                    0              0       0    2
## X1.25.20      0       0                    0              0       0    2
## X1.26.20      0       0                    0              0       0    5
## X1.27.20      0       0                    0              0       0    5
##          Uzbekistan Venezuela Vietnam Zambia Zimbabwe
## X1.22.20          0         0       0      0        0
## X1.23.20          0         0       2      0        0
## X1.24.20          0         0       2      0        0
## X1.25.20          0         0       2      0        0
## X1.26.20          0         0       2      0        0
## X1.27.20          0         0       2      0        0

We’re getting closer. If you printed the head of the data frame you just created, you might have noticed that the columns are now of type <fctr>, or factor. This was one of the side effects of rotating the data frame. We want to turn all of these columns back into doubles.

To do so, pass your data frame to the apply() function. apply() takes two parameters in addition to the data set that you’re using. The MARGIN parameter should be set to 2 — this tells apply() to work with columns rather than rows.

The second parameter should be the function you want to apply to all columns. In this case, we want as.numeric.

Finally, after calling apply(), pipe the result through as.data.frame() to ensure we’re still working with a data frame. Print out the head of your result to see what the data now looks like. Look at the data type of the columns now!

# Transform the columns to numeric values
df_confirmed_transposed <- df_confirmed_transposed %>%
  apply(MARGIN = 2, as.numeric) %>%
  as.data.frame()

head(df_confirmed_transposed)

##   Afghanistan Albania Algeria Andorra Angola Antigua and Barbuda Argentina
## 1           0       0       0       0      0                   0         0
## 2           0       0       0       0      0                   0         0
## 3           0       0       0       0      0                   0         0
## 4           0       0       0       0      0                   0         0
## 5           0       0       0       0      0                   0         0
## 6           0       0       0       0      0                   0         0
##   Armenia Australia Austria Azerbaijan Bahamas, The Bahrain Bangladesh Barbados
## 1       0         0       0          0            0       0          0        0
## 2       0         0       0          0            0       0          0        0
## 3       0         0       0          0            0       0          0        0
## 4       0         0       0          0            0       0          0        0
## 5       0         4       0          0            0       0          0        0
## 6       0         5       0          0            0       0          0        0
##   Belarus Belgium Benin Bhutan Bolivia Bosnia and Herzegovina Brazil Brunei
## 1       0       0     0      0       0                      0      0      0
## 2       0       0     0      0       0                      0      0      0
## 3       0       0     0      0       0                      0      0      0
## 4       0       0     0      0       0                      0      0      0
## 5       0       0     0      0       0                      0      0      0
## 6       0       0     0      0       0                      0      0      0
##   Bulgaria Burkina Faso Cabo Verde Cambodia Cameroon Canada Cape Verde
## 1        0            0          0        0        0      0          0
## 2        0            0          0        0        0      0          0
## 3        0            0          0        0        0      0          0
## 4        0            0          0        0        0      0          0
## 5        0            0          0        0        0      1          0
## 6        0            0          0        1        0      1          0
##   Central African Republic Chad Chile China Colombia Congo (Brazzaville)
## 1                        0    0     0   548        0                   0
## 2                        0    0     0   643        0                   0
## 3                        0    0     0   920        0                   0
## 4                        0    0     0  1406        0                   0
## 5                        0    0     0  2075        0                   0
## 6                        0    0     0  2877        0                   0
##   Congo (Kinshasa) Costa Rica Cote d'Ivoire Croatia Cruise Ship Cuba Cyprus
## 1                0          0             0       0           0    0      0
## 2                0          0             0       0           0    0      0
## 3                0          0             0       0           0    0      0
## 4                0          0             0       0           0    0      0
## 5                0          0             0       0           0    0      0
## 6                0          0             0       0           0    0      0
##   Czechia Denmark Djibouti Dominica Dominican Republic East Timor Ecuador Egypt
## 1       0       0        0        0                  0          0       0     0
## 2       0       0        0        0                  0          0       0     0
## 3       0       0        0        0                  0          0       0     0
## 4       0       0        0        0                  0          0       0     0
## 5       0       0        0        0                  0          0       0     0
## 6       0       0        0        0                  0          0       0     0
##   El Salvador Equatorial Guinea Eritrea Estonia Eswatini Ethiopia Fiji Finland
## 1           0                 0       0       0        0        0    0       0
## 2           0                 0       0       0        0        0    0       0
## 3           0                 0       0       0        0        0    0       0
## 4           0                 0       0       0        0        0    0       0
## 5           0                 0       0       0        0        0    0       0
## 6           0                 0       0       0        0        0    0       0
##   France Gabon Gambia, The Georgia Germany Ghana Greece Grenada Guatemala
## 1      0     0           0       0       0     0      0       0         0
## 2      0     0           0       0       0     0      0       0         0
## 3      2     0           0       0       0     0      0       0         0
## 4      3     0           0       0       0     0      0       0         0
## 5      3     0           0       0       0     0      0       0         0
## 6      3     0           0       0       1     0      0       0         0
##   Guinea Guyana Haiti Holy See Honduras Hungary Iceland India Indonesia Iran
## 1      0      0     0        0        0       0       0     0         0    0
## 2      0      0     0        0        0       0       0     0         0    0
## 3      0      0     0        0        0       0       0     0         0    0
## 4      0      0     0        0        0       0       0     0         0    0
## 5      0      0     0        0        0       0       0     0         0    0
## 6      0      0     0        0        0       0       0     0         0    0
##   Iraq Ireland Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Korea, South
## 1    0       0      0     0       0     2      0          0     0            1
## 2    0       0      0     0       0     1      0          0     0            1
## 3    0       0      0     0       0     2      0          0     0            2
## 4    0       0      0     0       0     2      0          0     0            2
## 5    0       0      0     0       0     4      0          0     0            3
## 6    0       0      0     0       0     4      0          0     0            4
##   Kosovo Kuwait Kyrgyzstan Latvia Lebanon Liberia Liechtenstein Lithuania
## 1      0      0          0      0       0       0             0         0
## 2      0      0          0      0       0       0             0         0
## 3      0      0          0      0       0       0             0         0
## 4      0      0          0      0       0       0             0         0
## 5      0      0          0      0       0       0             0         0
## 6      0      0          0      0       0       0             0         0
##   Luxembourg Madagascar Malaysia Maldives Malta Martinique Mauritania Mauritius
## 1          0          0        0        0     0          0          0         0
## 2          0          0        0        0     0          0          0         0
## 3          0          0        0        0     0          0          0         0
## 4          0          0        3        0     0          0          0         0
## 5          0          0        4        0     0          0          0         0
## 6          0          0        4        0     0          0          0         0
##   Mexico Moldova Monaco Mongolia Montenegro Morocco Mozambique Namibia Nepal
## 1      0       0      0        0          0       0          0       0     0
## 2      0       0      0        0          0       0          0       0     0
## 3      0       0      0        0          0       0          0       0     0
## 4      0       0      0        0          0       0          0       0     1
## 5      0       0      0        0          0       0          0       0     1
## 6      0       0      0        0          0       0          0       0     1
##   Netherlands New Zealand Nicaragua Niger Nigeria North Macedonia Norway Oman
## 1           0           0         0     0       0               0      0    0
## 2           0           0         0     0       0               0      0    0
## 3           0           0         0     0       0               0      0    0
## 4           0           0         0     0       0               0      0    0
## 5           0           0         0     0       0               0      0    0
## 6           0           0         0     0       0               0      0    0
##   Pakistan Panama Papua New Guinea Paraguay Peru Philippines Poland Portugal
## 1        0      0                0        0    0           0      0        0
## 2        0      0                0        0    0           0      0        0
## 3        0      0                0        0    0           0      0        0
## 4        0      0                0        0    0           0      0        0
## 5        0      0                0        0    0           0      0        0
## 6        0      0                0        0    0           0      0        0
##   Qatar Romania Russia Rwanda Saint Lucia Saint Vincent and the Grenadines
## 1     0       0      0      0           0                                0
## 2     0       0      0      0           0                                0
## 3     0       0      0      0           0                                0
## 4     0       0      0      0           0                                0
## 5     0       0      0      0           0                                0
## 6     0       0      0      0           0                                0
##   San Marino Saudi Arabia Senegal Serbia Seychelles Singapore Slovakia Slovenia
## 1          0            0       0      0          0         0        0        0
## 2          0            0       0      0          0         1        0        0
## 3          0            0       0      0          0         3        0        0
## 4          0            0       0      0          0         3        0        0
## 5          0            0       0      0          0         4        0        0
## 6          0            0       0      0          0         5        0        0
##   Somalia South Africa Spain Sri Lanka Sudan Suriname Sweden Switzerland Syria
## 1       0            0     0         0     0        0      0           0     0
## 2       0            0     0         0     0        0      0           0     0
## 3       0            0     0         0     0        0      0           0     0
## 4       0            0     0         0     0        0      0           0     0
## 5       0            0     0         0     0        0      0           0     0
## 6       0            0     0         1     0        0      0           0     0
##   Taiwan* Tanzania Thailand Timor-Leste Togo Trinidad and Tobago Tunisia Turkey
## 1       1        0        2           0    0                   0       0      0
## 2       1        0        3           0    0                   0       0      0
## 3       3        0        5           0    0                   0       0      0
## 4       3        0        7           0    0                   0       0      0
## 5       4        0        8           0    0                   0       0      0
## 6       5        0        8           0    0                   0       0      0
##   Uganda Ukraine United Arab Emirates United Kingdom Uruguay US Uzbekistan
## 1      0       0                    0              0       0  1          0
## 2      0       0                    0              0       0  1          0
## 3      0       0                    0              0       0  2          0
## 4      0       0                    0              0       0  2          0
## 5      0       0                    0              0       0  5          0
## 6      0       0                    0              0       0  5          0
##   Venezuela Vietnam Zambia Zimbabwe
## 1         0       0      0        0
## 2         0       2      0        0
## 3         0       2      0        0
## 4         0       2      0        0
## 5         0       2      0        0
## 6         0       2      0        0

Nice work - we got there! Let’s see what our hard work allows us to do. Let’s once again find the maximum number of cases reported in the US. To do so, simply select the US column and pipe the result to the max() function. This should be the same value that we found before we rotate the data frame.

# Find the maximum number of confirmed cases in the US
confirmed_max_us <- df_confirmed_transposed$US %>%
  max()

confirmed_max_us

## [1] 33272

Extra Credit: Visualization

We put a lot of work into rotating that data frame, and you may be wondering if it was worth it. In this section, we’re going to make some basic graphs that demonstrate the value of rotating the data frame.

You may not have been introduced to visualization yet. If this is unfamiliar to you, feel free to skip this section — in the following section we offer some ideas for more data manipulation challenges using this data set. That being said, if visualization is unfamiliar to you, you can use this as an opportunity to do some independent learning! Challenge yourself to do some research to create the following graph.

Let’s build a line graph showing the number of confirmed cases over time for a particular country. To do this, we’ll first need to add a new column to our dataset to represent the date (notice that when we rotated our data frame, we lost the column containing the date). The first day in our dataset was January 22nd. Let’s represent that as day 1. January 23rd would then be day 2, and so on. We want to add a column named date that contains the numbers 1 through the total number of rows in the dataset. Use mutate() to do this. nrow() will help find the number of rows in the dataset. Print out the head of the date column to ensure this worked.

# Add the date column
df_confirmed_transposed <- df_confirmed_transposed %>%
  mutate(date = as.Date('2020-01-22') + 0:(nrow(df_confirmed_transposed)-1))

df_confirmed_transposed %>%
  select(date) %>%
  head()

##         date
## 1 2020-01-22
## 2 2020-01-23
## 3 2020-01-24
## 4 2020-01-25
## 5 2020-01-26
## 6 2020-01-27

Great! Our date column is all set. Let’s now graph the number of cases in Malaysia over the days in our dataset. We’ll use the ggplot2 library to do this. Use a combination of ggplot(), aes(), and geom_line() to make our graph! Once again, if you’re unfamiliar with graphing, try to do some online research to learn how to do this!

library(ggplot2)

# Create a line graph with date on the X axis and number of cases in Malaysia on the Y axis
df_confirmed_transposed %>%
  ggplot(aes(x = date, y = Malaysia)) + 
  geom_line()

That line of code is pretty concise. That’s one of the reasons why we wanted to rotate the data frame. Having a column containing only the confirmed cases from a particular country made this graph relatively simple to create.

Finally, let’s do a bit of work to add a title, and to fix the names of the axes. There are a few different ways to do this, but we used the labs() function which takes x, y and title parameters.

# Add a proper title, x label, and y label
df_confirmed_transposed %>%
  ggplot(aes(x = date, y = Malaysia)) + 
  geom_line() +
  labs(x = 'Date', y = 'Confirmed Cases', title = 'Confirmed Cases in Malaysia')

In the next section we’ll give you ideas on ways to continue visualizing this data.

Further Work

Great work! In this project you have truly refined your data manipulation skills using a real data set. In doing so, we hope that you’ve seen some of the difficulties you need to work through when working with data. Your data set might have annoying column names. Data may be missing or incorrect. You might need to reshape the data or change its type. Gaining the confidence to master these challenges unique to your dataset is an important part of being a data scientist.

If you’d like to continue working with this data, here are some challenges:

Find the rate of change. Instead of reporting the total number of confirmed cases in a country, report how many new cases there were that day. Which countries are slowing their rate of infection?
Graph multiple countries on a single graph. Let’s visually compare the difference between countries.
Create a side by side bar plot that shows the number of confirmed cases, recoveries, and deaths every day for a particular country.
Find more data. Now that you’ve gotten comfortable with this dataset, try finding another that can expand your understanding of this pandemic. The New York Times GitHub page has a dataset containing information about specific counties in the US. Trying to format this dataset to find more useful information would be a fantastic exercise!

Investigating the Coronavirus Pandemic

Ricky Soo

November 27th, 2020