VISUALIZATION OF PERU COVID-19 TIME SERIES

Keywords: Visualization, R coding, dygraph, plotly, manipulateWidget, Time Series, Covid-19, Peru.

Introduction

Hello! I’ve been learning R programming with RStudio and this is my first visualization project to complete, where I show what I’ve been learning regarding data visualization. I have studied communication, with experience in film production, and have been drawn to the field of analytics through the area of visualization. I am amazed by how much information can be contained in a single graph. Visualization is like a good movie, show the story and at the same time generates conversations and analysis.

This project is about the evolution of Covid-19 in Peru from March 6, 2020 to March 31, 2021. Peru, located in South America, has a population of 33 million, and three distinct geographical regions: desertic west coast, central mountain high Andean region, and eastern tropical Amazon range. For more Peru information click here.

Data, presented at national level, correspond to: Cumulative number of cases across daily reports for; a) confirmed cases, b) recovered cases, and c) number of deaths. Since March 15, 2020 Peru has under a series of lockdowns that were relaxed as the number of cases slowed down; or tightened, as the number of cases started to increase in response to the advancement of Covid-19 due to a combination of environmental conditions, health care measures and human behavior response to continuous restrictions that affected mainly, social and economic interactions. An important driver of the early increase of people affected by Covid-19 is the need of a large segment of Peru’s population to keep working through the pandemia in very low pay jobs, that made impossible for them to follow government’s ‘stay in place’ regulations.

My visualization objectives are to graph each cumulative series in a single plot, graph each series individual daily cases in a single interactive plot each, and get all these four graphs in a single plot that keeps the interactivity of the individual cases plots. Needlessly to say, I have learned a lot completing these tasks, and enjoyed each step of my learning curve.

Data

Datasets were downloaded from the github repository for CSSE-John Hopkins University. Each dataset contains information at national level, 274 rows for countries in the confirmed and deaths data set; and 259 rows in the recovered dataset, with data for each day corresponding to a column, for a total of 439 columns. Date in format mm-dd-yyyy gives name to each column. Each country correspond to a row, except for Australia, Canada, China, Denmark, France, Netherlands and UK , that present information at province level. Last access was on 03-31-2021.

Downloaded datasets were in wide format and have the string of characters ‘X.’ in front of the date giving name to the columns. I’ve used the base::substring command to eliminate these strings. Next, in each dataset, I’ve added, with the command dplyr::inner_join, the variable continents, which corresponds to the continent each country is located.

Then, I’ve converted each dataset from a wide format to a long format using the command tidyr::pivot_longer. As a result the datasets have 5 columns, and 67080 rows. Columns correspond to Country.Region, continent, region, dates, and confirmed (for confirmed cases dataset), recovered (for recovered cases dataset), or deaths (for number of deaths dataset). Finally, the variable dates was converted (from integer) to dates with the command lubridate::mdy.

Technical Specifications

This project was made with:

R 4.0.3 GUI 1.73 Catalina build (7892).
R studio Version 1.4.1103 for MacOS.
Mac Book Pro- MacOS High Sierra Version 10.13.6

R Packages

library(readxl)
library(tidyverse)
library(magrittr)
library(lubridate)
library(dplyr)
library(xts)
library(tsibble)
library(slider)
library(dygraphs)
library(plotly)
library(manipulateWidget)
library(knitr)

Read Cumulative Confirmed Cases

confirmed <- read.csv("/Users/marcoarellano/Desktop/DATA SCIENCE/Covid 19/03.31.2021/DATA/time_series_covid19_confirmed_global.csv")
n_colsc <- dim(confirmed)[2]
n_rowsc <- dim(confirmed)[1]
names(confirmed)[5:n_colsc] <- substring(names(confirmed)[5:n_colsc],2)
tail(confirmed[, 1:6])

##     Province.State     Country.Region       Lat      Long 1.22.20 1.23.20
## 269                         Venezuela   6.42380 -66.58970       0       0
## 270                           Vietnam  14.05832 108.27720       0       2
## 271                West Bank and Gaza  31.95220  35.23320       0       0
## 272                             Yemen  15.55273  48.51639       0       0
## 273                            Zambia -13.13390  27.84933       0       0
## 274                          Zimbabwe -19.01544  29.15486       0       0

confirmed dataset has 274 rows and 439 columns.

Read Cumulative Recovered Cases

recovered <- read.csv("/Users/marcoarellano/Desktop/DATA SCIENCE/COVID 19/03.31.2021/DATA/time_series_covid19_recovered_global.csv")
n_colsr <- dim(recovered)[2]
n_rowsr <- dim(recovered)[1]
names(recovered)[5:n_colsr] <- substring(names(recovered)[5:n_colsr],2)
tail(recovered[, 1:6])

##     Province.State     Country.Region       Lat      Long 1.22.20 1.23.20
## 254                         Venezuela   6.42380 -66.58970       0       0
## 255                           Vietnam  14.05832 108.27720       0       0
## 256                West Bank and Gaza  31.95220  35.23320       0       0
## 257                             Yemen  15.55273  48.51639       0       0
## 258                            Zambia -13.13390  27.84933       0       0
## 259                          Zimbabwe -19.01544  29.15486       0       0

recovered dataset has 259 rows and 439 columns.

Read Cumulative Deaths Cases

deaths <- read.csv("/Users/marcoarellano/Desktop/DATA SCIENCE/COVID 19/03.31.2021/DATA/time_series_covid19_deaths_global.csv")
n_colsd <- dim(deaths)[2]
n_rowsd <- dim(deaths)[1]
names(deaths)[5:n_colsd] <- substring(names(deaths)[5:n_colsd],2)
tail(deaths[, 1:6])

##     Province.State     Country.Region       Lat      Long 1.22.20 1.23.20
## 269                         Venezuela   6.42380 -66.58970       0       0
## 270                           Vietnam  14.05832 108.27720       0       0
## 271                West Bank and Gaza  31.95220  35.23320       0       0
## 272                             Yemen  15.55273  48.51639       0       0
## 273                            Zambia -13.13390  27.84933       0       0
## 274                          Zimbabwe -19.01544  29.15486       0       0

deaths dataset has 274 rows and 439 columns.

Read Continent and country list

continents <- read_excel("~/Desktop/DATA SCIENCE/COVID 19/03.31.2021/DATA/continents_Corrected.xlsx")
head(continents)

## # A tibble: 6 x 3
##   Country.Region      continent                          region         
##   <chr>               <chr>                              <chr>          
## 1 Afghanistan         Asia                               Southern Asia  
## 2 Albania             Eastern Europe                     Southern Europe
## 3 Algeria             Africa                             Northern Africa
## 4 Andorra             Western Europe and other States    Northern Europe
## 5 Angola              Africa                             Middle Africa  
## 6 Antigua and Barbuda Latin America and Caribbean States Caribbean

Here I create the long format for each dataset following the steps:

Add continent variable to dataset, use dplyr::inner_join()
Use dplyr::pivot_longer to change having each date as a column, to have each date as a row, within each country.
Define the column dates as a Date variable.
Group dataset by Country.Region, continent, region, dates.
Get cumulated values of confirmed (or recovered, or deaths) for cases where there are more than one row per country.
Cancel the grouping.

Create the long format for accumulated confirmed cases (confirmed_long)

confirmed_long <- confirmed %>%
  inner_join(continents, by = "Country.Region") %>%
  pivot_longer (
    cols = !c(Province.State, Country.Region, Lat, Long, continent, region),
    names_to = c("dates"),
    values_to = "confirmed") %>%
  mutate(dates = mdy(dates)) %>%
  group_by(Country.Region, continent, region, dates) %>%
  summarise(confirmed = sum(confirmed)) %>%
  ungroup()
n_colscl <- dim(confirmed_long)[2]
n_rowscl <- dim(confirmed_long)[1]
tail(confirmed_long)

## # A tibble: 6 x 5
##   Country.Region continent region         dates      confirmed
##   <chr>          <chr>     <chr>          <date>         <int>
## 1 Zimbabwe       Africa    Eastern Africa 2021-03-26     36805
## 2 Zimbabwe       Africa    Eastern Africa 2021-03-27     36818
## 3 Zimbabwe       Africa    Eastern Africa 2021-03-28     36822
## 4 Zimbabwe       Africa    Eastern Africa 2021-03-29     36839
## 5 Zimbabwe       Africa    Eastern Africa 2021-03-30     36839
## 6 Zimbabwe       Africa    Eastern Africa 2021-03-31     36882

confirmed_long dataset has 67860 rows and 5 columns.

Create the long format for accumulated recovered cases (recovered_long)

recovered_long <- recovered %>%
  inner_join(continents, by = "Country.Region") %>%
  pivot_longer (
    cols = !c(Province.State, Country.Region, Lat, Long, continent, region),
    names_to = c("dates"),
    values_to = "recovered") %>%
  mutate(dates = mdy(dates))%>%
  group_by(Country.Region, continent, region, dates) %>%
  summarise(recovered = sum(recovered)) %>%
  ungroup()
n_colsrl <- dim(recovered_long)[2]
n_rowsrl <- dim(recovered_long)[1]
tail(recovered_long)

## # A tibble: 6 x 5
##   Country.Region continent region         dates      recovered
##   <chr>          <chr>     <chr>          <date>         <int>
## 1 Zimbabwe       Africa    Eastern Africa 2021-03-26     34572
## 2 Zimbabwe       Africa    Eastern Africa 2021-03-27     34575
## 3 Zimbabwe       Africa    Eastern Africa 2021-03-28     34603
## 4 Zimbabwe       Africa    Eastern Africa 2021-03-29     34617
## 5 Zimbabwe       Africa    Eastern Africa 2021-03-30     34617
## 6 Zimbabwe       Africa    Eastern Africa 2021-03-31     34686

recovered_long dataset has 67860 rows and 5 columns.

Create the long format for accumulated deaths (deaths_long)

deaths_long <- deaths %>%
  inner_join(continents, by = "Country.Region") %>%
  pivot_longer (
    cols = !c(Province.State, Country.Region, Lat, Long, continent, region),
    names_to = c("dates"),
    values_to = "deaths") %>%
  mutate(dates = mdy(dates)) %>%
  group_by(Country.Region, continent, region, dates) %>%
  summarise(deaths= sum(deaths)) %>%
  ungroup()
n_colsdl <- dim(deaths_long)[2]
n_rowsdl <- dim(deaths_long)[1]
tail(deaths_long)

## # A tibble: 6 x 5
##   Country.Region continent region         dates      deaths
##   <chr>          <chr>     <chr>          <date>      <int>
## 1 Zimbabwe       Africa    Eastern Africa 2021-03-26   1518
## 2 Zimbabwe       Africa    Eastern Africa 2021-03-27   1519
## 3 Zimbabwe       Africa    Eastern Africa 2021-03-28   1520
## 4 Zimbabwe       Africa    Eastern Africa 2021-03-29   1520
## 5 Zimbabwe       Africa    Eastern Africa 2021-03-30   1520
## 6 Zimbabwe       Africa    Eastern Africa 2021-03-31   1523

deaths_long dataset has 67860 rows and 5 columns.

Below, I create a new column that corresponds to the number of daily cases in each dataset.

Daily cases are achieved by subtracting the cases from day (j-1) to day (j), the difference give us the case increase in a single day. To achieve this, I use the dplyr::lag command that allows to find the number of cases in the day before. Using the lag() option default=0, means that the lag value for the first observation will be the same as the observed value for that day.

Order sequentially dates from first date, 01-22-2020, to final date, 03-31-2021.
Group rows by Country.Region.
Create variable confirmed_dailycases, or recovered_dailycases, or deaths_dailycases.
Cancel the grouping within each dataset.

Create the variable confirmed_dailycases

confirmed_long <- confirmed_long %>%
  arrange(dates) %>%
  group_by(Country.Region) %>%
  mutate(confirmed_dailycases = confirmed - lag(confirmed, default = 0)) %>%
  ungroup()
tail(confirmed_long)

## # A tibble: 6 x 6
##   Country.Region continent       region    dates      confirmed confirmed_daily…
##   <chr>          <chr>           <chr>     <date>         <int>            <dbl>
## 1 Vanuatu        Asia            Melanesia 2021-03-31         3                0
## 2 Venezuela      Latin America … South Am… 2021-03-31    160497             1348
## 3 Vietnam        Asia            South-Ea… 2021-03-31      2603                9
## 4 Yemen          Asia            Western … 2021-03-31      4357              110
## 5 Zambia         Africa          Eastern … 2021-03-31     88418              219
## 6 Zimbabwe       Africa          Eastern … 2021-03-31     36882               43

Create the variable recovered_dailycases

recovered_long <- recovered_long %>%
  arrange(dates) %>%
  group_by(Country.Region) %>%
  mutate(recovered_dailycases = recovered - lag(recovered, default = 0)) %>%
  ungroup()
tail(recovered_long)

## # A tibble: 6 x 6
##   Country.Region continent       region    dates      recovered recovered_daily…
##   <chr>          <chr>           <chr>     <date>         <int>            <dbl>
## 1 Vanuatu        Asia            Melanesia 2021-03-31         1                0
## 2 Venezuela      Latin America … South Am… 2021-03-31    147846              683
## 3 Vietnam        Asia            South-Ea… 2021-03-31      2359                0
## 4 Yemen          Asia            Western … 2021-03-31      1676                9
## 5 Zambia         Africa          Eastern … 2021-03-31     84592               73
## 6 Zimbabwe       Africa          Eastern … 2021-03-31     34686               69

Create the variable deaths_dailycases

deaths_long <- deaths_long %>%
  arrange(dates) %>%
  group_by(Country.Region) %>%
  mutate(deaths_dailycases = deaths - lag(deaths, default = 0)) %>%
  ungroup()
tail(deaths_long)

## # A tibble: 6 x 6
##   Country.Region continent         region     dates      deaths deaths_dailycas…
##   <chr>          <chr>             <chr>      <date>      <int>            <dbl>
## 1 Vanuatu        Asia              Melanesia  2021-03-31      0                0
## 2 Venezuela      Latin America an… South Ame… 2021-03-31   1602               13
## 3 Vietnam        Asia              South-Eas… 2021-03-31     35                0
## 4 Yemen          Asia              Western A… 2021-03-31    888                6
## 5 Zambia         Africa            Eastern A… 2021-03-31   1208                6
## 6 Zimbabwe       Africa            Eastern A… 2021-03-31   1523                3

Here I select data from my country, Peru, following the steps,

Use dplyr::filter to select Peru data in each dataset.
Select variables to be included in each Peru dataset: Peru_confirmed, Peru_recovered, Peru_deaths.
Eliminate the column Country.Region because the three datasets refers to Peru.
Use dplyr::full_join to create a single dataset with all three series: confirmed, recovered and deaths. The variable dates is use for joining the three Peru datasets.
Finally, I create a new variable, deaths_100k, which correspond to (deaths/32625948)10^5), as the population of Peru for March 2021 is estimated as 32 625 948 millions. Values were rounded to one decimal.

Peru_confirmed <- confirmed_long %>%
  filter(Country.Region %in% "Peru") %>%
  select(Country.Region, dates, confirmed, confirmed_dailycases)
Peru_confirmed <- Peru_confirmed[,-c(1)]

Peru_recovered <- recovered_long %>%
  filter(Country.Region %in% "Peru") %>%
  select(Country.Region, dates, recovered, recovered_dailycases)
Peru_recovered <- Peru_recovered[,-c(1)]

Peru_deaths <- deaths_long %>%
  filter(Country.Region %in% "Peru") %>%
  select(Country.Region, dates, deaths, deaths_dailycases)
Peru_deaths <- Peru_deaths[,-c(1)]

Combine the three datasets in the new dataset Peru_global

Peru_global <- Peru_confirmed %>%
  full_join(Peru_recovered, by = "dates") %>%
  full_join(Peru_deaths, by = "dates") %>%
  mutate(deaths_100k = ceiling((deaths/32625948)*10^5))
n_colspg <- dim(Peru_global)[2]
n_rowspg <- dim(Peru_global)[1]
tail(Peru_global)

## # A tibble: 6 x 8
##   dates      confirmed confirmed_daily… recovered recovered_daily… deaths
##   <date>         <int>            <dbl>     <int>            <dbl>  <int>
## 1 2021-03-26   1512384            11919   1423259            16955  51032
## 2 2021-03-27   1520973             8589   1432450             9191  51238
## 3 2021-03-28   1529882             8909   1442405             9955  51469
## 4 2021-03-29   1533121             3239   1451112             8707  51635
## 5 2021-03-30   1533121                0   1451112                0  51635
## 6 2021-03-31   1548807            15686   1468457            17345  52008
## # … with 2 more variables: deaths_dailycases <dbl>, deaths_100k <dbl>

Peru_global dataset has 435 rows and 8 columns.

Next, I create a 7-day Rolling Average variable for confirmed, recovered, and deaths variables in dataset Peru_global.

First, transform Peru_global dataset to a time series object. I use the command tsibble::as_tsibble

nr <- nrow(Peru_global)
Peru_global$rid <- seq(1, nr ,1)
Peru_global_ts <- as_tsibble(Peru_global, 
                             key = rid,
                             index = dates)

The 7-day rolling average takes seven consecutive values and calculate their average, this average is paired with the central date of the 7-day interval, which correspond to the 4th date, the following 7-day interval is created dropping the earliest date of the interval and adding the next date after the latest date of the interval.

The 7-day rolling average is created with the command slider::slide_index_dbl.
The position of our 7-day Rolling Average corresponds to day 4 of consecutive 7-day intervals.
The 7-day Rolling Averages smooth the day to day observed variation in 7 consecutive days.

Peru_global_ts <- Peru_global_ts %>% 
  filter_index("2020-03-06" ~ .) %>% 
  mutate(confirmed7_dailycases = slide_index_dbl(.i = dates,
                                                 .x = confirmed_dailycases,
                                                 .f = mean,
                                                 .before = 3,
                                                 .after= 3),
          recovered7_dailycases = slide_index_dbl(.i = dates,
                                                  .x = recovered_dailycases,
                                                  .f = mean,
                                                  .before = 3,
                                                  .after = 3),
         deaths7_dailycases = slide_index_dbl(.i = dates,
                                              .x = deaths_dailycases,
                                              .f = mean,
                                              .before = 3,
                                              .after = 3),
         confirmed7 = slide_index_dbl(.i = dates,
                                      .x = confirmed,
                                      .f = mean,
                                      .before = 3,
                                      .after = 3),
         recovered7 = slide_index_dbl(.i = dates,
                                      .x = recovered,
                                      .f = mean,
                                      .before = 3,
                                      .after = 3),
         deaths7 = slide_index_dbl(.i = dates,
                                   .x = deaths,
                                   .f = mean,
                                   .before = 3,
                                   .after = 3))
head(Peru_global_ts)

## # A tsibble: 6 x 15 [1D]
## # Key:       rid [6]
##   dates      confirmed confirmed_daily… recovered recovered_daily… deaths
##   <date>         <int>            <dbl>     <int>            <dbl>  <int>
## 1 2020-03-06         1                1         0                0      0
## 2 2020-03-07         1                0         0                0      0
## 3 2020-03-08         6                5         0                0      0
## 4 2020-03-09         7                1         0                0      0
## 5 2020-03-10        11                4         0                0      0
## 6 2020-03-11        11                0         0                0      0
## # … with 9 more variables: deaths_dailycases <dbl>, deaths_100k <dbl>,
## #   rid <dbl>, confirmed7_dailycases <dbl>, recovered7_dailycases <dbl>,
## #   deaths7_dailycases <dbl>, confirmed7 <dbl>, recovered7 <dbl>, deaths7 <dbl>

print(Peru_global_ts)

## # A tsibble: 391 x 15 [1D]
## # Key:       rid [391]
##    dates      confirmed confirmed_daily… recovered recovered_daily… deaths
##    <date>         <int>            <dbl>     <int>            <dbl>  <int>
##  1 2020-03-06         1                1         0                0      0
##  2 2020-03-07         1                0         0                0      0
##  3 2020-03-08         6                5         0                0      0
##  4 2020-03-09         7                1         0                0      0
##  5 2020-03-10        11                4         0                0      0
##  6 2020-03-11        11                0         0                0      0
##  7 2020-03-12        15                4         0                0      0
##  8 2020-03-13        28               13         0                0      0
##  9 2020-03-14        38               10         0                0      0
## 10 2020-03-15        43                5         0                0      0
## # … with 381 more rows, and 9 more variables: deaths_dailycases <dbl>,
## #   deaths_100k <dbl>, rid <dbl>, confirmed7_dailycases <dbl>,
## #   recovered7_dailycases <dbl>, deaths7_dailycases <dbl>, confirmed7 <dbl>,
## #   recovered7 <dbl>, deaths7 <dbl>

Now let’s graph!

I use the dygraph library to graph an interactive time series of confirmed, recovered and deaths daily cases. Each plot has 2 variables: the daily number of cases and 7-day rolling average.

The interactive graph allows to zoom in selected time intervals for a more detailed view of the series.

First Graph is for the Number of Daily Confirmed Cases.

peru_int_confirmeddaily <- cbind(Peru_global_ts[c(1, 3, 10)])
peru_int_confirmeddaily$confirmed7_dailycases <- round(peru_int_confirmeddaily$confirmed7_dailycases, 0)
rownames( peru_int_confirmeddaily) <- as.POSIXlt( peru_int_confirmeddaily[, 1])
ts_peru_int_confirmeddaily <-  peru_int_confirmeddaily[, -1]
dygraph(ts_peru_int_confirmeddaily, 
        main = "Confirmed Covid-19 Daily cases") %>%
  dySeries("confirmed_dailycases", stepPlot = TRUE, 
           fillGraph = TRUE, color = "lightblue", label = "Confirmed Daily Cases") %>%
  dySeries("confirmed7_dailycases", drawPoints = TRUE, 
           pointShape = "square", color = "darkblue", label = "Rolling Avg 7") %>%
  dyRangeSelector(height = 20) %>%
  dyLegend(width = 300)

Second Graph is for the Number of Daily Recovered Cases.

peru_int_recovereddaily <- cbind(Peru_global_ts[c(1, 5, 11)])
peru_int_recovereddaily$recovered7_dailycases <- round(peru_int_recovereddaily$recovered7_dailycases, 0)
rownames( peru_int_recovereddaily) <- as.POSIXlt( peru_int_recovereddaily[, 1])
ts_peru_int_recovereddaily <-  peru_int_recovereddaily[, -1]
dygraph(ts_peru_int_recovereddaily, 
        main = " Recovered Covid-19 Daily cases") %>%
  dySeries("recovered_dailycases", stepPlot = TRUE, 
           fillGraph = TRUE, color = "turquoise", label = "Recovered Daily Cases") %>%
  dySeries("recovered7_dailycases", drawPoints = TRUE, 
           pointShape = "circle", color = "green", label = "Rolling Avg 7") %>%
  dyRangeSelector(height = 20) %>%
  dyLegend(width = 300)

Third Graph is for the Number of Daily Deaths.

peru_int_deathsdaily <- cbind(Peru_global_ts[c(1, 7, 12)])
peru_int_deathsdaily$deaths7_dailycases <- round(peru_int_deathsdaily$deaths7_dailycases, 0)
rownames( peru_int_deathsdaily) <- as.POSIXlt( peru_int_deathsdaily[, 1])
ts_peru_int_deathsdaily <-  peru_int_deathsdaily[, -1]
dygraph(ts_peru_int_deathsdaily,
        main = "Covid-19 Daily Deaths") %>%
  dySeries("deaths_dailycases", stepPlot = TRUE, fillGraph = TRUE, 
           color = "orange", label = "Deaths Daily Cases") %>%
  dySeries("deaths7_dailycases", drawPoints = TRUE, pointShape = "square",
           color = "red", label = "Rolling Avg 7") %>%
  dyRangeSelector(height = 20) %>%
  dyLegend(width = 285)

The plotly library is used to create the last graph. For this graph we use the accumulated values and the 7-day rolling average of our 3 variables: confirmed, recovered and deaths.

This chart has 2 y-axis. The y-axis on the left corresponds to the values of confirmed and recovered cases; on the other hand, the right y-axis corresponds to the values of deaths. I considered to have two y-axis because the confirmed and recovered values have a similar range, in contrast to the death values that had a lower range. For that reason, in order to visualize the trend in a better way, it was decided to add the second y-axis.

plot_ly() %>%
  add_trace(x = ~Peru_global_ts$dates, y = ~ round(Peru_global_ts$confirmed7, 0), name = "Confirmed",  
            type = 'scatter', mode = 'lines', line = list(color = 'blue', size = 4),
            hoverinfo = "text",
            text = ~paste("Date: ", Peru_global_ts$dates,
                          "<br>",
                          "Confirmed: ", round(Peru_global_ts$confirmed7, 0))) %>%
  add_trace(x = ~Peru_global_ts$dates, y = ~round(Peru_global_ts$recovered7, 0), name = "Recovered",
            type = 'scatter', mode = 'lines', line = list(color = 'green', size = 4),
            hoverinfo = "text",
            text = ~paste("Date: ", Peru_global_ts$dates,
                          "<br>",
                          "Recovered: ", round(Peru_global_ts$recovered7, 0))) %>% 
  add_trace(x = ~Peru_global_ts$dates, y = ~round(Peru_global_ts$deaths7, 0), name = "Deaths", yaxis = "y2",
            type = 'scatter', mode = 'lines', line = list(color = 'red', size = 4),
            hoverinfo = "text",
            text = ~paste("Date: ",Peru_global_ts$dates,
                          "<br>",
                          "Deaths: ", round(Peru_global_ts$deaths7, 0))) %>% 
  layout( title = list(text ="7-day Rolling Average Cumulative Covid-19 cases Peru 2020-2021",
                       size = 10),
          yaxis2 = list(tickfont = list(color = "red"),
                        overlaying = "y",
                        side = "right",
                        title = "Cumulative Deaths",
                        showgrid = FALSE),
          xaxis = list(title = "Dates",
                       color = "black"),
          yaxis = list(tickangle = 0,
                      title = "Cumulative Confirmed and Recovered <br><br><br>",
                      standoff = 90,
                      showgrid = FALSE),
          legend = list(orientation = "h",   
                        xanchor = "center",  
                        x = 0.5,             
                        y = -0.2),
          autosize = T,
          margin = list(l = 100, r = 100, b = 100, t = 100, pad = 20))

After all our graphics are ready, I combine them in a single figure that has two columns; the left column has a combined cumulative cases series graph, and the right column has three individual graphs corresponding to daily cases.

I use the command manipulateWidget::combineWidget. This command allows to join our interactive graphics in a single image in a quick and easy way.

First, I create a function to combine in a single graph the three cumulative series.

cumulates_plotly <- function(id){plot_ly() %>%
  add_trace(x = ~Peru_global_ts$dates, y = ~ round(Peru_global_ts$confirmed7, 0), name = "Confirmed",  
            type = 'scatter', mode = 'lines', line = list(color = 'blue', size = 4),
            hoverinfo = "text",
            text = ~paste("Date: ", Peru_global_ts$dates,
                          "<br>",
                          "Confirmed: ", round(Peru_global_ts$confirmed7, 0))) %>%
  add_trace(x = ~Peru_global_ts$dates, y = ~round(Peru_global_ts$recovered7, 0), name = "Recovered",
            type = 'scatter', mode = 'lines', line = list(color = 'green', size = 4),
            hoverinfo = "text",
            text = ~paste("Date: ", Peru_global_ts$dates,
                          "<br>",
                          "Recovered: ", round(Peru_global_ts$recovered7, 0))) %>% 
  add_trace(x = ~Peru_global_ts$dates, y = ~round(Peru_global_ts$deaths7, 0), name = "Deaths", yaxis = "y2",
            type = 'scatter', mode = 'lines', line = list(color = 'red', size = 4),
            hoverinfo = "text",
            text = ~paste("Date: ", Peru_global_ts$dates,
                          "<br>",
                          "Deaths: ", round(Peru_global_ts$deaths7, 0))) %>% 
  layout( title = list(text = "7-day Rolling Average Cumulative Covid-19 cases Peru 2020-2021",
                       size = 10),
          yaxis2 = list(tickfont = list(color = "red"),
                        overlaying = "y",
                        side = "right",
                        title = "Cumulative Deaths",
                        showgrid= FALSE),
          xaxis = list(title = "Dates",
                       color = "black"),
          yaxis = list(tickangle =0,
                      title = "Cumulative Confirmed and Recovered <br><br><br>",
                      standoff = 90,
                      showgrid = FALSE),
          legend = list(orientation = "h",   
                        xanchor = "center",  
                        x = 0.5,             
                        y = -0.2),
          autosize = T,
          margin = list(l = 100, r = 100, b = 100, t = 100, pad = 20))}

Second, I create a function to define each component of the right column in the final figure.

c1 <-function(id){dygraph(ts_peru_int_confirmeddaily, 
        main = "Confirmed Covid-19 Daily cases") %>%
  dySeries("confirmed_dailycases", stepPlot = TRUE, 
           fillGraph = TRUE, color = "lightblue", label = "Confirmed Daily Cases") %>%
  dySeries("confirmed7_dailycases", drawPoints = TRUE, 
           pointShape = "square", color = "darkblue", label = "Rolling Avg 7") %>%
  dyRangeSelector(height = 20) %>%
  dyLegend(width = 300)}

r1<-function(id){dygraph(ts_peru_int_recovereddaily, 
        main = " Recovered Covid-19 Daily cases") %>%
  dySeries("recovered_dailycases", stepPlot = TRUE, 
           fillGraph = TRUE, color = "turquoise", label = "Recovered Daily Cases") %>%
  dySeries("recovered7_dailycases", drawPoints = TRUE, 
           pointShape = "circle", color = "green", label = "Rolling Avg 7") %>%
  dyRangeSelector(height = 20) %>%
  dyLegend(width = 300)}

d1<-function(id){dygraph(ts_peru_int_deathsdaily,
        main = "Covid-19 Daily Deaths") %>%
  dySeries("deaths_dailycases", stepPlot = TRUE, fillGraph = TRUE, 
           color = "orange", label = "Deaths Daily Cases") %>%
  dySeries("deaths7_dailycases", drawPoints = TRUE, pointShape = "square",
           color = "red", label = "Rolling Avg 7") %>%
  dyRangeSelector(height = 20) %>%
  dyLegend(width = 285)}

To conclude, I use combineWidget to arrange the charts in the final figure Additionally, I create the function write_alt_text to add an alternative text to the graph.

write_alt_text <- function(
  chart_type, 
  type_of_data, 
  reason, 
  source){glue::glue(
    "{chart_type} of {type_of_data} where {reason}.<br> \n\nData source from {source}")}
combineWidgets(
  ncol = 2, colsize = c(2,1),
  cumulates_plotly(1),
  title = "Covid-19 Peru Interactive Time Series",
  footer = write_alt_text(
  "<br/>Time Series", 
  "confirmed, recovered and deaths cases from Covid-19 in Peru", 
  "information about the evolution of Covid-19 is needed", 
  "MINSA-Peru/ CSSE-John Hopkins University.<br>Made by Marco Arellano B. Twitter: marellanob93, Github: marellanob"),
  combineWidgets(
    ncol = 1,
    c1(2),
    r1(3),
    d1(4)))

VISUALIZATION OF PERU COVID-19 TIME SERIES

SOURCE: MINSA-PERU / CSSE-JOHN HOPKINS UNIVERSITY

MARCO ARELLANO B.
Twitter: marellanob93
Github: marellanob

3/31/2021

Introduction

Data

Technical Specifications

R Packages

Now let’s graph!

VISUALIZATION OF PERU COVID-19 TIME SERIES

SOURCE: MINSA-PERU / CSSE-JOHN HOPKINS UNIVERSITY

MARCO ARELLANO B. Twitter: marellanob93 Github: marellanob

3/31/2021

Introduction

Data

Technical Specifications

R Packages

Now let’s graph!

MARCO ARELLANO B.
Twitter: marellanob93
Github: marellanob