1.0 Introduction

The 2019–20 coronavirus pandemic is a pandemic of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease was first identified in Wuhan, Hubei, China in December 2019.

The pandemic has spread like forest fire, infecting around 14.3m as of date and more than 600K fatalities. A graphical dashboard enables an eagle’s eye view of the status as of date. The goal of this report is to understand trajectory of confirmed cases, deaths and recovered cases in the United States, which is the worst affected nation.

2.0 Major data and design challenges

3.0 Solutionto data and design challenges listed above

COVID spread in US.

COVID spread in US.

4.0 Step-by-step description on how the data visualization was prepared

Global_confirmed<-read.csv("time_series_covid19_confirmed_global.csv")
Global_deaths<-read.csv("time_series_covid19_deaths_global.csv")
Global_recovered<-read.csv("time_series_covid19_recovered_global.csv")
library(readxl)
US<-read_excel("US.xlsx")
Global_confirmed[1:5,1:4]
##   Province.State Country.Region       Lat     Long
## 1                   Afghanistan  33.93911 67.70995
## 2                       Albania  41.15330 20.16830
## 3                       Algeria  28.03390  1.65960
## 4                       Andorra  42.50630  1.52180
## 5                        Angola -11.20270 17.87390
Global_confirmed[1:5,179:183]
##   X7.14.20 X7.15.20 X7.16.20 X7.17.20 X7.18.20
## 1    34740    34994    35070    35229    35301
## 2     3667     3752     3851     3906     4008
## 3    20216    20770    21355    21948    22549
## 4      861      862      877      880      880
## 5      541      576      607      638      687
Global_deaths[1:5,1:4]
##   Province.State Country.Region       Lat     Long
## 1                   Afghanistan  33.93911 67.70995
## 2                       Albania  41.15330 20.16830
## 3                       Algeria  28.03390  1.65960
## 4                       Andorra  42.50630  1.52180
## 5                        Angola -11.20270 17.87390
Global_deaths[1:5,179:183]
##   X7.14.20 X7.15.20 X7.16.20 X7.17.20 X7.18.20
## 1     1048     1094     1113     1147     1164
## 2       97      101      104      107      111
## 3     1028     1040     1052     1057     1068
## 4       52       52       52       52       52
## 5       26       27       28       29       29
Global_recovered[1:5,1:4]
##   Province.State Country.Region       Lat     Long
## 1                   Afghanistan  33.93911 67.70995
## 2                       Albania  41.15330 20.16830
## 3                       Algeria  28.03390  1.65960
## 4                       Andorra  42.50630  1.52180
## 5                        Angola -11.20270 17.87390
Global_recovered[1:5,179:183]
##   X7.14.20 X7.15.20 X7.16.20 X7.17.20 X7.18.20
## 1    21454    22456    22824    23151    23273
## 2     2062     2091     2137     2214     2264
## 3    14295    14792    15107    15430    15744
## 4      803      803      803      803      803
## 5      118      124      124      199      210

Dataset for visualization

  • I have created single dataset for just US cases over the entire time period available
US[1:5,]
## # A tibble: 5 x 4
##   Date                `Confirmed Cases` Deaths Recovered
##   <dttm>                          <dbl>  <dbl>     <dbl>
## 1 2020-01-22 00:00:00                 1      0         0
## 2 2020-01-23 00:00:00                 1      0         0
## 3 2020-01-24 00:00:00                 2      0         0
## 4 2020-01-25 00:00:00                 2      0         0
## 5 2020-01-26 00:00:00                 5      0         0
  • Using the data for each of the categories, i plotted individual charts to understand the scale and other important

Global Confirmed cases

library(ggplot2)
p=ggplot(US,aes(x=Date))+geom_line(aes(y=`Confirmed Cases`),colour='orange',size=1.5)+ggtitle("Spread of COVID - Confirmed Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
p

Global Deaths

q=ggplot(US,aes(x=Date))+geom_line(aes(y=Deaths),colour='red',size=1.5)+ggtitle("Spread of COVID - Deaths")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
q

Global Recovered cases

r=ggplot(US,aes(x=Date))+geom_line(aes(y=US$Recovered),colour='green',size=1.5)+ggtitle("Spread of COVID - Recovered Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
r

  • However, to understand the complete picture of patients affected vis a vis mortality rate and #cases recovered, it is important to combine the charts into one graph as below

Final Visualization

Global Confirmed cases, Deaths and Recoveries

s=ggplot(US,aes(x=Date))+geom_line(aes(y=`Confirmed Cases`),colour='orange',size=1.5)+geom_line(aes(y=Deaths),colour='red',size=1.5)+geom_line(aes(y=US$Recovered),colour='green',size=1.5)+ggtitle("Spread of COVID Confirmed, Deaths and Recovered Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
s

5.0 Insights

6.0 References

Data - “COVID