1.0 Introduction

The 2019–20 coronavirus pandemic is a pandemic of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease was first identified in Wuhan, Hubei, China in December 2019.

The pandemic has spread like forest fire, infecting around 14.3m as of date and more than 600K fatalities. A graphical dashboard enables an eagle’s eye view of the status as of date. The goal of this report is to understand trajectory of confirmed cases, deaths and recovered cases in the United States, which is the worst affected nation.

2.0 Major data and design challenges

3.0 Solutionto data and design challenges listed above

COVID spread in US.

COVID spread in US.

4.0 Step-by-step description on how the data visualization was prepared

load("Assignment4-19072020.R.RData")
Global_confirmed[1:5,1:4]
##   Province/State Country/Region       Lat     Long
## 1           <NA>    Afghanistan  33.93911 67.70995
## 2           <NA>        Albania  41.15330 20.16830
## 3           <NA>        Algeria  28.03390  1.65960
## 4           <NA>        Andorra  42.50630  1.52180
## 5           <NA>         Angola -11.20270 17.87390
Global_confirmed[1:5,179:183]
##   7/14/20 7/15/20 7/16/20 7/17/20 7/18/20
## 1   34740   34994   35070   35229   35301
## 2    3667    3752    3851    3906    4008
## 3   20216   20770   21355   21948   22549
## 4     861     862     877     880     880
## 5     541     576     607     638     687
Global_deaths[1:5,1:4]
##   Province/State Country/Region       Lat     Long
## 1           <NA>    Afghanistan  33.93911 67.70995
## 2           <NA>        Albania  41.15330 20.16830
## 3           <NA>        Algeria  28.03390  1.65960
## 4           <NA>        Andorra  42.50630  1.52180
## 5           <NA>         Angola -11.20270 17.87390
Global_deaths[1:5,179:183]
##   7/14/20 7/15/20 7/16/20 7/17/20 7/18/20
## 1    1048    1094    1113    1147    1164
## 2      97     101     104     107     111
## 3    1028    1040    1052    1057    1068
## 4      52      52      52      52      52
## 5      26      27      28      29      29
Global_recovered[1:5,1:4]
##   Province/State Country/Region       Lat     Long
## 1           <NA>    Afghanistan  33.93911 67.70995
## 2           <NA>        Albania  41.15330 20.16830
## 3           <NA>        Algeria  28.03390  1.65960
## 4           <NA>        Andorra  42.50630  1.52180
## 5           <NA>         Angola -11.20270 17.87390
Global_recovered[1:5,179:183]
##   7/14/20 7/15/20 7/16/20 7/17/20 7/18/20
## 1   21454   22456   22824   23151   23273
## 2    2062    2091    2137    2214    2264
## 3   14295   14792   15107   15430   15744
## 4     803     803     803     803     803
## 5     118     124     124     199     210

Dataset for visualization

  • I have created single dataset for just US cases over the entire time period available
load("Assignment 4.R.RData")
US[1:5,]
##         Date Confirmed Cases Deaths Recovered
## 1 2020-01-22               1      0         0
## 2 2020-01-23               1      0         0
## 3 2020-01-24               2      0         0
## 4 2020-01-25               2      0         0
## 5 2020-01-26               5      0         0
  • Using the data for each of the categories, i plotted individual charts to understand the scale and other important

Global Confirmed cases

library(ggplot2)
p=ggplot(US,aes(x=Date))+geom_line(aes(y=`Confirmed Cases`),colour='orange',size=1.5)+ggtitle("Spread of COVID - Confirmed Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
p

Global Deaths

q=ggplot(US,aes(x=Date))+geom_line(aes(y=Deaths),colour='red',size=1.5)+ggtitle("Spread of COVID - Deaths")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
q

Global Recovered cases

r=ggplot(US,aes(x=Date))+geom_line(aes(y=US$Recovered),colour='green',size=1.5)+ggtitle("Spread of COVID - Recovered Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
r

  • However, to understand the complete picture of patients affected vis a vis mortality rate and #cases recovered, it is important to combine the charts into one graph as below

Final Visualization

Global Confirmed cases, Deaths and Recoveries

s=ggplot(US,aes(x=Date))+geom_line(aes(y=`Confirmed Cases`),colour='orange',size=1.5)+geom_line(aes(y=Deaths),colour='red',size=1.5)+geom_line(aes(y=US$Recovered),colour='green',size=1.5)+ggtitle("Spread of COVID Confirmed, Deaths and Recovered Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
s

5.0 Insights

6.0 References

Data - “COVID