The 2019–20 coronavirus pandemic is a pandemic of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease was first identified in Wuhan, Hubei, China in December 2019.
The pandemic has spread like forest fire, infecting around 14.3m as of date and more than 600K fatalities. A graphical dashboard enables an eagle’s eye view of the status as of date. The goal of this report is to understand trajectory of confirmed cases, deaths and recovered cases in the United States, which is the worst affected nation.
The data downloaded from the reference mentioned below has time series data. Existing structure of the data is not helpful in creating a visualization as for each country, across a single column, #cases for each date starting Jan 22 till date is mentioned across the row
For the graph to be leggible and self explanatory, extra effort is required to clearly portray the intended information
Per the instructions for the assignment, we have to only build static visualizations due to which deep dive across dimensions is not possible
I have manipulated the original dataset to combine information from 3 files into a single dataframe specific to US specific details
Below is my thought process around how i would like to show case data in the form of infographs
COVID spread in US.
Global_confirmed<-read.csv("time_series_covid19_confirmed_global.csv")
Global_deaths<-read.csv("time_series_covid19_deaths_global.csv")
Global_recovered<-read.csv("time_series_covid19_recovered_global.csv")
library(readxl)
US<-read_excel("US.xlsx")
Global_confirmed[1:5,1:4]
## Province.State Country.Region Lat Long
## 1 Afghanistan 33.93911 67.70995
## 2 Albania 41.15330 20.16830
## 3 Algeria 28.03390 1.65960
## 4 Andorra 42.50630 1.52180
## 5 Angola -11.20270 17.87390
Global_confirmed[1:5,179:183]
## X7.14.20 X7.15.20 X7.16.20 X7.17.20 X7.18.20
## 1 34740 34994 35070 35229 35301
## 2 3667 3752 3851 3906 4008
## 3 20216 20770 21355 21948 22549
## 4 861 862 877 880 880
## 5 541 576 607 638 687
Global_deaths[1:5,1:4]
## Province.State Country.Region Lat Long
## 1 Afghanistan 33.93911 67.70995
## 2 Albania 41.15330 20.16830
## 3 Algeria 28.03390 1.65960
## 4 Andorra 42.50630 1.52180
## 5 Angola -11.20270 17.87390
Global_deaths[1:5,179:183]
## X7.14.20 X7.15.20 X7.16.20 X7.17.20 X7.18.20
## 1 1048 1094 1113 1147 1164
## 2 97 101 104 107 111
## 3 1028 1040 1052 1057 1068
## 4 52 52 52 52 52
## 5 26 27 28 29 29
Global_recovered[1:5,1:4]
## Province.State Country.Region Lat Long
## 1 Afghanistan 33.93911 67.70995
## 2 Albania 41.15330 20.16830
## 3 Algeria 28.03390 1.65960
## 4 Andorra 42.50630 1.52180
## 5 Angola -11.20270 17.87390
Global_recovered[1:5,179:183]
## X7.14.20 X7.15.20 X7.16.20 X7.17.20 X7.18.20
## 1 21454 22456 22824 23151 23273
## 2 2062 2091 2137 2214 2264
## 3 14295 14792 15107 15430 15744
## 4 803 803 803 803 803
## 5 118 124 124 199 210
US[1:5,]
## # A tibble: 5 x 4
## Date `Confirmed Cases` Deaths Recovered
## <dttm> <dbl> <dbl> <dbl>
## 1 2020-01-22 00:00:00 1 0 0
## 2 2020-01-23 00:00:00 1 0 0
## 3 2020-01-24 00:00:00 2 0 0
## 4 2020-01-25 00:00:00 2 0 0
## 5 2020-01-26 00:00:00 5 0 0
library(ggplot2)
p=ggplot(US,aes(x=Date))+geom_line(aes(y=`Confirmed Cases`),colour='orange',size=1.5)+ggtitle("Spread of COVID - Confirmed Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
p
q=ggplot(US,aes(x=Date))+geom_line(aes(y=Deaths),colour='red',size=1.5)+ggtitle("Spread of COVID - Deaths")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
q
r=ggplot(US,aes(x=Date))+geom_line(aes(y=US$Recovered),colour='green',size=1.5)+ggtitle("Spread of COVID - Recovered Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
r
s=ggplot(US,aes(x=Date))+geom_line(aes(y=`Confirmed Cases`),colour='orange',size=1.5)+geom_line(aes(y=Deaths),colour='red',size=1.5)+geom_line(aes(y=US$Recovered),colour='green',size=1.5)+ggtitle("Spread of COVID Confirmed, Deaths and Recovered Cases")+xlab("Date")+ylab("#Cases")+scale_y_continuous(labels = function(x) format(x, scientific = FALSE))
s
It is quite evident that #confirmed cases started to rise significantly from mid March, where #deaths started to increase in the last week of March, and lastly the recovery numbers started to pick up 2 weeks from being detected (as is the lag in a person acquiring COVID versus starting to show the symptoms).
From the last chart, it is easier to interpret that the recovery rate is around 33% of the confirmed cases
Also, it is worth noticing that unlike the pandemics in the past, though the pandemic has sustained for a very long time and is unclear when will we emerge out of it, still the mortality rate in this case is very less around 2-3%
Data - “COVID”