AIM

The primary objective and aim of this project is to analyze the spread of Corona Virus(COVID-19) worldwide. This dataset has data available from 22 Jan,2020 till 9th Feb 2020. The csv file is taken from WHO(World Health Organization).

What is Corona Virus(COVID-19)

Novel Coronavirus (COVID-19) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people

The GITHUB link to this Project is here.

Let’s begin with importing the data

require(dplyr)
require(highcharter)
require(ggplot2)
require(readr)
require(anytime) # to convert to date

nCOV<-read_csv("data/2019_nCoV_data.csv",col_names = TRUE)
nCOV$Date<-anydate(nCOV$Date)
#ommitting any NA values if there
nCOV<-na.omit(nCOV)

#chaiging the column name
colnames(nCOV)[3]="State"

#attaching the data frame
attach(nCOV)

Total deaths between that timeframe

summary(nCOV)
##       Sno              Date               State          
##  Min.   :   1.0   Min.   :2020-01-22   Length:837        
##  1st Qu.: 247.0   1st Qu.:2020-01-27   Class :character  
##  Median : 523.0   Median :2020-02-01   Mode  :character  
##  Mean   : 539.6   Mean   :2020-01-31                     
##  3rd Qu.: 829.0   3rd Qu.:2020-02-05                     
##  Max.   :1127.0   Max.   :2020-02-09                     
##    Country          Last Update          Confirmed           Deaths       
##  Length:837         Length:837         Min.   :    0.0   Min.   :  0.000  
##  Class :character   Class :character   1st Qu.:    2.0   1st Qu.:  0.000  
##  Mode  :character   Mode  :character   Median :   25.0   Median :  0.000  
##                                        Mean   :  342.4   Mean   :  7.319  
##                                        3rd Qu.:  132.0   3rd Qu.:  0.000  
##                                        Max.   :29631.0   Max.   :871.000  
##    Recovered      
##  Min.   :   0.00  
##  1st Qu.:   0.00  
##  Median :   0.00  
##  Mean   :  16.08  
##  3rd Qu.:   4.00  
##  Max.   :1795.00
sum(Deaths) #total deaths of 6126 confirmed between 22 Jan -  9th Feb
## [1] 6126
#Let's find sum of confirmed cases: 286573 people affected with nCOV worldwide between 22 Jan - 9 Feb
sum(Confirmed)
## [1] 286573
# let's see how many have recovered : out of 286573, only 13455 have recovered from it between 22 Jan - 9th Feb
sum(Recovered)
## [1] 13455

Let’s plot the data and make a time series plot of cases,deaths and recoveries in this time period:

# making a dataframe and grouping by date to get total deaths, confirmed cases and recoveris for that day
nCOVDate <- nCOV %>% 
  select(Date,Confirmed,Deaths,Recovered) %>% 
  group_by(Date) %>%
  summarise(nConfirmed=sum(Confirmed),nRecovery=sum(Recovered), nDeath=sum(Deaths)) 


hchart(nCOVDate, "spline", hcaes(x = Date,y = nConfirmed), name="Confirmed cases:",color="black") %>% 
  hc_exporting(enabled = TRUE) %>%
      hc_title(text="Increase in number of COVID-19 cases over time",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

Incraese in Deaths over time:

hchart(nCOVDate, "spline", hcaes(x = Date,y = nDeath), name="Confirmed deaths:",color="red") %>% 
  hc_exporting(enabled = TRUE) %>%
      hc_title(text="Increase in number of COVID-19 deaths over time",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

Let’s check the recoveries:

hchart(nCOVDate, "spline", hcaes(x = Date,y = nRecovery), name="Confirmed recoveries:",color="green") %>% 
  hc_exporting(enabled = TRUE) %>%
      hc_title(text="Increase in number of COVID-19 recoveries over time",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

Let’s make a data frame out of the above totals to understand better.

ncov_cases<-rbind(sum(Deaths),sum(Confirmed),sum(Recovered)) #combining by rows
ncov_cases <- as.data.frame(ncov_cases) #converting it to a data frame

#changing the names for better understanding
row.names(ncov_cases)<-c("Death","Confirmed","Recovered")
names(ncov_cases)[1]<-"Count"

ncov_cases
##            Count
## Death       6126
## Confirmed 286573
## Recovered  13455
#What is the recovery rate
recovery_rate<-(ncov_cases$Count[3]/ncov_cases$Count[2])*100 
recovery_rate
## [1] 4.695139
#hence, the recovery rate is pretty low i.e 4.69%

death_rate<-(100-recovery_rate)
death_rate
## [1] 95.30486

So we can notice that the recovery rate of perople affected with COVID-19 virus is approxinately 4.7 %, i.e only 4.7 % of people disgnosed with COVID-19 recovered and the rest died. Hence the death rate is 95.3 % for COVID-19.

#let's make a stacked line graph of all the 3 variables:

highchart() %>% 
  hc_xAxis(categories=nCOVDate$Date) %>% 
  hc_add_series(name="Deaths", data=nCOVDate$nDeath) %>% 
  hc_add_series(name="Recoveries",data=nCOVDate$nRecovery) %>% 
  hc_add_series(name="Confirmed Cases", data=nCOVDate$nConfirmed) %>% 
  hc_colors(c("red","green","blue")) %>% 
  hc_add_theme(hc_theme_elementary()) %>% 
  hc_exporting(enabled = TRUE) %>%
  hc_title(text="Analysis of count of deaths,recoveries and cases for COVID-19",align="center")

Time Series Analysis of Deaths and cases for Each majorly affected Countries

1) Mainland China
nCOVDateChina <- nCOV %>%
  filter(Country == "Mainland China") %>% 
  select(Date,Confirmed,Deaths,Recovered) %>% 
  group_by(Date) %>%
  summarise(nConfirmed=sum(Confirmed),nRecovery=sum(Recovered), nDeath=sum(Deaths)) 



 highchart() %>% 
        hc_xAxis(categories=nCOVDateChina$Date) %>% 
        hc_add_series(name = "Confirmed Cases", data = nCOVDateChina$nConfirmed) %>% 
        hc_add_series(name = "Recoveries", data = nCOVDateChina$nRecovery) %>%
        hc_add_series(name = "Deaths", data= nCOVDateChina$nDeath) %>% 
        #to add colors
        hc_colors(c("green","blue","red")) %>%
        hc_add_theme(hc_theme_elementary()) %>% 
        hc_exporting(enabled = TRUE) %>%
        hc_title(text="Analysis of count of deaths,recoveries and cases for COVID-19 in China",align="center")
2) United States
nCOVDateUS <- nCOV %>%
  filter(Country == "US") %>% 
  select(Date,Confirmed,Deaths,Recovered) %>% 
  group_by(Date) %>%
  summarise(nConfirmed=sum(Confirmed),nRecovery=sum(Recovered), nDeath=sum(Deaths)) 



 highchart() %>% 
        hc_xAxis(categories=nCOVDateUS$Date) %>% 
        hc_add_series(name = "Confirmed Cases", data = nCOVDateUS$nConfirmed) %>% 
        hc_add_series(name = "Recoveries", data = nCOVDateUS$nRecovery) %>%
        hc_add_series(name = "Deaths", data= nCOVDateUS$nDeath) %>% 
        #to add colors
        hc_colors(c("green","blue","red")) %>%
        hc_add_theme(hc_theme_elementary()) %>% 
        hc_exporting(enabled = TRUE) %>%
        hc_title(text="Analysis of count of deaths,recoveries and cases for COVID-19 in United States",align="center")
3) Hong Kong
nCOVDateHK <- nCOV %>%
  filter(Country == "Hong Kong") %>% 
  select(Date,Confirmed,Deaths,Recovered) %>% 
  group_by(Date) %>%
  summarise(nConfirmed=sum(Confirmed),nRecovery=sum(Recovered), nDeath=sum(Deaths)) 



 highchart() %>% 
        hc_xAxis(categories=nCOVDateUS$Date) %>% 
        hc_add_series(name = "Confirmed Cases", data = nCOVDateHK$nConfirmed) %>% 
        hc_add_series(name = "Recoveries", data = nCOVDateHK$nRecovery) %>%
        hc_add_series(name = "Deaths", data= nCOVDateHK$nDeath) %>% 
        #to add colors
        hc_colors(c("green","blue","red")) %>%
        hc_add_theme(hc_theme_elementary()) %>% 
        hc_exporting(enabled = TRUE) %>%
        hc_title(text="Analysis of count of deaths,recoveries and cases for COVID-19 in Hong Kong",align="center")
4) Macau
nCOVDateMacau <- nCOV %>%
  filter(Country == "Macau") %>% 
  select(Date,Confirmed,Deaths,Recovered) %>% 
  group_by(Date) %>%
  summarise(nConfirmed=sum(Confirmed),nRecovery=sum(Recovered), nDeath=sum(Deaths)) 



 highchart() %>% 
        hc_xAxis(categories=nCOVDateMacau$Date) %>% 
        hc_add_series(name = "Confirmed Cases", data = nCOVDateMacau$nConfirmed) %>% 
        hc_add_series(name = "Recoveries", data = nCOVDateMacau$nRecovery) %>%
        hc_add_series(name = "Deaths", data= nCOVDateMacau$nDeath) %>% 
        #to add colors
        hc_colors(c("green","blue","red")) %>%
        hc_add_theme(hc_theme_elementary()) %>% 
        hc_exporting(enabled = TRUE) %>%
        hc_title(text="Analysis of count of deaths,recoveries and cases for COVID-19 in Macau",align="center")
5) Canada
nCOVDateCanada <- nCOV %>%
  filter(Country == "Canada") %>% 
  select(Date,Confirmed,Deaths,Recovered) %>% 
  group_by(Date) %>%
  summarise(nConfirmed=sum(Confirmed),nRecovery=sum(Recovered), nDeath=sum(Deaths)) 



 highchart() %>% 
        hc_xAxis(categories=nCOVDateCanada$Date) %>% 
        hc_add_series(name = "Confirmed Cases", data = nCOVDateCanada$nConfirmed) %>% 
        hc_add_series(name = "Recoveries", data = nCOVDateCanada$nRecovery) %>%
        hc_add_series(name = "Deaths", data= nCOVDateCanada$nDeath) %>% 
        #to add colors
        hc_colors(c("green","blue","red")) %>%
        hc_add_theme(hc_theme_elementary()) %>% 
        hc_exporting(enabled = TRUE) %>%
        hc_title(text="Analysis of count of deaths,recoveries and cases for COVID-19 in Canada",align="center")
6) Australia
nCOVDateAUS <- nCOV %>%
  filter(Country == "Australia") %>% 
  select(Date,Confirmed,Deaths,Recovered) %>% 
  group_by(Date) %>%
  summarise(nConfirmed=sum(Confirmed),nRecovery=sum(Recovered), nDeath=sum(Deaths)) 



 highchart() %>% 
        hc_xAxis(categories=nCOVDateUS$Date) %>% 
        hc_add_series(name = "Confirmed Cases", data = nCOVDateAUS$nConfirmed) %>% 
        hc_add_series(name = "Recoveries", data = nCOVDateAUS$nRecovery) %>%
        hc_add_series(name = "Deaths", data= nCOVDateAUS$nDeath) %>% 
        #to add colors
        hc_colors(c("green","blue","red")) %>%
        hc_add_theme(hc_theme_elementary()) %>% 
        hc_exporting(enabled = TRUE) %>%
        hc_title(text="Analysis of count of deaths,recoveries and cases for COVID-19 in Australia",align="center")
7) Germany
nCOVDateGER <- nCOV %>%
  filter(Country == "Germany") %>% 
  select(Date,Confirmed,Deaths,Recovered) %>% 
  group_by(Date) %>%
  summarise(nConfirmed=sum(Confirmed),nRecovery=sum(Recovered), nDeath=sum(Deaths)) 



 highchart() %>% 
        hc_xAxis(categories=nCOVDateGER$Date) %>% 
        hc_add_series(name = "Confirmed Cases", data = nCOVDateGER$nConfirmed) %>% 
        hc_add_series(name = "Recoveries", data = nCOVDateGER$nRecovery) %>%
        hc_add_series(name = "Deaths", data= nCOVDateGER$nDeath) %>% 
        #to add colors
        hc_colors(c("green","blue","red")) %>%
        hc_add_theme(hc_theme_elementary()) %>% 
        hc_exporting(enabled = TRUE) %>%
        hc_title(text="Analysis of count of deaths,recoveries and cases for COVID-19 in Germany",align="center")

Let’s do some country wise analysis

Let’s first find out number of confirmed cases country wise. Obviously Mainland China is going to have the maximum number as it is the Epicenter of the breakout.

#let's check the number of nCOV cases per country
count_country<-nCOV %>% 
  group_by(Country) %>%
  select(Confirmed) %>% 
  summarise(nCases=sum(Confirmed))


  
#Mainland China has aroung 284905 confirmed cases out of total 286573 nCOV csases and this makes sense.
   
#let's plot these, and lets ingnore China 

countcountry1<-count_country %>% 
  filter(count_country$Country != "Mainland China") %>% 
  arrange(desc(nCases))
        

hchart(countcountry1 , type = "column", hcaes(x=countcountry1$Country, y= countcountry1$nCases ),color="purple",name="Count") %>% 
      hc_exporting(enabled = TRUE) %>%
      hc_title(text="Confirmed Cases grouped by Country",align="center") %>%
      hc_add_theme(hc_theme_elementary())

In in the above data frame we can notice that Mainland China has around 284905 confirmed cases out of total 286573 COVID-19 csases and this makes sense. That makes Mainland China having 99.4 % of the total confirmed cases.

Let’s find the number of Deaths occured in these countries.

count_deaths<-nCOV %>% 
  group_by(Country) %>% 
  select(Deaths) %>% 
  summarise(nDeaths=sum(Deaths)) %>% 
  arrange(desc(nDeaths))
  

count_deaths
## # A tibble: 10 x 2
##    Country        nDeaths
##    <chr>            <dbl>
##  1 Mainland China    6120
##  2 Hong Kong            6
##  3 Australia            0
##  4 Canada               0
##  5 China                0
##  6 Germany              0
##  7 Macau                0
##  8 Others               0
##  9 Taiwan               0
## 10 US                   0

As per this dataset between 22 Jan till 9th Feb all the deaths due to COVID-19 have only been in Mainland China and Hong Kong.

Let’s see how many have recovered in each Country

# Also adding the Recovery rates for each country
count_recovered<-nCOV %>% 
  group_by(Country) %>% 
  select(Recovered,Confirmed) %>% 
  summarise(nRecovered=sum(Recovered),RecoveryRate=(nRecovered/sum(Confirmed))*100) %>%
  arrange(desc(nRecovered))

Let’s plot the data:

hchart(count_recovered , type = "column", hcaes(x = count_recovered$Country , y = count_recovered$RecoveryRate),color="orange",name="Rate:") %>% 
      hc_exporting(enabled = TRUE) %>%
      hc_title(text="Recovery rate of each country out of total COVID-19 Cases",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

In the above plot we can notice that Australia has the highest Recovery rate followed by Mainland China. And other countries have recovery rates of 0.

Let’s Check the death rates in these countries now :

count_deaths<-nCOV %>% 
  group_by(Country) %>% 
  select(Recovered,Confirmed,Deaths) %>% 
  summarise(nRecovered=sum(Recovered),RecoveryRate=(nRecovered/sum(Confirmed))*100) %>%
  mutate(DeathRate=(100-RecoveryRate)) %>% 
  arrange(desc(DeathRate))


hchart(count_deaths , type = "column", hcaes(x = count_deaths$Country , y = count_deaths$DeathRate),color="black",name="Death Rate:") %>% 
      hc_exporting(enabled = TRUE) %>%
      hc_title(text="Death rate of each country out of total COVID-19 Cases",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

We can notice that in Countries like Canada,China,Germany,Hong Kong there have been no recoveries from COVID-19 virus, but only deaths. The death rate is lowest for Australia i.e, out of most people affected with COVID-19 virus, only 85% people died which is still a pretty large ratio. This explains the severity and how deadly this virus is.

Let’s check which province in China and Mainland China have been the most affected

StateCasesChina <- nCOV %>% 
  filter(Country == "Mainland China") %>% 
  group_by(State) %>% 
  select(Confirmed,Deaths,Recovered) %>% 
  summarise(nConfirmed=sum(Confirmed),nRecovered=sum(Recovered),nDeaths=sum(Deaths))

Let’s make a dataframe of States for all other countries as well.

AustraliaState <- nCOV %>% 
  filter(Country == "Australia") %>% 
  group_by(State) %>% 
  select(Confirmed,Deaths,Recovered) %>% 
  summarise(nConfirmed=sum(Confirmed),nRecovered=sum(Recovered),nDeaths=sum(Deaths))



USState <- nCOV %>% 
  filter(Country == "US") %>% 
  group_by(State) %>% 
  select(Confirmed,Deaths,Recovered) %>% 
  summarise(nConfirmed=sum(Confirmed),nRecovered=sum(Recovered),nDeaths=sum(Deaths))


CanadaState <- nCOV %>% 
  filter(Country == "Canada") %>% 
  group_by(State) %>% 
  select(Confirmed,Deaths,Recovered) %>% 
  summarise(nConfirmed=sum(Confirmed),nRecovered=sum(Recovered),nDeaths=sum(Deaths))


GermanyState<- nCOV %>% 
  filter(Country == "Germany") %>% 
  group_by(State) %>% 
  select(Confirmed,Deaths,Recovered) %>% 
  summarise(nConfirmed=sum(Confirmed),nRecovered=sum(Recovered),nDeaths=sum(Deaths))

Let’s now plot the Statewise data

1) States affected in Mainland China
hchart(StateCasesChina, "treemap", hcaes(x =State , value = nConfirmed,color = nConfirmed) ) %>% 
      hc_exporting(enabled = TRUE) %>% 
      hc_title(text="Tree map of States in China and confirmed Cases of COVID-19",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

We can notice in the above Tree map that most of the confirmed cases of COVID-19 virus are in State of Hubei whose capital city is Wuhan, as it the city where it all started.

Let’s see the most deaths were in which state:

hchart(StateCasesChina, "treemap", hcaes(x =State , value = nDeaths,color = nDeaths) ) %>% 
      hc_exporting(enabled = TRUE) %>% 
      hc_title(text="Tree map of States in China and confirmed deaths from COVID-19",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

As expected most of the deaths have been in Hubei state.

The most recoveries:

hchart(StateCasesChina, "treemap", hcaes(x =State , value = nRecovered,color = nRecovered) ) %>% 
      hc_exporting(enabled = TRUE) %>% 
      hc_title(text="Tree map of States in China and most recoveries",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

After Hubei state most recoveries have been in State of Zhejiand, Hunan and Guangdong.

2) United States
hchart(USState, "pie", hcaes(x =State , y = nConfirmed,color = nConfirmed), name="count:" ) %>% 
      hc_exporting(enabled = TRUE) %>% 
      hc_title(text="Tree map of States in US and confirmed Cases of COVID-19",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

We can notice that most confirmed cases were in Illinios(Chicago),followed by Santa Clara(California).

There have been no deaths in US during this time period and total of 3 people recovered, 2 in Chicago and 1 from Seattle.

3) Australia
hchart(AustraliaState, "pie", hcaes(x =State , y = nConfirmed,color = nConfirmed) ) %>% 
      hc_exporting(enabled = TRUE) %>% 
      hc_title(text="Tree map of States in Australia and confirmed Cases of COVID-19",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

For Australia most confirmed cases have been in New South Wales followed by Victoria. And 22 people have been recovered in New South Wales. No deaths in this duration of 22 Jan to 9th Feb 2020.

4) Canada
hchart(CanadaState, "pie", hcaes(x =State , y = nConfirmed,color = nConfirmed) ) %>% 
      hc_exporting(enabled = TRUE) %>% 
      hc_title(text="Tree map of States in Canada and confirmed Cases of COVID-19",align="center") %>%
      hc_add_theme(hc_theme_elementary()) 

For Canada we can notice that most confirmed cases were from British Columbia followed by Ontario state. There have been no fatalities or recoveries recorded between this time perios in Canada.

5) Germany

In Germany there have been only 20 confirmed cases that too only in state of Bavaria. No deaths or Recoveries recorded in this period(22 Jan to 9th Feb 2020).