Status of Coronavirus (Latest Data: Feb 17, 2020)

Dataset Source: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset

library(readr)
corona <- read_csv("R:/Datasets/novel-corona-virus-2019-dataset/2019_nCoV_data.csv")
## Parsed with column specification:
## cols(
##   Sno = col_double(),
##   Date = col_character(),
##   `Province/State` = col_character(),
##   Country = col_character(),
##   `Last Update` = col_character(),
##   Confirmed = col_double(),
##   Deaths = col_double(),
##   Recovered = col_double()
## )

Data

Viewing the head of the table

head(corona)
## # A tibble: 6 x 8
##     Sno Date   `Province/State` Country `Last Update` Confirmed Deaths Recovered
##   <dbl> <chr>  <chr>            <chr>   <chr>             <dbl>  <dbl>     <dbl>
## 1     1 01/22… Anhui            China   01/22/2020 1…         1      0         0
## 2     2 01/22… Beijing          China   01/22/2020 1…        14      0         0
## 3     3 01/22… Chongqing        China   01/22/2020 1…         6      0         0
## 4     4 01/22… Fujian           China   01/22/2020 1…         1      0         0
## 5     5 01/22… Gansu            China   01/22/2020 1…         0      0         0
## 6     6 01/22… Guangdong        China   01/22/2020 1…        26      0         0

Quick Discriptive Analysis

We will do a quick descriptive analysis of the data. We will find the total number of confirmed, recovered and deaths in the world. We will also find maximum of number of each cases.

sum(corona$Confirmed) #Total Confirmed Cases
## [1] 781452
sum(corona$Deaths) #Total Death Cases
## [1] 17949
sum(corona$Recovered) #Total of patients recovered
## [1] 76258
max(corona$Deaths)
## [1] 1789
max (corona$Recovered)
## [1] 7862
max (corona$Confirmed)
## [1] 59989

Graphs

Let us graphically view the number of confirmed cases in differnt countries. We will use ggplot2 to construct a barchart.

library(ggplot2)
options(scipen = 1) #To remove scientific notation
ggplot(corona)+ 
    aes(Country, Confirmed)+ 
    geom_bar(stat = "identity")+ 
    coord_flip()

Preparing the data

We can see the cases in Mainland China is so great that it makes visualizing the cases in other countries very difficult. Therefore, we will exlude Mainland China and China from the graph and analyse separately.

Removing the rows containing “Mainland China” and “China” from the dataset. We create a dataset named “world” that contains datas from other countries besides China.

world = corona [!(corona$Country == "Mainland China" | corona$Country == "China"),]

We will again aggregate the data for easier analayis.

world_confirm = aggregate(world$Confirmed~world$Country, FUN = sum)
colnames(world_confirm)= c("Country", "Confirmed")
(head (world_confirm))
##     Country Confirmed
## 1 Australia       284
## 2   Belgium        14
## 3    Brazil         0
## 4  Cambodia        22
## 5    Canada       116
## 6     Egypt         4

Visualizing the world data

ggplot(world_confirm)+ 
    aes(reorder (Country, Confirmed), Confirmed)+ 
    geom_bar(stat = "identity", fill = "Purple", color = "white")+
    geom_text(aes(Country, Confirmed, label = Confirmed), size = 3.5, hjust = -0.3)+ 
    coord_flip()+ 
    xlab ("Country")

When the data was analysed (Feb 17, 2020), the most number of confirmed cases outside China was in Singapore, Hongkong anbd Japan. Now it is time to look at the death cases in the world (excluding china). We will prepare the data for analysis and see the data in the graph. Similarly, we will also see the graph for recovered cases.

world_death = aggregate(world$Deaths~world$Country, FUN = sum) #Creating a separate data
colnames(world_death)= c("Country", "Deaths") #Changing column names
head (world_death)
##     Country Deaths
## 1 Australia      0
## 2   Belgium      0
## 3    Brazil      0
## 4  Cambodia      0
## 5    Canada      0
## 6     Egypt      0

Since there are countries that have zero deaths, we want to exclude it from the graph. Therefore, we are going to use dplyr package that selects only cases that have greater than zero deaths.

library(dplyr)
library (kableExtra)
world_death = filter(world_death, world_death$Deaths > 0)
kable(world_death) %>%
  kable_styling()
Country Deaths
France 3
Hong Kong 14
Japan 5
Philippines 17
Taiwan 2

China

It is now time to analyse the Coronavius cases in Mainland China.

First we combine the data, “Mainland China” and “China” into a single table, while excluding others.

china = filter(corona, corona$Country == "China" | corona$Country == "Mainland China")

Sorting the data according to the number of confirmed cases.

china = filter(corona, corona$Country == "China" | corona$Country == "Mainland China")
confirmed = aggregate(china$Confirmed~china$`Province/State`, FUN = sum)
deaths = aggregate(china$Deaths~china$`Province/State`, FUN = sum)
recovered = aggregate(china$Recovered~china$`Province/State`, FUN = sum)
province = cbind.data.frame(confirmed, deaths$`china$Deaths`, recovered$`china$Recovered`)
colnames(province) = c ("Province/State", "Confirmed", "Deaths", "Recovered")
colnames(deaths)= c ("Province", "Deaths")
 kable(province) %>%
kable_styling() %>%
  scroll_box(width = "800px", height = "300px")
Province/State Confirmed Deaths Recovered
Anhui 13622 46 1729
Beijing 5953 42 876
Chongqing 8708 50 1340
Fujian 4614 1 614
Gansu 1424 19 374
Guangdong 20084 19 3389
Guangxi 3709 16 414
Guizhou 1806 13 297
Hainan 2425 48 361
Hebei 3917 42 747
Heilongjiang 5405 117 470
Henan 18153 126 3267
Hong Kong 0 0 0
Hubei 589921 17228 44863
Hunan 15347 19 3564
Inner Mongolia 999 0 77
Jiangsu 8603 0 1538
Jiangxi 13057 9 1906
Jilin 1243 11 227
Liaoning 1978 6 262
Macau 1 0 0
Ningxia 930 0 245
Qinghai 324 0 93
Shaanxi 3676 0 500
Shandong 7934 17 1239
Shanghai 5382 24 965
Shanxi 1986 0 402
Sichuan 7324 24 1171
Taiwan 1 0 0
Tianjin 1738 25 227
Tibet 20 0 6
Xinjiang 903 6 52
Yunnan 2721 0 331
Zhejiang 19592 0 3894

Now let’s visualize. Before visualizing, we are sure that the barlenght of Hubei province will completely overshadow cases in other provinces. We will just visualize the deaths.

ggplot(deaths)+ 
    aes(deaths$Province, deaths$Deaths, fill = deaths$Province)+ 
    geom_bar(stat = "identity")+ 
  labs(x = "Deaths", y = "Province")+
    coord_flip()

Let’s see the graph without Hubei province.

death=deaths[-c(14),] #Eliminating Hubei Province at 14th row. 
ggplot(death)+ 
  aes(death$Province, death$Deaths)+ 
  geom_bar(stat = "identity")+
  labs(x = "Total Deaths", y = "Province/States")+
  coord_flip()

Thank you for your time. I will add more analysis about coronavirus in future.