Introduction

We will be analyzing COVID-19 statistics using the R package, “coronavirus”. The package provides the latest data by scrapping the data from Johns Hopkins University Center for Systems Science and Engineering (JHU-CCSE) Coronavirus repository. The code can be refreshed to obtain the latest information, provided that you installed the required packages. For any kind of collaboration or help reach me at

Installation of the coronavirus package

install.packages(“devtools”) devtools::install_github(“RamiKrispin/coronavirus”)

Loading the packages

library(coronavirus)
library(dplyr)
library(ggplot2)
library(lubridate)
library(countrycode)
## Warning: package 'countrycode' was built under R version 4.0.2
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.0.2
library(tidyr)
library(knitr)

Importing the latest data

data = refresh_coronavirus_jhu()
head(data)
##         date    location location_type location_code location_code_type
## 1 2020-02-08 Afghanistan       country            AF         iso_3166_2
## 2 2020-02-07 Afghanistan       country            AF         iso_3166_2
## 3 2020-05-04 Afghanistan       country            AF         iso_3166_2
## 4 2020-02-14 Afghanistan       country            AF         iso_3166_2
## 5 2020-02-15 Afghanistan       country            AF         iso_3166_2
## 6 2020-05-05 Afghanistan       country            AF         iso_3166_2
##       data_type value      lat     long
## 1     cases_new     0 33.93911 67.70995
## 2     cases_new     0 33.93911 67.70995
## 3 recovered_new    52 33.93911 67.70995
## 4     cases_new     0 33.93911 67.70995
## 5     cases_new     0 33.93911 67.70995
## 6 recovered_new    24 33.93911 67.70995
str(data)
## 'data.frame':    148365 obs. of  9 variables:
##  $ date              : chr  "2020-02-08" "2020-02-07" "2020-05-04" "2020-02-14" ...
##  $ location          : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ location_type     : chr  "country" "country" "country" "country" ...
##  $ location_code     : chr  "AF" "AF" "AF" "AF" ...
##  $ location_code_type: chr  "iso_3166_2" "iso_3166_2" "iso_3166_2" "iso_3166_2" ...
##  $ data_type         : chr  "cases_new" "cases_new" "recovered_new" "cases_new" ...
##  $ value             : int  0 0 52 0 0 24 0 14 0 0 ...
##  $ lat               : num  33.9 33.9 33.9 33.9 33.9 ...
##  $ long              : num  67.7 67.7 67.7 67.7 67.7 ...

Cleaning

Let’s change the date from character format to date format. We are ordering data based on date. The reason is that it is necessary to perform time series analysis. At last, we are going to change the name of the variables in the data_type column. For eg. “cases” in place of “cases_new”.

# Converting Character to Date
data$date = ymd(data$date)
# Ordering the data
data = data[order(data$date, decreasing = F),]
# Renaming the values in data_type column
data$data_type <- gsub('cases_new', 'cases', data$data_type)
data$data_type <- gsub('deaths_new', 'deaths', data$data_type)
data$data_type <- gsub('recovered_new', 'recovered', data$data_type)
options(scipen = 999) # Removing scientific notation in plots

Analysis

Let’s see the total number of cases, recovered and deaths in the world.

data %>% 
  group_by(data_type) %>%
  summarise(total = sum(value)) %>% 
  kable(digits = 2, format = "html", row.names = TRUE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
data_type total
1 cases 16681996
2 deaths 659374
3 recovered 9711187

Time series analysis of COVID-19 cases in the world.

data %>% 
    group_by(date, data_type) %>%
    summarise(count = sum(value)) %>% 
    ggplot()+
    aes(date, count, color = data_type)+
    geom_line(size = 1, alpha = 0.9)+
    scale_color_manual(values=c("blue", "red", "green"))+
    theme_minimal()+
    theme(legend.title = element_blank())+
    labs(x = "", y = "", title = "Daily New Cases, Deaths and Recovery", subtitle = today())

Continent wise data

We will use “countrycode” package to assign the continent name to the countries in the dataset. We will make a separate column for continent

countrydata = filter(data, location_type == 'country')
countrydata$continent = countrycode(sourcevar = countrydata[, "location"],
                               origin = "country.name",
                               destination = "continent")
## Warning in countrycode(sourcevar = countrydata[, "location"], origin = "country.name", : Some values were not matched unambiguously: Diamond Princess, Kosovo, MS Zaandam
head(countrydata)
##         date    location location_type location_code location_code_type
## 1 2020-01-22 Afghanistan       country            AF         iso_3166_2
## 2 2020-01-22 Afghanistan       country            AF         iso_3166_2
## 3 2020-01-22 Afghanistan       country            AF         iso_3166_2
## 4 2020-01-22     Albania       country            AL         iso_3166_2
## 5 2020-01-22     Albania       country            AL         iso_3166_2
## 6 2020-01-22     Albania       country            AL         iso_3166_2
##   data_type value      lat     long continent
## 1     cases     0 33.93911 67.70995      Asia
## 2 recovered     0 33.93911 67.70995      Asia
## 3    deaths     0 33.93911 67.70995      Asia
## 4 recovered     0 41.15330 20.16830    Europe
## 5     cases     0 41.15330 20.16830    Europe
## 6    deaths     0 41.15330 20.16830    Europe
# Removing cruise ships data and assigning Kosovo to Europe
countrydata$continent[countrydata$location == "Kosovo"] = "Europe"
countrydata = countrydata[- grep("Diamond Princess", countrydata$location),]
countrydata = countrydata[- grep("MS Zaandam", countrydata$location),]
# Cumulative Cases Continent wise
continent_df = countrydata %>%
    group_by(continent, data_type) %>%
    mutate(cumvalues = cumsum(value)) %>%
    select(date,continent, data_type, cumvalues)
# Plot
ggplot(continent_df)+
    aes(date, cumvalues, color = data_type)+
    geom_line(size = 1)+
    facet_wrap(~continent, scales = "free_y")+
    theme_minimal()+
    labs(x = "", y = "Cumulative Value", title = "Situtaiton of COVID-19 in different continents", subtitle = today())+
    scale_color_discrete(name="")+
    theme(legend.background = element_rect(fill="#fcfcfc", size=.5, linetype="dotted"), legend.position = "bottom", legend.title = element_blank()) +
    scale_color_manual(values=c("blue", "red", "green"))
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

Country and province wise data

df = data %>%
  group_by(location, data_type) %>%
  summarise(total = sum(value)) %>%
  pivot_wider(names_from =  data_type, values_from = total) %>%
  mutate(active = cases - ifelse(is.na(recovered), 0, recovered) - ifelse(is.na(deaths), 0, deaths)) %>%
  arrange(-cases) %>%
  ungroup() %>%
  mutate(location = if_else(location == "United Arab Emirates", "UAE", location)) %>%
  mutate(location = if_else(location == "Mainland China", "China", location)) %>%
  mutate(location = if_else(location == "North Macedonia", "N.Macedonia", location)) %>%
  mutate(location = trimws(location)) %>%
  mutate(location = factor(location, levels = location))
df %>% kable(digits = 2, format = "html", row.names = TRUE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T,
                font_size = 15) %>%
   scroll_box(height = "300px")
location cases deaths recovered active
1 US 4351997 149256 1355363 2847378
2 Brazil 2483191 88539 1868749 525903
3 India 1483156 33425 952743 496988
4 Russia 822060 13483 611109 197468
5 South Africa 459761 7257 287313 165191
6 Mexico 402697 44876 308142 49679
7 Peru 389717 18418 276452 94847
8 Chile 349800 9240 322332 18228
9 United Kingdom 300658 45878 0 254780
10 Iran 296273 16147 257019 23107
11 Spain 280610 28436 150376 101798
12 Pakistan 275225 5865 242436 26924
13 Saudi Arabia 270831 2789 225624 42418
14 Colombia 267385 9074 136690 121621
15 Italy 246488 35123 198756 12609
16 Bangladesh 229185 3000 127414 98771
17 Turkey 227982 5645 211561 10776
18 France 209342 30109 71667 107566
19 Germany 207707 9131 190711 7865
20 Argentina 173355 3179 75083 95093
21 Iraq 115332 4535 81062 29735
22 Qatar 109880 167 106603 3110
23 Indonesia 102051 4901 60539 36611
24 Egypt 92947 4691 35959 52297
25 Kazakhstan 86192 793 56638 28761
26 Philippines 83673 1947 26617 55109
27 Ecuador 82279 5584 35283 41412
28 Sweden 79494 5702 0 73792
29 Oman 77904 402 58587 18915
30 Bolivia 72327 2720 21971 47636
31 Hubei, China 68135 4512 63623 0
32 Ukraine 68030 1650 37852 28528
33 Belarus 67366 543 60669 6154
34 Belgium 66662 9833 17476 39353
35 Israel 66293 486 32182 33625
36 Kuwait 65149 442 55681 9026
37 Dominican Republic 64690 1101 32014 31575
38 Panama 62223 1349 36181 24693
39 UAE 59546 347 52905 6294
40 Quebec, Canada 58897 5670 NA 53227
41 Netherlands 53374 6145 0 47229
42 Singapore 51197 27 45893 5277
43 Portugal 50410 1722 35626 13062
44 Romania 47053 2239 26128 18686
45 Guatemala 46451 1782 33494 11175
46 Poland 43904 1682 33043 9179
47 Nigeria 41804 868 18764 22172
48 Ontario, Canada 40787 2812 NA 37975
49 Honduras 40460 1214 5103 34143
50 Bahrain 39921 141 36531 3249
51 Armenia 37629 719 27357 9553
52 Afghanistan 36368 1270 25358 9740
53 Switzerland 34609 1978 31000 1631
54 Ghana 34406 168 30621 3617
55 Kyrgyzstan 33844 1329 22296 10219
56 Japan 32116 1001 22636 8479
57 Azerbaijan 30858 430 23873 6555
58 Algeria 28615 1174 19233 8208
59 Ireland 25929 1764 23364 801
60 Serbia 24520 551 0 23969
61 Moldova 23521 753 16462 6306
62 Uzbekistan 21699 124 12026 9549
63 Morocco 21387 327 17066 3994
64 Austria 20677 713 18379 1585
65 Nepal 19063 49 13875 5139
66 Kenya 18581 299 7908 10374
67 Cameroon 17179 391 14539 2249
68 Venezuela 16571 151 10195 6225
69 Costa Rica 16344 125 3920 12299
70 Czechia 15799 374 11428 3997
71 Cote d’Ivoire 15713 98 10537 5078
72 El Salvador 15446 417 7903 7126
73 Ethiopia 15200 239 6526 8435
74 South Korea 14251 300 13069 882
75 Denmark 13577 613 12451 513
76 Sudan 11496 725 6001 4770
77 West Bank and Gaza 10938 79 3752 7107
78 Bulgaria 10871 355 5766 4750
79 Bosnia and Herzegovina 10766 297 5220 5249
80 Alberta, Canada 10470 187 NA 10283
81 N.Macedonia 10315 471 5663 4181
82 Madagascar 10104 93 6613 3398
83 Senegal 9805 198 6591 3016
84 Victoria, Australia 9304 92 4123 5089
85 Norway 9150 255 8752 143
86 Malaysia 8943 124 8607 212
87 Congo (Kinshasa) 8873 208 5930 2735
88 Kosovo 7652 192 4129 3331
89 French Guiana, France 7562 43 6106 1413
90 Finland 7404 329 6920 155
91 Haiti 7340 158 4365 2817
92 Tajikistan 7276 60 6065 1151
93 Gabon 7189 49 4682 2458
94 Guinea 7126 46 6312 768
95 Luxembourg 6375 113 4855 1407
96 Mauritania 6249 156 4683 1410
97 Djibouti 5068 58 4992 18
98 Zambia 5002 142 3195 1665
99 Albania 4997 148 2789 2060
100 Croatia 4923 140 4034 749
101 Paraguay 4674 45 3039 1590
102 Central African Republic 4599 59 1546 2994
103 Hungary 4456 596 3331 529
104 Greece 4279 203 1374 2702
105 Lebanon 4023 54 1710 2259
106 New South Wales, Australia 3718 49 2989 680
107 Malawi 3709 103 1667 1939
108 Nicaragua 3672 116 2492 1064
109 British Columbia, Canada 3523 194 NA 3329
110 Maldives 3506 15 2547 944
111 Thailand 3297 58 3111 128
112 Somalia 3212 93 1562 1557
113 Congo (Brazzaville) 3200 54 829 2317
114 Equatorial Guinea 3071 51 842 2178
115 Libya 3017 67 579 2371
116 Montenegro 2949 45 839 2065
117 Mayotte, France 2900 38 2672 190
118 Hong Kong, China 2884 23 1527 1334
119 Zimbabwe 2817 40 604 2173
120 Sri Lanka 2810 11 2296 503
121 Cuba 2555 87 2352 116
122 Mali 2520 124 1919 477
123 Eswatini 2404 39 1025 1340
124 Cabo Verde 2354 22 1616 716
125 South Sudan 2305 46 1175 1084
126 Slovakia 2204 28 1644 532
127 Slovenia 2101 117 1742 242
128 Estonia 2038 69 1924 45
129 Lithuania 2027 80 1623 324
130 Guinea-Bissau 1954 26 803 1125
131 Rwanda 1926 5 1005 916
132 Namibia 1917 8 104 1805
133 Iceland 1857 10 1823 24
134 Sierra Leone 1786 66 1336 384
135 Benin 1770 35 1036 699
136 Mozambique 1720 11 602 1107
137 Yemen 1703 484 840 379
138 Guangdong, China 1674 8 1648 18
139 New Zealand 1559 22 1514 23
140 Suriname 1510 24 965 521
141 Tunisia 1468 50 1168 250
142 Henan, China 1276 22 1254 0
143 Zhejiang, China 1270 1 1268 1
144 Latvia 1220 31 1052 137
145 Saskatchewan, Canada 1218 17 NA 1201
146 Uruguay 1218 35 958 225
147 Jordan 1182 11 1042 129
148 Liberia 1177 72 646 459
149 Georgia 1145 16 927 202
150 Uganda 1135 2 989 144
151 Niger 1132 69 1027 36
152 Burkina Faso 1105 53 926 126
153 Queensland, Australia 1078 6 1063 9
154 Cyprus 1067 19 852 196
155 Nova Scotia, Canada 1067 63 NA 1004
156 Hunan, China 1019 4 1015 0
157 Angola 1000 47 266 687
158 Anhui, China 991 6 985 0
159 Heilongjiang, China 947 13 934 0
160 Beijing, China 932 9 896 27
161 Jiangxi, China 932 1 931 0
162 Chad 926 75 810 41
163 Andorra 907 52 803 52
164 Togo 896 18 612 266
165 Sao Tome and Principe 867 14 759 94
166 Jamaica 855 10 724 121
167 Shandong, China 799 7 785 7
168 Shanghai, China 744 7 720 17
169 Botswana 739 2 63 674
170 Diamond Princess 712 13 651 48
171 Malta 708 9 665 34
172 San Marino 699 42 657 0
173 Syria 694 40 220 434
174 Western Australia, Australia 661 9 647 5
175 Reunion, France 657 4 592 61
176 Jiangsu, China 655 0 654 1
177 Sichuan, China 604 3 596 5
178 Channel Islands, United Kingdom 587 47 533 7
179 Chongqing, China 583 6 577 0
180 Tanzania 509 21 183 305
181 Lesotho 505 12 128 365
182 Taiwan* 467 7 440 20
183 South Australia, Australia 448 4 441 3
184 Bahamas 447 11 91 345
185 Vietnam 446 0 369 77
186 Manitoba, Canada 405 8 NA 397
187 Xinjiang, China 400 3 75 322
188 Guyana 396 20 181 195
189 Burundi 378 1 301 76
190 Fujian, China 366 1 361 4
191 Comoros 354 7 328 19
192 Burma 351 6 293 52
193 Hebei, China 349 6 343 0
194 Mauritius 344 10 332 2
195 Isle of Man, United Kingdom 336 24 312 0
196 Gambia 326 8 66 252
197 Shaanxi, China 323 3 318 2
198 Mongolia 291 0 225 66
199 Martinique, France 269 15 98 156
200 Newfoundland and Labrador, Canada 266 3 NA 263
201 Eritrea 265 0 191 74
202 Inner Mongolia, China 258 1 244 13
203 Guangxi, China 255 2 253 0
204 Tasmania, Australia 229 13 215 1
205 Cambodia 226 0 147 79
206 Faroe Islands, Denmark 220 0 188 32
207 Liaoning, China 217 2 160 55
208 Tianjin, China 204 3 195 6
209 Cayman Islands, United Kingdom 203 1 202 0
210 Guadeloupe, France 203 14 176 13
211 Shanxi, China 201 0 201 0
212 Yunnan, China 190 2 186 2
213 Gibraltar, United Kingdom 186 0 180 6
214 Hainan, China 171 6 165 0
215 New Brunswick, Canada 170 2 NA 168
216 Gansu, China 167 2 165 0
217 Jilin, China 157 2 153 2
218 Bermuda, United Kingdom 156 9 141 6
219 Trinidad and Tobago 153 8 128 17
220 Guizhou, China 147 2 145 0
221 Brunei 141 3 138 0
222 Aruba, Netherlands 119 3 102 14
223 Monaco 117 4 104 9
224 Seychelles 114 0 39 75
225 Sint Maarten, Netherlands 114 15 63 36
226 Australian Capital Territory, Australia 113 3 109 1
227 Barbados 110 7 94 9
228 Bhutan 99 0 86 13
229 Turks and Caicos Islands, United Kingdom 99 2 37 60
230 Liechtenstein 87 1 81 5
231 Antigua and Barbuda 86 3 65 18
232 Ningxia, China 75 0 75 0
233 Papua New Guinea 63 0 11 52
234 French Polynesia, France 62 0 60 2
235 Saint Vincent and the Grenadines 52 0 39 13
236 St Martin, France 49 3 41 5
237 Belize 48 2 27 19
238 Macau, China 46 0 46 0
239 Prince Edward Island, Canada 36 0 NA 36
240 Northern Territory, Australia 31 0 30 1
241 Curacao, Netherlands 29 1 24 4
242 Fiji 27 0 18 9
243 Saint Lucia 24 0 22 2
244 Timor-Leste 24 0 24 0
245 Grenada 23 0 23 0
246 New Caledonia, France 22 0 22 0
247 Laos 20 0 19 1
248 Dominica 18 0 18 0
249 Qinghai, China 18 0 18 0
250 Saint Kitts and Nevis 17 0 15 2
251 Greenland, Denmark 14 0 13 1
252 Yukon, Canada 14 0 NA 14
253 Falkland Islands (Malvinas), United Kingdom 13 0 13 0
254 Grand Princess, Canada 13 0 NA 13
255 Holy See 12 0 12 0
256 Montserrat, United Kingdom 12 1 10 1
257 Bonaire and Sint Eustatius and Saba, Netherlands 11 0 7 4
258 Western Sahara 10 1 8 1
259 MS Zaandam 9 2 0 7
260 British Virgin Islands, United Kingdom 8 1 7 0
261 Saint Barthelemy, France 7 0 6 1
262 Northwest Territories, Canada 5 0 NA 5
263 Saint Pierre and Miquelon, France 4 0 1 3
264 Anguilla, United Kingdom 3 0 3 0
265 Tibet, China 1 0 1 0
266 Diamond Princess, Canada 0 1 NA -1
267 Canada NA NA 101686 NA
df %>% 
    filter(cases > 5000) %>%
    ggplot()+
    geom_text(aes(cases, deaths, label = location), check_overlap = T)+
    theme_minimal()+
    labs(title = "Cases vs Deaths plot", subtitle = today())

Bibliography

  1. CSSEGISandData. (2020). CSSEGISandData/COVID-19. https://github.com/CSSEGISandData/COVID-19 (Original work published 2020)
  2. Krispin, R. (2020). RamiKrispin/coronavirus [R]. https://github.com/RamiKrispin/coronavirus (Original work published 2020)