We have chosen to investigate the coronavirus data set provided by John Hopkins University which themselves source the data from DXY, an online platform run by members of the Chinese medical community.

We combine this data with data sourced through the API of Numbeo which provides country information on various indicators, including health care and quality of life.

Our goal is to investigate wether we can identify any correlation between any of these indicators and either the mortality rate (i.e. death per confirmed case) and the death per capita.

Loading required libraries and downloading data

We load all required libraries:

library(httr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
library(date)
library(tidyverse)
library(jsonlite)

We download three seperate CSV (for confirmed cases, recoveries and deaths) globally from the John Hopkins GitHub account and save them as data frames:

urlRemote  <- "https://raw.githubusercontent.com/"
pathGithub <- "CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/"

fileConfirmGlobal <- "time_series_covid19_confirmed_global.csv"
# fileConfirmUS <- "time_series_covid19_confirmed_US.csv"
fileDeathsGlobal <- "time_series_covid19_deaths_global.csv"
# fileDeathsUS <- "time_series_covid19_deaths_US.csv"
fileRecovGlobal <- "time_series_covid19_recovered_global.csv"


confsGlobal <- read.csv(paste0(urlRemote, pathGithub, fileConfirmGlobal),header = TRUE)
# confsUS <- read.csv(paste0(urlRemote, pathGithub, fileConfirmUS),header = TRUE)
deathsGlobal <- read.csv(paste0(urlRemote, pathGithub, fileDeathsGlobal),header = TRUE)
# deathsUS <- read.csv(paste0(urlRemote, pathGithub, fileDeathsUS),header = TRUE)
recovsGlobal<- read.csv(paste0(urlRemote, pathGithub, fileRecovGlobal),header = TRUE)

Download a table that will enable us to merge a unique country iso code and the population so that we can combine the data with the numbeo API data.

urlRemote  <- "https://raw.githubusercontent.com/"
pathGithub <- "chilleundso/DATA607/master/Final_Project/"

countryISOfile <- "UID_ISO_FIPS_LookUp_Table.csv"

countryISOfull <- read.csv(paste0(urlRemote, pathGithub, countryISOfile),header = TRUE)

Data Clean-up

As we can see in the above table the data is provide in a long format. In the below section we transform all 3 data frames to take long format:

#Creating a long table using gather
confs <- tidyr::gather(confsGlobal, "Date", "Amount", -Province.State, -Country.Region, -Lat, -Long)
confs <- confs %>%
  select(Country.Region, Date, Amount,) %>%
  group_by(Country.Region, Date) %>%
  summarise(Amount = sum(Amount))
#transforming the date column into a date structure:
confs$Date<-as.Date(confs$Date,format='X%m.%d.%y')
#adding a status label to flag data belongs to confirmed data set
confs$Status <- 'confirmed'

#Creating a long table using gather
recovs <- tidyr::gather(recovsGlobal, "Date", "Amount", -Province.State, -Country.Region, -Lat, -Long)
recovs <- recovs %>%
select(Country.Region, Date, Amount,) %>%
group_by(Country.Region, Date) %>%
summarise(Amount = sum(Amount))
#transforming the date column into a date structure:
recovs$Date<-as.Date(recovs$Date,format='X%m.%d.%y')
#adding a status label to flag data belongs to recovered data set
recovs$Status <- 'recovered'

#Creating a long table using gather
deaths <- tidyr::gather(deathsGlobal, "Date", "Amount", -Province.State, -Country.Region, -Lat, -Long)
deaths <- deaths %>%
select(Country.Region, Date, Amount,) %>%
group_by(Country.Region, Date) %>%
summarise(Amount = sum(Amount))
#transforming the date column into a date structure:
deaths$Date<-as.Date(deaths$Date,format='X%m.%d.%y')
#adding a status label to flag data belongs to deaddata set
deaths$Status <- 'dead'

kable(head(confs))
Country.Region Date Amount Status
Afghanistan 2020-01-22 0 confirmed
Afghanistan 2020-01-23 0 confirmed
Afghanistan 2020-01-24 0 confirmed
Afghanistan 2020-01-25 0 confirmed
Afghanistan 2020-01-26 0 confirmed
Afghanistan 2020-01-27 0 confirmed
#retrieve the latest date for each file
require(data.table)

latestconfs <- as.data.table(confs)
latestconfs <- latestconfs[latestconfs[, .I[Date == max(Date)], by=Country.Region]$V1]

latestrecovs <- as.data.table(recovs)
latestrecovs <- latestrecovs[latestrecovs[, .I[Date == max(Date)], by=Country.Region]$V1]

latestdeaths <- as.data.table(deaths)
latestdeaths <- latestdeaths[latestdeaths[, .I[Date == max(Date)], by=Country.Region]$V1]
#we stack all three data sets underneath each other
latestmaster <- rbind(latestconfs,latestrecovs,latestdeaths)
#we then widen the data set to have three new columns one for each amount of confirmed, recovered and dead cases:
latestmasterwide <- tidyr::spread(latestmaster,Status,Amount)

We merge the wide master data set with the country code and population

master <- merge(latestmasterwide, countryISOfull, by.x = "Country.Region", by.y = "Combined_Key",all.x = TRUE)

kable(head(master)) 
Country.Region Date confirmed dead recovered iso2 iso3 Population
Afghanistan 2020-05-09 4033 115 502 AF AFG 38928341
Albania 2020-05-09 856 31 627 AL ALB 2877800
Algeria 2020-05-09 5558 494 2546 DZ DZA 43851043
Andorra 2020-05-09 754 48 545 AD AND 77265
Angola 2020-05-09 43 2 13 AO AGO 32866268
Antigua and Barbuda 2020-05-09 25 3 19 AG ATG 97928

We add the death per confirmed case and a death per 10’000 inhabitants measure

master$death_per_10kcapita <- master$dead / master$Population *10000
master$death_per_confirmed <- master$dead / master$confirmed

Merge with numbeo data

request <- fromJSON("https://numbeo.com/api/rankings_by_country_historical?api_key=x2mmjp3a6651i3&section=12")
quality_life <- request$`2020`

obesity <- read.csv("https://raw.githubusercontent.com/pmalo46/DATA607-1/master/Final_Project/obesity.csv")
indices <- left_join(quality_life, obesity, by = "country")

kable(head(indices))
country traffic_time_index quality_of_life_index healthcare_index purchasing_power_incl_rent_index iso3166_country_code house_price_to_income_ratio pollution_index climate_index safety_index cpi_index obesity_rate
Denmark 28.84967 192.6729 79.99861 100.88455 DK 7.453022 21.32568 81.80102 74.89569 83.00263 19.7
Switzerland 29.08889 192.0097 72.43725 119.53407 CH 8.678198 22.39336 79.23562 78.40138 122.39881 19.5
Finland 29.90210 190.2231 75.79191 99.93179 FI 8.348263 11.54766 58.86849 76.67501 70.28992 22.2
Australia 34.73191 186.2143 77.38250 107.30969 AU 7.521587 23.46116 92.70246 58.63757 73.53912 29.0
Netherlands 29.43206 183.6697 74.64593 90.72990 NL 7.510502 27.40537 87.60780 72.37625 73.74621 20.4
Austria 26.27285 182.4979 78.72762 82.38178 AT 10.881314 22.18992 77.30450 76.27022 70.38106 20.1
names(indices) 
##  [1] "country"                          "traffic_time_index"              
##  [3] "quality_of_life_index"            "healthcare_index"                
##  [5] "purchasing_power_incl_rent_index" "iso3166_country_code"            
##  [7] "house_price_to_income_ratio"      "pollution_index"                 
##  [9] "climate_index"                    "safety_index"                    
## [11] "cpi_index"                        "obesity_rate"

Change name of coutry code column to merge with master

names(indices)[names(indices) == "iso3166_country_code"] <- "iso2"
names(indices)
##  [1] "country"                          "traffic_time_index"              
##  [3] "quality_of_life_index"            "healthcare_index"                
##  [5] "purchasing_power_incl_rent_index" "iso2"                            
##  [7] "house_price_to_income_ratio"      "pollution_index"                 
##  [9] "climate_index"                    "safety_index"                    
## [11] "cpi_index"                        "obesity_rate"

Merge the two data sets

df_merged <- left_join(indices, master, by = "iso2")
kable(head(df_merged))
country traffic_time_index quality_of_life_index healthcare_index purchasing_power_incl_rent_index iso2 house_price_to_income_ratio pollution_index climate_index safety_index cpi_index obesity_rate Country.Region Date confirmed dead recovered iso3 Population death_per_10kcapita death_per_confirmed
Denmark 28.84967 192.6729 79.99861 100.88455 DK 7.453022 21.32568 81.80102 74.89569 83.00263 19.7 Denmark 2020-05-09 10517 526 8291 DNK 5792203 0.9081173 0.0500143
Switzerland 29.08889 192.0097 72.43725 119.53407 CH 8.678198 22.39336 79.23562 78.40138 122.39881 19.5 Switzerland 2020-05-09 30251 1830 26400 CHE 8654618 2.1144781 0.0604939
Finland 29.90210 190.2231 75.79191 99.93179 FI 8.348263 11.54766 58.86849 76.67501 70.28992 22.2 Finland 2020-05-09 5880 265 4000 FIN 5540718 0.4782774 0.0450680
Australia 34.73191 186.2143 77.38250 107.30969 AU 7.521587 23.46116 92.70246 58.63757 73.53912 29.0 Australia 2020-05-09 6939 97 6141 AUS 25459700 0.0380994 0.0139790
Netherlands 29.43206 183.6697 74.64593 90.72990 NL 7.510502 27.40537 87.60780 72.37625 73.74621 20.4 Netherlands 2020-05-09 42581 5441 149 NLD 17134873 3.1753956 0.1277800
Austria 26.27285 182.4979 78.72762 82.38178 AT 10.881314 22.18992 77.30450 76.27022 70.38106 20.1 Austria 2020-05-09 15833 615 13928 AUT 9006400 0.6828478 0.0388429

Final data set:

df <- subset(df_merged, select = -c(iso3, Country.Region, Date))
kable(df)
country traffic_time_index quality_of_life_index healthcare_index purchasing_power_incl_rent_index iso2 house_price_to_income_ratio pollution_index climate_index safety_index cpi_index obesity_rate confirmed dead recovered Population death_per_10kcapita death_per_confirmed
Denmark 28.84967 192.67287 79.99861 100.88455 DK 7.453022 21.32568 81.80102 74.89569 83.00263 19.7 10517 526 8291 5792203 0.9081173 0.0500143
Switzerland 29.08889 192.00966 72.43725 119.53407 CH 8.678198 22.39336 79.23562 78.40138 122.39881 19.5 30251 1830 26400 8654618 2.1144781 0.0604939
Finland 29.90210 190.22307 75.79191 99.93179 FI 8.348263 11.54766 58.86849 76.67501 70.28992 22.2 5880 265 4000 5540718 0.4782774 0.0450680
Australia 34.73191 186.21425 77.38250 107.30969 AU 7.521587 23.46116 92.70246 58.63757 73.53912 29.0 6939 97 6141 25459700 0.0380994 0.0139790
Netherlands 29.43206 183.66966 74.64593 90.72990 NL 7.510502 27.40537 87.60780 72.37625 73.74621 20.4 42581 5441 149 17134873 3.1753956 0.1277800
Austria 26.27285 182.49792 78.72762 82.38178 AT 10.881314 22.18992 77.30450 76.27022 70.38106 20.1 15833 615 13928 9006400 0.6828478 0.0388429
Iceland 20.10484 181.74968 65.91694 79.43872 IS 6.743605 16.20970 68.80913 76.64268 100.47810 21.9 1801 10 1773 341250 0.2930403 0.0055525
New Zealand 31.10124 181.01737 73.81169 92.65930 NZ 8.524692 23.39718 95.45707 59.07275 72.52990 30.8 1494 21 1371 4822233 0.0435483 0.0140562
Germany 30.98199 179.78200 73.32131 102.35901 DE 9.383290 29.02659 83.00464 65.18606 65.26026 22.3 171324 7549 143300 83783945 0.9010079 0.0440627
Estonia 24.52688 177.82339 72.67463 71.29566 EE 9.061324 19.80567 64.27713 76.86227 50.93035 21.2 1733 60 747 1326539 0.4523048 0.0346220
Sweden 30.48990 175.94562 69.22929 101.73127 SE 9.273113 18.08990 73.97315 52.93246 69.84555 20.6 25921 3220 4971 10099270 3.1883493 0.1242236
Norway 26.99329 175.19405 74.35655 88.37700 NO 8.770038 20.34644 71.37195 64.56747 101.42778 23.1 8099 219 32 5421242 0.4039665 0.0270404
Slovenia 27.30000 172.15093 64.58003 66.31006 SI 10.488593 24.06167 77.56112 78.92749 53.42851 20.2 1454 101 255 2078932 0.4858264 0.0694635
United States 32.88929 172.11043 69.27048 109.52095 US 3.518853 36.87565 77.53833 52.80061 71.05298 36.2 1309550 78795 212534 329466283 2.3915953 0.0601695
Spain 29.10178 169.82456 78.88316 72.03297 ES 9.374174 39.98797 94.19409 68.04163 53.77027 23.8 223578 26478 133952 46754783 5.6631639 0.1184285
Japan 39.15038 167.99036 81.14303 87.27511 JP 13.003179 39.59446 84.79052 79.33819 83.34817 4.3 15663 607 5906 126476458 0.0479931 0.0387538
Oman 22.80000 167.08855 58.15424 80.96589 OM 9.082185 37.74093 67.21854 79.21087 49.28305 27.0 3224 17 1068 5106622 0.0332901 0.0052730
Canada 33.87402 163.47241 71.58088 95.09013 CA 7.968498 27.82909 50.57194 60.33483 67.62489 29.4 68918 4823 31262 37855702 1.2740485 0.0699817
Portugal 29.99605 162.90538 71.88357 49.43092 PT 12.699386 30.89074 97.31355 70.36983 49.51947 20.8 27406 1126 2499 10196707 1.1042781 0.0410859
United Kingdom 34.53073 162.71289 74.46215 91.73410 GB 10.088192 40.55618 87.62401 56.29144 67.28325 27.8 216525 31662 1001 67886004 4.6639952 0.1462279
Qatar 29.71717 162.28517 73.30257 111.68924 QA 5.819578 61.06374 36.02774 88.14151 64.04240 35.1 21331 13 2449 2881060 0.0451223 0.0006094
Lithuania 26.07759 159.42224 69.49028 57.84589 LT 10.904033 28.79538 67.76290 66.93645 44.28340 26.3 1444 49 828 2722291 0.1799955 0.0339335
Croatia 29.10780 159.00507 62.67970 50.41590 HR 13.730953 30.45654 89.04831 75.28562 49.69533 24.4 2176 87 1726 4105268 0.2119228 0.0399816
United Arab Emirates 36.84840 156.66702 67.03940 91.58078 AE 5.286096 51.15344 45.23352 84.30395 61.98273 31.7 17417 185 4295 9890400 0.1870501 0.0106218
Czech Republic 29.65184 156.24471 74.62371 62.82402 CZ 15.424539 40.22991 77.13160 74.48022 46.14629 26.0 8095 276 4447 10708982 0.2577276 0.0340951
France 34.75772 153.94875 79.98899 80.36465 FR 13.043785 43.56327 90.24704 53.20834 74.14391 21.6 176782 26313 56148 65273512 4.0311911 0.1488443
Ireland 37.67832 153.52718 51.88632 80.88163 IE 7.485769 33.99221 89.13089 54.57491 75.91348 25.3 22760 1446 17110 4937796 2.9284320 0.0635325
Belgium 36.14610 153.47188 74.33634 86.28467 BE 6.907948 52.94033 85.99443 56.02296 71.78262 22.1 52596 8581 13411 11589616 7.4040417 0.1631493
Slovakia 29.10784 152.53286 60.02377 56.94178 SK 10.249536 39.65924 78.12861 70.78489 44.45724 20.5 1455 26 919 5459643 0.0476222 0.0178694
Saudi Arabia 28.60947 150.56131 59.11352 99.99585 SA 2.786306 65.08902 45.98363 73.81545 48.34322 35.4 37136 239 10144 34813867 0.0686508 0.0064358
Latvia 32.89247 149.99764 62.90569 52.48140 LV 8.849483 33.73467 74.69898 63.04626 47.94486 23.6 930 18 464 1886202 0.0954299 0.0193548
Israel 35.91241 149.94250 73.28857 78.08508 IL 12.832165 57.24640 93.77764 70.40384 81.15454 26.1 16454 247 11376 8655541 0.2853663 0.0150115
Cyprus 23.95082 147.92626 51.75356 57.41215 CY 8.199260 53.55118 92.80229 69.98935 57.93337 21.8 892 15 401 1207361 0.1242379 0.0168161
Singapore 41.30631 144.38575 70.83623 88.96049 SG 22.312409 33.48362 57.45328 69.42849 81.10301 6.1 22460 20 2296 5850343 0.0341860 0.0008905
Taiwan 31.93182 143.22601 86.71453 65.67109 TW 23.692501 63.35052 84.37945 84.34950 61.37385 NA 440 6 361 23816775 0.0025192 0.0136364
Poland 31.71679 141.83018 61.00867 59.60825 PL 11.379831 54.45633 76.14059 71.49866 40.03665 23.1 15651 785 5437 37846605 0.2074162 0.0501565
Italy 34.42466 140.76202 66.59011 65.59359 IT 9.710983 55.63247 92.26544 55.74135 67.25658 19.9 218268 30395 103031 60461828 5.0271388 0.1392554
South Korea 39.57174 139.02389 81.96500 85.20521 KR 17.374395 62.48445 68.39446 71.98390 78.17735 4.7 10874 256 9610 51269183 0.0499325 0.0235424
Belarus 30.67677 134.82919 59.03668 36.99567 BY 14.645532 43.63330 64.36820 75.00650 34.70285 24.5 22052 126 6050 9449321 0.1333429 0.0057138
Greece 33.83740 133.07255 56.21411 43.68143 GR 10.598569 52.54869 94.17588 59.67720 55.66503 24.9 2710 151 1374 10423056 0.1448711 0.0557196
Romania 34.74878 132.43807 55.05578 48.85854 RO 11.327094 58.42431 77.62148 72.36012 35.30522 22.5 15131 939 6912 19237682 0.4881045 0.0620580
South Africa 39.43247 131.97051 64.14473 73.60740 ZA 3.933628 57.30304 95.25269 22.51455 42.86621 28.3 9420 186 3983 59308690 0.0313613 0.0197452
Bulgaria 29.39400 129.80196 55.39576 49.36811 BG 8.519949 65.33132 82.75672 61.50326 36.69626 25.0 1921 90 422 6948445 0.1295254 0.0468506
Hungary 35.77739 128.15532 47.79987 47.55153 HU 14.771760 48.28649 79.48140 64.91569 40.85438 26.4 3213 405 904 9660350 0.4192395 0.1260504
Turkey 44.64854 127.10061 69.80286 40.84560 TR 7.806966 67.35228 93.26165 60.51208 34.69271 32.1 137115 3739 89480 84339067 0.4433295 0.0272691
Uruguay 39.89130 125.98306 66.28120 37.45649 UY 16.342254 44.84363 98.03568 46.19342 51.03619 27.9 702 18 513 3473727 0.0518175 0.0256410
Ecuador 37.54762 125.14458 70.58602 36.07997 EC 12.704691 57.00090 94.51074 49.10271 40.97648 19.9 29071 1717 3433 17643060 0.9731872 0.0590623
Bosnia And Herzegovina 26.66667 121.88667 52.26250 41.83162 BA 12.578326 62.33096 80.47768 56.97052 35.96506 17.9 2090 102 1059 3280815 0.3108984 0.0488038
Chile 35.44318 119.76466 65.44293 42.49976 CL 14.928576 65.78316 90.20527 54.77397 43.62215 28.0 27219 304 12667 19116209 0.1590273 0.0111687
Mexico 39.39162 118.54743 70.12283 41.81023 MX 10.666892 66.10374 86.28794 46.03030 35.71728 28.9 33460 3353 21824 128932753 0.2600580 0.1002092
Malaysia 37.03030 118.44042 68.10438 64.49155 MY 9.940264 63.17563 57.91647 41.16223 39.12052 15.6 6589 108 4929 32365998 0.0333684 0.0163910
Serbia 30.42699 116.30082 51.26556 36.72400 RS 18.938084 60.31775 83.23310 62.59142 35.71666 21.5 10032 215 2732 8737370 0.2460695 0.0214314
Georgia 36.02000 115.94622 51.23671 24.87582 GE 14.212180 71.08707 84.20425 79.78994 28.48285 21.7 626 10 297 3989175 0.0250678 0.0159744
Kuwait 34.44000 115.75207 56.20828 85.59116 KW 12.279476 68.69143 20.22263 65.24526 50.37446 37.9 7623 49 2622 4270563 0.1147390 0.0064279
Argentina 43.08232 115.31457 69.25247 47.21916 AR 24.531862 50.66676 98.27900 38.22513 32.95458 28.3 5776 300 1728 45195777 0.0663779 0.0519391
Jordan 42.02899 112.39625 64.60227 34.87689 JO 8.423914 77.78229 89.04694 59.16597 53.67444 35.5 522 9 387 10203140 0.0088208 0.0172414
North Macedonia 27.61404 110.45624 56.37830 37.15416 MK 12.600614 80.23085 76.29654 61.32916 31.58639 22.4 1622 91 1112 2083380 0.4367902 0.0561036
India 46.99106 108.62720 67.12699 54.30393 IN 11.375989 78.86750 64.87236 56.67775 24.58314 3.9 62808 2101 19301 1380004385 0.0152246 0.0334512
Panama 36.48000 108.36490 59.92786 34.23454 PA 12.602192 63.08894 67.83655 52.81013 54.15820 22.7 8282 237 4501 4314768 0.5492763 0.0286163
Lebanon 37.38462 106.55515 64.37542 44.46948 LB 13.227057 88.36627 94.73887 56.63634 60.50396 32.0 809 26 234 6825442 0.0380928 0.0321384
Colombia 47.49246 105.82676 67.24011 31.11585 CO 19.653250 62.82861 96.69306 45.20911 30.65978 22.3 10495 445 2569 50882884 0.0874557 0.0424011
Brazil 41.69870 105.64633 56.29039 32.80873 BR 16.414097 54.97914 97.16268 31.11512 40.22228 22.1 156061 10656 61685 212559409 0.5013187 0.0682810
Morocco 36.88889 105.45943 45.71935 35.24548 MA 14.177770 70.64057 91.69621 51.31186 34.31907 26.1 5910 186 2461 36910558 0.0503921 0.0314721
Pakistan 38.56047 105.44024 60.58675 30.57127 PK 11.981666 74.25446 71.93268 55.92482 21.98070 8.6 28736 636 7809 220892331 0.0287923 0.0221325
Ukraine 38.65314 104.76968 52.33359 31.79817 UA 12.228967 65.08189 71.41618 51.14551 33.17713 24.1 14710 376 2909 43733759 0.0859748 0.0255608
China 41.81445 102.81119 64.48305 60.88229 CN 29.057122 80.77383 79.19380 68.16920 40.03911 6.2 83990 4637 79127 1404676330 0.0330112 0.0552090
Russia 45.30420 102.30563 57.59408 38.94246 RU 10.772440 62.78721 40.36441 58.87879 39.20503 23.1 198676 1827 31916 145934460 0.1251932 0.0091959
Thailand 38.22807 101.87665 77.94772 35.45190 TH 22.260435 75.06562 69.44944 59.52478 49.77184 10.0 3004 56 2787 69799978 0.0080229 0.0186418
Hong Kong 41.46212 99.05359 66.07876 65.32260 HK 47.456833 67.68583 83.63913 79.29503 77.22394 NA NA NA NA NA NA NA
Indonesia 43.11157 97.47430 60.48300 25.05041 ID 18.880380 66.56103 74.14789 54.16487 37.27280 6.9 13645 959 2607 273523621 0.0350610 0.0702822
Kazakhstan 31.08000 88.30704 50.69575 38.36194 KZ 10.864093 75.14845 39.77913 37.98342 30.64392 21.0 4975 31 1776 18776707 0.0165098 0.0062312
Vietnam 30.16912 87.47921 57.70066 28.14149 VN 21.363412 86.47206 71.23916 54.65269 38.34373 2.1 288 0 241 97338583 0.0000000 0.0000000
Egypt 49.77674 86.53973 45.84033 22.40610 EG 13.013813 85.64919 91.97840 53.07509 29.54210 32.0 8964 514 2002 102334403 0.0502275 0.0573405
Peru 48.32653 85.45794 56.15480 33.95162 PE 14.954493 84.13382 97.69300 31.84627 38.65127 19.7 65015 1814 20246 32971846 0.5501663 0.0279013
Philippines 44.63087 85.37364 67.47425 23.48214 PH 24.602093 74.28124 60.81474 57.84267 37.63475 6.4 10610 704 1842 109581085 0.0642447 0.0663525
Sri Lanka 59.01235 85.16271 72.53332 24.45454 LK 31.135533 59.13779 59.11472 59.78198 31.61438 5.2 847 9 260 21413250 0.0042030 0.0106257
Iran 48.01420 74.14397 51.69924 22.69149 IR 25.110528 77.44633 70.98750 50.74892 39.00771 25.8 106220 6589 85064 83992953 0.7844706 0.0620316
Kenya 56.65306 70.55778 55.58817 27.32203 KE 31.628077 76.59695 99.78948 38.34092 40.20684 7.1 649 30 207 53771300 0.0055792 0.0462250
Bangladesh 56.73148 70.03254 42.79613 33.21100 BD 13.098089 86.21459 71.28995 36.06337 32.25093 3.6 13770 214 2414 164689383 0.0129942 0.0155410
Nigeria 61.08333 55.65243 51.59093 13.52465 NG 16.698723 87.62972 60.75307 36.23109 31.00172 8.9 4151 128 745 206139587 0.0062094 0.0308359

Comparing death per confirmed case and per capita

m_confrimVcapita <- lm(death_per_confirmed ~ death_per_10kcapita, df)
summary(m_confrimVcapita)
## 
## Call:
## lm(formula = death_per_confirmed ~ death_per_10kcapita, data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.031326 -0.015039 -0.004693  0.014968  0.088656 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         0.028412   0.002869   9.903 2.22e-15 ***
## death_per_10kcapita 0.021427   0.001808  11.853  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02271 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.646,  Adjusted R-squared:  0.6414 
## F-statistic: 140.5 on 1 and 77 DF,  p-value: < 2.2e-16

With an adjusted R squarred of 64% we can see that these two measures of the response to corona do explain a large proportion of their respective variability. We need to keep in mind that, cleary, the death per confirmed case highly depends on the amount of testing that is being done in each of the countries.

ggplot(df, aes(x= death_per_confirmed, y= death_per_10kcapita)) + 
                  geom_point() +
                  geom_text(aes(label=country),hjust=0, vjust=-0.5) + 
                  geom_smooth(method='lm', formula= y  ~ x )

Analysis of death per confirmed cases

We start by investigating all scatterplots:

pair_var <- c(2:5, 7:12, 17, 18)
pairs(df[pair_var])

judging by these pair plots, it looks like the most interesting correlations for death rates among confirmed cases might be cpi_index, quality_of_life_index, climate_index, obesity_rate, and possibly pollution_index and healthcare_index

m_health <- lm(death_per_confirmed ~ healthcare_index, df)
summary(m_health)
## 
## Call:
## lm(formula = death_per_confirmed ~ healthcare_index, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.04879 -0.02748 -0.01146  0.01541  0.11308 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)      0.0017337  0.0281954   0.061    0.951
## healthcare_index 0.0006503  0.0004300   1.512    0.135
## 
## Residual standard error: 0.03761 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.02884,    Adjusted R-squared:  0.01623 
## F-statistic: 2.287 on 1 and 77 DF,  p-value: 0.1346
m_pol <- lm(death_per_confirmed ~ pollution_index, df)
summary(m_pol)
## 
## Call:
## lm(formula = death_per_confirmed ~ pollution_index, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.05098 -0.02680 -0.01166  0.01436  0.11923 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.0620991  0.0116320   5.339 9.17e-07 ***
## pollution_index -0.0003433  0.0002044  -1.680    0.097 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03748 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.03536,    Adjusted R-squared:  0.02283 
## F-statistic: 2.822 on 1 and 77 DF,  p-value: 0.09703
m_qol <- lm(death_per_confirmed ~ quality_of_life_index, df)
summary(m_qol)
## 
## Call:
## lm(formula = death_per_confirmed ~ quality_of_life_index, data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.049091 -0.024744 -0.009786  0.016445  0.115296 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           0.0156955  0.0173634   0.904   0.3688  
## quality_of_life_index 0.0002095  0.0001252   1.674   0.0982 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03749 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.03511,    Adjusted R-squared:  0.02258 
## F-statistic: 2.802 on 1 and 77 DF,  p-value: 0.09823
m_clim <- lm(death_per_confirmed ~ climate_index, df)
summary(m_clim)
## 
## Call:
## lm(formula = death_per_confirmed ~ climate_index, data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.043911 -0.024494 -0.008463  0.010148  0.112508 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -0.0159346  0.0193363  -0.824  0.41244   
## climate_index  0.0007742  0.0002447   3.164  0.00223 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0359 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.115,  Adjusted R-squared:  0.1035 
## F-statistic: 10.01 on 1 and 77 DF,  p-value: 0.002231
m_obes <- lm(death_per_confirmed ~ obesity_rate, df)
summary(m_obes)
## 
## Call:
## lm(formula = death_per_confirmed ~ obesity_rate, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.04562 -0.02683 -0.01021  0.01362  0.11880 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0411568  0.0117408   3.505 0.000768 ***
## obesity_rate 0.0001445  0.0005058   0.286 0.775840    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03823 on 76 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.001073,   Adjusted R-squared:  -0.01207 
## F-statistic: 0.08166 on 1 and 76 DF,  p-value: 0.7758
m_cpi <- lm(death_per_confirmed ~ cpi_index, df)
summary(m_cpi)
## 
## Call:
## lm(formula = death_per_confirmed ~ cpi_index, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.05366 -0.02694 -0.00789  0.01465  0.11304 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 0.0273541  0.0120517   2.270    0.026 *
## cpi_index   0.0003170  0.0002164   1.465    0.147  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03764 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.02713,    Adjusted R-squared:  0.01449 
## F-statistic: 2.147 on 1 and 77 DF,  p-value: 0.1469

It seems like climate index is he only variable that has any statistical significance at all. Let’s take a closer look at that.

ggplot(df, aes(x=climate_index, y=death_per_confirmed)) + geom_point() + geom_smooth(method = "lm", formula = y~x)

This suggests that climate may have an effect on death rates. However, Numbeo’s climate index statistic is not an average temperature, but rather a “climate-likeability” index, meaning it is more along the lines of a probability that the average person will enjoy this climate. Thus, warm but temperate climates are more likely to score higher, whereas more extreme or uncomfortable climates score lower. So this could either mean that the virus may be less lethal in more extreme climates, or it could be that death rates are higher in areas people are more likely to live.

Analysis of death per capita

We repeat the analysis above with deaths per capita (per 10’000 inhabitants) and compare the results to the above model fitting

m_health2 <- lm(death_per_10kcapita ~ healthcare_index, df)
summary(m_health2)
## 
## Call:
## lm(formula = death_per_10kcapita ~ healthcare_index, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6376 -0.7040 -0.4093  0.1038  6.2830 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      -1.99576    1.02644  -1.944  0.05551 . 
## healthcare_index  0.04193    0.01565   2.678  0.00904 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.369 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.08522,    Adjusted R-squared:  0.07334 
## F-statistic: 7.174 on 1 and 77 DF,  p-value: 0.009041
m_pol2 <- lm(death_per_10kcapita ~ pollution_index, df)
summary(m_pol2)
## 
## Call:
## lm(formula = death_per_10kcapita ~ pollution_index, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2448 -0.6064 -0.3384 -0.0574  6.6798 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1.72755    0.42689   4.047 0.000122 ***
## pollution_index -0.01895    0.00750  -2.527 0.013563 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.376 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.07657,    Adjusted R-squared:  0.06457 
## F-statistic: 6.385 on 1 and 77 DF,  p-value: 0.01356
m_qol2 <- lm(death_per_10kcapita ~ quality_of_life_index, df)
summary(m_qol2)
## 
## Call:
## lm(formula = death_per_10kcapita ~ quality_of_life_index, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4272 -0.7567 -0.2750  0.0550  6.4096 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           -1.212477   0.622863  -1.947  0.05523 . 
## quality_of_life_index  0.014380   0.004491   3.202  0.00198 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.345 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1175, Adjusted R-squared:  0.1061 
## F-statistic: 10.25 on 1 and 77 DF,  p-value: 0.001984
m_clim2 <- lm(death_per_10kcapita ~ climate_index, df)
summary(m_clim2)
## 
## Call:
## lm(formula = death_per_10kcapita ~ climate_index, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1457 -0.7116 -0.4619  0.0076  6.5156 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   -0.749941   0.751667  -0.998   0.3215  
## climate_index  0.019053   0.009513   2.003   0.0487 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.396 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.04951,    Adjusted R-squared:  0.03717 
## F-statistic: 4.011 on 1 and 77 DF,  p-value: 0.04872
m_obes2 <- lm(death_per_10kcapita ~ obesity_rate, df)
summary(m_obes2)
## 
## Call:
## lm(formula = death_per_10kcapita ~ obesity_rate, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9634 -0.6642 -0.4571 -0.2262  6.6635 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.35846    0.43932   0.816    0.417
## obesity_rate  0.01729    0.01893   0.913    0.364
## 
## Residual standard error: 1.431 on 76 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.01086,    Adjusted R-squared:  -0.002154 
## F-statistic: 0.8345 on 1 and 76 DF,  p-value: 0.3639
m_cpi2 <- lm(death_per_10kcapita ~ cpi_index, df)
summary(m_cpi2)
## 
## Call:
## lm(formula = death_per_10kcapita ~ cpi_index, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7138 -0.5866 -0.2376  0.0222  6.1599 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.66376    0.42613  -1.558 0.123420    
## cpi_index    0.02658    0.00765   3.474 0.000845 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.331 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1355, Adjusted R-squared:  0.1243 
## F-statistic: 12.07 on 1 and 77 DF,  p-value: 0.0008446

Given that a lot of focus during the crisis has been on the health care system we will investigate this relationship.

With an adjusted R squared of 7% we cannot conclude that health care rating is a specifically strong indicator of the amount of people that die in a country. However, we want to investigate this further to see if we can see find a reason for why it could be such a small predictive power.

ggplot(df, aes(x= healthcare_index, y= death_per_10kcapita)) + 
                  geom_point() +
                  geom_text(aes(label=country),hjust=0, vjust=-0.5) + 
                  geom_smooth(method='lm', formula= y  ~ x ) 

We can see that contrary to what we would have anticipated we have a potivie correlation between health care rating and the deaths per capita. The countries that stick out especially are Belgium, Spain, Italy, the UK and France. THey all have a death per capita of more than 4 per 10 thousand. while having a relatively well rated health care system. It seems like there is a large divide between the European countries and the rest of the world (specifically the Asian countries). Looking at the countries with the best rated healthcare we clearly see that Asian countries have a much lower death rate (less than 5 per million inhabitants) while the Western countries with similar ratings have 20 to 100 times higher rate of death per capita. THis might have more to do with the response and ability to lock down the economy effectively versus western countries:

head(arrange(df, desc(healthcare_index))[,c(1,4,17)])

Finally, we need to add that this is not the end of this epidemic and numbers will surely be updated and corrected for the foreseeable future.

GitHub: https://github.com/chilleundso/DATA607/blob/master/Final_Project/Data607_Final_merged.Rmd