We have chosen to investigate the coronavirus data set provided by John Hopkins University which themselves source the data from DXY, an online platform run by members of the Chinese medical community.
We combine this data with data sourced through the API of Numbeo which provides country information on various indicators, including health care and quality of life.
Our goal is to investigate wether we can identify any correlation between any of these indicators and either the mortality rate (i.e. death per confirmed case) and the death per capita.
We load all required libraries:
library(httr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
library(date)
library(tidyverse)
library(jsonlite)
We download three seperate CSV (for confirmed cases, recoveries and deaths) globally from the John Hopkins GitHub account and save them as data frames:
urlRemote <- "https://raw.githubusercontent.com/"
pathGithub <- "CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/"
fileConfirmGlobal <- "time_series_covid19_confirmed_global.csv"
# fileConfirmUS <- "time_series_covid19_confirmed_US.csv"
fileDeathsGlobal <- "time_series_covid19_deaths_global.csv"
# fileDeathsUS <- "time_series_covid19_deaths_US.csv"
fileRecovGlobal <- "time_series_covid19_recovered_global.csv"
confsGlobal <- read.csv(paste0(urlRemote, pathGithub, fileConfirmGlobal),header = TRUE)
# confsUS <- read.csv(paste0(urlRemote, pathGithub, fileConfirmUS),header = TRUE)
deathsGlobal <- read.csv(paste0(urlRemote, pathGithub, fileDeathsGlobal),header = TRUE)
# deathsUS <- read.csv(paste0(urlRemote, pathGithub, fileDeathsUS),header = TRUE)
recovsGlobal<- read.csv(paste0(urlRemote, pathGithub, fileRecovGlobal),header = TRUE)
Download a table that will enable us to merge a unique country iso code and the population so that we can combine the data with the numbeo API data.
urlRemote <- "https://raw.githubusercontent.com/"
pathGithub <- "chilleundso/DATA607/master/Final_Project/"
countryISOfile <- "UID_ISO_FIPS_LookUp_Table.csv"
countryISOfull <- read.csv(paste0(urlRemote, pathGithub, countryISOfile),header = TRUE)
As we can see in the above table the data is provide in a long format. In the below section we transform all 3 data frames to take long format:
#Creating a long table using gather
confs <- tidyr::gather(confsGlobal, "Date", "Amount", -Province.State, -Country.Region, -Lat, -Long)
confs <- confs %>%
select(Country.Region, Date, Amount,) %>%
group_by(Country.Region, Date) %>%
summarise(Amount = sum(Amount))
#transforming the date column into a date structure:
confs$Date<-as.Date(confs$Date,format='X%m.%d.%y')
#adding a status label to flag data belongs to confirmed data set
confs$Status <- 'confirmed'
#Creating a long table using gather
recovs <- tidyr::gather(recovsGlobal, "Date", "Amount", -Province.State, -Country.Region, -Lat, -Long)
recovs <- recovs %>%
select(Country.Region, Date, Amount,) %>%
group_by(Country.Region, Date) %>%
summarise(Amount = sum(Amount))
#transforming the date column into a date structure:
recovs$Date<-as.Date(recovs$Date,format='X%m.%d.%y')
#adding a status label to flag data belongs to recovered data set
recovs$Status <- 'recovered'
#Creating a long table using gather
deaths <- tidyr::gather(deathsGlobal, "Date", "Amount", -Province.State, -Country.Region, -Lat, -Long)
deaths <- deaths %>%
select(Country.Region, Date, Amount,) %>%
group_by(Country.Region, Date) %>%
summarise(Amount = sum(Amount))
#transforming the date column into a date structure:
deaths$Date<-as.Date(deaths$Date,format='X%m.%d.%y')
#adding a status label to flag data belongs to deaddata set
deaths$Status <- 'dead'
kable(head(confs))
| Country.Region | Date | Amount | Status |
|---|---|---|---|
| Afghanistan | 2020-01-22 | 0 | confirmed |
| Afghanistan | 2020-01-23 | 0 | confirmed |
| Afghanistan | 2020-01-24 | 0 | confirmed |
| Afghanistan | 2020-01-25 | 0 | confirmed |
| Afghanistan | 2020-01-26 | 0 | confirmed |
| Afghanistan | 2020-01-27 | 0 | confirmed |
#retrieve the latest date for each file
require(data.table)
latestconfs <- as.data.table(confs)
latestconfs <- latestconfs[latestconfs[, .I[Date == max(Date)], by=Country.Region]$V1]
latestrecovs <- as.data.table(recovs)
latestrecovs <- latestrecovs[latestrecovs[, .I[Date == max(Date)], by=Country.Region]$V1]
latestdeaths <- as.data.table(deaths)
latestdeaths <- latestdeaths[latestdeaths[, .I[Date == max(Date)], by=Country.Region]$V1]
#we stack all three data sets underneath each other
latestmaster <- rbind(latestconfs,latestrecovs,latestdeaths)
#we then widen the data set to have three new columns one for each amount of confirmed, recovered and dead cases:
latestmasterwide <- tidyr::spread(latestmaster,Status,Amount)
We merge the wide master data set with the country code and population
master <- merge(latestmasterwide, countryISOfull, by.x = "Country.Region", by.y = "Combined_Key",all.x = TRUE)
kable(head(master))
| Country.Region | Date | confirmed | dead | recovered | iso2 | iso3 | Population |
|---|---|---|---|---|---|---|---|
| Afghanistan | 2020-05-09 | 4033 | 115 | 502 | AF | AFG | 38928341 |
| Albania | 2020-05-09 | 856 | 31 | 627 | AL | ALB | 2877800 |
| Algeria | 2020-05-09 | 5558 | 494 | 2546 | DZ | DZA | 43851043 |
| Andorra | 2020-05-09 | 754 | 48 | 545 | AD | AND | 77265 |
| Angola | 2020-05-09 | 43 | 2 | 13 | AO | AGO | 32866268 |
| Antigua and Barbuda | 2020-05-09 | 25 | 3 | 19 | AG | ATG | 97928 |
We add the death per confirmed case and a death per 10’000 inhabitants measure
master$death_per_10kcapita <- master$dead / master$Population *10000
master$death_per_confirmed <- master$dead / master$confirmed
Merge with numbeo data
request <- fromJSON("https://numbeo.com/api/rankings_by_country_historical?api_key=x2mmjp3a6651i3§ion=12")
quality_life <- request$`2020`
obesity <- read.csv("https://raw.githubusercontent.com/pmalo46/DATA607-1/master/Final_Project/obesity.csv")
indices <- left_join(quality_life, obesity, by = "country")
kable(head(indices))
| country | traffic_time_index | quality_of_life_index | healthcare_index | purchasing_power_incl_rent_index | iso3166_country_code | house_price_to_income_ratio | pollution_index | climate_index | safety_index | cpi_index | obesity_rate |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Denmark | 28.84967 | 192.6729 | 79.99861 | 100.88455 | DK | 7.453022 | 21.32568 | 81.80102 | 74.89569 | 83.00263 | 19.7 |
| Switzerland | 29.08889 | 192.0097 | 72.43725 | 119.53407 | CH | 8.678198 | 22.39336 | 79.23562 | 78.40138 | 122.39881 | 19.5 |
| Finland | 29.90210 | 190.2231 | 75.79191 | 99.93179 | FI | 8.348263 | 11.54766 | 58.86849 | 76.67501 | 70.28992 | 22.2 |
| Australia | 34.73191 | 186.2143 | 77.38250 | 107.30969 | AU | 7.521587 | 23.46116 | 92.70246 | 58.63757 | 73.53912 | 29.0 |
| Netherlands | 29.43206 | 183.6697 | 74.64593 | 90.72990 | NL | 7.510502 | 27.40537 | 87.60780 | 72.37625 | 73.74621 | 20.4 |
| Austria | 26.27285 | 182.4979 | 78.72762 | 82.38178 | AT | 10.881314 | 22.18992 | 77.30450 | 76.27022 | 70.38106 | 20.1 |
names(indices)
## [1] "country" "traffic_time_index"
## [3] "quality_of_life_index" "healthcare_index"
## [5] "purchasing_power_incl_rent_index" "iso3166_country_code"
## [7] "house_price_to_income_ratio" "pollution_index"
## [9] "climate_index" "safety_index"
## [11] "cpi_index" "obesity_rate"
Change name of coutry code column to merge with master
names(indices)[names(indices) == "iso3166_country_code"] <- "iso2"
names(indices)
## [1] "country" "traffic_time_index"
## [3] "quality_of_life_index" "healthcare_index"
## [5] "purchasing_power_incl_rent_index" "iso2"
## [7] "house_price_to_income_ratio" "pollution_index"
## [9] "climate_index" "safety_index"
## [11] "cpi_index" "obesity_rate"
Merge the two data sets
df_merged <- left_join(indices, master, by = "iso2")
kable(head(df_merged))
| country | traffic_time_index | quality_of_life_index | healthcare_index | purchasing_power_incl_rent_index | iso2 | house_price_to_income_ratio | pollution_index | climate_index | safety_index | cpi_index | obesity_rate | Country.Region | Date | confirmed | dead | recovered | iso3 | Population | death_per_10kcapita | death_per_confirmed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Denmark | 28.84967 | 192.6729 | 79.99861 | 100.88455 | DK | 7.453022 | 21.32568 | 81.80102 | 74.89569 | 83.00263 | 19.7 | Denmark | 2020-05-09 | 10517 | 526 | 8291 | DNK | 5792203 | 0.9081173 | 0.0500143 |
| Switzerland | 29.08889 | 192.0097 | 72.43725 | 119.53407 | CH | 8.678198 | 22.39336 | 79.23562 | 78.40138 | 122.39881 | 19.5 | Switzerland | 2020-05-09 | 30251 | 1830 | 26400 | CHE | 8654618 | 2.1144781 | 0.0604939 |
| Finland | 29.90210 | 190.2231 | 75.79191 | 99.93179 | FI | 8.348263 | 11.54766 | 58.86849 | 76.67501 | 70.28992 | 22.2 | Finland | 2020-05-09 | 5880 | 265 | 4000 | FIN | 5540718 | 0.4782774 | 0.0450680 |
| Australia | 34.73191 | 186.2143 | 77.38250 | 107.30969 | AU | 7.521587 | 23.46116 | 92.70246 | 58.63757 | 73.53912 | 29.0 | Australia | 2020-05-09 | 6939 | 97 | 6141 | AUS | 25459700 | 0.0380994 | 0.0139790 |
| Netherlands | 29.43206 | 183.6697 | 74.64593 | 90.72990 | NL | 7.510502 | 27.40537 | 87.60780 | 72.37625 | 73.74621 | 20.4 | Netherlands | 2020-05-09 | 42581 | 5441 | 149 | NLD | 17134873 | 3.1753956 | 0.1277800 |
| Austria | 26.27285 | 182.4979 | 78.72762 | 82.38178 | AT | 10.881314 | 22.18992 | 77.30450 | 76.27022 | 70.38106 | 20.1 | Austria | 2020-05-09 | 15833 | 615 | 13928 | AUT | 9006400 | 0.6828478 | 0.0388429 |
Final data set:
df <- subset(df_merged, select = -c(iso3, Country.Region, Date))
kable(df)
| country | traffic_time_index | quality_of_life_index | healthcare_index | purchasing_power_incl_rent_index | iso2 | house_price_to_income_ratio | pollution_index | climate_index | safety_index | cpi_index | obesity_rate | confirmed | dead | recovered | Population | death_per_10kcapita | death_per_confirmed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Denmark | 28.84967 | 192.67287 | 79.99861 | 100.88455 | DK | 7.453022 | 21.32568 | 81.80102 | 74.89569 | 83.00263 | 19.7 | 10517 | 526 | 8291 | 5792203 | 0.9081173 | 0.0500143 |
| Switzerland | 29.08889 | 192.00966 | 72.43725 | 119.53407 | CH | 8.678198 | 22.39336 | 79.23562 | 78.40138 | 122.39881 | 19.5 | 30251 | 1830 | 26400 | 8654618 | 2.1144781 | 0.0604939 |
| Finland | 29.90210 | 190.22307 | 75.79191 | 99.93179 | FI | 8.348263 | 11.54766 | 58.86849 | 76.67501 | 70.28992 | 22.2 | 5880 | 265 | 4000 | 5540718 | 0.4782774 | 0.0450680 |
| Australia | 34.73191 | 186.21425 | 77.38250 | 107.30969 | AU | 7.521587 | 23.46116 | 92.70246 | 58.63757 | 73.53912 | 29.0 | 6939 | 97 | 6141 | 25459700 | 0.0380994 | 0.0139790 |
| Netherlands | 29.43206 | 183.66966 | 74.64593 | 90.72990 | NL | 7.510502 | 27.40537 | 87.60780 | 72.37625 | 73.74621 | 20.4 | 42581 | 5441 | 149 | 17134873 | 3.1753956 | 0.1277800 |
| Austria | 26.27285 | 182.49792 | 78.72762 | 82.38178 | AT | 10.881314 | 22.18992 | 77.30450 | 76.27022 | 70.38106 | 20.1 | 15833 | 615 | 13928 | 9006400 | 0.6828478 | 0.0388429 |
| Iceland | 20.10484 | 181.74968 | 65.91694 | 79.43872 | IS | 6.743605 | 16.20970 | 68.80913 | 76.64268 | 100.47810 | 21.9 | 1801 | 10 | 1773 | 341250 | 0.2930403 | 0.0055525 |
| New Zealand | 31.10124 | 181.01737 | 73.81169 | 92.65930 | NZ | 8.524692 | 23.39718 | 95.45707 | 59.07275 | 72.52990 | 30.8 | 1494 | 21 | 1371 | 4822233 | 0.0435483 | 0.0140562 |
| Germany | 30.98199 | 179.78200 | 73.32131 | 102.35901 | DE | 9.383290 | 29.02659 | 83.00464 | 65.18606 | 65.26026 | 22.3 | 171324 | 7549 | 143300 | 83783945 | 0.9010079 | 0.0440627 |
| Estonia | 24.52688 | 177.82339 | 72.67463 | 71.29566 | EE | 9.061324 | 19.80567 | 64.27713 | 76.86227 | 50.93035 | 21.2 | 1733 | 60 | 747 | 1326539 | 0.4523048 | 0.0346220 |
| Sweden | 30.48990 | 175.94562 | 69.22929 | 101.73127 | SE | 9.273113 | 18.08990 | 73.97315 | 52.93246 | 69.84555 | 20.6 | 25921 | 3220 | 4971 | 10099270 | 3.1883493 | 0.1242236 |
| Norway | 26.99329 | 175.19405 | 74.35655 | 88.37700 | NO | 8.770038 | 20.34644 | 71.37195 | 64.56747 | 101.42778 | 23.1 | 8099 | 219 | 32 | 5421242 | 0.4039665 | 0.0270404 |
| Slovenia | 27.30000 | 172.15093 | 64.58003 | 66.31006 | SI | 10.488593 | 24.06167 | 77.56112 | 78.92749 | 53.42851 | 20.2 | 1454 | 101 | 255 | 2078932 | 0.4858264 | 0.0694635 |
| United States | 32.88929 | 172.11043 | 69.27048 | 109.52095 | US | 3.518853 | 36.87565 | 77.53833 | 52.80061 | 71.05298 | 36.2 | 1309550 | 78795 | 212534 | 329466283 | 2.3915953 | 0.0601695 |
| Spain | 29.10178 | 169.82456 | 78.88316 | 72.03297 | ES | 9.374174 | 39.98797 | 94.19409 | 68.04163 | 53.77027 | 23.8 | 223578 | 26478 | 133952 | 46754783 | 5.6631639 | 0.1184285 |
| Japan | 39.15038 | 167.99036 | 81.14303 | 87.27511 | JP | 13.003179 | 39.59446 | 84.79052 | 79.33819 | 83.34817 | 4.3 | 15663 | 607 | 5906 | 126476458 | 0.0479931 | 0.0387538 |
| Oman | 22.80000 | 167.08855 | 58.15424 | 80.96589 | OM | 9.082185 | 37.74093 | 67.21854 | 79.21087 | 49.28305 | 27.0 | 3224 | 17 | 1068 | 5106622 | 0.0332901 | 0.0052730 |
| Canada | 33.87402 | 163.47241 | 71.58088 | 95.09013 | CA | 7.968498 | 27.82909 | 50.57194 | 60.33483 | 67.62489 | 29.4 | 68918 | 4823 | 31262 | 37855702 | 1.2740485 | 0.0699817 |
| Portugal | 29.99605 | 162.90538 | 71.88357 | 49.43092 | PT | 12.699386 | 30.89074 | 97.31355 | 70.36983 | 49.51947 | 20.8 | 27406 | 1126 | 2499 | 10196707 | 1.1042781 | 0.0410859 |
| United Kingdom | 34.53073 | 162.71289 | 74.46215 | 91.73410 | GB | 10.088192 | 40.55618 | 87.62401 | 56.29144 | 67.28325 | 27.8 | 216525 | 31662 | 1001 | 67886004 | 4.6639952 | 0.1462279 |
| Qatar | 29.71717 | 162.28517 | 73.30257 | 111.68924 | QA | 5.819578 | 61.06374 | 36.02774 | 88.14151 | 64.04240 | 35.1 | 21331 | 13 | 2449 | 2881060 | 0.0451223 | 0.0006094 |
| Lithuania | 26.07759 | 159.42224 | 69.49028 | 57.84589 | LT | 10.904033 | 28.79538 | 67.76290 | 66.93645 | 44.28340 | 26.3 | 1444 | 49 | 828 | 2722291 | 0.1799955 | 0.0339335 |
| Croatia | 29.10780 | 159.00507 | 62.67970 | 50.41590 | HR | 13.730953 | 30.45654 | 89.04831 | 75.28562 | 49.69533 | 24.4 | 2176 | 87 | 1726 | 4105268 | 0.2119228 | 0.0399816 |
| United Arab Emirates | 36.84840 | 156.66702 | 67.03940 | 91.58078 | AE | 5.286096 | 51.15344 | 45.23352 | 84.30395 | 61.98273 | 31.7 | 17417 | 185 | 4295 | 9890400 | 0.1870501 | 0.0106218 |
| Czech Republic | 29.65184 | 156.24471 | 74.62371 | 62.82402 | CZ | 15.424539 | 40.22991 | 77.13160 | 74.48022 | 46.14629 | 26.0 | 8095 | 276 | 4447 | 10708982 | 0.2577276 | 0.0340951 |
| France | 34.75772 | 153.94875 | 79.98899 | 80.36465 | FR | 13.043785 | 43.56327 | 90.24704 | 53.20834 | 74.14391 | 21.6 | 176782 | 26313 | 56148 | 65273512 | 4.0311911 | 0.1488443 |
| Ireland | 37.67832 | 153.52718 | 51.88632 | 80.88163 | IE | 7.485769 | 33.99221 | 89.13089 | 54.57491 | 75.91348 | 25.3 | 22760 | 1446 | 17110 | 4937796 | 2.9284320 | 0.0635325 |
| Belgium | 36.14610 | 153.47188 | 74.33634 | 86.28467 | BE | 6.907948 | 52.94033 | 85.99443 | 56.02296 | 71.78262 | 22.1 | 52596 | 8581 | 13411 | 11589616 | 7.4040417 | 0.1631493 |
| Slovakia | 29.10784 | 152.53286 | 60.02377 | 56.94178 | SK | 10.249536 | 39.65924 | 78.12861 | 70.78489 | 44.45724 | 20.5 | 1455 | 26 | 919 | 5459643 | 0.0476222 | 0.0178694 |
| Saudi Arabia | 28.60947 | 150.56131 | 59.11352 | 99.99585 | SA | 2.786306 | 65.08902 | 45.98363 | 73.81545 | 48.34322 | 35.4 | 37136 | 239 | 10144 | 34813867 | 0.0686508 | 0.0064358 |
| Latvia | 32.89247 | 149.99764 | 62.90569 | 52.48140 | LV | 8.849483 | 33.73467 | 74.69898 | 63.04626 | 47.94486 | 23.6 | 930 | 18 | 464 | 1886202 | 0.0954299 | 0.0193548 |
| Israel | 35.91241 | 149.94250 | 73.28857 | 78.08508 | IL | 12.832165 | 57.24640 | 93.77764 | 70.40384 | 81.15454 | 26.1 | 16454 | 247 | 11376 | 8655541 | 0.2853663 | 0.0150115 |
| Cyprus | 23.95082 | 147.92626 | 51.75356 | 57.41215 | CY | 8.199260 | 53.55118 | 92.80229 | 69.98935 | 57.93337 | 21.8 | 892 | 15 | 401 | 1207361 | 0.1242379 | 0.0168161 |
| Singapore | 41.30631 | 144.38575 | 70.83623 | 88.96049 | SG | 22.312409 | 33.48362 | 57.45328 | 69.42849 | 81.10301 | 6.1 | 22460 | 20 | 2296 | 5850343 | 0.0341860 | 0.0008905 |
| Taiwan | 31.93182 | 143.22601 | 86.71453 | 65.67109 | TW | 23.692501 | 63.35052 | 84.37945 | 84.34950 | 61.37385 | NA | 440 | 6 | 361 | 23816775 | 0.0025192 | 0.0136364 |
| Poland | 31.71679 | 141.83018 | 61.00867 | 59.60825 | PL | 11.379831 | 54.45633 | 76.14059 | 71.49866 | 40.03665 | 23.1 | 15651 | 785 | 5437 | 37846605 | 0.2074162 | 0.0501565 |
| Italy | 34.42466 | 140.76202 | 66.59011 | 65.59359 | IT | 9.710983 | 55.63247 | 92.26544 | 55.74135 | 67.25658 | 19.9 | 218268 | 30395 | 103031 | 60461828 | 5.0271388 | 0.1392554 |
| South Korea | 39.57174 | 139.02389 | 81.96500 | 85.20521 | KR | 17.374395 | 62.48445 | 68.39446 | 71.98390 | 78.17735 | 4.7 | 10874 | 256 | 9610 | 51269183 | 0.0499325 | 0.0235424 |
| Belarus | 30.67677 | 134.82919 | 59.03668 | 36.99567 | BY | 14.645532 | 43.63330 | 64.36820 | 75.00650 | 34.70285 | 24.5 | 22052 | 126 | 6050 | 9449321 | 0.1333429 | 0.0057138 |
| Greece | 33.83740 | 133.07255 | 56.21411 | 43.68143 | GR | 10.598569 | 52.54869 | 94.17588 | 59.67720 | 55.66503 | 24.9 | 2710 | 151 | 1374 | 10423056 | 0.1448711 | 0.0557196 |
| Romania | 34.74878 | 132.43807 | 55.05578 | 48.85854 | RO | 11.327094 | 58.42431 | 77.62148 | 72.36012 | 35.30522 | 22.5 | 15131 | 939 | 6912 | 19237682 | 0.4881045 | 0.0620580 |
| South Africa | 39.43247 | 131.97051 | 64.14473 | 73.60740 | ZA | 3.933628 | 57.30304 | 95.25269 | 22.51455 | 42.86621 | 28.3 | 9420 | 186 | 3983 | 59308690 | 0.0313613 | 0.0197452 |
| Bulgaria | 29.39400 | 129.80196 | 55.39576 | 49.36811 | BG | 8.519949 | 65.33132 | 82.75672 | 61.50326 | 36.69626 | 25.0 | 1921 | 90 | 422 | 6948445 | 0.1295254 | 0.0468506 |
| Hungary | 35.77739 | 128.15532 | 47.79987 | 47.55153 | HU | 14.771760 | 48.28649 | 79.48140 | 64.91569 | 40.85438 | 26.4 | 3213 | 405 | 904 | 9660350 | 0.4192395 | 0.1260504 |
| Turkey | 44.64854 | 127.10061 | 69.80286 | 40.84560 | TR | 7.806966 | 67.35228 | 93.26165 | 60.51208 | 34.69271 | 32.1 | 137115 | 3739 | 89480 | 84339067 | 0.4433295 | 0.0272691 |
| Uruguay | 39.89130 | 125.98306 | 66.28120 | 37.45649 | UY | 16.342254 | 44.84363 | 98.03568 | 46.19342 | 51.03619 | 27.9 | 702 | 18 | 513 | 3473727 | 0.0518175 | 0.0256410 |
| Ecuador | 37.54762 | 125.14458 | 70.58602 | 36.07997 | EC | 12.704691 | 57.00090 | 94.51074 | 49.10271 | 40.97648 | 19.9 | 29071 | 1717 | 3433 | 17643060 | 0.9731872 | 0.0590623 |
| Bosnia And Herzegovina | 26.66667 | 121.88667 | 52.26250 | 41.83162 | BA | 12.578326 | 62.33096 | 80.47768 | 56.97052 | 35.96506 | 17.9 | 2090 | 102 | 1059 | 3280815 | 0.3108984 | 0.0488038 |
| Chile | 35.44318 | 119.76466 | 65.44293 | 42.49976 | CL | 14.928576 | 65.78316 | 90.20527 | 54.77397 | 43.62215 | 28.0 | 27219 | 304 | 12667 | 19116209 | 0.1590273 | 0.0111687 |
| Mexico | 39.39162 | 118.54743 | 70.12283 | 41.81023 | MX | 10.666892 | 66.10374 | 86.28794 | 46.03030 | 35.71728 | 28.9 | 33460 | 3353 | 21824 | 128932753 | 0.2600580 | 0.1002092 |
| Malaysia | 37.03030 | 118.44042 | 68.10438 | 64.49155 | MY | 9.940264 | 63.17563 | 57.91647 | 41.16223 | 39.12052 | 15.6 | 6589 | 108 | 4929 | 32365998 | 0.0333684 | 0.0163910 |
| Serbia | 30.42699 | 116.30082 | 51.26556 | 36.72400 | RS | 18.938084 | 60.31775 | 83.23310 | 62.59142 | 35.71666 | 21.5 | 10032 | 215 | 2732 | 8737370 | 0.2460695 | 0.0214314 |
| Georgia | 36.02000 | 115.94622 | 51.23671 | 24.87582 | GE | 14.212180 | 71.08707 | 84.20425 | 79.78994 | 28.48285 | 21.7 | 626 | 10 | 297 | 3989175 | 0.0250678 | 0.0159744 |
| Kuwait | 34.44000 | 115.75207 | 56.20828 | 85.59116 | KW | 12.279476 | 68.69143 | 20.22263 | 65.24526 | 50.37446 | 37.9 | 7623 | 49 | 2622 | 4270563 | 0.1147390 | 0.0064279 |
| Argentina | 43.08232 | 115.31457 | 69.25247 | 47.21916 | AR | 24.531862 | 50.66676 | 98.27900 | 38.22513 | 32.95458 | 28.3 | 5776 | 300 | 1728 | 45195777 | 0.0663779 | 0.0519391 |
| Jordan | 42.02899 | 112.39625 | 64.60227 | 34.87689 | JO | 8.423914 | 77.78229 | 89.04694 | 59.16597 | 53.67444 | 35.5 | 522 | 9 | 387 | 10203140 | 0.0088208 | 0.0172414 |
| North Macedonia | 27.61404 | 110.45624 | 56.37830 | 37.15416 | MK | 12.600614 | 80.23085 | 76.29654 | 61.32916 | 31.58639 | 22.4 | 1622 | 91 | 1112 | 2083380 | 0.4367902 | 0.0561036 |
| India | 46.99106 | 108.62720 | 67.12699 | 54.30393 | IN | 11.375989 | 78.86750 | 64.87236 | 56.67775 | 24.58314 | 3.9 | 62808 | 2101 | 19301 | 1380004385 | 0.0152246 | 0.0334512 |
| Panama | 36.48000 | 108.36490 | 59.92786 | 34.23454 | PA | 12.602192 | 63.08894 | 67.83655 | 52.81013 | 54.15820 | 22.7 | 8282 | 237 | 4501 | 4314768 | 0.5492763 | 0.0286163 |
| Lebanon | 37.38462 | 106.55515 | 64.37542 | 44.46948 | LB | 13.227057 | 88.36627 | 94.73887 | 56.63634 | 60.50396 | 32.0 | 809 | 26 | 234 | 6825442 | 0.0380928 | 0.0321384 |
| Colombia | 47.49246 | 105.82676 | 67.24011 | 31.11585 | CO | 19.653250 | 62.82861 | 96.69306 | 45.20911 | 30.65978 | 22.3 | 10495 | 445 | 2569 | 50882884 | 0.0874557 | 0.0424011 |
| Brazil | 41.69870 | 105.64633 | 56.29039 | 32.80873 | BR | 16.414097 | 54.97914 | 97.16268 | 31.11512 | 40.22228 | 22.1 | 156061 | 10656 | 61685 | 212559409 | 0.5013187 | 0.0682810 |
| Morocco | 36.88889 | 105.45943 | 45.71935 | 35.24548 | MA | 14.177770 | 70.64057 | 91.69621 | 51.31186 | 34.31907 | 26.1 | 5910 | 186 | 2461 | 36910558 | 0.0503921 | 0.0314721 |
| Pakistan | 38.56047 | 105.44024 | 60.58675 | 30.57127 | PK | 11.981666 | 74.25446 | 71.93268 | 55.92482 | 21.98070 | 8.6 | 28736 | 636 | 7809 | 220892331 | 0.0287923 | 0.0221325 |
| Ukraine | 38.65314 | 104.76968 | 52.33359 | 31.79817 | UA | 12.228967 | 65.08189 | 71.41618 | 51.14551 | 33.17713 | 24.1 | 14710 | 376 | 2909 | 43733759 | 0.0859748 | 0.0255608 |
| China | 41.81445 | 102.81119 | 64.48305 | 60.88229 | CN | 29.057122 | 80.77383 | 79.19380 | 68.16920 | 40.03911 | 6.2 | 83990 | 4637 | 79127 | 1404676330 | 0.0330112 | 0.0552090 |
| Russia | 45.30420 | 102.30563 | 57.59408 | 38.94246 | RU | 10.772440 | 62.78721 | 40.36441 | 58.87879 | 39.20503 | 23.1 | 198676 | 1827 | 31916 | 145934460 | 0.1251932 | 0.0091959 |
| Thailand | 38.22807 | 101.87665 | 77.94772 | 35.45190 | TH | 22.260435 | 75.06562 | 69.44944 | 59.52478 | 49.77184 | 10.0 | 3004 | 56 | 2787 | 69799978 | 0.0080229 | 0.0186418 |
| Hong Kong | 41.46212 | 99.05359 | 66.07876 | 65.32260 | HK | 47.456833 | 67.68583 | 83.63913 | 79.29503 | 77.22394 | NA | NA | NA | NA | NA | NA | NA |
| Indonesia | 43.11157 | 97.47430 | 60.48300 | 25.05041 | ID | 18.880380 | 66.56103 | 74.14789 | 54.16487 | 37.27280 | 6.9 | 13645 | 959 | 2607 | 273523621 | 0.0350610 | 0.0702822 |
| Kazakhstan | 31.08000 | 88.30704 | 50.69575 | 38.36194 | KZ | 10.864093 | 75.14845 | 39.77913 | 37.98342 | 30.64392 | 21.0 | 4975 | 31 | 1776 | 18776707 | 0.0165098 | 0.0062312 |
| Vietnam | 30.16912 | 87.47921 | 57.70066 | 28.14149 | VN | 21.363412 | 86.47206 | 71.23916 | 54.65269 | 38.34373 | 2.1 | 288 | 0 | 241 | 97338583 | 0.0000000 | 0.0000000 |
| Egypt | 49.77674 | 86.53973 | 45.84033 | 22.40610 | EG | 13.013813 | 85.64919 | 91.97840 | 53.07509 | 29.54210 | 32.0 | 8964 | 514 | 2002 | 102334403 | 0.0502275 | 0.0573405 |
| Peru | 48.32653 | 85.45794 | 56.15480 | 33.95162 | PE | 14.954493 | 84.13382 | 97.69300 | 31.84627 | 38.65127 | 19.7 | 65015 | 1814 | 20246 | 32971846 | 0.5501663 | 0.0279013 |
| Philippines | 44.63087 | 85.37364 | 67.47425 | 23.48214 | PH | 24.602093 | 74.28124 | 60.81474 | 57.84267 | 37.63475 | 6.4 | 10610 | 704 | 1842 | 109581085 | 0.0642447 | 0.0663525 |
| Sri Lanka | 59.01235 | 85.16271 | 72.53332 | 24.45454 | LK | 31.135533 | 59.13779 | 59.11472 | 59.78198 | 31.61438 | 5.2 | 847 | 9 | 260 | 21413250 | 0.0042030 | 0.0106257 |
| Iran | 48.01420 | 74.14397 | 51.69924 | 22.69149 | IR | 25.110528 | 77.44633 | 70.98750 | 50.74892 | 39.00771 | 25.8 | 106220 | 6589 | 85064 | 83992953 | 0.7844706 | 0.0620316 |
| Kenya | 56.65306 | 70.55778 | 55.58817 | 27.32203 | KE | 31.628077 | 76.59695 | 99.78948 | 38.34092 | 40.20684 | 7.1 | 649 | 30 | 207 | 53771300 | 0.0055792 | 0.0462250 |
| Bangladesh | 56.73148 | 70.03254 | 42.79613 | 33.21100 | BD | 13.098089 | 86.21459 | 71.28995 | 36.06337 | 32.25093 | 3.6 | 13770 | 214 | 2414 | 164689383 | 0.0129942 | 0.0155410 |
| Nigeria | 61.08333 | 55.65243 | 51.59093 | 13.52465 | NG | 16.698723 | 87.62972 | 60.75307 | 36.23109 | 31.00172 | 8.9 | 4151 | 128 | 745 | 206139587 | 0.0062094 | 0.0308359 |
m_confrimVcapita <- lm(death_per_confirmed ~ death_per_10kcapita, df)
summary(m_confrimVcapita)
##
## Call:
## lm(formula = death_per_confirmed ~ death_per_10kcapita, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.031326 -0.015039 -0.004693 0.014968 0.088656
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.028412 0.002869 9.903 2.22e-15 ***
## death_per_10kcapita 0.021427 0.001808 11.853 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02271 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.646, Adjusted R-squared: 0.6414
## F-statistic: 140.5 on 1 and 77 DF, p-value: < 2.2e-16
With an adjusted R squarred of 64% we can see that these two measures of the response to corona do explain a large proportion of their respective variability. We need to keep in mind that, cleary, the death per confirmed case highly depends on the amount of testing that is being done in each of the countries.
ggplot(df, aes(x= death_per_confirmed, y= death_per_10kcapita)) +
geom_point() +
geom_text(aes(label=country),hjust=0, vjust=-0.5) +
geom_smooth(method='lm', formula= y ~ x )
We start by investigating all scatterplots:
pair_var <- c(2:5, 7:12, 17, 18)
pairs(df[pair_var])
judging by these pair plots, it looks like the most interesting correlations for death rates among confirmed cases might be cpi_index, quality_of_life_index, climate_index, obesity_rate, and possibly pollution_index and healthcare_index
m_health <- lm(death_per_confirmed ~ healthcare_index, df)
summary(m_health)
##
## Call:
## lm(formula = death_per_confirmed ~ healthcare_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.04879 -0.02748 -0.01146 0.01541 0.11308
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0017337 0.0281954 0.061 0.951
## healthcare_index 0.0006503 0.0004300 1.512 0.135
##
## Residual standard error: 0.03761 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.02884, Adjusted R-squared: 0.01623
## F-statistic: 2.287 on 1 and 77 DF, p-value: 0.1346
m_pol <- lm(death_per_confirmed ~ pollution_index, df)
summary(m_pol)
##
## Call:
## lm(formula = death_per_confirmed ~ pollution_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.05098 -0.02680 -0.01166 0.01436 0.11923
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0620991 0.0116320 5.339 9.17e-07 ***
## pollution_index -0.0003433 0.0002044 -1.680 0.097 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03748 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.03536, Adjusted R-squared: 0.02283
## F-statistic: 2.822 on 1 and 77 DF, p-value: 0.09703
m_qol <- lm(death_per_confirmed ~ quality_of_life_index, df)
summary(m_qol)
##
## Call:
## lm(formula = death_per_confirmed ~ quality_of_life_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.049091 -0.024744 -0.009786 0.016445 0.115296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0156955 0.0173634 0.904 0.3688
## quality_of_life_index 0.0002095 0.0001252 1.674 0.0982 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03749 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.03511, Adjusted R-squared: 0.02258
## F-statistic: 2.802 on 1 and 77 DF, p-value: 0.09823
m_clim <- lm(death_per_confirmed ~ climate_index, df)
summary(m_clim)
##
## Call:
## lm(formula = death_per_confirmed ~ climate_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.043911 -0.024494 -0.008463 0.010148 0.112508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0159346 0.0193363 -0.824 0.41244
## climate_index 0.0007742 0.0002447 3.164 0.00223 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0359 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.115, Adjusted R-squared: 0.1035
## F-statistic: 10.01 on 1 and 77 DF, p-value: 0.002231
m_obes <- lm(death_per_confirmed ~ obesity_rate, df)
summary(m_obes)
##
## Call:
## lm(formula = death_per_confirmed ~ obesity_rate, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.04562 -0.02683 -0.01021 0.01362 0.11880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0411568 0.0117408 3.505 0.000768 ***
## obesity_rate 0.0001445 0.0005058 0.286 0.775840
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03823 on 76 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.001073, Adjusted R-squared: -0.01207
## F-statistic: 0.08166 on 1 and 76 DF, p-value: 0.7758
m_cpi <- lm(death_per_confirmed ~ cpi_index, df)
summary(m_cpi)
##
## Call:
## lm(formula = death_per_confirmed ~ cpi_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.05366 -0.02694 -0.00789 0.01465 0.11304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0273541 0.0120517 2.270 0.026 *
## cpi_index 0.0003170 0.0002164 1.465 0.147
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03764 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.02713, Adjusted R-squared: 0.01449
## F-statistic: 2.147 on 1 and 77 DF, p-value: 0.1469
It seems like climate index is he only variable that has any statistical significance at all. Let’s take a closer look at that.
ggplot(df, aes(x=climate_index, y=death_per_confirmed)) + geom_point() + geom_smooth(method = "lm", formula = y~x)
This suggests that climate may have an effect on death rates. However, Numbeo’s climate index statistic is not an average temperature, but rather a “climate-likeability” index, meaning it is more along the lines of a probability that the average person will enjoy this climate. Thus, warm but temperate climates are more likely to score higher, whereas more extreme or uncomfortable climates score lower. So this could either mean that the virus may be less lethal in more extreme climates, or it could be that death rates are higher in areas people are more likely to live.
We repeat the analysis above with deaths per capita (per 10’000 inhabitants) and compare the results to the above model fitting
m_health2 <- lm(death_per_10kcapita ~ healthcare_index, df)
summary(m_health2)
##
## Call:
## lm(formula = death_per_10kcapita ~ healthcare_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6376 -0.7040 -0.4093 0.1038 6.2830
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.99576 1.02644 -1.944 0.05551 .
## healthcare_index 0.04193 0.01565 2.678 0.00904 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.369 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.08522, Adjusted R-squared: 0.07334
## F-statistic: 7.174 on 1 and 77 DF, p-value: 0.009041
m_pol2 <- lm(death_per_10kcapita ~ pollution_index, df)
summary(m_pol2)
##
## Call:
## lm(formula = death_per_10kcapita ~ pollution_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2448 -0.6064 -0.3384 -0.0574 6.6798
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.72755 0.42689 4.047 0.000122 ***
## pollution_index -0.01895 0.00750 -2.527 0.013563 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.376 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.07657, Adjusted R-squared: 0.06457
## F-statistic: 6.385 on 1 and 77 DF, p-value: 0.01356
m_qol2 <- lm(death_per_10kcapita ~ quality_of_life_index, df)
summary(m_qol2)
##
## Call:
## lm(formula = death_per_10kcapita ~ quality_of_life_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4272 -0.7567 -0.2750 0.0550 6.4096
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.212477 0.622863 -1.947 0.05523 .
## quality_of_life_index 0.014380 0.004491 3.202 0.00198 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.345 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1175, Adjusted R-squared: 0.1061
## F-statistic: 10.25 on 1 and 77 DF, p-value: 0.001984
m_clim2 <- lm(death_per_10kcapita ~ climate_index, df)
summary(m_clim2)
##
## Call:
## lm(formula = death_per_10kcapita ~ climate_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1457 -0.7116 -0.4619 0.0076 6.5156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.749941 0.751667 -0.998 0.3215
## climate_index 0.019053 0.009513 2.003 0.0487 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.396 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.04951, Adjusted R-squared: 0.03717
## F-statistic: 4.011 on 1 and 77 DF, p-value: 0.04872
m_obes2 <- lm(death_per_10kcapita ~ obesity_rate, df)
summary(m_obes2)
##
## Call:
## lm(formula = death_per_10kcapita ~ obesity_rate, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9634 -0.6642 -0.4571 -0.2262 6.6635
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.35846 0.43932 0.816 0.417
## obesity_rate 0.01729 0.01893 0.913 0.364
##
## Residual standard error: 1.431 on 76 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.01086, Adjusted R-squared: -0.002154
## F-statistic: 0.8345 on 1 and 76 DF, p-value: 0.3639
m_cpi2 <- lm(death_per_10kcapita ~ cpi_index, df)
summary(m_cpi2)
##
## Call:
## lm(formula = death_per_10kcapita ~ cpi_index, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7138 -0.5866 -0.2376 0.0222 6.1599
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.66376 0.42613 -1.558 0.123420
## cpi_index 0.02658 0.00765 3.474 0.000845 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.331 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1355, Adjusted R-squared: 0.1243
## F-statistic: 12.07 on 1 and 77 DF, p-value: 0.0008446
Given that a lot of focus during the crisis has been on the health care system we will investigate this relationship.
With an adjusted R squared of 7% we cannot conclude that health care rating is a specifically strong indicator of the amount of people that die in a country. However, we want to investigate this further to see if we can see find a reason for why it could be such a small predictive power.
ggplot(df, aes(x= healthcare_index, y= death_per_10kcapita)) +
geom_point() +
geom_text(aes(label=country),hjust=0, vjust=-0.5) +
geom_smooth(method='lm', formula= y ~ x )
We can see that contrary to what we would have anticipated we have a potivie correlation between health care rating and the deaths per capita. The countries that stick out especially are Belgium, Spain, Italy, the UK and France. THey all have a death per capita of more than 4 per 10 thousand. while having a relatively well rated health care system. It seems like there is a large divide between the European countries and the rest of the world (specifically the Asian countries). Looking at the countries with the best rated healthcare we clearly see that Asian countries have a much lower death rate (less than 5 per million inhabitants) while the Western countries with similar ratings have 20 to 100 times higher rate of death per capita. THis might have more to do with the response and ability to lock down the economy effectively versus western countries:
head(arrange(df, desc(healthcare_index))[,c(1,4,17)])
Finally, we need to add that this is not the end of this epidemic and numbers will surely be updated and corrected for the foreseeable future.
GitHub: https://github.com/chilleundso/DATA607/blob/master/Final_Project/Data607_Final_merged.Rmd