Introduction

What is your survival probability if you are infected by new COVID-19? This is a complicated question because your response depends of a large number of social and economic variables. However, you can to know this response based in the country that you live.

The datasets about the new COVID-19 generaly is available in a aggregate level with little informations preventing the construction of specific estimations. However, this data is easily found in a country level. Based in this, we can to calculate the tax at which infected people die daile in consequence of the new COVID-19 complications.

Getting the data

The first step is to import the data about NCOV-19 cases in several countries. Fot this, I’m using data available by Johns Hopkins University Center for Systems Science and Engineering. You also can to see this database through address https://github.com/CSSEGISandData/COVID-19.

I uses R for this process and the packages used are described below.

library(dplyr)
library(tidyr)
library(lubridate)
library(ggplot2)
library(tidyverse)
library(plotly)

The script below shows the import of data about the new COVID-19 confirmed cases in world countries.

data_address <- paste("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv", sep = "")

cases <- read_csv(data_address) %>% rename(province = "Province/State", 
                                           country_region = "Country/Region") %>%
  pivot_longer(-c(province, country_region, Lat, Long), names_to = "Date", 
               values_to = "cumulative_cases")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Province/State` = col_character(),
##   `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.

Now, I will choose some countries to do this analysis (China, Italy, Brazil, Japan, Germany, Iran and France).

cases <- cases %>% 
  filter(country_region== 'China' | country_region== 'Italy' | country_region== 'Brazil' |
           country_region== 'Japan' | country_region== 'Germany' | country_region== 'Iran'|
           country_region== 'France') %>% 
  select(Date, cumulative_cases)

Now, I Will go to import the database with new COVID-19 deaths information.

deaths_address <- paste("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv", sep = "")

deaths <-read_csv(deaths_address) %>% rename(province = "Province/State", 
                                          country = "Country/Region") %>%
 pivot_longer(-c(province, country, Lat, Long), names_to = "Date", 
              values_to = "cumulative_deaths") %>% 
 filter(country == 'China' | country == 'Italy' | country == 'Brazil' |
          country == 'Japan' | country == 'Germany' | country == 'Iran'|
          country == 'France') %>% rename(day = 'Date') %>% 
 select(country, day, cumulative_deaths)
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Province/State` = col_character(),
##   `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.

Now, I will go to join the two databases in a unique archive and to generate the sacrifice rate

  data <- cbind(cases, deaths) 
data$time <- as.numeric(as.factor(data$Date))
data<-data %>% mutate(sacrifice = ifelse(cumulative_deaths==0 | cumulative_cases ==0,
                                                           0, cumulative_deaths/cumulative_cases),
                                        Date_infection = mdy(Date))

the database has information from 1/1/2020 to the current date. The descriptive statistics of the sacrifice rate can be seen in table below:

dt<-data%>%
                  group_by(country)%>%
                  summarise_at(vars(sacrifice), funs(mean, max, min, sd))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
show(dt)
## # A tibble: 7 x 5
##   country    mean    max   min      sd
##   <chr>     <dbl>  <dbl> <dbl>   <dbl>
## 1 Brazil  0.00682 0.0437     0 0.0130 
## 2 China   0.00974 1          0 0.0255 
## 3 France  0.00495 0.111      0 0.0163 
## 4 Germany 0.00222 0.0158     0 0.00401
## 5 Iran    0.0636  1          0 0.128  
## 6 Italy   0.0421  0.123      0 0.0438 
## 7 Japan   0.0163  0.0372     0 0.0131

Now, I will go to plot the sacrifice rate by country in the script below. The points in the plot increase according to the number of confirmed cases in each country.

ggplotly(ggplot(data=data, aes(x = time, y = sacrifice, group = cumulative_cases, color = country, size = cumulative_cases)) + 
  geom_point(aes(frame = time, ids = country)) + 
  ylim(0,0.25)+ theme_classic() + 
  scale_x_log10()) %>% 
   animation_opts(
    1000, easing = "elastic", redraw = FALSE)
## Warning: Ignoring unknown aesthetics: frame, ids

Conclusions

If you live in Italy or Iran, you have more probability of dying from problems arising from the new COVID-19. For now, to live in Brazil, Germany or France is a comfortable option.