Lets first load the data set
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
data <- read.csv("C:/Users/dilip/Downloads/covid_R/COVID19_line_list_data.csv")
describe(data) # Hmisc command to check the dataset entries
## data 
## 
##  27  Variables      1085  Observations
## --------------------------------------------------------------------------------
## ï..id 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1085        0     1085        1      543      362     55.2    109.4 
##      .25      .50      .75      .90      .95 
##    272.0    543.0    814.0    976.6   1030.8 
## 
## lowest :    1    2    3    4    5, highest: 1081 1082 1083 1084 1085
## --------------------------------------------------------------------------------
## case_in_country 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      888      197      197        1    48.84    54.99     2.00     4.00 
##      .25      .50      .75      .90      .95 
##    11.00    28.00    67.25   110.30   153.65 
## 
## lowest :    1    2    3    4    5, highest:  365  443  875  925 1443
##                                                                             
## Value          0    20    40    60    80   100   120   140   160   180   200
## Frequency    215   241   137    81    84    40    22    19    22    19     1
## Proportion 0.242 0.271 0.154 0.091 0.095 0.045 0.025 0.021 0.025 0.021 0.001
##                                                     
## Value        280   300   360   440   880   920  1440
## Frequency      1     1     1     1     1     1     1
## Proportion 0.001 0.001 0.001 0.001 0.001 0.001 0.001
## 
## For the frequency table, variable is rounded to the nearest 20
## --------------------------------------------------------------------------------
## reporting.date 
##        n  missing distinct 
##     1084        1       43 
## 
## lowest : 02/01/20  02/02/20  02/03/20  02/04/20  02/05/20 
## highest: 2/24/2020 2/25/2020 2/26/2020 2/27/2020 2/28/2020
## --------------------------------------------------------------------------------
## summary 
##        n  missing distinct 
##     1080        5      967 
## 
## lowest : confirmed COVID-19 pneumonia patient No.11 in Tianjin: female, 55, symptom onset on 01/23/2020, hospitalized on 01/23/2020, confirmed on 01/26/2020                                                                                                 confirmed COVID-19 pneumonia patient No.12 in Tianjin: female, 79, symptom onset on 01/24/2020, hospitalized on 01/24/2020, confirmed on 01/26/2020                                                                                                 confirmed COVID-19 pneumonia patient No.13 in Tianjin: female, 19, symptom onset on 01/19/2020, hospitalized on 01/20/2020, confirmed on 01/26/2020                                                                                                 confirmed COVID-19 pneumonia patient No.14 in Tianjin: male, 71, Wuhan resident, visited Malaysia from 01/19/2020 to 01/25/2020, arrived in Tianjin on 01/25/2020, symptom onset on 01/25/2020, hospitalized on 01/25/2020, confirmed on 01/26/2020 confirmed imported COVID-19 pneumonia patient in Gansu: female, 20, lives in Wuhan, arrived in Gansu on 01/18/2020, symptom onset on 01/19/2020, visit clinic on 01/24/2020, hospitalized on 01/24/2020.                                           
## highest: new recovered imported COVID-19 pneumonia patient in Beijing: female, returned to Beijing from Wuhan on 01/08/2020, symptom onset afterwards, recovered on 01/24/2020.                                                                              new recovered imported COVID-19 pneumonia patient in Beijing: male, returned to Beijing from Wuhan on 01/08/2020, symptom onset afterwards, recovered on 01/25/2020.                                                                                Second confirmed imported COVID-19 pneumonia patient in Guangxi: male, 46, in contact with individuals from Wuhan before symptom onset. symptom onset on 01/20/2020.                                                                                Second confirmed imported COVID-19 pneumonia patient in Liaoning: male, 40, works in Wuhan, visit Fushun, Liaoning on 01/12/2020, symptom onset on 01/14/2020, visit clinic in Fushun Dalian on 01/19/2020.                                         Second confirmed imported COVID-19 pneumonia patient in Sichuan: male, 57, Wuhan resident, visited Sichuan on 01/15/2020, symptom onset on 01/16/2020 and hospitalized.                                                                            
## --------------------------------------------------------------------------------
## location 
##        n  missing distinct 
##     1085        0      156 
## 
## lowest : Afghanistan      Aichi Prefecture Alappuzha        Algeria          Amiens          
## highest: Yunnan           Zabaikalsky      Zaragoza         Zhejiang         Zhuhai          
## --------------------------------------------------------------------------------
## country 
##        n  missing distinct 
##     1085        0       38 
## 
## lowest : Afghanistan Algeria     Australia   Austria     Bahrain    
## highest: Thailand    UAE         UK          USA         Vietnam    
## --------------------------------------------------------------------------------
## gender 
##        n  missing distinct 
##      902      183        2 
##                         
## Value      female   male
## Frequency     382    520
## Proportion  0.424  0.576
## --------------------------------------------------------------------------------
## age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      843      242       85    0.999    49.48    20.79     22.0     25.0 
##      .25      .50      .75      .90      .95 
##     35.0     51.0     64.0     75.0     78.9 
## 
## lowest :  0.25  0.50  1.00  2.00  4.00, highest: 86.00 87.00 89.00 91.00 96.00
## --------------------------------------------------------------------------------
## symptom_onset 
##        n  missing distinct 
##      563      522       70 
## 
## lowest : 01/02/20  01/03/20  01/04/20  01/05/20  01/06/20 
## highest: 2/22/2020 2/23/2020 2/24/2020 2/25/2020 2/26/2020
## --------------------------------------------------------------------------------
## If_onset_approximated 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      560      525        2    0.123       24  0.04286  0.08219 
## 
## --------------------------------------------------------------------------------
## hosp_visit_date 
##        n  missing distinct 
##      507      578       60 
## 
## lowest : 01/01/20  01/03/20  01/05/20  01/06/20  01/08/20 
## highest: 2/24/2020 2/25/2020 2/26/2020 2/27/2020 2/28/2020
## --------------------------------------------------------------------------------
## exposure_start 
##        n  missing distinct 
##      128      957       39 
## 
## lowest : 01/03/20  01/06/20  01/08/20  01/09/20  01/10/20 
## highest: 2/15/2020 2/17/2020 2/19/2020 2/20/2020 2/21/2020
## --------------------------------------------------------------------------------
## exposure_end 
##        n  missing distinct 
##      341      744       52 
## 
## lowest : 01/02/20  01/03/20  01/04/20  01/05/20  01/06/20 
## highest: 2/21/2020 2/22/2020 2/23/2020 2/24/2020 2/25/2020
## --------------------------------------------------------------------------------
## visiting.Wuhan 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1085        0        2    0.437      192    0.177   0.2916 
## 
## --------------------------------------------------------------------------------
## from.Wuhan 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1081        4        2     0.37      156   0.1443   0.2472 
## 
## --------------------------------------------------------------------------------
## death 
##        n  missing distinct 
##     1085        0       14 
## 
## lowest : 0         02/01/20  1         2/13/2020 2/14/2020
## highest: 2/24/2020 2/25/2020 2/26/2020 2/27/2020 2/28/2020
## 
## 0 (1022, 0.942), 02/01/20 (1, 0.001), 1 (42, 0.039), 2/13/2020 (1, 0.001),
## 2/14/2020 (1, 0.001), 2/19/2020 (2, 0.002), 2/21/2020 (2, 0.002), 2/22/2020 (1,
## 0.001), 2/23/2020 (4, 0.004), 2/24/2020 (1, 0.001), 2/25/2020 (2, 0.002),
## 2/26/2020 (3, 0.003), 2/27/2020 (2, 0.002), 2/28/2020 (1, 0.001)
## --------------------------------------------------------------------------------
## recovered 
##        n  missing distinct 
##     1085        0       32 
## 
## lowest : 0         02/02/20  02/04/20  02/05/20  02/06/20 
## highest: 2/24/2020 2/25/2020 2/26/2020 2/27/2020 2/28/2020
## --------------------------------------------------------------------------------
## symptom 
##        n  missing distinct 
##      270      815      108 
## 
## lowest : chest discomfort                    chills                              cold, fever, pneumonia              cough                               cough with sputum                  
## highest: throat pain, chills                 throat pain, fever                  tired                               vomiting, cough, fever, sore throat vomiting, diarrhea, fever, cough   
## --------------------------------------------------------------------------------
## source 
##        n  missing distinct 
##     1085        0       85 
## 
## lowest : 央视新闻        ABC                 ABC News            新浪              Al Arabiya         
## highest: Wa.de               Washington Examiner Xin Hua Net         Yahoo News          Yonnhap News Agency
## --------------------------------------------------------------------------------
## link 
##        n  missing distinct 
##     1085        0      490 
## 
## lowest : http://behdasht.gov.ir/news/%DA%A9%D8%B1%D9%88%D9%86%D8%A7+%D9%88%DB%8C%D8%B1%D9%88%D8%B3/199807/%D8%AF%D8%B1+%D8%B1%D9%88%D8%B2%D9%87%D8%A7%DB%8C+%DA%AF%D8%B0%D8%B4%D8%AA%D9%87+735+%D8%A8%DB%8C%D9%85%D8%A7%D8%B1+%D8%A8%D8%A7+%D8%B9%D9%84%D8%A7%D8%A6%D9%85+%D8%B4%D8%A8%D9%87+%D8%A2%D9%86%D9%81%D9%84%D9%88%D8%A2%D9%86%D8%B2%D8%A7+%D8%AF%D8%B1+%DA%A9%D8%B4%D9%88%D8%B1+%D8%A8%D8%B3%D8%AA%D8%B1%DB%8C+%D8%B4%D8%AF%D9%86%D8%AF+%D8%A8%D8%B1+%D8%A7%D8%B3%D8%A7%D8%B3+%D8%A2%D8%AE%D8%B1%DB%8C%D9%86+%D9%86%D8%AA%D8%A7%DB%8C%D8%AC+%D8%A2%D8%B2%D9%85%D8%A7%DB%8C%D8%B4+%D9%87%D8%A7+%D8%A7%D8%A8%D8%AA%D9%84%D8%A7%DB%8C+13+%D9%85%D9%88%D8%B1%D8%AF+%D8%AF%DB%8C%DA%AF%D8%B1+%D8%A8%D9%87+%DA%A9%D9%88%D9%88%DB%8C%D8%AF19+%D9%82%D8%B7%D8%B9%DB%8C+%D8%A8%D9%87+%D9%86%D8%B8%D8%B1+%D9%85%DB%8C+%D8%B1%D8%B3%D8%AF http://english.alarabiya.net/en/News/gulf/2020/02/25/Number-of-Kuwait-coronavirus-cases-rises-to-eight-KUNA.html                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                http://sxwjw.shaanxi.gov.cn/art/2020/1/27/art_9_67483.html                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      http://wjw.beijing.gov.cn/xwzx_20031/wnxw/202001/t20200121_1620353.html                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         http://wjw.sz.gov.cn/wzx/202001/t20200120_18987787.htm                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
## highest: https://www3.nhk.or.jp/nhkworld/en/news/20200116_23/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://www3.nhk.or.jp/nhkworld/en/news/20200124_14/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://www3.nhk.or.jp/nhkworld/en/news/20200126_31/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://www3.nhk.or.jp/nhkworld/en/news/20200130_02/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://www3.nhk.or.jp/nhkworld/en/news/20200131_01/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
## --------------------------------------------------------------------------------
## 
## Variables with all observations missing:
## 
## [1] X   X.1 X.2 X.3 X.4 X.5 X.6
Cleaning the dataset and the required columns for analysis and calculating the death rate.
data$death_dummy <- as.integer(data$death != 0)
# Lets calculate the death rate
sum(data$death_dummy)/nrow(data)
## [1] 0.05806452
AGE
Claim: the people who died were actually old
dead = subset(data, death_dummy == 1)
alive = subset(data, death_dummy == 0)

# Mean of the people who died and who are alive
mean(dead$age, na.rm = TRUE)
## [1] 68.58621
mean(alive$age, na.rm = TRUE)
## [1] 48.07229
t.test(dead$age, alive$age, alternative = "two.sided",conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  dead$age and alive$age
## t = 10.839, df = 72.234, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  16.74114 24.28669
## sample estimates:
## mean of x mean of y 
##  68.58621  48.07229
After Calculating the mean of the ages of the people who died and the people who are alive we came to a conclusion that the difference in the age span is 20 years. To check the significance of our claim we are running a t-test on the tibble.
Normally, the p-value<0.05 we reject the null hypothesis.
Here we got a p-value ~ 0, so we can directly reject the null hypothesis.
Hence we can approve that the claim we made about the age is correct and the people who died were actually old.
GENDER
Claim: gender does matters about the death cause.
men = subset(data, gender == "male")
women = subset(data, gender == "female")

# Mean of the people who died and who are alive
mean(men$death_dummy, na.rm = TRUE)
## [1] 0.08461538
mean(women$death_dummy, na.rm = TRUE)
## [1] 0.03664921
# we can see that the difference is 20 years
# So is this statistically significant?
t.test(men$death_dummy, women$death_dummy, alternative = "two.sided",conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  men$death_dummy and women$death_dummy
## t = 3.084, df = 894.06, p-value = 0.002105
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.01744083 0.07849151
## sample estimates:
##  mean of x  mean of y 
## 0.08461538 0.03664921
After calculating the mean of the gender wise deaths of the people, the age gap is still 20 years among men and women. To check the significance of our claim we again run a t-test on the tibble.
Hence we approve our claim that the people who actually died were more older than the people who didn’t died.
95% confidence level that men have more deaths than women and have 1.7% to 7.8% higher chance of dying
p-value = 0.002105 < 0.05, hence we reject the null hypothesis and approve that our claim is wrong and
improve the claim to men have more chances of dying than women.