Introduction

Cardiovascular disease is one of the leading causes of death among men and women of all racial and ethnic group around the world. Heart failure occurs when adequate amount of blood cannot be pumped by the heart to satisfy the body’s need. As per Centre for Disease Control and Prevention, United States, one person dies every 36 seconds in the United States from cardiovascular disease. Also, here in Australia, over one million people are living with heart disease, stroke or vascular conditions as per Department of Health, Australian government.

In the first part of analysis, the aim was to examine the relationships between the variables under consideration for the prediction and analysis of people who have a high risk of having cardiovascular disease. We analyse the records of 300 patients from dataset ‘heart_failure.csv’ who had heart failure. The dataset was sourced from GitHub vaksakalli/datasets. (2020). Retrieved 27 September 2020, from https://github.com/vaksakalli/datasets/blob/master/heart_failure.csv. These medical records contain their body features and certain laboratory test results. A logistic regression model can be prepared to predict whether the person suffering from heart failure will survive or not.

In this part of the analysis, the aim is to examine the probablity of a survival status of a patient based on his medical records. For this analysis, a binomial logistic regression analysis model was formulated. The response variable used was the binary digit variable ‘Death Event Patient of a Patient’. The ‘Death Event of a Patient’, which is also the response variable has two levels, 0-‘Dead’ and 1-‘Survived’. It helps to predict the survival status of a patient from cardio vascular attack. The variables ‘Does Patient Suffer from Anaemia’,‘Does Patient suffer from Diabetes’,‘Does Patient suffer from High Blood Pressure’,‘Gender of the Patient’ and ‘Does the Patient Smoke’ are used in the analysis. The significant predictors generated from the binomial logistic regression model are analysed to compare the survival status of the patient comparing with different levels of the same predictor.

library(dplyr)
library(tidyr)
library(car)
library(knitr)
library(readr)
library(ResourceSelection)
library(ggplot2)
library(oddsratio)

** Uploading file to R Studio

risk <- read_csv("~/R/ProjectGroup74_Data.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   age = col_double(),
##   anaemia = col_double(),
##   creatinine_phosphokinase = col_double(),
##   diabetes = col_double(),
##   ejection_fraction = col_double(),
##   high_blood_pressure = col_double(),
##   platelets = col_double(),
##   serum_creatinine = col_double(),
##   serum_sodium = col_double(),
##   sex = col_double(),
##   smoking = col_double(),
##   time = col_double(),
##   DEATH_EVENT = col_double()
## )
head(risk)
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 6 x 13
##     age anaemia creatinine_phos~ diabetes ejection_fracti~ high_blood_pres~
##   <dbl>   <dbl>            <dbl>    <dbl>            <dbl>            <dbl>
## 1    75       0              582        0               20                1
## 2    55       0             7861        0               38                0
## 3    65       0              146        0               20                0
## 4    50       1              111        0               20                0
## 5    65       1              160        1               20                0
## 6    90       1               47        0               40                1
## # ... with 7 more variables: platelets <dbl>, serum_creatinine <dbl>,
## #   serum_sodium <dbl>, sex <dbl>, smoking <dbl>, time <dbl>, DEATH_EVENT <dbl>

** Checking structure of Dataset

str(risk)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 299 obs. of  13 variables:
##  $ age                     : num  75 55 65 50 65 90 75 60 65 80 ...
##  $ anaemia                 : num  0 0 0 1 1 1 1 1 0 1 ...
##  $ creatinine_phosphokinase: num  582 7861 146 111 160 ...
##  $ diabetes                : num  0 0 0 0 1 0 0 1 0 0 ...
##  $ ejection_fraction       : num  20 38 20 20 20 40 15 60 65 35 ...
##  $ high_blood_pressure     : num  1 0 0 0 0 1 0 0 0 1 ...
##  $ platelets               : num  265000 263358 162000 210000 327000 ...
##  $ serum_creatinine        : num  1.9 1.1 1.3 1.9 2.7 2.1 1.2 1.1 1.5 9.4 ...
##  $ serum_sodium            : num  130 136 129 137 116 132 137 131 138 133 ...
##  $ sex                     : num  1 1 1 1 0 1 1 1 0 1 ...
##  $ smoking                 : num  0 0 1 0 0 1 0 1 0 1 ...
##  $ time                    : num  4 6 7 7 8 8 10 10 10 10 ...
##  $ DEATH_EVENT             : num  1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   age = col_double(),
##   ..   anaemia = col_double(),
##   ..   creatinine_phosphokinase = col_double(),
##   ..   diabetes = col_double(),
##   ..   ejection_fraction = col_double(),
##   ..   high_blood_pressure = col_double(),
##   ..   platelets = col_double(),
##   ..   serum_creatinine = col_double(),
##   ..   serum_sodium = col_double(),
##   ..   sex = col_double(),
##   ..   smoking = col_double(),
##   ..   time = col_double(),
##   ..   DEATH_EVENT = col_double()
##   .. )

** Converting variables with binary response to Factor

risk$anaemia <- factor(risk$anaemia, levels = c(0,1), labels = c("No","Yes"))
risk$diabetes <- factor(risk$diabetes, levels = c(0,1), labels = c("No","Yes"))
risk$high_blood_pressure <- factor(risk$high_blood_pressure, levels = c(0,1), labels = c("No","Yes"))
risk$sex <- factor(risk$sex, levels = c(0,1), labels = c("Female","Male"))
risk$smoking <- factor(risk$smoking, levels = c(0,1), labels = c("No","Yes"))
risk$DEATH_EVENT <- factor(risk$DEATH_EVENT,levels = c(0,1), labels = c("No","Yes"))

min(risk$age,na.rm = FALSE)
## [1] 40
max(risk$age,na.rm = FALSE)
## [1] 95
risk <- risk %>% mutate(Age_Group = 
                           case_when(age>=35 & age<=44 ~ '35 to 44 yrs',
                           age>=45 & age<=54 ~ '45 to 54 yrs',
                           age>=55 & age<=64 ~ '55 to 64 yrs',
                           age>=65 & age<=74 ~ '65 to 74 yrs',
                           age>=75 & age<=84 ~ '75 to 84 yrs',
                           age>=85 & age<=95 ~ '85 and above',))

risk$Age_Group <- factor(risk$Age_Group, levels = c('35 to 44 yrs','45 to 54 yrs','55 to 64 yrs','65 to 74 yrs','75 to 84 yrs','85 and above'),
                                 labels = c('35 to 44 yrs','45 to 54 yrs','55 to 64 yrs','65 to 74 yrs','75 to 84 yrs','85 and above'))

head(risk)
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 6 x 14
##     age anaemia creatinine_phos~ diabetes ejection_fracti~ high_blood_pres~
##   <dbl> <fct>              <dbl> <fct>               <dbl> <fct>           
## 1    75 No                   582 No                     20 Yes             
## 2    55 No                  7861 No                     38 No              
## 3    65 No                   146 No                     20 No              
## 4    50 Yes                  111 No                     20 No              
## 5    65 Yes                  160 Yes                    20 No              
## 6    90 Yes                   47 No                     40 Yes             
## # ... with 8 more variables: platelets <dbl>, serum_creatinine <dbl>,
## #   serum_sodium <dbl>, sex <fct>, smoking <fct>, time <dbl>,
## #   DEATH_EVENT <fct>, Age_Group <fct>

** Summary of Data

mlr::summarizeColumns(risk) %>% knitr::kable(caption = "Summary of Dataset")
Summary of Dataset
name type na mean disp median mad min max nlevs
age numeric 0 60.83389 1.189481e+01 60.0 14.82600 40.0 95.0 0
anaemia factor 0 NA 4.314381e-01 NA NA 129.0 170.0 2
creatinine_phosphokinase numeric 0 581.83946 9.702879e+02 250.0 269.83320 23.0 7861.0 0
diabetes factor 0 NA 4.180602e-01 NA NA 125.0 174.0 2
ejection_fraction numeric 0 38.08361 1.183484e+01 38.0 11.86080 14.0 80.0 0
high_blood_pressure factor 0 NA 3.511706e-01 NA NA 105.0 194.0 2
platelets numeric 0 263358.02926 9.780424e+04 262000.0 65234.40000 25100.0 850000.0 0
serum_creatinine numeric 0 1.39388 1.034510e+00 1.1 0.29652 0.5 9.4 0
serum_sodium numeric 0 136.62542 4.412477e+00 137.0 4.44780 113.0 148.0 0
sex factor 0 NA 3.511706e-01 NA NA 105.0 194.0 2
smoking factor 0 NA 3.210702e-01 NA NA 96.0 203.0 2
time numeric 0 130.26087 7.761421e+01 115.0 105.26460 4.0 285.0 0
DEATH_EVENT factor 0 NA 3.210702e-01 NA NA 96.0 203.0 2
Age_Group factor 0 NA 7.023411e-01 NA NA 14.0 89.0 6

** Checking levels of Factor Vairables

levels(risk$anaemia)
## [1] "No"  "Yes"
levels(risk$diabetes)
## [1] "No"  "Yes"
levels(risk$high_blood_pressure)
## [1] "No"  "Yes"
levels(risk$sex)
## [1] "Female" "Male"
levels(risk$smoking)
## [1] "No"  "Yes"
levels(risk$DEATH_EVENT)
## [1] "No"  "Yes"
levels(risk$Age_Group)
## [1] "35 to 44 yrs" "45 to 54 yrs" "55 to 64 yrs" "65 to 74 yrs" "75 to 84 yrs"
## [6] "85 and above"

Statistical Modelling

Model Fitting

model_one<-glm(DEATH_EVENT~1,data=risk,family='binomial')
model_two<-glm(DEATH_EVENT~.,data=risk,family='binomial')
final_model <- step(model_one, 
                   scope = list(lower = model_one,
                                upper = model_two),
                   direction = "forward")
## Start:  AIC=377.35
## DEATH_EVENT ~ 1
## 
##                            Df Deviance    AIC
## + time                      1   279.07 283.07
## + serum_creatinine          1   347.25 351.25
## + ejection_fraction         1   351.97 355.97
## + age                       1   355.99 359.99
## + Age_Group                 5   350.23 362.23
## + serum_sodium              1   364.02 368.02
## <none>                          375.35 377.35
## + high_blood_pressure       1   373.49 377.49
## + anaemia                   1   374.04 378.04
## + creatinine_phosphokinase  1   374.23 378.23
## + platelets                 1   374.61 378.61
## + smoking                   1   375.30 379.30
## + sex                       1   375.34 379.34
## + diabetes                  1   375.35 379.35
## 
## Step:  AIC=283.07
## DEATH_EVENT ~ time
## 
##                            Df Deviance    AIC
## + ejection_fraction         1   256.08 262.08
## + serum_creatinine          1   259.64 265.64
## + serum_sodium              1   269.83 275.83
## + age                       1   271.46 277.46
## + Age_Group                 5   267.46 281.46
## <none>                          279.07 283.07
## + creatinine_phosphokinase  1   277.90 283.90
## + platelets                 1   277.92 283.92
## + smoking                   1   278.81 284.81
## + high_blood_pressure       1   278.96 284.96
## + sex                       1   279.02 285.02
## + diabetes                  1   279.06 285.06
## + anaemia                   1   279.07 285.07
## 
## Step:  AIC=262.08
## DEATH_EVENT ~ time + ejection_fraction
## 
##                            Df Deviance    AIC
## + serum_creatinine          1   235.41 243.41
## + age                       1   244.51 252.51
## + Age_Group                 5   240.59 256.59
## + serum_sodium              1   249.73 257.73
## <none>                          256.08 262.08
## + sex                       1   254.98 262.98
## + smoking                   1   255.20 263.20
## + platelets                 1   255.22 263.22
## + creatinine_phosphokinase  1   255.33 263.33
## + high_blood_pressure       1   255.93 263.93
## + diabetes                  1   256.05 264.05
## + anaemia                   1   256.08 264.08
## 
## Step:  AIC=243.41
## DEATH_EVENT ~ time + ejection_fraction + serum_creatinine
## 
##                            Df Deviance    AIC
## + age                       1   226.30 236.30
## + Age_Group                 5   221.91 239.91
## + serum_sodium              1   232.02 242.02
## <none>                          235.41 243.41
## + creatinine_phosphokinase  1   234.63 244.63
## + sex                       1   234.69 244.69
## + platelets                 1   234.90 244.90
## + smoking                   1   235.20 245.20
## + diabetes                  1   235.33 245.33
## + high_blood_pressure       1   235.41 245.41
## + anaemia                   1   235.41 245.41
## 
## Step:  AIC=236.3
## DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + age
## 
##                            Df Deviance    AIC
## + serum_sodium              1   223.49 235.49
## <none>                          226.30 236.30
## + sex                       1   225.08 237.08
## + creatinine_phosphokinase  1   225.12 237.12
## + diabetes                  1   225.87 237.87
## + platelets                 1   225.93 237.93
## + smoking                   1   225.95 237.95
## + high_blood_pressure       1   226.27 238.27
## + anaemia                   1   226.28 238.28
## + Age_Group                 5   221.89 241.89
## 
## Step:  AIC=235.49
## DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + age + 
##     serum_sodium
## 
##                            Df Deviance    AIC
## <none>                          223.49 235.49
## + sex                       1   222.04 236.04
## + creatinine_phosphokinase  1   222.18 236.18
## + smoking                   1   223.09 237.09
## + diabetes                  1   223.25 237.25
## + platelets                 1   223.26 237.26
## + high_blood_pressure       1   223.46 237.46
## + anaemia                   1   223.48 237.48
## + Age_Group                 5   219.72 241.72
summary(final_model)
## 
## Call:
## glm(formula = DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + 
##     age + serum_sodium, family = "binomial", data = risk)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1590  -0.5888  -0.2281   0.5144   2.7959  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        9.493034   5.405768   1.756  0.07907 .  
## time              -0.020895   0.002916  -7.166 7.74e-13 ***
## ejection_fraction -0.073430   0.015785  -4.652 3.29e-06 ***
## serum_creatinine   0.685990   0.174044   3.941 8.10e-05 ***
## age                0.042466   0.015030   2.825  0.00472 ** 
## serum_sodium      -0.064557   0.038377  -1.682  0.09254 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 375.35  on 298  degrees of freedom
## Residual deviance: 223.49  on 293  degrees of freedom
## AIC: 235.49
## 
## Number of Fisher Scoring iterations: 6
final_model$coefficients
##       (Intercept)              time ejection_fraction  serum_creatinine 
##        9.49303414       -0.02089486       -0.07342996        0.68599031 
##               age      serum_sodium 
##        0.04246630       -0.06455724
final_model$residuals
##          1          2          3          4          5          6          7 
##   1.017654   1.411569   1.040665   1.085390   1.006866   1.043732   1.035209 
##          8          9         10         11         12         13         14 
##   2.318037   2.837611   1.000344   1.018955   1.190020   1.414152   1.602653 
##         15         16         17         18         19         20         21 
##  -3.453061   1.318497   1.185609   1.087730   1.140229   1.511043  -9.420945 
##         22         23         24         25         26         27         28 
##   1.142225   1.378260  -1.275774   1.074332   1.196605   1.154780   1.481879 
##         29         30         31         32         33         34         35 
##   1.019244   1.079600   1.064890   1.063948   1.413165  -3.058110   2.456635 
##         36         37         38         39         40         41         42 
##   1.050965   1.349247   1.997885  -7.991555   1.219881   1.052129   1.454654 
##         43         44         45         46         47         48         49 
##   1.725040  -2.252196   5.335578   1.505178   1.285358   2.085918   1.006756 
##         50         51         52         53         54         55         56 
##   1.618099   1.240844   1.255739   1.160534   2.094040   1.273070   1.043713 
##         57         58         59         60         61         62         63 
##  -6.532160  -1.593443   1.394210   1.140688   1.973254   1.579124  -1.807407 
##         64         65         66         67         68         69         70 
##   4.050872  -1.021892   1.048522   1.394946   1.366166   1.403764   1.210604 
##         71         72         73         74         75         76         77 
##  -1.199265  -1.635356   1.303424  -1.416804   1.201533   1.574747  -1.138683 
##         78         79         80         81         82         83         84 
##  -1.241322  -2.344529  -1.236334  -2.415893  -1.384387   1.166997  -2.215607 
##         85         86         87         88         89         90         91 
##   1.890104  -1.108479  -1.617661  -1.101027  -1.179137  -1.891827  -1.476516 
##         92         93         94         95         96         97         98 
##  -1.433854  -1.056279   1.389520  -1.284943  -1.074231  -3.315221  -1.175278 
##         99        100        101        102        103        104        105 
##  -2.503876  -1.427770  -2.286287  -1.619360  -3.003874  -1.334153  -1.332124 
##        106        107        108        109        110        111        112 
##   1.464755  -1.281684  -1.273546  -1.746620  -1.278004   3.415807  -1.598131 
##        113        114        115        116        117        118        119 
##  -2.243416   6.889758  -1.897051  -1.310361  -1.072187  -2.019207  -1.079088 
##        120        121        122        123        124        125        126 
##   1.377390  -1.130545  -1.563642  -1.284262  -1.410530   1.373290  -1.110509 
##        127        128        129        130        131        132        133 
##   1.213489  -1.054024  -1.553177  -1.952472  -1.036677 -10.285516  -1.171815 
##        134        135        136        137        138        139        140 
##  -1.052752  -2.096509  -1.588713  -1.073156  -4.805649  -1.451787  -1.300201 
##        141        142        143        144        145        146        147 
##   2.076399  -1.161746  -1.463822  -1.194498   1.555295  -1.212428  -1.319361 
##        148        149        150        151        152        153        154 
##  -1.066245   1.487395  -1.334761   2.169881  -1.045931  -1.112267  -1.228765 
##        155        156        157        158        159        160        161 
##  -1.348164  -1.840338  -1.302538  -1.440177  -1.424862  -1.128356  -1.368930 
##        162        163        164        165        166        167        168 
##  -1.120155  -1.208919   9.467110   6.546604   2.552355  -1.028216   1.494184 
##        169        170        171        172        173        174        175 
##  -1.160086  -1.309917  -1.184190  -1.066724  -1.027406  -1.265122  -1.203049 
##        176        177        178        179        180        181        182 
##  -1.026296  -1.198936  -1.029564  -1.021353  -1.064707  -1.103721   3.901498 
##        183        184        185        186        187        188        189 
##   3.131468   2.541868   5.275709   5.294704  49.820074   4.470562  -1.077648 
##        190        191        192        193        194        195        196 
##  -1.015468  -1.591255  -1.029692  -1.055593  -1.177540   5.234647  12.341878 
##        197        198        199        200        201        202        203 
##  -1.037375  -1.111235  -1.182068  -1.621470  -1.033826  -1.006860  -1.016276 
##        204        205        206        207        208        209        210 
##  -1.922290  -1.090421  -1.024994  -1.014896  -1.172492  -1.039951  -1.064110 
##        211        212        213        214        215        216        217 
##  -1.447239  -1.004351  -1.059225   8.382024  -1.084400  -1.163596  -1.022677 
##        218        219        220        221        222        223        224 
##   2.127165  -1.110618  -1.032861   2.503832  -1.014946  -1.024425  -1.070754 
##        225        226        227        228        229        230        231 
##  -1.091995  -1.044135  -1.166479  -1.033449  -4.095248  -1.237435   5.570897 
##        232        233        234        235        236        237        238 
##  -1.095695  -1.018829  -1.036973  -1.018000  -1.034550  -1.015601  -1.161458 
##        239        240        241        242        243        244        245 
##  -1.042190  -1.019854  -1.073113  -1.103595  -1.021639  -1.037632  -1.056664 
##        246        247        248        249        250        251        252 
##  -1.047767  14.913772  -1.311853  -1.013154  -1.039392  -1.025009  -1.019908 
##        253        254        255        256        257        258        259 
##  -1.016899  -1.100748  -1.004331  -1.030317  -1.052806  -1.019692  -1.033241 
##        260        261        262        263        264        265        266 
##  -1.006809  -1.014997  -1.022693   7.424996  -1.004848  -1.019481  -1.012412 
##        267        268        269        270        271        272        273 
##   8.931869  -1.020458  -1.007407  -1.013403  -1.039378  -1.012155  -1.035642 
##        274        275        276        277        278        279        280 
##  -1.004807  -1.032081  -1.008221  -1.026354  -1.023560  -1.017841  -1.018504 
##        281        282        283        284        285        286        287 
##  -1.022575  -1.072596  -1.164218  -1.019940  -1.006107  -1.014636  -1.026660 
##        288        289        290        291        292        293        294 
##  -1.003366  -1.016936  -1.030099  -1.001418  -1.019585  -1.007971  -1.014911 
##        295        296        297        298        299 
##  -1.008370  -1.008443  -1.000769  -1.004920  -1.004867
final_model$fitted.values
##            1            2            3            4            5            6 
## 0.9826518339 0.7084312920 0.9609238764 0.9213277502 0.9931804525 0.9581000445 
##            7            8            9           10           11           12 
## 0.9659885517 0.4313994974 0.3524091133 0.9996556600 0.9813975072 0.8403218170 
##           13           14           15           16           17           18 
## 0.7071376317 0.6239655388 0.7104018673 0.7584392886 0.8434483918 0.9193458897 
##           19           20           21           22           23           24 
## 0.8770169306 0.6617944261 0.8938535364 0.8754844882 0.7255526185 0.2161623776 
##           25           26           27           28           29           30 
## 0.9308112200 0.8356973657 0.8659655578 0.6748190824 0.9811191910 0.9262687613 
##           31           32           33           34           35           36 
## 0.9390639307 0.9398952837 0.7076313370 0.6730006900 0.4070608714 0.9515061862 
##           37           38           39           40           41           42 
## 0.7411543604 0.5005292468 0.8748679068 0.8197523148 0.9504534707 0.6874486126 
##           43           44           45           46           47           48 
## 0.5796966257 0.5559889743 0.1874211070 0.6643732333 0.7779934925 0.4794051349 
##           49           50           51           52           53           54 
## 0.9932891332 0.6180091519 0.8059029943 0.7963440941 0.8616726198 0.4775458579 
##           55           56           57           58           59           60 
## 0.7855028311 0.9581180184 0.8469112806 0.3724281338 0.7172518820 0.8766642836 
##           61           62           63           64           65           66 
## 0.5067772232 0.6332623261 0.4467212731 0.2468604445 0.0214231088 0.9537236907 
##           67           68           69           70           71           72 
## 0.7168735346 0.7319755764 0.7123705434 0.8260336160 0.1661561334 0.3885123707 
##           73           74           75           76           77           78 
## 0.7672102471 0.2941860454 0.8322701660 0.6350224518 0.1217925094 0.1944069575 
##           79           80           81           82           83           84 
## 0.5734750855 0.1911569538 0.5860744175 0.2776587159 0.8569001169 0.5486564820 
##           85           86           87           88           89           90 
## 0.5290713505 0.0978630377 0.3818236385 0.0917568623 0.1519220935 0.4714103576 
##           91           92           93           94           95           96 
## 0.3227298082 0.3025787513 0.0532805606 0.7196730094 0.2217550825 0.0691015807 
##           97           98           99          100          101          102 
## 0.6983610286 0.1491375468 0.6006192505 0.2996069650 0.5626095891 0.3824721554 
##          103          104          105          106          107          108 
## 0.6670965989 0.2504607070 0.2493191330 0.6827080691 0.2197763297 0.2147905691 
##          109          110          111          112          113          114 
## 0.4274655336 0.2175298218 0.2927566009 0.3742691933 0.5542511027 0.1451429717 
##          115          116          117          118          119          120 
## 0.4728659430 0.2368517756 0.0673269354 0.5047559534 0.0732915452 0.7260110019 
##          121          122          123          124          125          126 
## 0.1154706083 0.3604676071 0.2213425870 0.2910463805 0.7281784898 0.0995123264 
##          127          128          129          130          131          132 
## 0.8240700981 0.0512546648 0.3561582448 0.4878288477 0.0353791907 0.9027758988 
##          133          134          135          136          137          138 
## 0.1466227941 0.0501088551 0.5230166795 0.3705598598 0.0681694002 0.7919115502 
##          139          140          141          142          143          144 
## 0.3111938817 0.2308883275 0.4816029144 0.1392267366 0.3168568352 0.1628280473 
##          145          146          147          148          149          150 
## 0.6429648430 0.1752087478 0.2420572807 0.0621294572 0.6723162987 0.2508020275 
##          151          152          153          154          155          156 
## 0.4608548535 0.0439138922 0.1009355915 0.1861744706 0.2582505598 0.4566216285 
##          157          158          159          160          161          162 
## 0.2322679856 0.3056409757 0.2981778465 0.1137548286 0.2695023317 0.1072663028 
##          163          164          165          166          167          168 
## 0.1728150402 0.1056288521 0.1527509622 0.3917950881 0.0274413007 0.6692614106 
##          169          170          171          172          173          174 
## 0.1379948692 0.2365930632 0.1555408767 0.0625504636 0.0266752066 0.2095625209 
##          175          176          177          178          179          180 
## 0.1687789520 0.0256220836 0.1659271591 0.0287154653 0.0209069876 0.0607746139 
##          181          182          183          184          185          186 
## 0.0939743279 0.2563117914 0.3193390678 0.3934115276 0.1895479773 0.1888679616 
##          187          188          189          190          191          192 
## 0.0200722304 0.2236855277 0.0720530224 0.0152322516 0.3715653028 0.0288360569 
##          193          194          195          196          197          198 
## 0.0526651557 0.1507718701 0.1910348345 0.0810249444 0.0360287550 0.1001006388 
##          199          200          201          202          203          204 
## 0.1540250239 0.3832758589 0.0327190766 0.0068134553 0.0160151526 0.4797872447 
##          205          206          207          208          209          210 
## 0.0829228258 0.0243842096 0.0146778294 0.1471160255 0.0384160513 0.0602475616 
##          211          212          213          214          215          216 
## 0.3090289287 0.0043321759 0.0559131158 0.1193029329 0.0778311491 0.1405951121 
##          217          218          219          220          221          222 
## 0.0221743335 0.4701093138 0.0996001546 0.0318152664 0.3993878821 0.0147255265 
##          223          224          225          226          227          228 
## 0.0238422585 0.0660787809 0.0842445492 0.0422692499 0.1427193845 0.0323663787 
##          229          230          231          232          233          234 
## 0.7558145424 0.1918769560 0.1795042923 0.0873375086 0.0184813355 0.0356545587 
##          235          236          237          238          239          240 
## 0.0176818008 0.0333964753 0.0153616498 0.1390129312 0.0404821162 0.0194673491 
##          241          242          243          244          245          246 
## 0.0681313942 0.0938707028 0.0211804444 0.0362668586 0.0536252867 0.0455892744 
##          247          248          249          250          251          252 
## 0.0670521196 0.2377194775 0.0129835800 0.0378986548 0.0243989930 0.0195189623 
##          253          254          255          256          257          258 
## 0.0166178968 0.0915265434 0.0043125179 0.0294249569 0.0501570358 0.0193116951 
##          259          260          261          262          263          264 
## 0.0321711470 0.0067630330 0.0147750511 0.0221899321 0.1346802078 0.0048247437 
##          265          266          267          268          269          270 
## 0.0191085604 0.0122598592 0.1119586474 0.0200481117 0.0073528299 0.0132257621 
##          271          272          273          274          275          276 
## 0.0378860103 0.0120091009 0.0344154340 0.0047841953 0.0310841423 0.0081535718 
##          277          278          279          280          281          282 
## 0.0256768309 0.0230172683 0.0175279065 0.0181674064 0.0220767464 0.0676822223 
##          283          284          285          286          287          288 
## 0.1410544395 0.0195503001 0.0060697844 0.0144248516 0.0259679124 0.0033551046 
##          289          290          291          292          293          294 
## 0.0166536995 0.0292192205 0.0014161604 0.0192089831 0.0079078897 0.0146920998 
##          295          296          297          298          299 
## 0.0083005108 0.0083721761 0.0007682252 0.0048956042 0.0048436298
final_model$linear.predictors
##            1            2            3            4            5            6 
##  4.036768072  0.887777395  3.202383561  2.460525351  4.981119257  3.129667439 
##            7            8            9           10           11           12 
##  3.346454800 -0.276143514 -0.608466395  7.973536532  3.965681990  1.660624500 
##           13           14           15           16           17           18 
##  0.881522551  0.506414349  0.897336608  1.144141920  1.684112997  2.433492659 
##           19           20           21           22           23           24 
##  1.964479598  0.671301056  2.130722060  1.950347133  0.972174043 -1.288172011 
##           25           26           27           28           29           30 
##  2.599217774  1.626556486  1.865748338  0.730062932  3.950547945  2.530737859 
##           31           32           33           34           35           36 
##  2.735058290  2.749680156  0.883907697  0.721788294 -0.376129008  2.976609949 
##           37           38           39           40           41           42 
##  1.051977019  0.002116988  1.944702986  1.514670305  2.954026994  0.788218186 
##           43           44           45           46           47           48 
##  0.321528137  0.224899069 -1.466855014  0.682844361  1.254011465 -0.082426096 
##           49           50           51           52           53           54 
##  4.997293668  0.481106616  1.423605317  1.363599538  1.829252211 -0.089877021 
##           55           56           57           58           59           60 
##  1.298027522  3.130115261  1.710578325 -0.521814102  0.930870620  1.961214079 
##           61           62           63           64           65           66 
##  0.027110553  0.546237946 -0.213927051 -1.115427368 -3.821629174  3.025743846 
##           67           68           69           70           71           72 
##  0.929005772  1.004669039  0.906925159  1.557773385 -1.613118265 -0.453569713 
##           73           74           75           76           77           78 
##  1.192625182 -0.875139308  1.601802553  0.553824515 -1.975564034 -1.421625024 
##           79           80           81           82           83           84 
##  0.296043712 -1.442510050  0.347760569 -0.956105001  1.789778492  0.195243799 
##           85           86           87           88           89           90 
##  0.116416705 -2.221197425 -0.481814969 -2.292369838 -1.719604656 -0.114483445 
##           91           92           93           94           95           96 
## -0.741254829 -0.835048024 -2.877431239  0.942840210 -1.255467738 -2.600572555 
##           97           98           99          100          101          102 
##  0.839505359 -1.741381473  0.408045986 -0.849170157  0.251759755 -0.479068306 
##          103          104          105          106          107          108 
##  0.695082500 -1.096156692 -1.102246883  0.766245076 -1.266970289 -1.296287017 
##          109          110          111          112          113          114 
## -0.292199275 -1.280119858 -0.882033327 -0.513944951  0.217862055 -1.773214968 
##          115          116          117          118          119          120 
## -0.108642964 -1.170017751 -2.628494341  0.019024387 -2.537193754  0.974477216 
##          121          122          123          124          125          126 
## -2.036039719 -0.573335176 -1.257859503 -0.890307470  0.985400558 -2.202654956 
##          127          128          129          130          131          132 
##  1.544169966 -2.918333778 -0.592077834 -0.048694229 -3.305611265  2.228455713 
##          133          134          135          136          137          138 
## -1.761338399 -2.942149652  0.092131833 -0.529815761 -2.615155252  1.336486478 
##          139          140          141          142          143          144 
## -0.794543704 -1.203302014 -0.073621578 -1.821727326 -0.768254402 -1.637334769 
##          145          146          147          148          149          150 
##  0.588255790 -1.549152219 -1.141433420 -2.714391698  0.718680095 -1.094339372 
##          151          152          153          154          155          156 
## -0.156901681 -3.080617258 -2.186872071 -1.475061758 -1.055081229 -0.173950792 
##          157          158          159          160          161          162 
## -1.195548917 -0.820578022 -0.855989921 -2.052948123 -0.997148997 -2.118973771 
##          163          164          165          166          167          168 
## -1.565806431 -2.136189289 -1.713185777 -0.439772882 -3.567881228  0.704846429 
##          169          170          171          172          173          174 
## -1.832044718 -1.171449595 -1.691787762 -2.707189281 -3.596983290 -1.327564439 
##          175          176          177          178          179          180 
## -1.594305880 -3.638344613 -1.614771846 -3.521183620 -3.846543207 -2.737883311 
##          181          182          183          184          185          186 
## -2.266046004 -1.065227234 -0.756810838 -0.432994383 -1.452949974 -1.457382695 
##          187          188          189          190          191          192 
## -3.888141578 -1.244316515 -2.555572324 -4.168990829 -0.525507490 -3.516868713 
##          193          194          195          196          197          198 
## -2.889698560 -1.728559954 -1.443300066 -2.428501916 -3.286744096 -2.196106869 
##          199          200          201          202          203          204 
## -1.703374698 -0.475666839 -3.386530674 -4.982019123 -4.118075189 -0.080895107 
##          205          206          207          208          209          210 
## -2.403281263 -3.689133077 -4.206630514 -1.757401953 -3.220106496 -2.747154377 
##          211          212          213          214          215          216 
## -0.804663065 -5.437343760 -2.826419219 -1.999047803 -2.472186617 -1.810355944 
##          217          218          219          220          221          222 
## -3.786395929 -0.119705482 -2.201675221 -3.415476660 -0.408016251 -4.203337773 
##          223          224          225          226          227          228 
## -3.712164621 -2.648544406 -2.386025485 -3.120506816 -1.792884948 -3.397733336 
##          229          230          231          232          233          234 
##  1.129868023 -1.437860017 -1.519709559 -2.346586120 -3.972339701 -3.297572561 
##          235          236          237          238          239          240 
## -4.017379382 -3.365338040 -4.160400285 -1.823512527 -3.165570654 -3.919357288 
##          241          242          243          244          245          246 
## -2.615753715 -2.267263675 -3.833268989 -3.279910095 -2.870617871 -3.041421630 
##          247          248          249          250          251          252 
## -2.632879115 -1.165223315 -4.331001190 -3.234204174 -3.688511836 -3.916656888 
##          253          254          255          256          257          258 
## -4.080517521 -2.295136649 -5.441911493 -3.496045536 -2.941137871 -3.927543801 
##          259          260          261          262          263          264 
## -3.403985273 -4.989497818 -4.199929965 -3.785676768 -1.860196003 -5.329161254 
##          265          266          267          268          269          270 
## -3.938325369 -4.389089202 -2.070888726 -3.889368507 -4.905290027 -4.312274676 
##          271          272          273          274          275          276 
## -3.234551012 -4.410008718 -3.334228561 -5.337641768 -3.439479979 -4.801112199 
##          277          278          279          280          281          282 
## -3.636153981 -3.748224246 -4.026277672 -3.989791690 -3.790906340 -2.622850169 
##          283          284          285          286          287          288 
## -1.806559633 -3.915020706 -5.098343911 -4.224272855 -3.624582610 -5.693911592 
##          289          290          291          292          293          294 
## -4.078328964 -3.503273945 -6.558388862 -3.932981366 -4.831954995 -4.205644260 
##          295          296          297          298          299 
## -4.783103066 -4.774434020 -7.170659072 -5.314509946 -5.325235478
final_model$deviance
## [1] 223.4863
final_model$aic
## [1] 235.4863
final_model$null.deviance
## [1] 375.3488
final_model$iter
## [1] 6
final_model$df.residual
## [1] 293
final_model$df.null
## [1] 298
final_model
## 
## Call:  glm(formula = DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + 
##     age + serum_sodium, family = "binomial", data = risk)
## 
## Coefficients:
##       (Intercept)               time  ejection_fraction   serum_creatinine  
##           9.49303           -0.02089           -0.07343            0.68599  
##               age       serum_sodium  
##           0.04247           -0.06456  
## 
## Degrees of Freedom: 298 Total (i.e. Null);  293 Residual
## Null Deviance:       375.3 
## Residual Deviance: 223.5     AIC: 235.5
glm(formula = DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + 
    age + serum_sodium, family = "binomial", data = risk)
## 
## Call:  glm(formula = DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + 
##     age + serum_sodium, family = "binomial", data = risk)
## 
## Coefficients:
##       (Intercept)               time  ejection_fraction   serum_creatinine  
##           9.49303           -0.02089           -0.07343            0.68599  
##               age       serum_sodium  
##           0.04247           -0.06456  
## 
## Degrees of Freedom: 298 Total (i.e. Null);  293 Residual
## Null Deviance:       375.3 
## Residual Deviance: 223.5     AIC: 235.5

** Final Model ** Logit(Probablity of Death by Cardiovascular Disease) = 9.49303 - 0.02089 x time - 0.07343 x ejection_fraction + 0.68599 x serum_creatinine + 0.04247 x age - 0.06456 x serum_sodium

Residual Analysis

summary(final_model)
## 
## Call:
## glm(formula = DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + 
##     age + serum_sodium, family = "binomial", data = risk)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1590  -0.5888  -0.2281   0.5144   2.7959  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        9.493034   5.405768   1.756  0.07907 .  
## time              -0.020895   0.002916  -7.166 7.74e-13 ***
## ejection_fraction -0.073430   0.015785  -4.652 3.29e-06 ***
## serum_creatinine   0.685990   0.174044   3.941 8.10e-05 ***
## age                0.042466   0.015030   2.825  0.00472 ** 
## serum_sodium      -0.064557   0.038377  -1.682  0.09254 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 375.35  on 298  degrees of freedom
## Residual deviance: 223.49  on 293  degrees of freedom
## AIC: 235.49
## 
## Number of Fisher Scoring iterations: 6
par(mfrow = c(2,2))
plot(final_model)

plot(density(resid(final_model, type='response')))
lines(density(resid(final_model, type='response')), col='red')

plot(density(resid(final_model, type='pearson')))
lines(density(resid(final_model, type='pearson')), col='red')

plot(density(rstandard(final_model, type='pearson')))
lines(density(rstandard(final_model, type='pearson')), col='red')

In residuals vs fitted, the predicted values lie between -4 to 1 while beyond one are some outliers. We can see some linearity between values. In Normal Q-Q plot, the value are in the range between -3 to 3 with linearity between the values. In scale-location plot it is seen that the values lie between -4 to 2 while there are some outliers beyond 2. It is seen that, predicted values and sqr.root of std.deviance residuals intersect at 0 at x-axis and 1 at y-axis. In the residuals vs leverage plot, all values are located between 0.00 to 0.05 while there are some outliers beyond 0.05.

Response Analysis

summary(final_model)
## 
## Call:
## glm(formula = DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + 
##     age + serum_sodium, family = "binomial", data = risk)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1590  -0.5888  -0.2281   0.5144   2.7959  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        9.493034   5.405768   1.756  0.07907 .  
## time              -0.020895   0.002916  -7.166 7.74e-13 ***
## ejection_fraction -0.073430   0.015785  -4.652 3.29e-06 ***
## serum_creatinine   0.685990   0.174044   3.941 8.10e-05 ***
## age                0.042466   0.015030   2.825  0.00472 ** 
## serum_sodium      -0.064557   0.038377  -1.682  0.09254 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 375.35  on 298  degrees of freedom
## Residual deviance: 223.49  on 293  degrees of freedom
## AIC: 235.49
## 
## Number of Fisher Scoring iterations: 6
survival_fracejec <- ggplot(data = risk, mapping = aes(x = DEATH_EVENT, y = ejection_fraction), color = "blue") +
  geom_bar(stat = "identity")+
  theme_light()+
  labs(x = "Survival Status")
survival_fracejec

survival_creatinine <- ggplot(data = risk, mapping = aes(x = DEATH_EVENT, y = serum_creatinine), color = "blue") +
  geom_bar(stat = "identity")+
  theme_light()+
  labs(x = "Survival Status")
survival_creatinine

survival_sodium <- ggplot(data = risk, mapping = aes(x = DEATH_EVENT, y = serum_sodium)) +
  geom_bar(stat = "identity")+
  theme_light()+
  labs(x = "Survival Status")
survival_sodium

Goodness of Fit

modelone_res <- model_one$deviance
modeltwo_res <- model_two$deviance
finalmodel_res <- final_model$deviance

modelone_dfres <- model_one$df.residual
modeltwo_dfres <- model_two$df.residual
finalmodel_dfres <- final_model$df.residual

modelone_resdf <- modelone_res/modelone_dfres
modeltwo_resdf <- modeltwo_res/modeltwo_dfres
finalmodel_resdf <- finalmodel_res/finalmodel_dfres

res <- c(modelone_res,modelone_dfres,modelone_resdf)

dfres <- c(modeltwo_res,modeltwo_dfres,modeltwo_resdf)

resdf <- c(finalmodel_res,finalmodel_dfres,finalmodel_resdf)

res
## [1] 375.34878 298.00000   1.25956
dfres
## [1] 215.4878046 281.0000000   0.7668605
resdf
## [1] 223.4862746 293.0000000   0.7627518
gender.table <- with(risk, table(DEATH_EVENT, sex)) 
gender.table
##            sex
## DEATH_EVENT Female Male
##         No      71  132
##         Yes     34   62
chisq.test(gender.table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  gender.table
## X-squared = 0, df = 1, p-value = 1
smoke.table <- with(risk, table(DEATH_EVENT, smoking)) 
smoke.table
##            smoking
## DEATH_EVENT  No Yes
##         No  137  66
##         Yes  66  30
chisq.test(smoke.table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  smoke.table
## X-squared = 0.0073315, df = 1, p-value = 0.9318
anaemia.table <- with(risk, table(DEATH_EVENT, anaemia)) 
anaemia.table
##            anaemia
## DEATH_EVENT  No Yes
##         No  120  83
##         Yes  50  46
chisq.test(anaemia.table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  anaemia.table
## X-squared = 1.0422, df = 1, p-value = 0.3073
diabetes.table <- with(risk, table(DEATH_EVENT, diabetes)) 
diabetes.table
##            diabetes
## DEATH_EVENT  No Yes
##         No  118  85
##         Yes  56  40
chisq.test(diabetes.table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  diabetes.table
## X-squared = 2.1617e-30, df = 1, p-value = 1
highbp.table <- with(risk, table(DEATH_EVENT, high_blood_pressure)) 
highbp.table
##            high_blood_pressure
## DEATH_EVENT  No Yes
##         No  137  66
##         Yes  57  39
chisq.test(highbp.table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  highbp.table
## X-squared = 1.5435, df = 1, p-value = 0.2141
highcreatinine.table <- with(risk, table(DEATH_EVENT, serum_creatinine)) 
highcreatinine.table
##            serum_creatinine
## DEATH_EVENT 0.5 0.6 0.7 0.75 0.8 0.9  1 1.1 1.18 1.2 1.3 1.4 1.5 1.6 1.7 1.8
##         No    1   2  18    1  23  27 35  23   11  15  13   7   3   3   5   3
##         Yes   0   2   1    0   1   5 15   9    0   9   7   2   2   3   4   1
##            serum_creatinine
## DEATH_EVENT 1.83 1.9  2 2.1 2.2 2.3 2.4 2.5 2.7 2.9  3 3.2 3.4 3.5 3.7 3.8  4
##         No     0   0  0   2   0   2   1   0   2   0  0   1   1   1   0   1  0
##         Yes    8   5  1   3   1   1   1   3   1   1  2   0   0   1   1   0  1
##            serum_creatinine
## DEATH_EVENT 4.4  5 5.8 6.1 6.8  9 9.4
##         No    0  1   0   1   0  0   0
##         Yes   1  0   1   0   1  1   1
chisq.test(highcreatinine.table)
## Warning in chisq.test(highcreatinine.table): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  highcreatinine.table
## X-squared = 92.428, df = 39, p-value = 3.145e-06
highejecfrac.table <- with(risk, table(DEATH_EVENT, ejection_fraction)) 
highejecfrac.table
##            ejection_fraction
## DEATH_EVENT 14 15 17 20 25 30 35 38 40 45 50 55 60 62 65 70 80
##         No   0  0  1  2 18 21 42 25 33 15 15  2 27  1  0  0  1
##         Yes  1  2  1 16 18 13  7 15  4  5  6  1  4  1  1  1  0
chisq.test(highejecfrac.table)
## Warning in chisq.test(highejecfrac.table): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  highejecfrac.table
## X-squared = 65.332, df = 16, p-value = 6.459e-08
highsodium.table <- with(risk, table(DEATH_EVENT, serum_sodium)) 
highsodium.table
##            serum_sodium
## DEATH_EVENT 113 116 121 124 125 126 127 128 129 130 131 132 133 134 135 136 137
##         No    1   0   0   0   1   1   0   1   0   6   2   6   8  15  10  29  31
##         Yes   0   1   1   1   0   0   3   1   2   3   3   8   2  17   6  11   7
##            serum_sodium
## DEATH_EVENT 138 139 140 141 142 143 144 145 146 148
##         No   17  16  28  11   7   3   3   6   0   1
##         Yes   6   6   7   1   4   0   2   3   1   0
chisq.test(highsodium.table)
## Warning in chisq.test(highsodium.table): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  highsodium.table
## X-squared = 45.801, df = 26, p-value = 0.009601

Confidence Intervals

summary(final_model)
## 
## Call:
## glm(formula = DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + 
##     age + serum_sodium, family = "binomial", data = risk)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1590  -0.5888  -0.2281   0.5144   2.7959  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        9.493034   5.405768   1.756  0.07907 .  
## time              -0.020895   0.002916  -7.166 7.74e-13 ***
## ejection_fraction -0.073430   0.015785  -4.652 3.29e-06 ***
## serum_creatinine   0.685990   0.174044   3.941 8.10e-05 ***
## age                0.042466   0.015030   2.825  0.00472 ** 
## serum_sodium      -0.064557   0.038377  -1.682  0.09254 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 375.35  on 298  degrees of freedom
## Residual deviance: 223.49  on 293  degrees of freedom
## AIC: 235.49
## 
## Number of Fisher Scoring iterations: 6
CInt <- exp(confint(final_model))
## Waiting for profiling to be done...
CI <- exp(confint.default(final_model))

CInt
##                       2.5 %       97.5 %
## (Intercept)       0.3340162 6.934776e+08
## time              0.9733671 9.846074e-01
## ejection_fraction 0.8994926 9.571500e-01
## serum_creatinine  1.4214664 2.874270e+00
## age               1.0138088 1.075593e+00
## serum_sodium      0.8681426 1.011048e+00
CI
##                       2.5 %       97.5 %
## (Intercept)       0.3321806 5.298714e+08
## time              0.9737409 9.849349e-01
## ejection_fraction 0.9008933 9.583986e-01
## serum_creatinine  1.4118068 2.792983e+00
## age               1.0130935 1.074574e+00
## serum_sodium      0.8695534 1.010718e+00
final.predict <- data.frame(DEATH_EVENT = 0, ejection_fraction = 60,time = 10,serum_creatinine = 1.90, age = 45, serum_sodium = 130)
alpha<-0.05
model.predict <- predict(object = final_model, newdata = final.predict, type = "response", se = TRUE)  
Interval<-model.predict$fit + qnorm(p = c(alpha/2, 1-alpha/2))*model.predict$se.fit  
Interval
## [1] 0.1294244 0.7217940

Hypothesis Test

t.test(risk$age~risk$DEATH_EVENT)
## 
##  Welch Two Sample t-test
## 
## data:  risk$age by risk$DEATH_EVENT
## t = -4.1862, df = 155.29, p-value = 4.735e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.498546 -3.408204
## sample estimates:
##  mean in group No mean in group Yes 
##          58.76191          65.21528
t.test(risk$ejection_fraction~risk$DEATH_EVENT)
## 
##  Welch Two Sample t-test
## 
## data:  risk$ejection_fraction by risk$DEATH_EVENT
## t = 4.567, df = 164.76, p-value = 9.647e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.858566 9.735953
## sample estimates:
##  mean in group No mean in group Yes 
##          40.26601          33.46875
t.test(risk$serum_creatinine~risk$DEATH_EVENT)
## 
##  Welch Two Sample t-test
## 
## data:  risk$serum_creatinine by risk$DEATH_EVENT
## t = -4.1526, df = 113.19, p-value = 6.399e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9615153 -0.3403977
## sample estimates:
##  mean in group No mean in group Yes 
##          1.184877          1.835833
summary(anova(final_model))
##        Df       Deviance        Resid. Df       Resid. Dev   
##  Min.   :1   Min.   : 2.815   Min.   :293.0   Min.   :223.5  
##  1st Qu.:1   1st Qu.: 9.113   1st Qu.:294.2   1st Qu.:228.6  
##  Median :1   Median :20.670   Median :295.5   Median :245.7  
##  Mean   :1   Mean   :30.372   Mean   :295.5   Mean   :266.0  
##  3rd Qu.:1   3rd Qu.:22.990   3rd Qu.:296.8   3rd Qu.:273.3  
##  Max.   :1   Max.   :96.275   Max.   :298.0   Max.   :375.3  
##  NA's   :1   NA's   :1
Anova(final_model , test = "LR")
## Analysis of Deviance Table (Type II tests)
## 
## Response: DEATH_EVENT
##                   LR Chisq Df Pr(>Chisq)    
## time                79.603  1  < 2.2e-16 ***
## ejection_fraction   26.341  1  2.862e-07 ***
## serum_creatinine    16.077  1  6.081e-05 ***
## age                  8.530  1   0.003493 ** 
## serum_sodium         2.815  1   0.093381 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sensitivity Analysis

summary(final_model)
## 
## Call:
## glm(formula = DEATH_EVENT ~ time + ejection_fraction + serum_creatinine + 
##     age + serum_sodium, family = "binomial", data = risk)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1590  -0.5888  -0.2281   0.5144   2.7959  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        9.493034   5.405768   1.756  0.07907 .  
## time              -0.020895   0.002916  -7.166 7.74e-13 ***
## ejection_fraction -0.073430   0.015785  -4.652 3.29e-06 ***
## serum_creatinine   0.685990   0.174044   3.941 8.10e-05 ***
## age                0.042466   0.015030   2.825  0.00472 ** 
## serum_sodium      -0.064557   0.038377  -1.682  0.09254 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 375.35  on 298  degrees of freedom
## Residual deviance: 223.49  on 293  degrees of freedom
## AIC: 235.49
## 
## Number of Fisher Scoring iterations: 6
exp(cbind(Odds_and_OR=coef(final_model), confint(final_model)))
## Waiting for profiling to be done...
##                    Odds_and_OR     2.5 %       97.5 %
## (Intercept)       1.326699e+04 0.3340162 6.934776e+08
## time              9.793219e-01 0.9733671 9.846074e-01
## ejection_fraction 9.292012e-01 0.8994926 9.571500e-01
## serum_creatinine  1.985737e+00 1.4214664 2.874270e+00
## age               1.043381e+00 1.0138088 1.075593e+00
## serum_sodium      9.374825e-01 0.8681426 1.011048e+00
exp(coef(final_model))
##       (Intercept)              time ejection_fraction  serum_creatinine 
##      1.326699e+04      9.793219e-01      9.292012e-01      1.985737e+00 
##               age      serum_sodium 
##      1.043381e+00      9.374825e-01
exp(final_model$coefficients[2])
##      time 
## 0.9793219
Anova(final_model)
## Analysis of Deviance Table (Type II tests)
## 
## Response: DEATH_EVENT
##                   LR Chisq Df Pr(>Chisq)    
## time                79.603  1  < 2.2e-16 ***
## ejection_fraction   26.341  1  2.862e-07 ***
## serum_creatinine    16.077  1  6.081e-05 ***
## age                  8.530  1   0.003493 ** 
## serum_sodium         2.815  1   0.093381 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Critique Limitations

In this phase, it was seen that risk of a cardiovascular disease was based on the features selected in the model i.e. serum creatinine, serum sodium, age and ejection fraction of patient. A very important point to recognize is, low risk i.e. less than 5 or 10% does not mean there is no risk to patient at all. A common critique is prediction models always predict on the basis of populations provided for the analysis.

One of the limitation of this analysis is, the dataset is quite small (299 medical records of patients). Using a larger dataset might help in predicting the outcome more accurately. In this analysis, the most important features turned out to be serum creatinine, serum sodium, ejection fraction percentage and age of the patient. However, these features might change if we used a larger dataset. Some of these feature would not be meaningful if a larger dataset was used or some other feature might turn out useful along with these features for predicting the outcome. Hence, results would be more reliable if a larger dataset was used for this analysis. In addition to this, more medical records of patients if used would also be helpful for the prediction. For example, height and weight of the patient, Body mass Index of the patient, employment history, a stress survey, other medical history (medic pills if the patient is consuming) etc would help in predicting the outcome more accurately.

Summary and Conclusion

Summary

Firstly, we analysed the patients age distribution showing a histogram. It states that most of patients who suffer from a cardiovascular disease are in the age group of 50 to 60. With help of histograms, we also found the distributions of phosphokinase (CPK) levels in blood, distribution of platelet count in blood, distribution of creatinine levels in blood and distribution of sodium level in blood of patients. The results showed that, CPK level in blood was between 500 to 1250 mcg/L. The higher the CPK levels, the greater the risk of having a heart failure. The distribution of platelets count showed that, maximum number of patients had a platelet count between 125000 to 275000 kiloplatelets/mL. A platelet count lower than 150000 suggests that a patient suffers from thrombocytopenia and platelet count higher than 450000 suggests that a patient suffers from thrombocytosis. The distribution of creatinine and sodium levels in blood were between 0 to 2.5 mg/dl (ideal range for creatinine is 0.8 to 1.2) and 130 to 150 mEq/L (ideal range for sodium – 135 to 145 mEq/L) respectively. Increased creatinine levels can cause a damage to kidney and increased. Later, we analysed the relationship between different variables like the age group of patients and level of CPK, platelet count, creatinine, sodium and ejection fraction percentage. The ejection fraction percentage states the amount of blood left ventricle pumps out during each contraction. An ideal ejection fraction percentage of 55% or above is considered normal. An ejection fraction percentage lower than 55% states that the patient has a risk of suffering from a stroke or heart failure. In addition to this, a comparison on whether the person smokes, whether the person suffers from hypertension or diabetes or anaemia was done comparing this aesthetics with their platelet count, sodium and creatinine level and ejection fraction percentage. In this analysis, a comparison of patients follows period and his/her survival status and a comparison of patient’s ejection fraction percentage, his survival status and patients follow period to clinical facility is being done. There was no linear correlation between survival status - ejection fraction percentage and survival status – follow up period of patient.

In Phase 2, we did a step wise selection of binomial regression model for predicting the survival of patient from a risk of cardiovascular attack. A stepwise selection (forward selection method) was used to select the logistic binomial regression model. In this model, it was seen that the age of the patient, his or her ejection fraction percentage (ejection fraction percentage states the amount of blood left ventricle pumps out during each contraction), serum creatinine and serum sodium levels were the most important features. The least important features were anaemia (is patient suffering from anaemia) and diabetes (is patient suffering from diabetes). In the stepwise selection, we selected a binomial regression model with lowest Akaikes Information Criterion (AIC). After selecting the binomial regression model, we conducted a number of steps viz. Response analysis, Residual Analysis, a goodness of fit test, check the confidence intervals, did hypothesis testing and finally odds ratio analysis on binomial regression model to find the odds that male or female are more prone to a cardiovascular disease. In residual analysis, plots involving the standardised pearson residuals, deviance plots were shown. In the next step a response analysis was performed where scatter plots were shown between dependent variable (Age of the patient) vs independent variables (serum creatinine levels, ejection fraction percentage, serum sodium levels) which are included in the binomial regression model. A goodness of fit test was conducted to test the significance that features serum creatinine, serum sodium, ejection fraction and age of the patient help in predicting the survival status of the patient from a risk of cardiovascular attack. Similar to goodness of fit test, hypothesis tesing was also performed. Later, we found confidence intervals for ejection fraction, serum creatinine levels in patient and an odds ratio analysis to check whether male or female are more prone to have a risk of cardiovascular attack depending on their medical history.

Conclusion

An analysis on prediction of survival of a patient from a risk of having a cardiovascular attack was performed in the Phase 1 and Phase 2 of the project. After the analysis, it is possible to predict the risk of having a cardiovascular attack and his or her survival status from such an attack merely from patients electronic medical records. From these medical records, the most important features were serum creatinine, serum sodium, ejection fraction percentage and age of the patient. From analyzing these features and from binomial regression above, it was found that these features would help medical institutions in prediction of a risk of cardiovascular attack. In phase 1 of the analysis, features like smoking status of a patient, diabetes status of patient, anaemia status of patient and high blood pressure status of patient also contributed in the study. However, in phase 2 of the analysis, none of these features were proved to important in predicting the survival status. In stepwise selection method, all these features were removed in earlier stages as their p_values were quite higher than 0.05. Hence it is evident that from maching learing or logistic regression analysis, it is quite possible to predict such outcomes. Logistic regression models can be used to classify the medical records for predicting a risk of such a disease. This algorithm is not only limited to this analysis, in fact we can use such models to predict the survival status or risk to patients from other diseases viz. Cancers, tumors etc.

Reference

Dataset from GitHub by Dr. Vural Aksakalli. Reference of Website: vaksakalli/datasets. (2020). Retrieved 27 September 2020, from https://github.com/vaksakalli/datasets/blob/master/heart_failure.csv