we will try to evaluate factors affecting mortality in HRS . We have already calculated descritive statistics, done ANOVA of NGAL and done comparative ROC Aanalysis of biomarkers like FeNa,NGAL,Urine Na,Urine Creatinine which are accessible along with master chart here

Let us do survival analysis now, first dplyr::select limited variables

Lets prepare our data for survival analysis

## Call: survfit(formula = Surv(days, status) ~ 1, data = master2)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    30     94      18    0.809  0.0406        0.733        0.892
##    90     76       5    0.755  0.0443        0.673        0.847

survival plot from same

classifying CTP n MELD for further use in survival curves

Plot survival by diagnosetic categories

## Call: survfit(formula = Surv(days, status) ~ diagnoses, data = master3)
## 
##                 diagnoses=CKD 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    30      9       1    0.889   0.105        0.706            1
##    90      8       1    0.778   0.139        0.549            1
## 
##                 diagnoses=HRS 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    30     33      10    0.697  0.0800        0.557        0.873
##    90     23       3    0.606  0.0851        0.460        0.798
## 
##                 diagnoses=iAKI 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    30     10       5      0.5   0.158        0.269        0.929
##    90      5       1      0.4   0.155        0.187        0.855
## 
##                 diagnoses=Prerenal 
##         time       n.risk      n.event     survival      std.err 
##      30.0000      32.0000       2.0000       0.9375       0.0428 
## lower 95% CI upper 95% CI 
##       0.8573       1.0000

Plot survival by MELD category

## Call: survfit(formula = Surv(days, status) ~ MELD_CAT, data = master3)
## 
##                 MELD_CAT=MELD<=10 
##         time       n.risk      n.event     survival      std.err 
##       90.000        3.000        1.000        0.667        0.272 
## lower 95% CI upper 95% CI 
##        0.300        1.000 
## 
##                 MELD_CAT=MELD(10-18) 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    30     27       3    0.889  0.0605        0.778        1.000
##    90     24       1    0.852  0.0684        0.728        0.997
## 
##                 MELD_CAT=MELD(19-24) 
##         time       n.risk      n.event     survival      std.err 
##      30.0000      28.0000       5.0000       0.8214       0.0724 
## lower 95% CI upper 95% CI 
##       0.6911       0.9763 
## 
##                 MELD_CAT=MELD>=25 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    30     26      10    0.615  0.0954        0.454        0.834
##    90     16       3    0.500  0.0981        0.340        0.734

Plot survival by CTP

## Call: survfit(formula = Surv(days, status) ~ CTP_CATEGORIES, data = master3)
## 
##                 CTP_CATEGORIES=B 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    30     47       8    0.830  0.0548        0.729        0.944
##    90     39       3    0.766  0.0618        0.654        0.897
## 
##                 CTP_CATEGORIES=C 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    30     37      10    0.730   0.073         0.60        0.888
##    90     27       2    0.676   0.077         0.54        0.845

Let us do multivariate analysis of mortality with predictive factor

## 
## Call:
## glm(formula = as.factor(outcome_30) ~ age + MELD, family = binomial(), 
##     data = master)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2479  -0.6459  -0.4810  -0.3371   2.3154  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -3.35611    1.54531  -2.172  0.02987 * 
## age         -0.01254    0.02341  -0.536  0.59227   
## MELD         0.11346    0.03990   2.843  0.00446 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 91.815  on 93  degrees of freedom
## Residual deviance: 81.183  on 91  degrees of freedom
## AIC: 87.183
## 
## Number of Fisher Scoring iterations: 4

Lets add NGAL to model with MELD only and see improvement in AIC has been used often

## 
## Call:
## glm(formula = as.factor(outcome_30) ~ age + NGAL + MELD, family = binomial(), 
##     data = master)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7072  -0.6101  -0.3770  -0.2836   2.5033  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -4.4395768  1.6723988  -2.655  0.00794 **
## age          0.0007788  0.0241383   0.032  0.97426   
## NGAL         0.0021566  0.0007445   2.897  0.00377 **
## MELD         0.0879044  0.0424967   2.068  0.03859 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 91.815  on 93  degrees of freedom
## Residual deviance: 72.054  on 90  degrees of freedom
## AIC: 80.054
## 
## Number of Fisher Scoring iterations: 5

so adding NGAL to model leads to decrease in AIC (>1) more than implying its an effective variable

## 
## Call:
## glm(formula = as.factor(outcome_30) ~ age + NGAL + MELD + serum_sodium, 
##     family = binomial(), data = master)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7422  -0.5964  -0.3833  -0.2638   2.4996  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -6.7983469  5.5066227  -1.235   0.2170   
## age           0.0014307  0.0241404   0.059   0.9527   
## NGAL          0.0021394  0.0007454   2.870   0.0041 **
## MELD          0.0934692  0.0445050   2.100   0.0357 * 
## serum_sodium  0.0167365  0.0369826   0.453   0.6509   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 91.815  on 93  degrees of freedom
## Residual deviance: 71.848  on 89  degrees of freedom
## AIC: 81.848
## 
## Number of Fisher Scoring iterations: 5

Adding serum sodium doesnt lead to fall in AIC, so even adding an extra predictor isnt improving predictive power of our model

## Analysis of Deviance Table
## 
## Model 1: as.factor(outcome_30) ~ age + MELD
## Model 2: as.factor(outcome_30) ~ age + NGAL + MELD
##   Resid. Df Resid. Dev Df Deviance
## 1        91     81.183            
## 2        90     72.054  1   9.1292

Let’s do ROC curve analysis of our model with MELD,NGAL

## Area under the curve: 0.818

thus AUC of combined MELD n NGAL model is 0.818

## Logistic Regression Model
##  
##  lrm(formula = outcome_30 ~ age + NGAL + MELD, data = master4, 
##      x = TRUE, y = TRUE)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs             94    LR chi2     19.76    R2       0.304    C       0.817    
##   a              76    d.f.            3    g        1.351    Dxy     0.635    
##   d              18    Pr(> chi2) 0.0002    gr       3.861    gamma   0.636    
##  max |deriv| 0.0002                         gp       0.189    tau-a   0.199    
##                                             Brier    0.121                     
##  
##            Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept -4.4396 1.6724 -2.65  0.0079  
##  age        0.0008 0.0241  0.03  0.9743  
##  NGAL       0.0022 0.0007  2.90  0.0038  
##  MELD       0.0879 0.0425  2.07  0.0386  
## 

Nomogram of model

Let’s do internal validation bootstrap

##           index.orig training    test optimism index.corrected   n
## Dxy           0.6360   0.6533  0.5919   0.0614          0.5746 200
## R2            0.3041   0.3465  0.2700   0.0765          0.2276 200
## Intercept     0.0000   0.0000 -0.0732   0.0732         -0.0732 200
## Slope         1.0000   1.0000  0.8843   0.1157          0.8843 200
## Emax          0.0000   0.0000  0.0400   0.0400          0.0400 200
## D             0.1996   0.2349  0.1740   0.0609          0.1387 200
## U            -0.0213  -0.0213  0.0210  -0.0422          0.0210 200
## Q             0.2209   0.2562  0.1531   0.1032          0.1177 200
## B             0.1213   0.1097  0.1285  -0.0188          0.1401 200
## g             1.3509   1.5624  1.2310   0.3314          1.0194 200
## gp            0.1889   0.1930  0.1767   0.0163          0.1726 200

so our bootstrapped model shows considerable optimism implying on external validation our model is going to fare worse

lets’s see calibration estimates

## 
## n=94   Mean absolute error=0.031   Mean squared error=0.0024
## 0.9 Quantile of absolute error=0.091

we can see while negative predictive probability is good the positive predictive of our model fares a bit worse

Lets calculate adjusted odds ratio from our final model

##                               2.5 %    97.5 %
## (Intercept) 0.01180093 0.0003549703 0.2734832
## age         1.00077911 0.9534518891 1.0495262
## NGAL        1.00215897 1.0007490347 1.0037256
## MELD        1.09188369 1.0053542303 1.1927184