we will try to evaluate factors affecting mortality in HRS . We have already calculated descritive statistics, done ANOVA of NGAL and done comparative ROC Aanalysis of biomarkers like FeNa,NGAL,Urine Na,Urine Creatinine which are accessible along with master chart here
Let us do survival analysis now, first dplyr::select limited variables
Lets prepare our data for survival analysis
## Call: survfit(formula = Surv(days, status) ~ 1, data = master2)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 30 94 18 0.809 0.0406 0.733 0.892
## 90 76 5 0.755 0.0443 0.673 0.847
survival plot from same
classifying CTP n MELD for further use in survival curves
Plot survival by diagnosetic categories
## Call: survfit(formula = Surv(days, status) ~ diagnoses, data = master3)
##
## diagnoses=CKD
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 30 9 1 0.889 0.105 0.706 1
## 90 8 1 0.778 0.139 0.549 1
##
## diagnoses=HRS
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 30 33 10 0.697 0.0800 0.557 0.873
## 90 23 3 0.606 0.0851 0.460 0.798
##
## diagnoses=iAKI
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 30 10 5 0.5 0.158 0.269 0.929
## 90 5 1 0.4 0.155 0.187 0.855
##
## diagnoses=Prerenal
## time n.risk n.event survival std.err
## 30.0000 32.0000 2.0000 0.9375 0.0428
## lower 95% CI upper 95% CI
## 0.8573 1.0000
Plot survival by MELD category
## Call: survfit(formula = Surv(days, status) ~ MELD_CAT, data = master3)
##
## MELD_CAT=MELD<=10
## time n.risk n.event survival std.err
## 90.000 3.000 1.000 0.667 0.272
## lower 95% CI upper 95% CI
## 0.300 1.000
##
## MELD_CAT=MELD(10-18)
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 30 27 3 0.889 0.0605 0.778 1.000
## 90 24 1 0.852 0.0684 0.728 0.997
##
## MELD_CAT=MELD(19-24)
## time n.risk n.event survival std.err
## 30.0000 28.0000 5.0000 0.8214 0.0724
## lower 95% CI upper 95% CI
## 0.6911 0.9763
##
## MELD_CAT=MELD>=25
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 30 26 10 0.615 0.0954 0.454 0.834
## 90 16 3 0.500 0.0981 0.340 0.734
Plot survival by CTP
## Call: survfit(formula = Surv(days, status) ~ CTP_CATEGORIES, data = master3)
##
## CTP_CATEGORIES=B
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 30 47 8 0.830 0.0548 0.729 0.944
## 90 39 3 0.766 0.0618 0.654 0.897
##
## CTP_CATEGORIES=C
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 30 37 10 0.730 0.073 0.60 0.888
## 90 27 2 0.676 0.077 0.54 0.845
Let us do multivariate analysis of mortality with predictive factor
##
## Call:
## glm(formula = as.factor(outcome_30) ~ age + MELD, family = binomial(),
## data = master)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2479 -0.6459 -0.4810 -0.3371 2.3154
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.35611 1.54531 -2.172 0.02987 *
## age -0.01254 0.02341 -0.536 0.59227
## MELD 0.11346 0.03990 2.843 0.00446 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 91.815 on 93 degrees of freedom
## Residual deviance: 81.183 on 91 degrees of freedom
## AIC: 87.183
##
## Number of Fisher Scoring iterations: 4
Lets add NGAL to model with MELD only and see improvement in AIC has been used often
##
## Call:
## glm(formula = as.factor(outcome_30) ~ age + NGAL + MELD, family = binomial(),
## data = master)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7072 -0.6101 -0.3770 -0.2836 2.5033
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.4395768 1.6723988 -2.655 0.00794 **
## age 0.0007788 0.0241383 0.032 0.97426
## NGAL 0.0021566 0.0007445 2.897 0.00377 **
## MELD 0.0879044 0.0424967 2.068 0.03859 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 91.815 on 93 degrees of freedom
## Residual deviance: 72.054 on 90 degrees of freedom
## AIC: 80.054
##
## Number of Fisher Scoring iterations: 5
so adding NGAL to model leads to decrease in AIC (>1) more than implying its an effective variable
##
## Call:
## glm(formula = as.factor(outcome_30) ~ age + NGAL + MELD + serum_sodium,
## family = binomial(), data = master)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7422 -0.5964 -0.3833 -0.2638 2.4996
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.7983469 5.5066227 -1.235 0.2170
## age 0.0014307 0.0241404 0.059 0.9527
## NGAL 0.0021394 0.0007454 2.870 0.0041 **
## MELD 0.0934692 0.0445050 2.100 0.0357 *
## serum_sodium 0.0167365 0.0369826 0.453 0.6509
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 91.815 on 93 degrees of freedom
## Residual deviance: 71.848 on 89 degrees of freedom
## AIC: 81.848
##
## Number of Fisher Scoring iterations: 5
Adding serum sodium doesnt lead to fall in AIC, so even adding an extra predictor isnt improving predictive power of our model
## Analysis of Deviance Table
##
## Model 1: as.factor(outcome_30) ~ age + MELD
## Model 2: as.factor(outcome_30) ~ age + NGAL + MELD
## Resid. Df Resid. Dev Df Deviance
## 1 91 81.183
## 2 90 72.054 1 9.1292
Let’s do ROC curve analysis of our model with MELD,NGAL
## Area under the curve: 0.818
thus AUC of combined MELD n NGAL model is 0.818
## Logistic Regression Model
##
## lrm(formula = outcome_30 ~ age + NGAL + MELD, data = master4,
## x = TRUE, y = TRUE)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 94 LR chi2 19.76 R2 0.304 C 0.817
## a 76 d.f. 3 g 1.351 Dxy 0.635
## d 18 Pr(> chi2) 0.0002 gr 3.861 gamma 0.636
## max |deriv| 0.0002 gp 0.189 tau-a 0.199
## Brier 0.121
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -4.4396 1.6724 -2.65 0.0079
## age 0.0008 0.0241 0.03 0.9743
## NGAL 0.0022 0.0007 2.90 0.0038
## MELD 0.0879 0.0425 2.07 0.0386
##
Nomogram of model
Let’s do internal validation bootstrap
## index.orig training test optimism index.corrected n
## Dxy 0.6360 0.6533 0.5919 0.0614 0.5746 200
## R2 0.3041 0.3465 0.2700 0.0765 0.2276 200
## Intercept 0.0000 0.0000 -0.0732 0.0732 -0.0732 200
## Slope 1.0000 1.0000 0.8843 0.1157 0.8843 200
## Emax 0.0000 0.0000 0.0400 0.0400 0.0400 200
## D 0.1996 0.2349 0.1740 0.0609 0.1387 200
## U -0.0213 -0.0213 0.0210 -0.0422 0.0210 200
## Q 0.2209 0.2562 0.1531 0.1032 0.1177 200
## B 0.1213 0.1097 0.1285 -0.0188 0.1401 200
## g 1.3509 1.5624 1.2310 0.3314 1.0194 200
## gp 0.1889 0.1930 0.1767 0.0163 0.1726 200
so our bootstrapped model shows considerable optimism implying on external validation our model is going to fare worse
lets’s see calibration estimates
##
## n=94 Mean absolute error=0.031 Mean squared error=0.0024
## 0.9 Quantile of absolute error=0.091
we can see while negative predictive probability is good the positive predictive of our model fares a bit worse
Lets calculate adjusted odds ratio from our final model
## 2.5 % 97.5 %
## (Intercept) 0.01180093 0.0003549703 0.2734832
## age 1.00077911 0.9534518891 1.0495262
## NGAL 1.00215897 1.0007490347 1.0037256
## MELD 1.09188369 1.0053542303 1.1927184