This project is about Predicting mortality in intensive care unit patients infected with Klebsiella pneumoniae. This study aimed to develop a prediction model and simplified risk score to predict 14-day mortality in ICU patients infected with Klebsiella pneumoniae.A retrospective cohort study was conducted using data from 249 ICU patients infected with Klebsiella pneumoniae in Vietnam. The outcome of this study was 14-day mortality. Eighteen variables were selected to be predictors of the outcome, including both clinical and non-clinical information and treatment information. A prediction model using logistic regression was applied, and a simplified risk score was also constructed based on the result from the prediction model. Calibration (area under the receiver operating characteristic curve, or AUC) and discrimination (Hosmer-Lemeshow goodness-of-fit test) were used to test the model performance.This study’s findings provided a valuable tool for predicting 14-day mortality in ICU patients infected with Klebsiella pneumoniae.
pacman::p_load(
rio, # File import
here, # File locator
lubridate, janitor, magrittr,
ggplot2, pROC, epiDisplay,readr,gtsummary, mlr,PredictABEL,table1,
sjPlot,sjmisc,sjlabelled, MASS, flextable, officer, tidyverse)
setwd("E:\\Clinical epi_assignment 3\\Clinical epi_assignment 3\\Clinical epi_assignment 3\\Clinical epi_assignment 3\\Huyen")
data <- read_csv("Kleb.csv")which(is.na(data$ID))## integer(0)
which(is.na(data$Age1))## integer(0)
which(is.na(data$Gender))## integer(0)
which(is.na(data$ReferalRoute1))## integer(0)
which(is.na(data$Bacteraemia))## integer(0)
which(is.na(data$SepticShock))## integer(0)
which(is.na(data$Apache1))## integer(0)
which(is.na(data$SOFA1))## integer(0)
which(is.na(data$Charlson1))## integer(0)
which(is.na(data$CVC))## integer(0)
which(is.na(data$UrinaryCatheter))## integer(0)
which(is.na(data$EndotrachealTube))## integer(0)
which(is.na(data$Dialysis))## integer(0)
which(is.na(data$ICHOperation))## integer(0)
which(is.na(data$CarbapenemResistance))## integer(0)
which(is.na(data$CoInfection))## integer(0)
names(data) ## [1] "ID" "Age1" "Gender"
## [4] "ReferalRoute1" "Bacteraemia" "SepticShock"
## [7] "Apache1" "SOFA1" "Charlson1"
## [10] "CVC" "UrinaryCatheter" "EndotrachealTube"
## [13] "Dialysis" "ICHOperation" "CarbapenemResistance"
## [16] "CoInfection" "EmpiricalAB" "DefinitiveAB"
## [19] "AdjunctiveTreatment" "Status14"
data$Age1 = as.factor(data$Age1)
data$Gender = as.factor(data$Gender)
data$ReferalRoute1 = as.factor(data$ReferalRoute1)
data$Bacteraemia = as.factor(data$Bacteraemia)
data$SepticShock = as.factor(data$SepticShock)
data$Apache1 = as.factor(data$Apache1)
data$SOFA1 = as.factor(data$SOFA1)
data$Charlson1 = as.factor(data$Charlson1)
data$CVC = as.factor(data$CVC)
data$UrinaryCatheter = as.factor(data$UrinaryCatheter)
data$EndotrachealTube = as.factor(data$EndotrachealTube)
data$CoInfection = as.factor(data$CoInfection)
data$EmpiricalAB = as.factor(data$EmpiricalAB)
data$DefinitiveAB = as.factor(data$DefinitiveAB)
data$AdjunctiveTreatment = as.factor(data$AdjunctiveTreatment)
data$Dialysis = as.factor(data$Dialysis)
data$ICHOperation = as.factor(data$ICHOperation)
data$CarbapenemResistance = as.factor(data$CarbapenemResistance)Table 1. Characteristics of ICU patients infected with Klebsiella pneumoniae in this study
data %>%
# keep only the columns of interest
dplyr::select(Status14, Age1, Gender, ReferalRoute1,
Bacteraemia, SepticShock, Apache1, SOFA1, Charlson1,
CVC, UrinaryCatheter,EndotrachealTube, Dialysis, ICHOperation,
CarbapenemResistance, CoInfection, EmpiricalAB, DefinitiveAB,
AdjunctiveTreatment) %>%
tbl_summary(by = Status14, percent = "row") %>%
add_p()| Characteristic | 0, N = 1771 | 1, N = 721 | p-value2 |
|---|---|---|---|
| Age1 | 0.11 | ||
| 0 | 122 (74%) | 42 (26%) | |
| 1 | 55 (65%) | 30 (35%) | |
| Gender | 0.3 | ||
| 0 | 49 (77%) | 15 (23%) | |
| 1 | 128 (69%) | 57 (31%) | |
| ReferalRoute1 | 0.023 | ||
| 0 | 63 (81%) | 15 (19%) | |
| 1 | 114 (67%) | 57 (33%) | |
| Bacteraemia | 0.3 | ||
| 0 | 146 (73%) | 55 (27%) | |
| 1 | 31 (65%) | 17 (35%) | |
| SepticShock | 0.053 | ||
| 0 | 153 (74%) | 55 (26%) | |
| 1 | 24 (59%) | 17 (41%) | |
| Apache1 | 0.008 | ||
| 0 | 97 (80%) | 25 (20%) | |
| 1 | 39 (68%) | 18 (32%) | |
| 2 | 41 (59%) | 29 (41%) | |
| SOFA1 | <0.001 | ||
| 0 | 59 (89%) | 7 (11%) | |
| 1 | 107 (69%) | 49 (31%) | |
| 2 | 11 (41%) | 16 (59%) | |
| Charlson1 | 0.005 | ||
| 0 | 70 (84%) | 13 (16%) | |
| 1 | 30 (67%) | 15 (33%) | |
| 2 | 77 (64%) | 44 (36%) | |
| CVC | <0.001 | ||
| 0 | 90 (83%) | 18 (17%) | |
| 1 | 87 (62%) | 54 (38%) | |
| UrinaryCatheter | 0.3 | ||
| 0 | 157 (72%) | 60 (28%) | |
| 1 | 20 (62%) | 12 (38%) | |
| EndotrachealTube | 0.12 | ||
| 0 | 62 (78%) | 18 (22%) | |
| 1 | 115 (68%) | 54 (32%) | |
| Dialysis | 0.073 | ||
| 0 | 133 (74%) | 46 (26%) | |
| 1 | 44 (63%) | 26 (37%) | |
| ICHOperation | 0.010 | ||
| 0 | 169 (73%) | 62 (27%) | |
| 1 | 8 (44%) | 10 (56%) | |
| CarbapenemResistance | 0.4 | ||
| 0 | 51 (75%) | 17 (25%) | |
| 1 | 126 (70%) | 55 (30%) | |
| CoInfection | 0.4 | ||
| 0 | 93 (73%) | 34 (27%) | |
| 1 | 84 (69%) | 38 (31%) | |
| EmpiricalAB | >0.9 | ||
| 0 | 116 (71%) | 47 (29%) | |
| 1 | 61 (71%) | 25 (29%) | |
| DefinitiveAB | 0.7 | ||
| 0 | 153 (71%) | 61 (29%) | |
| 1 | 24 (69%) | 11 (31%) | |
| AdjunctiveTreatment | <0.001 | ||
| 0 | 169 (76%) | 53 (24%) | |
| 1 | 8 (30%) | 19 (70%) | |
| 1 n (%) | |||
| 2 Pearson's Chi-squared test | |||
The characteristics of were summarized in Table 1. Among 249 patients, the overall 14-day all-cause mortality rate in the total population was 28.9%. In terms of demographic factors, under-65-year-olds, men, and intra-hospital referred patients accounted for the majority of the population, accounting for 65.9%, 71.3%, and 68.7% of the total population, respectively. Patients who were referred within the hospital experienced a 14-day all-cause mortality rate that was significantly higher than patients who were referred outside the hospital (p-value = 0.023). In terms of health conditions, 19.3% of participants reported having bacteremia, and 16.5 %reported having septic shock. There was no significant difference of these two factors among mortality status. Additionally, most participants reported having a SOFA score greater than or equal to 4 (73%), a Charlson index of 1 and greater than 1 (67%), and an APACHE II score of 15 or above (51%). The SOFA score, Charlson index, and APACHE II score were all significantly higher in patients who died recently within 14 days. The two most frequently carried out invasive procedures were the insertion of an endotracheal tube (68%) and a central venous catheter (57%). Regarding pathogen factors, 72.7% and 49% of participants reported to have carbapenem resistance and co-infection respectively. There was no significant difference of these two factors among mortality status. In terms of treatment factors, 34.5%, 14.1% and 10.8% of participants reported to have inadequate empirical antibiotic therapy, inadequate definitive antibiotic therapy, and absence of adjunctive therapy respectively.
# Step 1: univariable analysis, selecting variables at p<0.15 to include in multivariable analysis.
fit_null = glm(Status14 ~ 1, family=binomial, data=data)
fit_age = glm(Status14 ~ Age1, family=binomial, data=data)
fit_gender = glm(Status14 ~ Gender, family=binomial, data=data)
fit_ReferalRoute1 = glm(Status14 ~ ReferalRoute1, family=binomial, data=data)
fit_Bacteraemia = glm(Status14 ~ Bacteraemia, family=binomial, data=data)
fit_SepticShock = glm(Status14 ~ SepticShock, family=binomial, data=data)
fit_Apache1 = glm(Status14 ~ Apache1, family=binomial, data=data)
fit_SOFA1 = glm(Status14 ~ SOFA1, family=binomial, data=data)
fit_Charlson1 = glm(Status14 ~ Charlson1, family=binomial, data=data)
fit_CVC= glm(Status14 ~ CVC, family=binomial, data=data)
fit_UrinaryCatheter = glm(Status14 ~ UrinaryCatheter, family=binomial, data=data)
fit_EndotrachealTube = glm(Status14 ~ EndotrachealTube, family=binomial, data=data)
fit_Dialysis = glm(Status14 ~ Dialysis, family=binomial, data=data)
fit_ICHOperation = glm(Status14 ~ ICHOperation, family=binomial, data=data)
fit_CarbapenemResistance = glm(Status14 ~ CarbapenemResistance, family=binomial, data=data)
fit_CoInfection = glm(Status14 ~ CoInfection, family=binomial, data=data)
fit_EmpiricalAB = glm(Status14 ~ EmpiricalAB, family=binomial, data=data)
fit_DefinitiveAB = glm(Status14 ~ DefinitiveAB, family=binomial, data=data)
fit_AdjunctiveTreatment = glm(Status14 ~ AdjunctiveTreatment, family=binomial, data=data)
lrtest(fit_age,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 2.51159 , P value = 0.1130118
lrtest(fit_gender,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 1.293413 , P value = 0.2554199
lrtest(fit_ReferalRoute1,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 5.436675 , P value = 0.01971817
lrtest(fit_Bacteraemia,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 1.18699 , P value = 0.2759375
lrtest(fit_SepticShock,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 3.562357 , P value = 0.05910375
lrtest(fit_Apache1,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 2 d.f. = 9.681283 , P value = 0.007901983
lrtest(fit_SOFA1,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 2 d.f. = 24.18238 , P value = 5.608708e-06
lrtest(fit_Charlson1,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 2 d.f. = 11.53223 , P value = 0.003131905
lrtest(fit_CVC,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 14.5014 , P value = 0.0001400558
lrtest(fit_UrinaryCatheter,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 1.261648 , P value = 0.2613394
lrtest(fit_EndotrachealTube,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 2.427302 , P value = 0.1192379
lrtest(fit_Dialysis,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 3.118748 , P value = 0.07739623
lrtest(fit_ICHOperation,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 6.037446 , P value = 0.01400553
lrtest(fit_CarbapenemResistance,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 0.7108019 , P value = 0.3991777
lrtest(fit_CoInfection,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 0.5797528 , P value = 0.4464092
lrtest(fit_EmpiricalAB,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 0.001517121 , P value = 0.9689301
lrtest(fit_DefinitiveAB,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 0.1234125 , P value = 0.7253625
lrtest(fit_AdjunctiveTreatment,fit_null)## Likelihood ratio test for MLE method
## Chi-squared 1 d.f. = 22.64712 , P value = 1.946516e-06
The result of univariate and multivariate logistic regression models with the 14-day mortality status as the outcome was presented in Table 2. The univariable analysis were developed among all eighteen predictors, in which, each univariable model was compared with the null model by a likelihood ratio test. As a result, eleven variables with p-value <0.15 from univariate analyses were included in the initial multivariable model, which included age, route of referral, APACHE II score, SOFA score, Charlson comorbidity index, septic shock, central venous catheter, endotracheal tube, dialysis, intracerebral haemorrhage operation and adjunctive treatment.
# Step 2. backward selection using the likelihood ratio test with p-value threshold of 0.1.
# Full model with Oral_tem, Breath_min had p<0.15 from univariable analysis
model_full <- glm(Status14 ~ Age1 + ReferalRoute1 +SepticShock
+Apache1 + SOFA1 + Charlson1
+CVC + EndotrachealTube + Dialysis
+ ICHOperation + AdjunctiveTreatment
, family=binomial, data=data)
car::vif(model_full)## GVIF Df GVIF^(1/(2*Df))
## Age1 1.203367 1 1.096981
## ReferalRoute1 1.133702 1 1.064755
## SepticShock 1.252033 1 1.118943
## Apache1 1.383559 2 1.084550
## SOFA1 1.401880 2 1.088122
## Charlson1 1.310337 2 1.069906
## CVC 1.621824 1 1.273509
## EndotrachealTube 1.171515 1 1.082365
## Dialysis 1.373008 1 1.171754
## ICHOperation 1.246831 1 1.116616
## AdjunctiveTreatment 1.287982 1 1.134893
model_full <- tbl_regression(model_full, exponentiate = TRUE)
model_full| Characteristic | OR1 | 95% CI1 | p-value |
|---|---|---|---|
| Age1 | |||
| 0 | — | — | |
| 1 | 2.20 | 1.04, 4.77 | 0.042 |
| ReferalRoute1 | |||
| 0 | — | — | |
| 1 | 2.70 | 1.20, 6.50 | 0.020 |
| SepticShock | |||
| 0 | — | — | |
| 1 | 0.91 | 0.35, 2.24 | 0.8 |
| Apache1 | |||
| 0 | — | — | |
| 1 | 0.97 | 0.41, 2.27 | >0.9 |
| 2 | 0.64 | 0.25, 1.53 | 0.3 |
| SOFA1 | |||
| 0 | — | — | |
| 1 | 3.79 | 1.46, 11.4 | 0.010 |
| 2 | 11.4 | 2.77, 51.9 | 0.001 |
| Charlson1 | |||
| 0 | — | — | |
| 1 | 1.75 | 0.61, 5.02 | 0.3 |
| 2 | 2.07 | 0.87, 5.12 | 0.10 |
| CVC | |||
| 0 | — | — | |
| 1 | 5.67 | 2.35, 14.8 | <0.001 |
| EndotrachealTube | |||
| 0 | — | — | |
| 1 | 1.42 | 0.64, 3.24 | 0.4 |
| Dialysis | |||
| 0 | — | — | |
| 1 | 0.72 | 0.31, 1.66 | 0.4 |
| ICHOperation | |||
| 0 | — | — | |
| 1 | 4.88 | 1.37, 18.3 | 0.015 |
| AdjunctiveTreatment | |||
| 0 | — | — | |
| 1 | 21.2 | 6.80, 77.0 | <0.001 |
| 1 OR = Odds Ratio, CI = Confidence Interval | |||
The two strongest predictors of 14-day mortality in the study population were adjunctive treatment and SOFA score. Multicollinearity of the initial multivariable model was checked by generalized variance-inflation factors (GVIFs). Since GVIF<5 (1.13-1.62) of all eleven variables, multicollinearity was not an issue in this initial multivariable model. ## Backward selection
model_full <- glm(Status14 ~ Age1 + ReferalRoute1 +SepticShock
+Apache1 + SOFA1 + Charlson1
+CVC + EndotrachealTube + Dialysis
+ ICHOperation + AdjunctiveTreatment
, family=binomial, data=data)
stepAIC(model_full) ## Start: AIC=244.05
## Status14 ~ Age1 + ReferalRoute1 + SepticShock + Apache1 + SOFA1 +
## Charlson1 + CVC + EndotrachealTube + Dialysis + ICHOperation +
## AdjunctiveTreatment
##
## Df Deviance AIC
## - Apache1 2 215.17 241.17
## - SepticShock 1 214.09 242.09
## - Dialysis 1 214.63 242.63
## - EndotrachealTube 1 214.79 242.79
## - Charlson1 2 216.81 242.81
## <none> 214.05 244.05
## - Age1 1 218.26 246.26
## - ReferalRoute1 1 219.90 247.90
## - ICHOperation 1 220.05 248.05
## - SOFA1 2 226.68 252.68
## - CVC 1 229.80 257.80
## - AdjunctiveTreatment 1 245.62 273.62
##
## Step: AIC=241.17
## Status14 ~ Age1 + ReferalRoute1 + SepticShock + SOFA1 + Charlson1 +
## CVC + EndotrachealTube + Dialysis + ICHOperation + AdjunctiveTreatment
##
## Df Deviance AIC
## - SepticShock 1 215.19 239.19
## - Charlson1 2 217.48 239.48
## - EndotrachealTube 1 215.69 239.69
## - Dialysis 1 215.93 239.93
## <none> 215.17 241.17
## - Age1 1 219.11 243.11
## - ICHOperation 1 220.79 244.79
## - ReferalRoute1 1 221.49 245.49
## - SOFA1 2 226.98 248.98
## - CVC 1 229.81 253.81
## - AdjunctiveTreatment 1 245.88 269.88
##
## Step: AIC=239.19
## Status14 ~ Age1 + ReferalRoute1 + SOFA1 + Charlson1 + CVC + EndotrachealTube +
## Dialysis + ICHOperation + AdjunctiveTreatment
##
## Df Deviance AIC
## - Charlson1 2 217.49 237.49
## - EndotrachealTube 1 215.72 237.72
## - Dialysis 1 215.99 237.99
## <none> 215.19 239.19
## - Age1 1 219.15 241.15
## - ICHOperation 1 220.97 242.97
## - ReferalRoute1 1 221.49 243.49
## - SOFA1 2 227.57 247.57
## - CVC 1 229.97 251.97
## - AdjunctiveTreatment 1 245.92 267.92
##
## Step: AIC=237.49
## Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + EndotrachealTube +
## Dialysis + ICHOperation + AdjunctiveTreatment
##
## Df Deviance AIC
## - EndotrachealTube 1 218.00 236.00
## - Dialysis 1 218.12 236.12
## <none> 217.49 237.49
## - Age1 1 224.05 242.05
## - ReferalRoute1 1 224.64 242.64
## - ICHOperation 1 224.92 242.92
## - SOFA1 2 231.12 247.12
## - CVC 1 232.17 250.17
## - AdjunctiveTreatment 1 249.85 267.85
##
## Step: AIC=236
## Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + Dialysis + ICHOperation +
## AdjunctiveTreatment
##
## Df Deviance AIC
## - Dialysis 1 218.81 234.81
## <none> 218.00 236.00
## - Age1 1 224.16 240.16
## - ICHOperation 1 225.15 241.15
## - ReferalRoute1 1 225.84 241.84
## - SOFA1 2 233.23 247.23
## - CVC 1 234.20 250.20
## - AdjunctiveTreatment 1 249.90 265.90
##
## Step: AIC=234.81
## Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + ICHOperation +
## AdjunctiveTreatment
##
## Df Deviance AIC
## <none> 218.81 234.81
## - Age1 1 224.88 238.88
## - ICHOperation 1 226.58 240.58
## - ReferalRoute1 1 226.71 240.71
## - SOFA1 2 233.23 245.23
## - CVC 1 234.54 248.54
## - AdjunctiveTreatment 1 250.41 264.41
##
## Call: glm(formula = Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC +
## ICHOperation + AdjunctiveTreatment, family = binomial, data = data)
##
## Coefficients:
## (Intercept) Age11 ReferalRoute11
## -4.6828 0.8812 1.1065
## SOFA11 SOFA12 CVC1
## 1.3027 2.2960 1.5223
## ICHOperation1 AdjunctiveTreatment1
## 1.7313 2.9714
##
## Degrees of Freedom: 248 Total (i.e. Null); 241 Residual
## Null Deviance: 299.5
## Residual Deviance: 218.8 AIC: 234.8
model_final <- glm(formula = Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC +
ICHOperation + AdjunctiveTreatment, family = binomial, data = data)
model_final <- tbl_regression(model_final, exponentiate = TRUE)
model_final| Characteristic | OR1 | 95% CI1 | p-value |
|---|---|---|---|
| Age1 | |||
| 0 | — | — | |
| 1 | 2.41 | 1.20, 4.96 | 0.015 |
| ReferalRoute1 | |||
| 0 | — | — | |
| 1 | 3.02 | 1.38, 7.12 | 0.008 |
| SOFA1 | |||
| 0 | — | — | |
| 1 | 3.68 | 1.47, 10.7 | 0.009 |
| 2 | 9.93 | 2.85, 38.7 | <0.001 |
| CVC | |||
| 0 | — | — | |
| 1 | 4.58 | 2.11, 10.8 | <0.001 |
| ICHOperation | |||
| 0 | — | — | |
| 1 | 5.65 | 1.68, 20.0 | 0.006 |
| AdjunctiveTreatment | |||
| 0 | — | — | |
| 1 | 19.5 | 6.46, 68.7 | <0.001 |
| 1 OR = Odds Ratio, CI = Confidence Interval | |||
The backward stepwise method was used to select the final set of
variables in the most fitted multivariable model, which had the lowest
AIC. The final logistic regression models included six variables: age,
referral route, SOFA score, central venous catheter, intracerebral
haemorrhage operation and absence of adjunctive therapy. Absence of
adjunctive therapy (OR 19.5, 95%CI: 6.46–68.7) and SOFA score (score
4–11 vs score 0-3: OR 3.68, 95%CI: 1.47–10.7; score ≥12 vs score 0-3: OR
9.93, 95%CI: 2.85–38.74) were the two strongest predictors of 14-day
mortality in the study population. They were followed by having
intracerebral haemorrhage operation (OR 5.65, 95%CI: 1.68–20.0) and
having central venous catheter (OR 4.58, 95%CI: 2.11–10.8). Based on the
regression coefficients and intercept, the prediction model to predict
14-day mortality of ICU patients infected with K. pneumoniae was
presented as below:
Probability of 14-day mortality = 1/(1+exp-(-4.68 + 0.88 × Age ≥65 +
1.11 × Intra-hospital referral + @1.30 ×
SOFA score 4–11 + 2.29 × SOFA score ≥12 + 1.52 × Central venous catheter
+ 1.73 × Intracerebral haemorrhage surgery + 2.97 × Absence of
adjunctive therapy)
# Q3: Model performance assessment ------------------------------------------------------------------------------
model_final = glm(formula = Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC +
ICHOperation + AdjunctiveTreatment, family = binomial, data = data)
summary(model_final)##
## Call:
## glm(formula = Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC +
## ICHOperation + AdjunctiveTreatment, family = binomial, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8796 -0.7995 -0.3974 0.4541 2.6129
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.6828 0.6923 -6.764 1.34e-11 ***
## Age11 0.8812 0.3610 2.441 0.014655 *
## ReferalRoute11 1.1065 0.4154 2.664 0.007724 **
## SOFA11 1.3027 0.4997 2.607 0.009134 **
## SOFA12 2.2960 0.6604 3.477 0.000508 ***
## CVC1 1.5223 0.4147 3.671 0.000242 ***
## ICHOperation1 1.7313 0.6257 2.767 0.005657 **
## AdjunctiveTreatment1 2.9714 0.5984 4.966 6.85e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 299.49 on 248 degrees of freedom
## Residual deviance: 218.81 on 241 degrees of freedom
## AIC: 234.81
##
## Number of Fisher Scoring iterations: 5
preds = predict(model_final, type="response")
preds## 1 2 3 4 5 6
## 0.134962074 0.032923400 0.532444750 0.032923400 0.560220974 0.040679615
## 7 8 9 10 11 12
## 0.504191613 0.075937470 0.320546177 0.009168121 0.027217619 0.075937470
## 13 14 15 16 17 18
## 0.027217619 0.113650703 0.032923400 0.113650703 0.040679615 0.113650703
## 19 20 21 22 23 24
## 0.075937470 0.320546177 0.181599568 0.032923400 0.399234377 0.093335037
## 25 26 27 28 29 30
## 0.420016385 0.367652200 0.161268657 0.152985499 0.367652200 0.093335037
## 31 32 33 34 35 36
## 0.583928729 0.704083066 0.367652200 0.367652200 0.320546177 0.367652200
## 37 38 39 40 41 42
## 0.113650703 0.399234377 0.236357034 0.754597256 0.583928729 0.027217619
## 43 44 45 46 47 48
## 0.273575881 0.032923400 0.532444750 0.532444750 0.199032398 0.273575881
## 49 50 51 52 53 54
## 0.040679615 0.134962074 0.113650703 0.754597256 0.027217619 0.532444750
## 55 56 57 58 59 60
## 0.113650703 0.532444750 0.032923400 0.865441049 0.092854134 0.320546177
## 61 62 63 64 65 66
## 0.273575881 0.560220974 0.754597256 0.199032398 0.320546177 0.320546177
## 67 68 69 70 71 72
## 0.040679615 0.113650703 0.320546177 0.320546177 0.961339036 0.134962074
## 73 74 75 76 77 78
## 0.320546177 0.009168121 0.320546177 0.902046649 0.320546177 0.320546177
## 79 80 81 82 83 84
## 0.532444750 0.532444750 0.134962074 0.273575881 0.032923400 0.113650703
## 85 86 87 88 89 90
## 0.134962074 0.027217619 0.075937470 0.667713222 0.093335037 0.021847311
## 91 92 93 94 95 96
## 0.199032398 0.320546177 0.273575881 0.092854134 0.063264812 0.093335037
## 97 98 99 100 101 102
## 0.113650703 0.063264812 0.273575881 0.945553850 0.532444750 0.199032398
## 103 104 105 106 107 108
## 0.199032398 0.296408847 0.113650703 0.560220974 0.009168121 0.093335037
## 109 110 111 112 113 114
## 0.902046649 0.902046649 0.152985499 0.134962074 0.320546177 0.134962074
## 115 116 117 118 119 120
## 0.532444750 0.296408847 0.040679615 0.755177097 0.093335037 0.040679615
## 121 122 123 124 125 126
## 0.919022230 0.303613326 0.320546177 0.667713222 0.320546177 0.093335037
## 127 128 129 130 131 132
## 0.273575881 0.504191613 0.560220974 0.093335037 0.754597256 0.320546177
## 133 134 135 136 137 138
## 0.040679615 0.199032398 0.032923400 0.040679615 0.236357034 0.320546177
## 139 140 141 142 143 144
## 0.113650703 0.021847311 0.199032398 0.992097754 0.093335037 0.880258758
## 145 146 147 148 149 150
## 0.320546177 0.961339036 0.952038399 0.752810497 0.063264812 0.093335037
## 151 152 153 154 155 156
## 0.532444750 0.093335037 0.063264812 0.320546177 0.199032398 0.891579760
## 157 158 159 160 161 162
## 0.296408847 0.320546177 0.560220974 0.113650703 0.754597256 0.532444750
## 163 164 165 166 167 168
## 0.027217619 0.093335037 0.027217619 0.075937470 0.093335037 0.199032398
## 169 170 171 172 173 174
## 0.032923400 0.667713222 0.532444750 0.727110661 0.136458593 0.199032398
## 175 176 177 178 179 180
## 0.320546177 0.532444750 0.027217619 0.296408847 0.063264812 0.063264812
## 181 182 183 184 185 186
## 0.075937470 0.532444750 0.093335037 0.113650703 0.113650703 0.009168121
## 187 188 189 190 191 192
## 0.532444750 0.027217619 0.532444750 0.399234377 0.093335037 0.320546177
## 193 194 195 196 197 198
## 0.021847311 0.113650703 0.320546177 0.093335037 0.320546177 0.217506721
## 199 200 201 202 203 204
## 0.093335037 0.320546177 0.027217619 0.063264812 0.273575881 0.320546177
## 205 206 207 208 209 210
## 0.027217619 0.032923400 0.320546177 0.320546177 0.075937470 0.021847311
## 211 212 213 214 215 216
## 0.320546177 0.956950433 0.857988295 0.093335037 0.532444750 0.199032398
## 217 218 219 220 221 222
## 0.063264812 0.667713222 0.199032398 0.320546177 0.199032398 0.032923400
## 223 224 225 226 227 228
## 0.320546177 0.134962074 0.320546177 0.134962074 0.320546177 0.075937470
## 229 230 231 232 233 234
## 0.063264812 0.093335037 0.199032398 0.560220974 0.320546177 0.063264812
## 235 236 237 238 239 240
## 0.273575881 0.320546177 0.320546177 0.113650703 0.296408847 0.367652200
## 241 242 243 244 245 246
## 0.320546177 0.560220974 0.009168121 0.134962074 0.829074883 0.320546177
## 247 248 249
## 0.353233033 0.752810497 0.320546177
# --> probabilities on logit scale
#Discrimination:
roc1 <- roc(data$Status14, preds, plot=TRUE, main="ROC curve of final model", col= "red", print.auc=TRUE)## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
roc1##
## Call:
## roc.default(response = data$Status14, predictor = preds, plot = TRUE, main = "ROC curve of final model", col = "red", print.auc = TRUE)
##
## Data: preds in 177 controls (data$Status14 0) < 72 cases (data$Status14 1).
## Area under the curve: 0.8262
ci.auc(roc1)## 95% CI: 0.7692-0.8831 (DeLong)
Figure 1 showed the discrimination and calibration plot of the final model. The figure 1.A shows showed high predictive accuracy of the final model. The ROC area of the reduced model was 0.826 (95% confidence interval [CI], 0.7692-0.8831). It means that there was 82.6% chance that the model will be able to distinguish between presence and absence of 14-day mortality.
#Calibration:
data <- as.data.frame(data)
#data$Status14 = as.numeric(data$Status14)
plotCalibration(data,20,preds,groups=10) ## $Table_HLtest
## total meanpred meanobs predicted observed
## [0.00917,0.0407) 31 0.026 0.032 0.79 1
## [0.04068,0.0929) 26 0.060 0.115 1.57 3
## [0.09285,0.1137) 21 0.093 0.048 1.96 1
## [0.11365,0.1365) 26 0.122 0.115 3.17 3
## [0.13646,0.2736) 22 0.194 0.227 4.26 5
## [0.27358,0.3532) 55 0.310 0.291 17.07 16
## [0.35323,0.5602) 30 0.475 0.433 14.24 13
## [0.56022,0.7271) 14 0.605 0.714 8.46 10
## [0.72711,0.9921] 24 0.853 0.833 20.48 20
##
## $Chi_square
## [1] 3.229
##
## $df
## [1] 8
##
## $p_value
## [1] 0.9192
Figure 1.B presented good calibration as the predicted probabilities are reasonably similar to the observed probabilities across the distribution. The line represents the perfect model calibration. All spots represent 10% of the patients. Most spots were precisely within the reference line. It means that the predicted probabilities were closed to observed probabilities, which was confirmed by the Hosmer-Lemeshow test. The p-value is above alpha = 0.05 (accepted H0), so the null hypothesis that the observed and predicted risks are the same. It means that this model indicated a good calibration.
#4.1. . Construct a simplified risk score
data$Age1_RC=as.numeric(levels(data$Age1)[data$Age1])
data$ReferalRoute1=as.numeric(levels(data$ReferalRoute1)[data$ReferalRoute1])
data$CVC=as.numeric(levels(data$CVC)[data$CVC])
data$ICHOperation=as.numeric(levels(data$ICHOperation)[data$ICHOperation])
data$AdjunctiveTreatment=as.numeric(levels(data$AdjunctiveTreatment)[data$AdjunctiveTreatment])
data$SOFA1_RC1[data$SOFA1=="0"|data$SOFA1=="2"]=0
data$SOFA1_RC1[data$SOFA1=="1"]=1
data$SOFA1_RC2[data$SOFA1=="0"|data$SOFA1=="1"]=0
data$SOFA1_RC2[data$SOFA1=="2"]=1
data$TotalScore = data$Age1_RC + 1* data$ReferalRoute1 + 2* data$CVC +
2* data$ICHOperation +3* data$AdjunctiveTreatment +
1* data$SOFA1_RC1 + 3* data$SOFA1_RC2
#Vidualization the rík score
risk <- data %>%
dplyr::select(Status14, TotalScore) %>%
mutate(TotalScore = as_factor(TotalScore)) %>%
group_by(TotalScore) %>%
summarise(number = n())
risk1 <- data %>%
dplyr::select(TotalScore, Status14) %>%
mutate(TotalScore = as_factor(TotalScore)) %>%
group_by(TotalScore, Status14) %>%
summarise(number = n()) %>%
mutate(pct = number*100/sum(number)) %>%
filter(Status14 == "1") %>% dplyr::select(-number)## `summarise()` has grouped output by 'TotalScore'. You can override using the
## `.groups` argument.
risk <- risk %>%
left_join(risk1, by = "TotalScore")
head(risk)data <- data %>%
mutate(score = case_when(TotalScore < 2 ~ 'Low risk',
TotalScore < 4 & TotalScore >1 ~ 'Moderate risk',
TotalScore < 7 & TotalScore >3 ~ 'High risk',
TotalScore >6 ~ 'Very high risk'))
data <- data %>%
mutate(score = fct_relevel(score, c("Low risk", "Moderate risk", "High risk", "Very high risk")))
risk3 <- data %>%
dplyr::select(Status14, score) %>%
mutate(score = as_factor(score)) %>%
group_by(score) %>%
summarise(number = n())
risk4 <- data %>%
dplyr::select(score, Status14) %>%
mutate(score = as_factor(score)) %>%
group_by(score, Status14) %>%
summarise(number = n()) %>%
mutate(pct = number*100/sum(number)) %>%
filter(Status14 == "1") %>% dplyr::select(-number)## `summarise()` has grouped output by 'score'. You can override using the
## `.groups` argument.
risk3 <- risk3 %>%
left_join(risk4, by = "score")
head(risk3)The distribution of simplified risk score was presented in Figure 2. The score ranged from 0 to 10. There were no patients with scores of 11 or 12. The simplified risk score gradually increased from 0 to 4, then decreased dramatically from 4 to 10. The majority of patients reported having a 4 score, accounting for 25.7% of all participants. Following that, 18.1% and 18.5% of patients reported having a score of 2 or 3 respectively. The higher simplified risk score reported lowest patients, specifically 8 (0.8%), 9 (1.6%), and 10 (0.4 percent). Four risk categories of simplified risk score were defined including (1) low risk (score 0–1), (2) moderate risk (score 2–3), (3) high risk (score 4–6) and (4) very high risk (score 7–12). The simplified risk score category gradually increased from low to high, then decreased dramatically from high to very high risk. The mortality risk corresponding to a specific score is measured by the number of patients who died out of the total number of patients having that score. The disctribution of the mortality risk corresponding to a specific score was presented in Figure 2. Figure 2.A showed that when the score increases from 0 to 10, the probability of 14-day mortality increases from 0 to 100%. Notably, all patients have risk score more than 7 died within 14 days of observation. The similar pattern was observed among risk categories, 14-day mortality probabilities of low risk, moderate risk, high risk and very high risk were graduated increased from 3.2% in low-risk group to 85% in very high-risk group. In conclusion, as the score rises from 0 to 10, the probability of 14-day mortality rises from 0% to 100%.
ggplot(data = risk) +
geom_bar(aes(x = TotalScore, y = number), fill = "orange",
stat = "identity", position="dodge") +
geom_line(aes(x = TotalScore, y = pct)) +
geom_point(aes(x = TotalScore, y = pct)) +
theme_bw() +
scale_y_continuous(
# Features of the first axis
name = "Number and Percentage patients with each score (n%)",
# Add a second axis and specify its features
sec.axis = sec_axis(~./1, name="Mortality risk (%)")
)## Warning: Removed 1 row(s) containing missing values (geom_path).
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot(data = risk3) +
geom_bar(aes(x = score, y = number), fill = "orange",
stat = "identity", position="dodge") +
geom_line(aes(x = score, y = pct)) +
geom_point(aes(x = score, y = pct)) +
theme_bw() +
scale_y_continuous(
# Features of the first axis
name = "Percentage patients with each category (n%)",
# Add a second axis and specify its features
sec.axis = sec_axis(~./1, name="Mortality risk (%)")
)## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
m_scr<-glm(Status14~TotalScore, family = binomial(), data = data)
tab_model(m_scr)| Status 14 | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.02 | 0.01 – 0.04 | <0.001 |
| TotalScore | 2.26 | 1.82 – 2.91 | <0.001 |
| Observations | 249 | ||
| R2 Tjur | 0.294 | ||
p_scr<-predict(m_scr, type = "response")
plotCalibration(data,which(colnames(data) == "Status14"),p_scr,groups = 10)## $Table_HLtest
## total meanpred meanobs predicted observed
## [0.0159,0.0765) 31 0.032 0.032 1.00 1
## 0.0765 45 0.077 0.089 3.44 4
## 0.1579 46 0.158 0.109 7.26 5
## 0.2977 64 0.298 0.328 19.06 21
## 0.4895 29 0.490 0.517 14.20 15
## 0.6844 14 0.684 0.643 9.58 9
## [0.8307,0.9827] 20 0.873 0.850 17.46 17
##
## $Chi_square
## [1] 1.509
##
## $df
## [1] 8
##
## $p_value
## [1] 0.9926
roc_scr<-roc(data$Status14,p_scr, plot=TRUE, xlim = c(1,0)) ## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
lines.roc(data$Status14, preds, col="red")## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
ci.auc(roc_scr)## 95% CI: 0.7637-0.8759 (DeLong)
roc.test(roc1,roc_scr)##
## DeLong's test for two correlated ROC curves
##
## data: roc1 and roc_scr
## Z = 0.72104, p-value = 0.4709
## alternative hypothesis: true difference in AUC is not equal to 0
## 95 percent confidence interval:
## -0.01085367 0.02348707
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.8261535 0.8198368
Figure 3 showed the similar predictive accuracy of the the risk score model compared to the final model above. Compared with the original model, the simplified score yielded a similar predictive performance (AUC 0.82, 95% CI 0.76–0.88). This was confirmed by the DeLong’s test for two correlated ROC curves with p-value = 0.4709 > 0.05. The calibration plot in Figure 3.B showed that the spots on the right represents the higher predicted probability of mortality and a somewhat higher observed mortality in the same patients on the y-axis. This indicates that the score model was slight overestimated in those patients in the higher estimated disease probability range. However, differences between predicted and observed probabilities was not significant, which was presented in the goodness-of-fit test. The p-value is above alpha = 0.05 (accepted H0), so the null hypothesis that the observed and predicted risks are the same. It means that this model still well predicted the outcome.
A work by Khanh Huyen - 30 October 2022