This project is about Predicting mortality in intensive care unit patients infected with Klebsiella pneumoniae. This study aimed to develop a prediction model and simplified risk score to predict 14-day mortality in ICU patients infected with Klebsiella pneumoniae.A retrospective cohort study was conducted using data from 249 ICU patients infected with Klebsiella pneumoniae in Vietnam. The outcome of this study was 14-day mortality. Eighteen variables were selected to be predictors of the outcome, including both clinical and non-clinical information and treatment information. A prediction model using logistic regression was applied, and a simplified risk score was also constructed based on the result from the prediction model. Calibration (area under the receiver operating characteristic curve, or AUC) and discrimination (Hosmer-Lemeshow goodness-of-fit test) were used to test the model performance.This study’s findings provided a valuable tool for predicting 14-day mortality in ICU patients infected with Klebsiella pneumoniae.

1 Data analysis

1.1 Import data

pacman::p_load(
  rio,          # File import
  here,         # File locator
  lubridate, janitor, magrittr,
  ggplot2, pROC,   epiDisplay,readr,gtsummary, mlr,PredictABEL,table1,
  sjPlot,sjmisc,sjlabelled, MASS, flextable, officer, tidyverse)
setwd("E:\\Clinical epi_assignment 3\\Clinical epi_assignment 3\\Clinical epi_assignment 3\\Clinical epi_assignment 3\\Huyen")

data <- read_csv("Kleb.csv")

1.2 Data investigation

Checking for missing data

which(is.na(data$ID))

## integer(0)

which(is.na(data$Age1))

## integer(0)

which(is.na(data$Gender))

## integer(0)

which(is.na(data$ReferalRoute1))

## integer(0)

which(is.na(data$Bacteraemia))

## integer(0)

which(is.na(data$SepticShock))

## integer(0)

which(is.na(data$Apache1))

## integer(0)

which(is.na(data$SOFA1))

## integer(0)

which(is.na(data$Charlson1))

## integer(0)

which(is.na(data$CVC))

## integer(0)

which(is.na(data$UrinaryCatheter))

## integer(0)

which(is.na(data$EndotrachealTube))

## integer(0)

which(is.na(data$Dialysis))

## integer(0)

which(is.na(data$ICHOperation))

## integer(0)

which(is.na(data$CarbapenemResistance))

## integer(0)

which(is.na(data$CoInfection))

## integer(0)

Convert variable to factor

names(data)

##  [1] "ID"                   "Age1"                 "Gender"              
##  [4] "ReferalRoute1"        "Bacteraemia"          "SepticShock"         
##  [7] "Apache1"              "SOFA1"                "Charlson1"           
## [10] "CVC"                  "UrinaryCatheter"      "EndotrachealTube"    
## [13] "Dialysis"             "ICHOperation"         "CarbapenemResistance"
## [16] "CoInfection"          "EmpiricalAB"          "DefinitiveAB"        
## [19] "AdjunctiveTreatment"  "Status14"

data$Age1 = as.factor(data$Age1)
data$Gender = as.factor(data$Gender)
data$ReferalRoute1 = as.factor(data$ReferalRoute1)
data$Bacteraemia = as.factor(data$Bacteraemia)
data$SepticShock = as.factor(data$SepticShock)
data$Apache1 = as.factor(data$Apache1)
data$SOFA1 = as.factor(data$SOFA1)
data$Charlson1 = as.factor(data$Charlson1)
data$CVC = as.factor(data$CVC)
data$UrinaryCatheter = as.factor(data$UrinaryCatheter)
data$EndotrachealTube = as.factor(data$EndotrachealTube)
data$CoInfection = as.factor(data$CoInfection)
data$EmpiricalAB = as.factor(data$EmpiricalAB)
data$DefinitiveAB = as.factor(data$DefinitiveAB)
data$AdjunctiveTreatment = as.factor(data$AdjunctiveTreatment)
data$Dialysis = as.factor(data$Dialysis)
data$ICHOperation = as.factor(data$ICHOperation)
data$CarbapenemResistance = as.factor(data$CarbapenemResistance)

1.3 Descriptive analysis

Table 1. Characteristics of ICU patients infected with Klebsiella pneumoniae in this study

data %>% 
  # keep only the columns of interest
  dplyr::select(Status14, Age1, Gender, ReferalRoute1, 
         Bacteraemia, SepticShock, Apache1, SOFA1, Charlson1, 
         CVC, UrinaryCatheter,EndotrachealTube, Dialysis, ICHOperation,
         CarbapenemResistance, CoInfection, EmpiricalAB, DefinitiveAB,
         AdjunctiveTreatment) %>%  
  tbl_summary(by = Status14, percent = "row") %>% 
  add_p()

Characteristic	0, N = 177¹	1, N = 72¹	p-value²
Age1			0.11
0	122 (74%)	42 (26%)
1	55 (65%)	30 (35%)
Gender			0.3
0	49 (77%)	15 (23%)
1	128 (69%)	57 (31%)
ReferalRoute1			0.023
0	63 (81%)	15 (19%)
1	114 (67%)	57 (33%)
Bacteraemia			0.3
0	146 (73%)	55 (27%)
1	31 (65%)	17 (35%)
SepticShock			0.053
0	153 (74%)	55 (26%)
1	24 (59%)	17 (41%)
Apache1			0.008
0	97 (80%)	25 (20%)
1	39 (68%)	18 (32%)
2	41 (59%)	29 (41%)
SOFA1			<0.001
0	59 (89%)	7 (11%)
1	107 (69%)	49 (31%)
2	11 (41%)	16 (59%)
Charlson1			0.005
0	70 (84%)	13 (16%)
1	30 (67%)	15 (33%)
2	77 (64%)	44 (36%)
CVC			<0.001
0	90 (83%)	18 (17%)
1	87 (62%)	54 (38%)
UrinaryCatheter			0.3
0	157 (72%)	60 (28%)
1	20 (62%)	12 (38%)
EndotrachealTube			0.12
0	62 (78%)	18 (22%)
1	115 (68%)	54 (32%)
Dialysis			0.073
0	133 (74%)	46 (26%)
1	44 (63%)	26 (37%)
ICHOperation			0.010
0	169 (73%)	62 (27%)
1	8 (44%)	10 (56%)
CarbapenemResistance			0.4
0	51 (75%)	17 (25%)
1	126 (70%)	55 (30%)
CoInfection			0.4
0	93 (73%)	34 (27%)
1	84 (69%)	38 (31%)
EmpiricalAB			>0.9
0	116 (71%)	47 (29%)
1	61 (71%)	25 (29%)
DefinitiveAB			0.7
0	153 (71%)	61 (29%)
1	24 (69%)	11 (31%)
AdjunctiveTreatment			<0.001
0	169 (76%)	53 (24%)
1	8 (30%)	19 (70%)
¹ n (%)
² Pearson's Chi-squared test

The characteristics of were summarized in Table 1. Among 249 patients, the overall 14-day all-cause mortality rate in the total population was 28.9%. In terms of demographic factors, under-65-year-olds, men, and intra-hospital referred patients accounted for the majority of the population, accounting for 65.9%, 71.3%, and 68.7% of the total population, respectively. Patients who were referred within the hospital experienced a 14-day all-cause mortality rate that was significantly higher than patients who were referred outside the hospital (p-value = 0.023). In terms of health conditions, 19.3% of participants reported having bacteremia, and 16.5 %reported having septic shock. There was no significant difference of these two factors among mortality status. Additionally, most participants reported having a SOFA score greater than or equal to 4 (73%), a Charlson index of 1 and greater than 1 (67%), and an APACHE II score of 15 or above (51%). The SOFA score, Charlson index, and APACHE II score were all significantly higher in patients who died recently within 14 days. The two most frequently carried out invasive procedures were the insertion of an endotracheal tube (68%) and a central venous catheter (57%). Regarding pathogen factors, 72.7% and 49% of participants reported to have carbapenem resistance and co-infection respectively. There was no significant difference of these two factors among mortality status. In terms of treatment factors, 34.5%, 14.1% and 10.8% of participants reported to have inadequate empirical antibiotic therapy, inadequate definitive antibiotic therapy, and absence of adjunctive therapy respectively.

2 Develop a diganostic model

2.1 Univariate analysis

# Step 1: univariable analysis, selecting variables at p<0.15 to include in multivariable analysis. 

fit_null = glm(Status14 ~ 1, family=binomial, data=data)
fit_age = glm(Status14 ~ Age1, family=binomial, data=data)
fit_gender = glm(Status14 ~ Gender, family=binomial, data=data)
fit_ReferalRoute1 = glm(Status14 ~ ReferalRoute1, family=binomial, data=data)
fit_Bacteraemia = glm(Status14 ~ Bacteraemia, family=binomial, data=data)
fit_SepticShock = glm(Status14 ~ SepticShock, family=binomial, data=data)
fit_Apache1 = glm(Status14 ~ Apache1, family=binomial, data=data)
fit_SOFA1 = glm(Status14 ~ SOFA1, family=binomial, data=data)
fit_Charlson1 = glm(Status14 ~ Charlson1, family=binomial, data=data)
fit_CVC= glm(Status14 ~ CVC, family=binomial, data=data)
fit_UrinaryCatheter = glm(Status14 ~ UrinaryCatheter, family=binomial, data=data)
fit_EndotrachealTube = glm(Status14 ~ EndotrachealTube, family=binomial, data=data)
fit_Dialysis = glm(Status14 ~ Dialysis, family=binomial, data=data)
fit_ICHOperation = glm(Status14 ~ ICHOperation, family=binomial, data=data)
fit_CarbapenemResistance = glm(Status14 ~ CarbapenemResistance, family=binomial, data=data)
fit_CoInfection = glm(Status14 ~ CoInfection, family=binomial, data=data)
fit_EmpiricalAB = glm(Status14 ~ EmpiricalAB, family=binomial, data=data)
fit_DefinitiveAB = glm(Status14 ~ DefinitiveAB, family=binomial, data=data)
fit_AdjunctiveTreatment = glm(Status14 ~ AdjunctiveTreatment, family=binomial, data=data)


lrtest(fit_age,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  2.51159 , P value =  0.1130118

lrtest(fit_gender,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  1.293413 , P value =  0.2554199

lrtest(fit_ReferalRoute1,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  5.436675 , P value =  0.01971817

lrtest(fit_Bacteraemia,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  1.18699 , P value =  0.2759375

lrtest(fit_SepticShock,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  3.562357 , P value =  0.05910375

lrtest(fit_Apache1,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 2 d.f. =  9.681283 , P value =  0.007901983

lrtest(fit_SOFA1,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 2 d.f. =  24.18238 , P value =  5.608708e-06

lrtest(fit_Charlson1,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 2 d.f. =  11.53223 , P value =  0.003131905

lrtest(fit_CVC,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  14.5014 , P value =  0.0001400558

lrtest(fit_UrinaryCatheter,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  1.261648 , P value =  0.2613394

lrtest(fit_EndotrachealTube,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  2.427302 , P value =  0.1192379

lrtest(fit_Dialysis,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  3.118748 , P value =  0.07739623

lrtest(fit_ICHOperation,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  6.037446 , P value =  0.01400553

lrtest(fit_CarbapenemResistance,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  0.7108019 , P value =  0.3991777

lrtest(fit_CoInfection,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  0.5797528 , P value =  0.4464092

lrtest(fit_EmpiricalAB,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  0.001517121 , P value =  0.9689301

lrtest(fit_DefinitiveAB,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  0.1234125 , P value =  0.7253625

lrtest(fit_AdjunctiveTreatment,fit_null)

## Likelihood ratio test for MLE method 
## Chi-squared 1 d.f. =  22.64712 , P value =  1.946516e-06

The result of univariate and multivariate logistic regression models with the 14-day mortality status as the outcome was presented in Table 2. The univariable analysis were developed among all eighteen predictors, in which, each univariable model was compared with the null model by a likelihood ratio test. As a result, eleven variables with p-value <0.15 from univariate analyses were included in the initial multivariable model, which included age, route of referral, APACHE II score, SOFA score, Charlson comorbidity index, septic shock, central venous catheter, endotracheal tube, dialysis, intracerebral haemorrhage operation and adjunctive treatment.

# Step 2. backward selection using the likelihood ratio test with p-value threshold of 0.1.
# Full model with Oral_tem, Breath_min had p<0.15 from univariable analysis
model_full <-  glm(Status14 ~ Age1 + ReferalRoute1 +SepticShock
                   +Apache1 + SOFA1 + Charlson1 
                   +CVC + EndotrachealTube + Dialysis 
                   + ICHOperation + AdjunctiveTreatment
                   , family=binomial, data=data)
car::vif(model_full)

##                         GVIF Df GVIF^(1/(2*Df))
## Age1                1.203367  1        1.096981
## ReferalRoute1       1.133702  1        1.064755
## SepticShock         1.252033  1        1.118943
## Apache1             1.383559  2        1.084550
## SOFA1               1.401880  2        1.088122
## Charlson1           1.310337  2        1.069906
## CVC                 1.621824  1        1.273509
## EndotrachealTube    1.171515  1        1.082365
## Dialysis            1.373008  1        1.171754
## ICHOperation        1.246831  1        1.116616
## AdjunctiveTreatment 1.287982  1        1.134893

model_full <- tbl_regression(model_full, exponentiate = TRUE)
model_full

Characteristic	OR¹	95% CI¹	p-value
Age1
0	—	—
1	2.20	1.04, 4.77	0.042
ReferalRoute1
0	—	—
1	2.70	1.20, 6.50	0.020
SepticShock
0	—	—
1	0.91	0.35, 2.24	0.8
Apache1
0	—	—
1	0.97	0.41, 2.27	>0.9
2	0.64	0.25, 1.53	0.3
SOFA1
0	—	—
1	3.79	1.46, 11.4	0.010
2	11.4	2.77, 51.9	0.001
Charlson1
0	—	—
1	1.75	0.61, 5.02	0.3
2	2.07	0.87, 5.12	0.10
CVC
0	—	—
1	5.67	2.35, 14.8	<0.001
EndotrachealTube
0	—	—
1	1.42	0.64, 3.24	0.4
Dialysis
0	—	—
1	0.72	0.31, 1.66	0.4
ICHOperation
0	—	—
1	4.88	1.37, 18.3	0.015
AdjunctiveTreatment
0	—	—
1	21.2	6.80, 77.0	<0.001
¹ OR = Odds Ratio, CI = Confidence Interval

The two strongest predictors of 14-day mortality in the study population were adjunctive treatment and SOFA score. Multicollinearity of the initial multivariable model was checked by generalized variance-inflation factors (GVIFs). Since GVIF<5 (1.13-1.62) of all eleven variables, multicollinearity was not an issue in this initial multivariable model. ## Backward selection

model_full <-  glm(Status14 ~ Age1 + ReferalRoute1 +SepticShock
                   +Apache1 + SOFA1 + Charlson1 
                   +CVC + EndotrachealTube + Dialysis 
                   + ICHOperation + AdjunctiveTreatment
                   , family=binomial, data=data)
stepAIC(model_full)

## Start:  AIC=244.05
## Status14 ~ Age1 + ReferalRoute1 + SepticShock + Apache1 + SOFA1 + 
##     Charlson1 + CVC + EndotrachealTube + Dialysis + ICHOperation + 
##     AdjunctiveTreatment
## 
##                       Df Deviance    AIC
## - Apache1              2   215.17 241.17
## - SepticShock          1   214.09 242.09
## - Dialysis             1   214.63 242.63
## - EndotrachealTube     1   214.79 242.79
## - Charlson1            2   216.81 242.81
## <none>                     214.05 244.05
## - Age1                 1   218.26 246.26
## - ReferalRoute1        1   219.90 247.90
## - ICHOperation         1   220.05 248.05
## - SOFA1                2   226.68 252.68
## - CVC                  1   229.80 257.80
## - AdjunctiveTreatment  1   245.62 273.62
## 
## Step:  AIC=241.17
## Status14 ~ Age1 + ReferalRoute1 + SepticShock + SOFA1 + Charlson1 + 
##     CVC + EndotrachealTube + Dialysis + ICHOperation + AdjunctiveTreatment
## 
##                       Df Deviance    AIC
## - SepticShock          1   215.19 239.19
## - Charlson1            2   217.48 239.48
## - EndotrachealTube     1   215.69 239.69
## - Dialysis             1   215.93 239.93
## <none>                     215.17 241.17
## - Age1                 1   219.11 243.11
## - ICHOperation         1   220.79 244.79
## - ReferalRoute1        1   221.49 245.49
## - SOFA1                2   226.98 248.98
## - CVC                  1   229.81 253.81
## - AdjunctiveTreatment  1   245.88 269.88
## 
## Step:  AIC=239.19
## Status14 ~ Age1 + ReferalRoute1 + SOFA1 + Charlson1 + CVC + EndotrachealTube + 
##     Dialysis + ICHOperation + AdjunctiveTreatment
## 
##                       Df Deviance    AIC
## - Charlson1            2   217.49 237.49
## - EndotrachealTube     1   215.72 237.72
## - Dialysis             1   215.99 237.99
## <none>                     215.19 239.19
## - Age1                 1   219.15 241.15
## - ICHOperation         1   220.97 242.97
## - ReferalRoute1        1   221.49 243.49
## - SOFA1                2   227.57 247.57
## - CVC                  1   229.97 251.97
## - AdjunctiveTreatment  1   245.92 267.92
## 
## Step:  AIC=237.49
## Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + EndotrachealTube + 
##     Dialysis + ICHOperation + AdjunctiveTreatment
## 
##                       Df Deviance    AIC
## - EndotrachealTube     1   218.00 236.00
## - Dialysis             1   218.12 236.12
## <none>                     217.49 237.49
## - Age1                 1   224.05 242.05
## - ReferalRoute1        1   224.64 242.64
## - ICHOperation         1   224.92 242.92
## - SOFA1                2   231.12 247.12
## - CVC                  1   232.17 250.17
## - AdjunctiveTreatment  1   249.85 267.85
## 
## Step:  AIC=236
## Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + Dialysis + ICHOperation + 
##     AdjunctiveTreatment
## 
##                       Df Deviance    AIC
## - Dialysis             1   218.81 234.81
## <none>                     218.00 236.00
## - Age1                 1   224.16 240.16
## - ICHOperation         1   225.15 241.15
## - ReferalRoute1        1   225.84 241.84
## - SOFA1                2   233.23 247.23
## - CVC                  1   234.20 250.20
## - AdjunctiveTreatment  1   249.90 265.90
## 
## Step:  AIC=234.81
## Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + ICHOperation + 
##     AdjunctiveTreatment
## 
##                       Df Deviance    AIC
## <none>                     218.81 234.81
## - Age1                 1   224.88 238.88
## - ICHOperation         1   226.58 240.58
## - ReferalRoute1        1   226.71 240.71
## - SOFA1                2   233.23 245.23
## - CVC                  1   234.54 248.54
## - AdjunctiveTreatment  1   250.41 264.41

## 
## Call:  glm(formula = Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + 
##     ICHOperation + AdjunctiveTreatment, family = binomial, data = data)
## 
## Coefficients:
##          (Intercept)                 Age11        ReferalRoute11  
##              -4.6828                0.8812                1.1065  
##               SOFA11                SOFA12                  CVC1  
##               1.3027                2.2960                1.5223  
##        ICHOperation1  AdjunctiveTreatment1  
##               1.7313                2.9714  
## 
## Degrees of Freedom: 248 Total (i.e. Null);  241 Residual
## Null Deviance:       299.5 
## Residual Deviance: 218.8     AIC: 234.8

model_final <- glm(formula = Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + 
      ICHOperation + AdjunctiveTreatment, family = binomial, data = data)
model_final <- tbl_regression(model_final, exponentiate = TRUE)
model_final

Characteristic	OR¹	95% CI¹	p-value
Age1
0	—	—
1	2.41	1.20, 4.96	0.015
ReferalRoute1
0	—	—
1	3.02	1.38, 7.12	0.008
SOFA1
0	—	—
1	3.68	1.47, 10.7	0.009
2	9.93	2.85, 38.7	<0.001
CVC
0	—	—
1	4.58	2.11, 10.8	<0.001
ICHOperation
0	—	—
1	5.65	1.68, 20.0	0.006
AdjunctiveTreatment
0	—	—
1	19.5	6.46, 68.7	<0.001
¹ OR = Odds Ratio, CI = Confidence Interval

The backward stepwise method was used to select the final set of variables in the most fitted multivariable model, which had the lowest AIC. The final logistic regression models included six variables: age, referral route, SOFA score, central venous catheter, intracerebral haemorrhage operation and absence of adjunctive therapy. Absence of adjunctive therapy (OR 19.5, 95%CI: 6.46–68.7) and SOFA score (score 4–11 vs score 0-3: OR 3.68, 95%CI: 1.47–10.7; score ≥12 vs score 0-3: OR 9.93, 95%CI: 2.85–38.74) were the two strongest predictors of 14-day mortality in the study population. They were followed by having intracerebral haemorrhage operation (OR 5.65, 95%CI: 1.68–20.0) and having central venous catheter (OR 4.58, 95%CI: 2.11–10.8). Based on the regression coefficients and intercept, the prediction model to predict 14-day mortality of ICU patients infected with K. pneumoniae was presented as below:
Probability of 14-day mortality = 1/(1+exp-(-4.68 + 0.88 × Age ≥65 + 1.11 × Intra-hospital referral + @1.30 × SOFA score 4–11 + 2.29 × SOFA score ≥12 + 1.52 × Central venous catheter + 1.73 × Intracerebral haemorrhage surgery + 2.97 × Absence of adjunctive therapy)

2.2 Model performance assessment

# Q3: Model performance assessment ------------------------------------------------------------------------------
model_final = glm(formula = Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + 
                     ICHOperation + AdjunctiveTreatment, family = binomial, data = data)
summary(model_final)

## 
## Call:
## glm(formula = Status14 ~ Age1 + ReferalRoute1 + SOFA1 + CVC + 
##     ICHOperation + AdjunctiveTreatment, family = binomial, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8796  -0.7995  -0.3974   0.4541   2.6129  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -4.6828     0.6923  -6.764 1.34e-11 ***
## Age11                  0.8812     0.3610   2.441 0.014655 *  
## ReferalRoute11         1.1065     0.4154   2.664 0.007724 ** 
## SOFA11                 1.3027     0.4997   2.607 0.009134 ** 
## SOFA12                 2.2960     0.6604   3.477 0.000508 ***
## CVC1                   1.5223     0.4147   3.671 0.000242 ***
## ICHOperation1          1.7313     0.6257   2.767 0.005657 ** 
## AdjunctiveTreatment1   2.9714     0.5984   4.966 6.85e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 299.49  on 248  degrees of freedom
## Residual deviance: 218.81  on 241  degrees of freedom
## AIC: 234.81
## 
## Number of Fisher Scoring iterations: 5

preds = predict(model_final, type="response")
preds

##           1           2           3           4           5           6 
## 0.134962074 0.032923400 0.532444750 0.032923400 0.560220974 0.040679615 
##           7           8           9          10          11          12 
## 0.504191613 0.075937470 0.320546177 0.009168121 0.027217619 0.075937470 
##          13          14          15          16          17          18 
## 0.027217619 0.113650703 0.032923400 0.113650703 0.040679615 0.113650703 
##          19          20          21          22          23          24 
## 0.075937470 0.320546177 0.181599568 0.032923400 0.399234377 0.093335037 
##          25          26          27          28          29          30 
## 0.420016385 0.367652200 0.161268657 0.152985499 0.367652200 0.093335037 
##          31          32          33          34          35          36 
## 0.583928729 0.704083066 0.367652200 0.367652200 0.320546177 0.367652200 
##          37          38          39          40          41          42 
## 0.113650703 0.399234377 0.236357034 0.754597256 0.583928729 0.027217619 
##          43          44          45          46          47          48 
## 0.273575881 0.032923400 0.532444750 0.532444750 0.199032398 0.273575881 
##          49          50          51          52          53          54 
## 0.040679615 0.134962074 0.113650703 0.754597256 0.027217619 0.532444750 
##          55          56          57          58          59          60 
## 0.113650703 0.532444750 0.032923400 0.865441049 0.092854134 0.320546177 
##          61          62          63          64          65          66 
## 0.273575881 0.560220974 0.754597256 0.199032398 0.320546177 0.320546177 
##          67          68          69          70          71          72 
## 0.040679615 0.113650703 0.320546177 0.320546177 0.961339036 0.134962074 
##          73          74          75          76          77          78 
## 0.320546177 0.009168121 0.320546177 0.902046649 0.320546177 0.320546177 
##          79          80          81          82          83          84 
## 0.532444750 0.532444750 0.134962074 0.273575881 0.032923400 0.113650703 
##          85          86          87          88          89          90 
## 0.134962074 0.027217619 0.075937470 0.667713222 0.093335037 0.021847311 
##          91          92          93          94          95          96 
## 0.199032398 0.320546177 0.273575881 0.092854134 0.063264812 0.093335037 
##          97          98          99         100         101         102 
## 0.113650703 0.063264812 0.273575881 0.945553850 0.532444750 0.199032398 
##         103         104         105         106         107         108 
## 0.199032398 0.296408847 0.113650703 0.560220974 0.009168121 0.093335037 
##         109         110         111         112         113         114 
## 0.902046649 0.902046649 0.152985499 0.134962074 0.320546177 0.134962074 
##         115         116         117         118         119         120 
## 0.532444750 0.296408847 0.040679615 0.755177097 0.093335037 0.040679615 
##         121         122         123         124         125         126 
## 0.919022230 0.303613326 0.320546177 0.667713222 0.320546177 0.093335037 
##         127         128         129         130         131         132 
## 0.273575881 0.504191613 0.560220974 0.093335037 0.754597256 0.320546177 
##         133         134         135         136         137         138 
## 0.040679615 0.199032398 0.032923400 0.040679615 0.236357034 0.320546177 
##         139         140         141         142         143         144 
## 0.113650703 0.021847311 0.199032398 0.992097754 0.093335037 0.880258758 
##         145         146         147         148         149         150 
## 0.320546177 0.961339036 0.952038399 0.752810497 0.063264812 0.093335037 
##         151         152         153         154         155         156 
## 0.532444750 0.093335037 0.063264812 0.320546177 0.199032398 0.891579760 
##         157         158         159         160         161         162 
## 0.296408847 0.320546177 0.560220974 0.113650703 0.754597256 0.532444750 
##         163         164         165         166         167         168 
## 0.027217619 0.093335037 0.027217619 0.075937470 0.093335037 0.199032398 
##         169         170         171         172         173         174 
## 0.032923400 0.667713222 0.532444750 0.727110661 0.136458593 0.199032398 
##         175         176         177         178         179         180 
## 0.320546177 0.532444750 0.027217619 0.296408847 0.063264812 0.063264812 
##         181         182         183         184         185         186 
## 0.075937470 0.532444750 0.093335037 0.113650703 0.113650703 0.009168121 
##         187         188         189         190         191         192 
## 0.532444750 0.027217619 0.532444750 0.399234377 0.093335037 0.320546177 
##         193         194         195         196         197         198 
## 0.021847311 0.113650703 0.320546177 0.093335037 0.320546177 0.217506721 
##         199         200         201         202         203         204 
## 0.093335037 0.320546177 0.027217619 0.063264812 0.273575881 0.320546177 
##         205         206         207         208         209         210 
## 0.027217619 0.032923400 0.320546177 0.320546177 0.075937470 0.021847311 
##         211         212         213         214         215         216 
## 0.320546177 0.956950433 0.857988295 0.093335037 0.532444750 0.199032398 
##         217         218         219         220         221         222 
## 0.063264812 0.667713222 0.199032398 0.320546177 0.199032398 0.032923400 
##         223         224         225         226         227         228 
## 0.320546177 0.134962074 0.320546177 0.134962074 0.320546177 0.075937470 
##         229         230         231         232         233         234 
## 0.063264812 0.093335037 0.199032398 0.560220974 0.320546177 0.063264812 
##         235         236         237         238         239         240 
## 0.273575881 0.320546177 0.320546177 0.113650703 0.296408847 0.367652200 
##         241         242         243         244         245         246 
## 0.320546177 0.560220974 0.009168121 0.134962074 0.829074883 0.320546177 
##         247         248         249 
## 0.353233033 0.752810497 0.320546177

# --> probabilities on logit scale

#Discrimination:
roc1 <- roc(data$Status14, preds, plot=TRUE, main="ROC curve of final model", col= "red", print.auc=TRUE)

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

roc1

## 
## Call:
## roc.default(response = data$Status14, predictor = preds, plot = TRUE,     main = "ROC curve of final model", col = "red", print.auc = TRUE)
## 
## Data: preds in 177 controls (data$Status14 0) < 72 cases (data$Status14 1).
## Area under the curve: 0.8262

  ci.auc(roc1)

## 95% CI: 0.7692-0.8831 (DeLong)

Figure 1 showed the discrimination and calibration plot of the final model. The figure 1.A shows showed high predictive accuracy of the final model. The ROC area of the reduced model was 0.826 (95% confidence interval [CI], 0.7692-0.8831). It means that there was 82.6% chance that the model will be able to distinguish between presence and absence of 14-day mortality.

#Calibration:
data <- as.data.frame(data)
#data$Status14 = as.numeric(data$Status14)
plotCalibration(data,20,preds,groups=10)

## $Table_HLtest
##                  total meanpred meanobs predicted observed
## [0.00917,0.0407)    31    0.026   0.032      0.79        1
## [0.04068,0.0929)    26    0.060   0.115      1.57        3
## [0.09285,0.1137)    21    0.093   0.048      1.96        1
## [0.11365,0.1365)    26    0.122   0.115      3.17        3
## [0.13646,0.2736)    22    0.194   0.227      4.26        5
## [0.27358,0.3532)    55    0.310   0.291     17.07       16
## [0.35323,0.5602)    30    0.475   0.433     14.24       13
## [0.56022,0.7271)    14    0.605   0.714      8.46       10
## [0.72711,0.9921]    24    0.853   0.833     20.48       20
## 
## $Chi_square
## [1] 3.229
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0.9192

Figure 1.B presented good calibration as the predicted probabilities are reasonably similar to the observed probabilities across the distribution. The line represents the perfect model calibration. All spots represent 10% of the patients. Most spots were precisely within the reference line. It means that the predicted probabilities were closed to observed probabilities, which was confirmed by the Hosmer-Lemeshow test. The p-value is above alpha = 0.05 (accepted H0), so the null hypothesis that the observed and predicted risks are the same. It means that this model indicated a good calibration.

3 Simplified risk score & mortality risk

3.1 Construct a simplified risk score and Performance of 14-day mortality risk score

#4.1. . Construct a simplified risk score
data$Age1_RC=as.numeric(levels(data$Age1)[data$Age1])
data$ReferalRoute1=as.numeric(levels(data$ReferalRoute1)[data$ReferalRoute1])
data$CVC=as.numeric(levels(data$CVC)[data$CVC])
data$ICHOperation=as.numeric(levels(data$ICHOperation)[data$ICHOperation])
data$AdjunctiveTreatment=as.numeric(levels(data$AdjunctiveTreatment)[data$AdjunctiveTreatment])

data$SOFA1_RC1[data$SOFA1=="0"|data$SOFA1=="2"]=0
data$SOFA1_RC1[data$SOFA1=="1"]=1

data$SOFA1_RC2[data$SOFA1=="0"|data$SOFA1=="1"]=0
data$SOFA1_RC2[data$SOFA1=="2"]=1

data$TotalScore = data$Age1_RC + 1* data$ReferalRoute1 + 2* data$CVC + 
  2* data$ICHOperation +3* data$AdjunctiveTreatment +
  1* data$SOFA1_RC1 + 3* data$SOFA1_RC2
  #Vidualization the rík score
  risk <- data %>% 
    dplyr::select(Status14, TotalScore) %>% 
    mutate(TotalScore = as_factor(TotalScore)) %>% 
    group_by(TotalScore) %>% 
    summarise(number = n())
  
  risk1 <- data %>% 
    dplyr::select(TotalScore, Status14) %>% 
    mutate(TotalScore = as_factor(TotalScore)) %>% 
    group_by(TotalScore, Status14) %>% 
    summarise(number = n()) %>% 
    mutate(pct = number*100/sum(number)) %>% 
    filter(Status14 == "1") %>% dplyr::select(-number)

## `summarise()` has grouped output by 'TotalScore'. You can override using the
## `.groups` argument.

  risk <- risk %>% 
    left_join(risk1, by = "TotalScore")
head(risk)

data <- data %>%
    mutate(score = case_when(TotalScore < 2 ~ 'Low risk',
                             TotalScore < 4 & TotalScore >1 ~ 'Moderate risk',
                             TotalScore < 7 & TotalScore >3 ~ 'High risk',
                             TotalScore >6 ~ 'Very high risk'))
  data <- data %>% 
    mutate(score = fct_relevel(score, c("Low risk", "Moderate risk", "High risk", "Very high risk")))

   risk3 <- data %>% 
    dplyr::select(Status14, score) %>% 
    mutate(score = as_factor(score)) %>% 
    group_by(score) %>% 
    summarise(number = n())
  
  risk4 <- data %>% 
    dplyr::select(score, Status14) %>% 
    mutate(score = as_factor(score)) %>% 
    group_by(score, Status14) %>% 
    summarise(number = n()) %>% 
    mutate(pct = number*100/sum(number)) %>% 
    filter(Status14 == "1") %>% dplyr::select(-number)

## `summarise()` has grouped output by 'score'. You can override using the
## `.groups` argument.

  risk3 <- risk3 %>% 
    left_join(risk4, by = "score")
    head(risk3)

The distribution of simplified risk score was presented in Figure 2. The score ranged from 0 to 10. There were no patients with scores of 11 or 12. The simplified risk score gradually increased from 0 to 4, then decreased dramatically from 4 to 10. The majority of patients reported having a 4 score, accounting for 25.7% of all participants. Following that, 18.1% and 18.5% of patients reported having a score of 2 or 3 respectively. The higher simplified risk score reported lowest patients, specifically 8 (0.8%), 9 (1.6%), and 10 (0.4 percent). Four risk categories of simplified risk score were defined including (1) low risk (score 0–1), (2) moderate risk (score 2–3), (3) high risk (score 4–6) and (4) very high risk (score 7–12). The simplified risk score category gradually increased from low to high, then decreased dramatically from high to very high risk. The mortality risk corresponding to a specific score is measured by the number of patients who died out of the total number of patients having that score. The disctribution of the mortality risk corresponding to a specific score was presented in Figure 2. Figure 2.A showed that when the score increases from 0 to 10, the probability of 14-day mortality increases from 0 to 100%. Notably, all patients have risk score more than 7 died within 14 days of observation. The similar pattern was observed among risk categories, 14-day mortality probabilities of low risk, moderate risk, high risk and very high risk were graduated increased from 3.2% in low-risk group to 85% in very high-risk group. In conclusion, as the score rises from 0 to 10, the probability of 14-day mortality rises from 0% to 100%.

3.2 Vidualization the risk score

 ggplot(data = risk) + 
    geom_bar(aes(x = TotalScore, y = number), fill = "orange",
             stat = "identity", position="dodge") +
    geom_line(aes(x = TotalScore, y = pct)) + 
    geom_point(aes(x = TotalScore, y = pct)) + 
    theme_bw() +
    scale_y_continuous(
      
      # Features of the first axis
      name = "Number and Percentage patients with each score (n%)",
      
      # Add a second axis and specify its features
      sec.axis = sec_axis(~./1, name="Mortality risk (%)")
    )

## Warning: Removed 1 row(s) containing missing values (geom_path).

## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

## Warning: Removed 1 rows containing missing values (geom_point).

  ggplot(data = risk3) + 
    geom_bar(aes(x = score, y = number), fill = "orange",
             stat = "identity", position="dodge") +
    geom_line(aes(x = score, y = pct)) + 
    geom_point(aes(x = score, y = pct)) + 
    theme_bw() +
    scale_y_continuous(
      
      # Features of the first axis
      name = "Percentage patients with each category (n%)",
      
      # Add a second axis and specify its features
      sec.axis = sec_axis(~./1, name="Mortality risk (%)")
    )

## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

3.3 Model performance assessment

m_scr<-glm(Status14~TotalScore, family = binomial(), data = data)
tab_model(m_scr)

	Status 14
Predictors	Odds Ratios	CI	p
(Intercept)	0.02	0.01 – 0.04	<0.001
TotalScore	2.26	1.82 – 2.91	<0.001
Observations	249
R² Tjur	0.294

p_scr<-predict(m_scr, type = "response")
plotCalibration(data,which(colnames(data) == "Status14"),p_scr,groups = 10)

## $Table_HLtest
##                 total meanpred meanobs predicted observed
## [0.0159,0.0765)    31    0.032   0.032      1.00        1
## 0.0765             45    0.077   0.089      3.44        4
## 0.1579             46    0.158   0.109      7.26        5
## 0.2977             64    0.298   0.328     19.06       21
## 0.4895             29    0.490   0.517     14.20       15
## 0.6844             14    0.684   0.643      9.58        9
## [0.8307,0.9827]    20    0.873   0.850     17.46       17
## 
## $Chi_square
## [1] 1.509
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0.9926

roc_scr<-roc(data$Status14,p_scr, plot=TRUE, xlim = c(1,0))

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

lines.roc(data$Status14, preds, col="red")

## Setting levels: control = 0, case = 1
## Setting direction: controls < cases

ci.auc(roc_scr)

## 95% CI: 0.7637-0.8759 (DeLong)

roc.test(roc1,roc_scr)

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc1 and roc_scr
## Z = 0.72104, p-value = 0.4709
## alternative hypothesis: true difference in AUC is not equal to 0
## 95 percent confidence interval:
##  -0.01085367  0.02348707
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.8261535   0.8198368

Figure 3 showed the similar predictive accuracy of the the risk score model compared to the final model above. Compared with the original model, the simplified score yielded a similar predictive performance (AUC 0.82, 95% CI 0.76–0.88). This was confirmed by the DeLong’s test for two correlated ROC curves with p-value = 0.4709 > 0.05. The calibration plot in Figure 3.B showed that the spots on the right represents the higher predicted probability of mortality and a somewhat higher observed mortality in the same patients on the y-axis. This indicates that the score model was slight overestimated in those patients in the higher estimated disease probability range. However, differences between predicted and observed probabilities was not significant, which was presented in the goodness-of-fit test. The p-value is above alpha = 0.05 (accepted H0), so the null hypothesis that the observed and predicted risks are the same. It means that this model still well predicted the outcome.

A work by Khanh Huyen - 30 October 2022

khanhhuyenhnhuph@gmail.com

Assignment 3

Khanh Huyen - 30 October 2022 -