Risk Adjusted-Mortality Calculations

Author

Zabdi Hernandez

Risk-Adjusted Mortality Calculations

Dataset Design

To produce an accurate risk-adjusted mortality prediction/probability for each encounter, we consider only Inpatient-classified encounters. The dataset being loaded includes only those encounters labeled as Inpatient. Removing extra column leads us to our variables of interest:

Mortality Flag (signals an inpatient death)
Age
ICU Flag (patient assigned to ICU)
Ventilation Flag (patient utilized mechanical ventilation)
DRG or Service Line
Admission Type
CCI

The initial data structure can be seen below:

Rows: 24506 Columns: 74
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (41): encounter_id, patient_id, sex, race, ethnicity, facility, encounte...
dbl (23): age, zip3, admission_n, days_to_readmit, in_hospital_death_flag, i...
lgl (10): ed_provider_first_contact_ts, ed_decision_to_admit_ts, ed_bed_assi...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 6 × 9
  encounter_id in_hospital_death_flag   age icu_stay_flag mechanical_ventilati…¹
  <chr>                         <dbl> <dbl> <fct>         <fct>                 
1 E0011828                          0    21 0             0                     
2 E0017629                          0    21 0             0                     
3 E0050859                          0    21 0             0                     
4 E0074829                          0    54 0             0                     
5 E0006401                          0    69 0             0                     
6 E0016505                          0    69 0             0                     
# ℹ abbreviated name: ¹mechanical_ventilation_flag
# ℹ 4 more variables: ms_drg_code <fct>, service_line <fct>,
#   admission_type <fct>, cci <dbl>

Morbidity Model

We will develop a logistic regression with the binary variable Mortality Flag as the dependent/response variable.

Question: At this point, I am not sure whether service line performs a better model, or if drg code will. So I will create both models and compare their features.

##Setting Up Training 
inpatient_train <- inpatient

###Model Train using DRG
mortality_train_drg <- glm(in_hospital_death_flag ~ age + icu_stay_flag + mechanical_ventilation_flag + ms_drg_code + admission_type + cci, data = inpatient_train,
  family = binomial)

summary(mortality_train_drg)

###Model Train using Service Line
mortality_train_sl <- glm(in_hospital_death_flag ~ age + icu_stay_flag + mechanical_ventilation_flag + service_line + admission_type + cci, data = inpatient_train,
  family = binomial)

summary(mortality_train_sl)

The Two models have now been trained and appear to function on similar coefficient estimates.

Model Testing

Whereas the service line model may be preferred due to its simple interpretation for hospital management, I need to assess both models and see how they perform overall on several features: Multicolinearity (validity), Discrimination, and Predicton.

###DRG MODEL CHECKS
vif(mortality_train_drg) # Low MC
##SERVICE LINE MODEL CHECKS
vif(mortality_train_sl) ##Low MC


### DRG Discrimination
roc_drg <- roc(inpatient$in_hospital_death_flag, predict(mortality_train_drg, type="response"))

Setting levels: control = 0, case = 1

Setting direction: controls < cases

###SERVICE LINE Discrimination
roc_sl <- roc(inpatient$in_hospital_death_flag, predict(mortality_train_sl, type="response"))

Setting levels: control = 0, case = 1
Setting direction: controls < cases

##DRGPrediction
inpatient$pred_drg <- predict(mortality_train_drg, newdata = inpatient, type = "response")
inpatient |>
  mutate(decile = ntile(pred_drg, 10)) |>
  group_by(decile) |>
  summarise(
    obs = mean(in_hospital_death_flag),
    pred = mean(pred_drg),
    n = n()
  )
###Visualization for DRG Predictions
calib_drg <- inpatient |>
  mutate(decile = ntile(pred_drg, 10)) |>
  group_by(decile) |>
  summarise(obs = mean(in_hospital_death_flag),
            pred_drg = mean(pred_drg))


##SERVICE LINE Prediction
inpatient$pred_sl <- predict(mortality_train_sl, newdata = inpatient, type = "response")
inpatient |>
  mutate(decile = ntile(pred_sl, 10)) |>
  group_by(decile) |>
  summarise(
    obs = mean(in_hospital_death_flag),
    pred = mean(pred_sl),
    n = n()
  )
###Visualization for DRG Redictions:
calib_sl <- inpatient |>
  mutate(decile = ntile(pred_sl, 10)) |>
  group_by(decile) |>
  summarise(obs = mean(in_hospital_death_flag),
            pred_sl = mean(pred_sl))


###Formal Comparions of Models: 
#Comparing Descrimination: Essentially Identical Models
auc(roc_drg)
auc(roc_sl)

calib_drg$error <- abs(calib_drg$obs - calib_drg$pred_drg)
calib_sl$error  <- abs(calib_sl$obs  - calib_sl$pred_sl)

#Comparing Clibration: 
mean(calib_drg$error)#performs slightly better
mean(calib_sl$error)


#Brier Calibration
brier_drg <- mean((inpatient$in_hospital_death_flag - inpatient$pred_drg)^2)
brier_sl  <- mean((inpatient$in_hospital_death_flag - inpatient$pred_sl)^2)

brier_drg
brier_sl

Being that both models with DRG and service line performed similarly, I chose to utilize and create predictions using the DRG-based Model.

Prediction Creation for Final Dataset

inpatient$predicted_prob <- predict(mortality_train_drg, type="response")

##Trimming chart only to essentials 
inpatient <- inpatient |>
  select(encounter_id, predicted_prob)

Predictions have been imputed and will be joined to the Generalized SQL Table. The New predictions will be joined onto the original database in order to utilize as a single coherent file in SQL.