To produce an accurate risk-adjusted mortality prediction/probability for each encounter, we consider only Inpatient-classified encounters. The dataset being loaded includes only those encounters labeled as Inpatient. Removing extra column leads us to our variables of interest:
Mortality Flag (signals an inpatient death)
Age
ICU Flag (patient assigned to ICU)
Ventilation Flag (patient utilized mechanical ventilation)
DRG or Service Line
Admission Type
CCI
The initial data structure can be seen below:
Rows: 24506 Columns: 74
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (41): encounter_id, patient_id, sex, race, ethnicity, facility, encounte...
dbl (23): age, zip3, admission_n, days_to_readmit, in_hospital_death_flag, i...
lgl (10): ed_provider_first_contact_ts, ed_decision_to_admit_ts, ed_bed_assi...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We will develop a logistic regression with the binary variable Mortality Flag as the dependent/response variable.
Question: At this point, I am not sure whether service line performs a better model, or if drg code will. So I will create both models and compare their features.
##Setting Up Training inpatient_train <- inpatient###Model Train using DRGmortality_train_drg <-glm(in_hospital_death_flag ~ age + icu_stay_flag + mechanical_ventilation_flag + ms_drg_code + admission_type + cci, data = inpatient_train,family = binomial)summary(mortality_train_drg)###Model Train using Service Linemortality_train_sl <-glm(in_hospital_death_flag ~ age + icu_stay_flag + mechanical_ventilation_flag + service_line + admission_type + cci, data = inpatient_train,family = binomial)summary(mortality_train_sl)
The Two models have now been trained and appear to function on similar coefficient estimates.
Model Testing
Whereas the service line model may be preferred due to its simple interpretation for hospital management, I need to assess both models and see how they perform overall on several features: Multicolinearity (validity), Discrimination, and Predicton.
###DRG MODEL CHECKSvif(mortality_train_drg) # Low MC##SERVICE LINE MODEL CHECKSvif(mortality_train_sl) ##Low MC### DRG Discriminationroc_drg <-roc(inpatient$in_hospital_death_flag, predict(mortality_train_drg, type="response"))
Setting levels: control = 0, case = 1
Setting direction: controls < cases
###SERVICE LINE Discriminationroc_sl <-roc(inpatient$in_hospital_death_flag, predict(mortality_train_sl, type="response"))
Setting levels: control = 0, case = 1
Setting direction: controls < cases
Being that both models with DRG and service line performed similarly, I chose to utilize and create predictions using the DRG-based Model.
Prediction Creation for Final Dataset
inpatient$predicted_prob <-predict(mortality_train_drg, type="response")##Trimming chart only to essentials inpatient <- inpatient |>select(encounter_id, predicted_prob)
Predictions have been imputed and will be joined to the Generalized SQL Table. The New predictions will be joined onto the original database in order to utilize as a single coherent file in SQL.