Project Significance
1. Urgent need to help frontline clinicians to effectively triage patients
2. The limited healthcare resources and the increased demand for care
Step 1: Install and Load the R packages
#install.packages("caret")
#install.packages("pscl")
#install.packages("Hmisc")
library(readxl)
library(caret) #helps us to split our data into training and testing sets
## Loading required package: ggplot2
## Loading required package: lattice
library(pscl) #gets the pseudo R-square for logistic regression
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
library(Hmisc) #used to find the correlation and its p-values
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
Step 2: Import & summarize the data
hospital_df <- read_excel("Hospital Data.xlsx")
head(hospital_df)
## # A tibble: 6 × 21
## ICU_Admit ESI Age Sex SBP HR Temp RR Spo2 qSOFA BMI MI
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 3 27 Female 125 115 102. 18 99 0 18.9 0
## 2 0 2 44 Male 92 97 103. 20 97 1 29.6 0
## 3 0 2 39 Male 109 100 103. 44 97 1 35.9 0
## 4 0 2 46 Female 113 98 98.7 18 96 0 41.4 0
## 5 0 2 34 Female 109 101 99.4 24 98 1 22.6 0
## 6 0 3 69 Female 152 113 99.4 16 97 0 29.3 0
## # ℹ 9 more variables: CHF <dbl>, Stroke <dbl>, DM <dbl>, CKD <dbl>,
## # Cancer <dbl>, Asthma <dbl>, HTN <dbl>, LowIncome <dbl>, obese <dbl>
summary(hospital_df)
## ICU_Admit ESI Age Sex
## Min. :0.0000 Min. :1.000 Min. : 17.00 Length:1175
## 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.: 47.00 Class :character
## Median :0.0000 Median :2.000 Median : 60.00 Mode :character
## Mean :0.1047 Mean :2.205 Mean : 57.85
## 3rd Qu.:0.0000 3rd Qu.:3.000 3rd Qu.: 70.00
## Max. :1.0000 Max. :4.000 Max. :103.00
## NA's :10
## SBP HR Temp RR
## Min. : 65 Min. : 12.00 Min. : 93.90 Min. : 10.00
## 1st Qu.:115 1st Qu.: 86.00 1st Qu.: 98.20 1st Qu.: 18.00
## Median :130 Median : 99.00 Median : 99.00 Median : 20.00
## Mean :132 Mean : 98.68 Mean : 99.34 Mean : 22.11
## 3rd Qu.:146 3rd Qu.:111.00 3rd Qu.:100.40 3rd Qu.: 24.00
## Max. :251 Max. :176.00 Max. :103.80 Max. :135.00
## NA's :1 NA's :7 NA's :1
## Spo2 qSOFA BMI MI
## Min. : 2.00 Min. :0.0000 Min. :13.68 Min. :0.00000
## 1st Qu.: 94.00 1st Qu.:0.0000 1st Qu.:25.84 1st Qu.:0.00000
## Median : 96.00 Median :0.0000 Median :31.01 Median :0.00000
## Mean : 95.17 Mean :0.5685 Mean :32.27 Mean :0.07745
## 3rd Qu.: 98.00 3rd Qu.:1.0000 3rd Qu.:36.81 3rd Qu.:0.00000
## Max. :100.00 Max. :3.0000 Max. :78.38 Max. :1.00000
## NA's :2 NA's :2
## CHF Stroke DM CKD
## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.00000 Median :0.0000 Median :0.0000
## Mean :0.1753 Mean :0.08766 Mean :0.4085 Mean :0.2477
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000
##
## Cancer Asthma HTN LowIncome
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000
## Median :0.00000 Median :0.0000 Median :1.0000 Median :1.0000
## Mean :0.08255 Mean :0.1072 Mean :0.6153 Mean :0.8817
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
##
## obese
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.5434
## 3rd Qu.:1.0000
## Max. :1.0000
## NA's :1
#Note: Make sure your excel file and R markdown file are in the same folder
Data Description: A description of some of the features are presented in the table below.
Variable | Definition
------------- | -------------
1. ICU Admit | patient admitted to the ICU or not
2. ESI | emergency severity index
3. SBP | systolic blood pressure
4. HR | heart rate
5. Temp | temperature
Step 3: Data visualization
# Visualizing the target variable (i.e., ICU Admit) using a column chart
counts <- table(hospital_df$ICU_Admit)
barplot(counts)

# Interpretation: Based on the column chart, majority of the patients were not admitted
# Meaning of the target variables
# 0: Do not admit
# 1: Admit
Step 4: Feature engineering - pre-processing the data
# We count the number of missing values
colSums(is.na(hospital_df))
## ICU_Admit ESI Age Sex SBP HR Temp RR
## 0 10 0 1 1 0 7 1
## Spo2 qSOFA BMI MI CHF Stroke DM CKD
## 2 0 2 0 0 0 0 0
## Cancer Asthma HTN LowIncome obese
## 0 0 0 0 1
# Interpretation: Eight of the variables have missing values.
# We use the na.omit function to drop all the rows with missing values
hosp_df <- na.omit(hospital_df)
summary(hosp_df)
## ICU_Admit ESI Age Sex
## Min. :0.0000 Min. :1.000 Min. : 17.00 Length:1152
## 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.: 47.00 Class :character
## Median :0.0000 Median :2.000 Median : 60.00 Mode :character
## Mean :0.1033 Mean :2.205 Mean : 57.86
## 3rd Qu.:0.0000 3rd Qu.:3.000 3rd Qu.: 70.00
## Max. :1.0000 Max. :4.000 Max. :103.00
## SBP HR Temp RR
## Min. : 65.0 Min. : 12.00 Min. : 93.90 Min. : 10.00
## 1st Qu.:115.0 1st Qu.: 86.00 1st Qu.: 98.20 1st Qu.: 18.00
## Median :129.5 Median : 99.00 Median : 99.00 Median : 20.00
## Mean :132.0 Mean : 98.76 Mean : 99.35 Mean : 22.06
## 3rd Qu.:146.2 3rd Qu.:111.00 3rd Qu.:100.40 3rd Qu.: 24.00
## Max. :251.0 Max. :176.00 Max. :103.80 Max. :135.00
## Spo2 qSOFA BMI MI
## Min. : 2.00 Min. :0.0000 Min. :13.68 Min. :0.00000
## 1st Qu.: 94.00 1st Qu.:0.0000 1st Qu.:25.84 1st Qu.:0.00000
## Median : 96.00 Median :0.0000 Median :31.00 Median :0.00000
## Mean : 95.18 Mean :0.5668 Mean :32.25 Mean :0.07899
## 3rd Qu.: 98.00 3rd Qu.:1.0000 3rd Qu.:36.81 3rd Qu.:0.00000
## Max. :100.00 Max. :3.0000 Max. :78.38 Max. :1.00000
## CHF Stroke DM CKD
## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.00
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00
## Median :0.0000 Median :0.00000 Median :0.0000 Median :0.00
## Mean :0.1762 Mean :0.08681 Mean :0.4106 Mean :0.25
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:0.25
## Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.00
## Cancer Asthma HTN LowIncome
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000
## Median :0.00000 Median :0.0000 Median :1.0000 Median :1.0000
## Mean :0.08333 Mean :0.1094 Mean :0.6155 Mean :0.8819
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## obese
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.5417
## 3rd Qu.:1.0000
## Max. :1.0000
# Interpretation: The median age for patients in our dataset is 60 years
# Create dummy or indicator variables for the patient sex
hosp_df$Sex <- ifelse(hosp_df$Sex == 'Male', 1, 0)
head(hosp_df) #Outputs a snapshot of our new columns. Compare the "Sex" column with the one shown in Step 2 to see the difference.
## # A tibble: 6 × 21
## ICU_Admit ESI Age Sex SBP HR Temp RR Spo2 qSOFA BMI MI
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 3 27 0 125 115 102. 18 99 0 18.9 0
## 2 0 2 44 1 92 97 103. 20 97 1 29.6 0
## 3 0 2 39 1 109 100 103. 44 97 1 35.9 0
## 4 0 2 46 0 113 98 98.7 18 96 0 41.4 0
## 5 0 2 34 0 109 101 99.4 24 98 1 22.6 0
## 6 0 3 69 0 152 113 99.4 16 97 0 29.3 0
## # ℹ 9 more variables: CHF <dbl>, Stroke <dbl>, DM <dbl>, CKD <dbl>,
## # Cancer <dbl>, Asthma <dbl>, HTN <dbl>, LowIncome <dbl>, obese <dbl>
Step 5: Feature selection - identifying the contributing variables
using correlation
# Correlation analysis - shows the relationship between the target and independent variables
corr <- rcorr(as.matrix(hosp_df)) # We use the rcorr function for correlation analysis
corr #Outputs the correlation results
## ICU_Admit ESI Age Sex SBP HR Temp RR Spo2 qSOFA BMI
## ICU_Admit 1.00 -0.31 0.12 0.09 -0.02 0.06 -0.07 0.34 -0.32 0.36 -0.03
## ESI -0.31 1.00 -0.25 -0.09 -0.04 -0.13 0.00 -0.31 0.23 -0.40 0.04
## Age 0.12 -0.25 1.00 0.09 -0.03 -0.19 -0.11 0.16 -0.17 0.33 -0.26
## Sex 0.09 -0.09 0.09 1.00 -0.04 -0.07 -0.03 0.04 -0.05 0.07 -0.16
## SBP -0.02 -0.04 -0.03 -0.04 1.00 0.09 0.05 -0.01 0.04 -0.23 0.14
## HR 0.06 -0.13 -0.19 -0.07 0.09 1.00 0.22 0.13 -0.12 0.09 0.14
## Temp -0.07 0.00 -0.11 -0.03 0.05 0.22 1.00 0.00 -0.03 -0.07 0.19
## RR 0.34 -0.31 0.16 0.04 -0.01 0.13 0.00 1.00 -0.31 0.55 0.01
## Spo2 -0.32 0.23 -0.17 -0.05 0.04 -0.12 -0.03 -0.31 1.00 -0.27 -0.04
## qSOFA 0.36 -0.40 0.33 0.07 -0.23 0.09 -0.07 0.55 -0.27 1.00 -0.14
## BMI -0.03 0.04 -0.26 -0.16 0.14 0.14 0.19 0.01 -0.04 -0.14 1.00
## MI 0.11 -0.10 0.15 0.08 -0.01 -0.13 -0.07 0.01 0.02 0.09 -0.04
## CHF 0.03 -0.10 0.26 0.04 -0.01 -0.15 0.01 0.00 -0.05 0.12 0.02
## Stroke 0.11 -0.16 0.21 0.04 0.06 -0.04 0.00 0.07 -0.02 0.20 -0.14
## DM 0.15 -0.13 0.23 0.05 0.02 -0.02 0.01 0.05 -0.08 0.11 0.08
## CKD 0.10 -0.14 0.32 0.12 0.02 -0.12 -0.03 0.01 -0.08 0.10 -0.03
## Cancer 0.03 -0.08 0.17 0.08 -0.08 -0.03 -0.06 -0.01 -0.06 0.04 -0.11
## Asthma -0.06 0.01 -0.09 -0.13 -0.02 0.10 0.04 -0.02 -0.01 -0.05 0.18
## HTN 0.10 -0.18 0.48 0.04 0.10 -0.02 -0.03 0.07 -0.12 0.17 0.03
## LowIncome 0.02 -0.08 0.04 -0.04 0.00 -0.02 0.00 0.01 -0.01 0.01 0.08
## obese 0.00 0.05 -0.20 -0.17 0.07 0.15 0.13 0.01 -0.04 -0.13 0.74
## MI CHF Stroke DM CKD Cancer Asthma HTN LowIncome obese
## ICU_Admit 0.11 0.03 0.11 0.15 0.10 0.03 -0.06 0.10 0.02 0.00
## ESI -0.10 -0.10 -0.16 -0.13 -0.14 -0.08 0.01 -0.18 -0.08 0.05
## Age 0.15 0.26 0.21 0.23 0.32 0.17 -0.09 0.48 0.04 -0.20
## Sex 0.08 0.04 0.04 0.05 0.12 0.08 -0.13 0.04 -0.04 -0.17
## SBP -0.01 -0.01 0.06 0.02 0.02 -0.08 -0.02 0.10 0.00 0.07
## HR -0.13 -0.15 -0.04 -0.02 -0.12 -0.03 0.10 -0.02 -0.02 0.15
## Temp -0.07 0.01 0.00 0.01 -0.03 -0.06 0.04 -0.03 0.00 0.13
## RR 0.01 0.00 0.07 0.05 0.01 -0.01 -0.02 0.07 0.01 0.01
## Spo2 0.02 -0.05 -0.02 -0.08 -0.08 -0.06 -0.01 -0.12 -0.01 -0.04
## qSOFA 0.09 0.12 0.20 0.11 0.10 0.04 -0.05 0.17 0.01 -0.13
## BMI -0.04 0.02 -0.14 0.08 -0.03 -0.11 0.18 0.03 0.08 0.74
## MI 1.00 0.26 0.22 0.12 0.22 0.10 -0.02 0.15 0.06 0.00
## CHF 0.26 1.00 0.16 0.21 0.35 0.12 0.04 0.28 0.06 -0.03
## Stroke 0.22 0.16 1.00 0.12 0.18 0.01 -0.01 0.20 0.06 -0.14
## DM 0.12 0.21 0.12 1.00 0.30 0.10 0.00 0.36 0.11 0.05
## CKD 0.22 0.35 0.18 0.30 1.00 0.15 0.00 0.37 0.05 -0.04
## Cancer 0.10 0.12 0.01 0.10 0.15 1.00 0.01 0.12 -0.02 -0.10
## Asthma -0.02 0.04 -0.01 0.00 0.00 0.01 1.00 0.05 0.05 0.14
## HTN 0.15 0.28 0.20 0.36 0.37 0.12 0.05 1.00 0.09 0.03
## LowIncome 0.06 0.06 0.06 0.11 0.05 -0.02 0.05 0.09 1.00 0.07
## obese 0.00 -0.03 -0.14 0.05 -0.04 -0.10 0.14 0.03 0.07 1.00
##
## n= 1152
##
##
## P
## ICU_Admit ESI Age Sex SBP HR Temp RR Spo2
## ICU_Admit 0.0000 0.0000 0.0020 0.4564 0.0502 0.0161 0.0000 0.0000
## ESI 0.0000 0.0000 0.0016 0.1995 0.0000 0.9360 0.0000 0.0000
## Age 0.0000 0.0000 0.0024 0.2500 0.0000 0.0002 0.0000 0.0000
## Sex 0.0020 0.0016 0.0024 0.1693 0.0138 0.2836 0.1300 0.0717
## SBP 0.4564 0.1995 0.2500 0.1693 0.0038 0.1035 0.7642 0.1526
## HR 0.0502 0.0000 0.0000 0.0138 0.0038 0.0000 0.0000 0.0000
## Temp 0.0161 0.9360 0.0002 0.2836 0.1035 0.0000 0.9415 0.2548
## RR 0.0000 0.0000 0.0000 0.1300 0.7642 0.0000 0.9415 0.0000
## Spo2 0.0000 0.0000 0.0000 0.0717 0.1526 0.0000 0.2548 0.0000
## qSOFA 0.0000 0.0000 0.0000 0.0109 0.0000 0.0020 0.0270 0.0000 0.0000
## BMI 0.3904 0.1818 0.0000 0.0000 0.0000 0.0000 0.0000 0.7369 0.2343
## MI 0.0001 0.0012 0.0000 0.0058 0.6330 0.0000 0.0126 0.7453 0.4941
## CHF 0.3062 0.0012 0.0000 0.2181 0.7854 0.0000 0.7370 0.8740 0.1146
## Stroke 0.0002 0.0000 0.0000 0.1311 0.0310 0.1300 0.9121 0.0175 0.4142
## DM 0.0000 0.0000 0.0000 0.0678 0.5954 0.4831 0.6349 0.0697 0.0052
## CKD 0.0006 0.0000 0.0000 0.0000 0.5186 0.0000 0.3616 0.6653 0.0040
## Cancer 0.2806 0.0054 0.0000 0.0050 0.0100 0.2609 0.0277 0.7825 0.0411
## Asthma 0.0296 0.8420 0.0032 0.0000 0.4546 0.0009 0.1422 0.4299 0.7748
## HTN 0.0004 0.0000 0.0000 0.1423 0.0009 0.4165 0.2991 0.0223 0.0000
## LowIncome 0.5392 0.0087 0.1657 0.1624 0.8919 0.4223 0.9653 0.7010 0.6577
## obese 0.9291 0.0890 0.0000 0.0000 0.0195 0.0000 0.0000 0.6878 0.1858
## qSOFA BMI MI CHF Stroke DM CKD Cancer Asthma HTN
## ICU_Admit 0.0000 0.3904 0.0001 0.3062 0.0002 0.0000 0.0006 0.2806 0.0296 0.0004
## ESI 0.0000 0.1818 0.0012 0.0012 0.0000 0.0000 0.0000 0.0054 0.8420 0.0000
## Age 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0032 0.0000
## Sex 0.0109 0.0000 0.0058 0.2181 0.1311 0.0678 0.0000 0.0050 0.0000 0.1423
## SBP 0.0000 0.0000 0.6330 0.7854 0.0310 0.5954 0.5186 0.0100 0.4546 0.0009
## HR 0.0020 0.0000 0.0000 0.0000 0.1300 0.4831 0.0000 0.2609 0.0009 0.4165
## Temp 0.0270 0.0000 0.0126 0.7370 0.9121 0.6349 0.3616 0.0277 0.1422 0.2991
## RR 0.0000 0.7369 0.7453 0.8740 0.0175 0.0697 0.6653 0.7825 0.4299 0.0223
## Spo2 0.0000 0.2343 0.4941 0.1146 0.4142 0.0052 0.0040 0.0411 0.7748 0.0000
## qSOFA 0.0000 0.0015 0.0000 0.0000 0.0002 0.0011 0.1477 0.0725 0.0000
## BMI 0.0000 0.1375 0.5799 0.0000 0.0076 0.3161 0.0001 0.0000 0.2951
## MI 0.0015 0.1375 0.0000 0.0000 0.0000 0.0000 0.0009 0.4947 0.0000
## CHF 0.0000 0.5799 0.0000 0.0000 0.0000 0.0000 0.0000 0.2350 0.0000
## Stroke 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8009 0.7535 0.0000
## DM 0.0002 0.0076 0.0000 0.0000 0.0000 0.0000 0.0007 0.8881 0.0000
## CKD 0.0011 0.3161 0.0000 0.0000 0.0000 0.0000 0.0000 0.9133 0.0000
## Cancer 0.1477 0.0001 0.0009 0.0000 0.8009 0.0007 0.0000 0.8645 0.0000
## Asthma 0.0725 0.0000 0.4947 0.2350 0.7535 0.8881 0.9133 0.8645 0.1011
## HTN 0.0000 0.2951 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1011
## LowIncome 0.6894 0.0061 0.0519 0.0563 0.0598 0.0002 0.0918 0.5823 0.0858 0.0032
## obese 0.0000 0.0000 0.9491 0.3555 0.0000 0.0755 0.1333 0.0006 0.0000 0.2766
## LowIncome obese
## ICU_Admit 0.5392 0.9291
## ESI 0.0087 0.0890
## Age 0.1657 0.0000
## Sex 0.1624 0.0000
## SBP 0.8919 0.0195
## HR 0.4223 0.0000
## Temp 0.9653 0.0000
## RR 0.7010 0.6878
## Spo2 0.6577 0.1858
## qSOFA 0.6894 0.0000
## BMI 0.0061 0.0000
## MI 0.0519 0.9491
## CHF 0.0563 0.3555
## Stroke 0.0598 0.0000
## DM 0.0002 0.0755
## CKD 0.0918 0.1333
## Cancer 0.5823 0.0006
## Asthma 0.0858 0.0000
## HTN 0.0032 0.2766
## LowIncome 0.0203
## obese 0.0203
# Drop all the columns with p-value > 0.05 and store the data in a new data frame
hosp_df2 <- subset(hosp_df, select = -c(SBP, HR, BMI, CHF, Cancer, LowIncome, obese))
hosp_df2
## # A tibble: 1,152 × 14
## ICU_Admit ESI Age Sex Temp RR Spo2 qSOFA MI Stroke DM CKD
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 3 27 0 102. 18 99 0 0 0 0 0
## 2 0 2 44 1 103. 20 97 1 0 0 0 0
## 3 0 2 39 1 103. 44 97 1 0 0 0 0
## 4 0 2 46 0 98.7 18 96 0 0 0 0 0
## 5 0 2 34 0 99.4 24 98 1 0 0 0 0
## 6 0 3 69 0 99.4 16 97 0 0 0 0 0
## 7 0 2 51 0 101. 18 100 0 0 0 0 0
## 8 0 2 59 1 98.5 16 95 1 0 0 0 0
## 9 0 3 34 1 98.8 18 100 0 0 0 0 0
## 10 0 2 78 1 103. 24 96 1 1 0 1 1
## # ℹ 1,142 more rows
## # ℹ 2 more variables: Asthma <dbl>, HTN <dbl>
# Interpretation: The data set is reduced from 21 features to 14 features
Step 6: Logistics Regression Model building
# Splitting the data into training and testing sets
set.seed(3456)
trainIndex <- createDataPartition(hosp_df2$ICU_Admit, p = .70, list = FALSE, times = 1)
Train <- hosp_df2[ trainIndex,] #We use 70% of the data to train the model
Test <- hosp_df2[-trainIndex,] #The remaining 30% is used to text the model's performance
# First model - all the significant variables after correlation analysis.
model <- glm(ICU_Admit ~ ESI + Age + Sex + Temp + RR + Spo2 + qSOFA + MI + Stroke + DM + CKD + Asthma + HTN, data = Train, family = binomial)
summary(model)
##
## Call:
## glm(formula = ICU_Admit ~ ESI + Age + Sex + Temp + RR + Spo2 +
## qSOFA + MI + Stroke + DM + CKD + Asthma + HTN, family = binomial,
## data = Train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 23.77633 9.27732 2.563 0.010382 *
## ESI -0.87551 0.29013 -3.018 0.002547 **
## Age -0.01519 0.01049 -1.448 0.147561
## Sex 0.11622 0.28752 0.404 0.686063
## Temp -0.18126 0.08610 -2.105 0.035260 *
## RR 0.04752 0.01494 3.182 0.001463 **
## Spo2 -0.08767 0.02351 -3.729 0.000192 ***
## qSOFA 0.68665 0.22189 3.095 0.001971 **
## MI 0.39902 0.43297 0.922 0.356753
## Stroke 0.42227 0.40106 1.053 0.292393
## DM 0.73881 0.30939 2.388 0.016943 *
## CKD 0.18152 0.31826 0.570 0.568435
## Asthma -0.68316 0.57291 -1.192 0.233095
## HTN 0.73034 0.41132 1.776 0.075802 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 508.23 on 806 degrees of freedom
## Residual deviance: 362.39 on 793 degrees of freedom
## AIC: 390.39
##
## Number of Fisher Scoring iterations: 6
# First model results: Age, Sex, Myocardial infarction (MI), Stroke, Chronic kidney disease (CKD), Asthma, and Hypertension (HTN) are insignificant variables (p > 0.05).
# Second model - we drop all the insignificant variables and rebuild the model
model2 <- glm(ICU_Admit ~ ESI + Temp + RR + Spo2 + qSOFA + DM, data = Train, family = binomial)
summary(model2)
##
## Call:
## glm(formula = ICU_Admit ~ ESI + Temp + RR + Spo2 + qSOFA + DM,
## family = binomial, data = Train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 22.49099 8.74746 2.571 0.010136 *
## ESI -0.96206 0.28494 -3.376 0.000735 ***
## Temp -0.17718 0.08359 -2.120 0.034030 *
## RR 0.04398 0.01469 2.994 0.002755 **
## Spo2 -0.07904 0.02195 -3.601 0.000317 ***
## qSOFA 0.68941 0.20579 3.350 0.000808 ***
## DM 0.95479 0.28853 3.309 0.000936 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 508.23 on 806 degrees of freedom
## Residual deviance: 372.62 on 800 degrees of freedom
## AIC: 386.62
##
## Number of Fisher Scoring iterations: 6
# Interpretation: All the variables are signifcant (p < 0.05)
Result Interpretation
1. Emergency severity index (ESI): having a severity index of 2, versus a severity index of 1, changes the log odds of ICU admission by -0.96.
2. Respiratory rate (RR): for every one unit change in patient RR, the log odds of ICU admission (versus non-admission) increases by 0.04.
3. Percent Oxygen (Spo2): for a one unit increase in Spo2, the log odds of being admitted to the ICU decreases by -0.08.
4. Quick sepsis related organ failure assessment (qSOFA): having an organ failure 1, versus not having an organ failure 0, changes the log odds of ICU admission by 0.69.
5. Diabetes Mellitus (DM): a patient diagnosed with diabetes 1, versus no diabetes 0, changes the log odds of ICU admission by 0.95.
Project Conclusion
1. The significant variables that impact patient admission in the ICU includes the following: patient emergency severity index (ESI), patient's temperature, respiratory rate (RR), percent oxygenc (Spo2), quick sepsis related organ failure assessment (qSOFA), and diabetes melitus (DM).
2. The developed model's performance on the test data was McFadden R-square of 0.27 and accuracy of 89.8%.
3. The confusion matrix indicates that the logistic regression model is quite good at identifying patients who don't need ICU care but struggles more with accurately identifying those who do. The low number of True Positives compared to False Negatives suggests that while the model is conservative in predicting ICU admissions, it might miss identifying some patients who actually need ICU care (as seen in the False Negatives = 31). The low number of False Positives indicates that when the model predicts ICU admission, it's generally accurate, but there's room for improvement in catching more of the true ICU admission cases.