The covid-19 pandemic has challenged the health systems all over the world. In Mexico, as of April 2022, the health authorities reported 5,671,144 confirmed patients, with 323,318 deaths (overall mortality rate of 5.7%). A total of 672,987 patients needed hospitalization and 302,285 died, meaning that the in-hospital mortality rate in Mexico is 44.9% (1). In Mexico, there was wide variability in the in-hospital mortality rates of COVID-19 between hospitals and institutions. The Mexican Institute of Social Security (IMSS, after the initials in Spanish), carries the biggest burden of public health care and the highest mortality of patients hospitalized with COVID-19 (2). Prediction models for COVID-19 can help guide evidence-based clinical decision making; but because the mortality of hospitalized patients with this disease varies so widely by country and context, it is imperative to develop prognostic prediction models tailored to the reality of the location.
Some previous studies have evaluated laboratory, radiological and clinical data to develop prognostic prediction models with limited success, and some advanced diagnostic tests may not be available in low and middle income (LMIC) countries (3). Those prognostic models or scores that have been developed in Mexico to-date have used a epidemiologic dataset of all patients (outpatients and inpatients) with COVID-19 , have focused on predicting ICU admission , or for hospitalized patients relied on integration of AI with chest tomography , which is not always available for all patients is public sector hospitals. Ours is the first study to specifically examine prognostic prediction models for inpatient mortality amongst hospitalized patients in the state of Puebla Mexico in public sector hospitals. Following the recommendations of the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement (4), we aimed to develop and validate a prognostic prediction model using sociodemographic, clinical and laboratory data that can be easily obtained and are part of the usual clinical management in Mexico. This work also points out the importance of some sociodemographic variables on the outcome of COVID-19 hospitalized patients.
Once the relevant data inputs are identified for the model, it is also essential to develop a rigorous statistical approach to test the prognostic prediction model. In machine learning, decisions are taken through models built with data (5). For binary outcomes, logistic regression has been extensively used, it can be regarded as the simplest prediction model when the outcome is categorical. More recently, more complicated mathematical models have emerged. Random forests is a mathematical algorithm considered as an extension of classification and regression trees (CART). CART algorithm is popular because is simple to interpret and run, but is unstable, meaning that small changes in the data may lead to important changes in the results. This problem can be prevented by creating many trees from the same data set, and this is what the random forests algorithm does (6). The goal is to improve prediction performance with the construction of a number of individual trees randomly different through bootstrapping (7).
This is a retrospective cohort study conducted at the Hospital de Especialidades de Puebla, IMSS (HEP), and Hospital General Dr. Eduardo Vazquez Navarro (HGEVN), Mexico. The first site is a 315 bed multi-specialty hospital and a referral center for workers from the states of Puebla, Tlaxcala and Oaxaca. In March 2020 it was converted to a hybrid COVID-19 hospital. The second location is a ### bed general hospital for the general population, it mostly receives people below the porverty line, it was also converted to a COVID-19 hospital. Data for developing the model were collected from the charts of patients admitted in the HEP between April 3rd and October 17th 2020 with a clinical diagnosis of COVID-19 and a positive SARS-Cov-2 RT-PCR nasopharyngeal swab. This set will be described as the model-building data set. Patients transferred to other facilities were not included. For validation of the model, the data of an independent set of 100 patients from the IMSS hospital was prospectively collected during the month of November 2020 (nonrandom data set). In addition, the model was tested with the data of patients from the HGEVN, which belongs to the Secretaria de Salud del Estado de Puebla (external data set).
The data were registered at the moment of admission and the vital status recorded at the moment of discharge, as either alive or dead. The data gathered included sociodemographic variables, baseline comorbidities, medications, vital signs on admission and initial laboratory tests (table 1S of supplementary material (SM)). In addition, the Systematic COronary Risk Evaluation (SCORE) index was calculated, which gives the risk of developing fatal cardiovascular disease at 10 years. This score is endorsed by the European Society of Cardiology for high-risk patients and was used because of the high prevalence of cardiovascular risk factors in the Mexican population. The score considers age from 40 years, gender, systolic blood pressure, smoking, and total cholesterol level (8,9). Socioeconomic level was classified in high, medium-high, medium-low and low. For this, an ad-hoc scale was used obtaining a profile of the household which considered the educational level, current job position and the company where he or she works.
This study was approved by both institutional review boards of the HEP (No. R-2020-785-126) and HGEVN (###) as a minimal-risk research and informed consent was waived. To keep confidentiality, data were deidentified.
The statistical analysis was performed with R version 4.1.1 (10) and RStudio version 2021.09.0 (11). The logistic regression and random forests models were built using the caret package with the glm and rf method respectively, for exploratory data analysis and descriptive statistics the tidyverse and ggplot2 packages were used.
Sample size. The criteria proposed by Riley for developing a multivariable model was used (12). Using the pmsampsize package, a sample size of 512 patients for the model building data set was considered adequate. The parameters were decided as follows: a Nagelkerke R2 of 0.4 was deemed appropriate, as direct measures were used; the overall outcome proportion (mortality rate) was considered of 0.4, and 25 candidate predictor parameters for potential inclusion in the model were estimated.
Preprocessing. Once the database was curated, a bivariate analysis was performed using either χ2 , U of Mann-Witney or Student t tests for categorical , ordinal or numerical variables respectively. Features with a p < 0.01 were considered as candidates for predictors. Variables close to zero variation were excluded. Single imputation with the median was used to handle missing values.
Resampling. When building a prediction model, because the data are random, the parameters and results obtained are also random variables. If we repeat the experiment the results are going to be different. According to the law of big numbers, after a lot of repetitions, results will tend to converge. Thus, one of the most important concepts in machine learning is the use of a resampling technique to stabilize the results. Bootstrapping was used as the resampling method.
Internal validation. The data set was partitioned into a training and test sets. Training set comprised 85% of the data. During model building, accuracy was used as the performance metric.
External validation. Two separate data sets were used for model validation. In the nonrandom data set, patients were taken from the same hospital (HEP) but from a later period. This data set was formed with patients admitted in November 2020. The performance of the models were evaluated with overall accuracy, sensitivity, specificity and the harmonic average of sensitivity and specificity (F1 score). The external data set was taken from a hospital with a different population coverage (HGEVN). According to the TRIPOD statement, this design fulfill the requirements for both type 2b (validation set split by time) and type 3 (evaluation of the prediction model on separate data) analysis.
After curating the databases, a total of 641 patients were included in the model-building data set, 100 in the nonrandom data set and 107 in the external data base. Table 1S of SM shows a list of the variables included in the model-building database, together with the codification and the units employed. The baseline features of the patients used to build the models are shown in table @ref(tab:tb-blinefeat). It is worth noting that patients hospitalized because of COVID-19 tend to be males above 50 years old and overweight.
The analysis of the categorical and numerical variables grouped by the outcome are shown in table 2S and 3S respectively. A p level less than 0.01 was used to decide the predictors. Table @ref(tab:tb-varfinalmodel) summarizes the variables selected for building the prediction models. In both algorithms the resampling technique employed was simple bootstrap with 25 repetitions.
The logistic regression algorithm selected was trained with no tuning parameters. The features of the final model is presented in table @ref(tab:tb-varglm).
The random forests algorithm was trained varying the number of randomly selected predictors and the number of trees. The values with the best accuracy were employed in the final model (figures 1S and 2S of the supplementary material). The impact of the predictors in the final model is shown in @ref(tab:tb-VarImpRF).
The performance of the logistic regression and the random forests models are presented in table @ref(tab:tb-perfboth).
Variable | Value | Missing cases |
|---|---|---|
Somatic variables: | Median, (IQR), [range] | |
Age | 55, (44 - 64) [19 - 96] | 0 |
Weight | 77, (68 - 86) [39 - 136] | 3 |
Height | 1.6, (1.6 - 1.7) [1.4 - 2] | 3 |
BMI | 28.8, (25.6 - 32) [17.3 - 49.8] | 3 |
Sociodemographic variables: | No. (%) | |
Gender | 0 | |
Female | 241 (37.6) | |
Male | 400 (62.4) | |
Occupation | 1 | |
Outdoor work | 40 (6.2) | |
Health care worker | 97 (15.2) | |
Unemployed | 112 (17.5) | |
Office job | 113 (17.7) | |
Work at home | 129 (20.2) | |
Work in a public area | 149 (23.3) | |
Schooling | 34 | |
Illiteracy | 40 (6.6) | |
Primary school | 137 (22.6) | |
Junior high school | 120 (19.8) | |
High school | 150 (24.7) | |
College | 147 (24.2) | |
Postgraduate studies | 13 (2.1) | |
Socioeconomic level | 0 | |
Low, Medium-low | 449 (70) | |
Medium-high, High | 192 (30) | |
Initial symptom | 3 | |
Cough | 122 (19.1) | |
Fever | 166 (26) | |
Headache | 35 (5.5) | |
Anosmia | 2 (0.3) | |
Malaise | 143 (22.4) | |
Muscular weakness | 3 (0.5) | |
Diarrhea | 23 (3.6) | |
Rash | 2 (0.3) | |
Dyspnea | 80 (12.5) | |
Chest pain | 14 (2.2) | |
Incapability to move | 2 (0.3) | |
Sore throat | 46 (7.2) |
Variable | Missing cases | pa |
|---|---|---|
Occupation | 1 | < 0.000001 |
Age | 0 | < 0.000001 |
Respiratory rate | 4 | < 0.000001 |
Oxygen saturation | 4 | < 0.000001 |
Blood urea nitrogen | 9 | < 0.000001 |
Serum creatinine | 9 | < 0.000001 |
Blood glucose | 9 | < 0.000001 |
White blood cells | 9 | < 0.000001 |
Lymphocytes | 9 | < 0.000001 |
Neutrophils | 9 | < 0.000001 |
Potassium | 23 | < 0.000001 |
pH | 53 | < 0.000001 |
Arterial oxygen partial pressure | 53 | < 0.000001 |
Arterial carbon dioxide partial pressure | 53 | < 0.000001 |
Serum lactate dehydrogenase | 135 | < 0.000001 |
Schooling | 34 | < 0.000001 |
SCORE | 7 | < 0.000001 |
Socioeconomic level | 0 | 0.00008 |
Diabetes mellitus | 0 | 0.00203 |
D-dimer | 29 | 0.00400 |
Gender | 0 | 0.00554 |
Past smoker | 0 | 0.00612 |
Arterial pressure, diastolic | 4 | 0.00700 |
Chloride | 23 | 0.00900 |
ap value calculated with Student T test for numeric variables and χ2 for categorical variables. | ||
Predictor | Overall importance | Estimate | p value |
|---|---|---|---|
Arterial oxygen partial pressure | 100.00 | -0.020 | 0.001 |
Oxygen saturation | 95.76 | -0.047 | 0.001 |
Serum lactate dehydrogenase | 88.27 | 0.002 | 0.002 |
Arterial carbon dioxide partial pressure | 83.63 | 0.045 | 0.004 |
Blood urea nitrogen | 67.06 | 0.029 | 0.019 |
Arterial pressure, diastolic | 59.82 | -0.024 | 0.036 |
Respiratory rate | 58.65 | 0.054 | 0.039 |
Lymphocytes | 58.31 | -0.001 | 0.040 |
SCORE | 57.52 | 0.168 | 0.043 |
Socioeconomic level = Medium-high, High | 41.14 | -0.582 | 0.141 |
Age | 39.67 | 0.018 | 0.154 |
pH | 35.56 | -2.035 | 0.198 |
Occupation = Work at home | 35.21 | -0.677 | 0.202 |
Past smoker = Yes | 32.26 | 0.543 | 0.240 |
Schooling | 27.58 | -0.135 | 0.308 |
Chloride | 26.92 | -0.025 | 0.319 |
Neutrophils | 25.17 | 0.000 | 0.348 |
Serum creatinine | 23.73 | -0.112 | 0.374 |
Gender = Male | 23.44 | -0.298 | 0.379 |
White blood cells | 19.76 | 0.000 | 0.449 |
Diabetes mellitus = Yes | 17.52 | 0.239 | 0.496 |
Occupation = Outdoor work | 17.02 | 0.418 | 0.506 |
D-dimer | 15.37 | 0.000 | 0.543 |
Blood glucose | 14.69 | -0.001 | 0.558 |
Potassium | 13.54 | 0.115 | 0.584 |
Occupation = Work in public area | 6.49 | 0.146 | 0.756 |
Socioeconomic level = Medium-high, High | 0.85 | 0.063 | 0.903 |
Occupation = Office job | 0.00 | 0.045 | 0.925 |
Intercept | 18.641 | 0.144 |
Variable | Overalla |
|---|---|
Serum lactate dehydrogenase | 100.0 |
D-dimer | 94.3 |
Oxygen saturation | 90.2 |
Arterial oxygen partial pressure | 84.9 |
Neutrophils | 68.3 |
Blood urea nitrogen | 65.9 |
Arterial carbon dioxide partial pressure | 65.6 |
White blood cells | 62.3 |
pH | 56.6 |
Age | 49.9 |
Serum creatinine | 49.7 |
Lymphocytes | 48.2 |
Respiratory rate | 41.1 |
Arterial pressure, diastolic | 40.3 |
Potassium | 36.3 |
Blood glucose | 35.5 |
Schooling | 35.5 |
Chloride | 30.2 |
SCORE | 25.4 |
Sex = Male | 3.5 |
Socioeconomic level = Medium-high, High | 3.1 |
Occupation = Work in public area | 2.9 |
Occupation = Work at home | 2.5 |
Occupation = Unemployed | 1.0 |
Occupation = Office job | 0.6 |
Diabetes mellitus = Yes | 0.5 |
Occupation = Outdoor work | 0.2 |
Past smoker = Yes | 0.0 |
aNumbers represent the relative overall impact of the variables on the prediction model. | |
Metric | Internal data set | Nonrandom data set | External data set |
|---|---|---|---|
Logistic Regression Model | |||
Accuracya | 0.83 (0.74 ± 0.9) | 0.82 (0.73 ± 0.89) | 0.68 (0.59 ± 0.77) |
Sensitivity | 0.667 | 0.676 | 0.697 |
Specificity | 0.908 | 0.905 | 0.659 |
Positive predictive value | 0.769 | 0.806 | 0.767 |
Negative predictive value | 0.855 | 0.826 | 0.574 |
Random Forests Model | |||
Accuracya | 0.85 (0.77 ± 0.92) | 0.82 (0.73 ± 0.89) | 0.62 (0.52 ± 0.71) |
Sensitivity | 0.633 | 0.595 | 0.667 |
Specificity | 0.954 | 0.952 | 0.537 |
Positive predictive value | 0.864 | 0.88 | 0.698 |
Negative predictive value | 0.849 | 0.8 | 0.5 |
aValues between parentheses represent the 95% confidence interval. | |||
Code | Description | Units |
|---|---|---|
sexo | Gender | 0 = Female, 1 = Male |
ocupacion | Occupation | 1 = Health care worker, 2 = Office job, 3 = Outdoor work, 4 = Work in public area,5 = Work at home, 6 = Unemployed |
escolaridad | Schooling | 1 = Analphabet, 2 = Primary school, 3 = Junior high school, 4 = High school, 5 = College, 6 = Postgraduate studies |
nivsoc | Socioeconomic level | 0 = Low, Medium-low, 1 = Medium-high, High |
app_0 | Hypertension | 1 = Yes, 0 = No |
app_1 | Alergic rinhitis | 1 = Yes, 0 = No |
app_2 | Asthma | 1 = Yes, 0 = No |
app_3 | Conjunctivitis | 1 = Yes, 0 = No |
app_4 | Currently smoker | 1 = Yes, 0 = No |
app_5 | Past smoker | 1 = Yes, 0 = No |
app_6 | Use of medication | 1 = Yes, 0 = No |
app_7 | Use of dietary supplements | 1 = Yes, 0 = No |
app_8 | Cardiovascular disease | 1 = Yes, 0 = No |
app_9 | Diabetes mellitus | 1 = Yes, 0 = No |
app_10 | Insuline resistance | 1 = Yes, 0 = No |
app_11 | COPD | 1 = Yes, 0 = No |
app_12 | Renal disease | 1 = Yes, 0 = No |
app_13 | Cancer | 1 = Yes, 0 = No |
app_14 | Lung disease other than COPD | 1 = Yes, 0 = No |
app_15 | AIDS | 1 = Yes, 0 = No |
app_16 | Autoimmune disease | 1 = Yes, 0 = No |
app_17 | Cerebrovascular disease | 1 = Yes, 0 = No |
app_18 | Overweight | 1 = Yes, 0 = No |
app_19 | Obesity | 1 = Yes, 0 = No |
edad | Age | Years old |
peso | Weight | kg |
talla | Height | m |
imc | Body mass index | kg/m² |
temp | Temperature | Centigrades |
fc | Heart rate | Beats per minute |
fr | Respiratory rate | Cicles per minute |
tas | Arterial pressure, systolic | mm Hg |
tad | Arterial pressure, diastolic | mm Hg |
score | SCORE | Score points (0-16) |
ing_disnea | Short of breath at the moment of hospitalization | 1 = Yes, 0 = No |
sintoma1 | Prevalent symptom | 1 = Cough,2 = Fever, 3 = Headache, 4 = Anosmia, 5 = Malaise, 6 = Dizziness, 7 = Muscular weakness, 8 = Diarrhea, 9 = Ageusia, 10 = Rash, 11 = Dyspnea, 12 = Chest pain, 13 = Incapability to move, 14 = Sore throat |
sato2sin | Oxygen saturation | Percentage of Saturated hemoglobin |
urea | Blood urea | mg/dL |
bun | Blood urea nitrogen | mg/dL |
creat | Serum creatinine | mg/dL |
colesterol | Serum cholesterol | mg/dL |
gluc | Blood glucose | mg/dL |
hb | Hemoglobin concentration | mg/dL |
leucos | White blood cells | number of cells/µL |
plaq | Platelets | number of cells/µL |
linfos | Lymphocytes | number of cells/µL |
monos | Monocytes | number of cells/µL |
eos | Eosinophils | number of cells/µL |
basof | Basophils | number of cells/µL |
neutros | Neutrophils | number of cells/µL |
k | Potassium | mEq/L |
na | Sodium | mEq/L |
cl | Chloride | mEq/L |
ca | Calcium | mEq/L |
ph | pH | Units of pH |
pao2 | Arterial oxygen partial pressure | mm Hg |
paco2 | Arterial carbon dioxide partial pressure | mm Hg |
hco3 | Arterial bicarbonate | mEq/L |
dhl | Serum lactate dehydrogenase | IU/L |
alat | Serum Alanine aminotransferase | IU/L |
aat | Serum Aspartate aminotransferase | IU/L |
dimd | D-dimer | µg/mL |
diasretraso | Hospitalization delay | Days |
motivoegre | Vital status at the moment of discharge | 0 = Alive, 1 = Dead |
Variable | Value | Total | Deada | Alivea | pb |
|---|---|---|---|---|---|
Gender | |||||
Female | 241 | 60 (24.9) | 181 (75.1) | 0.006 | |
Male | 400 | 143 (35.8) | 257 (64.2) | ||
Occupation | |||||
Health care worker | 97 | 16 (16.5) | 81 (83.5) | < 0.001 | |
Office job | 113 | 22 (19.5) | 91 (80.5) | ||
Outdoor work | 40 | 20 (50) | 20 (50) | ||
Work in public area | 149 | 49 (32.9) | 100 (67.1) | ||
Work at home | 129 | 40 (31) | 89 (69) | ||
Unemployed | 112 | 56 (50) | 56 (50) | ||
Socioeconomic level | |||||
Low or medium-low | 449 | 164 (36.5) | 285 (63.5) | < 0.001 | |
Medium-high or high | 192 | 39 (20.3) | 153 (79.7) | ||
Hypertension | |||||
No | 418 | 121 (28.9) | 297 (71.1) | 0.052 | |
Yes | 223 | 82 (36.8) | 141 (63.2) | ||
Alergic rinhitis | |||||
No | 635 | 202 (31.8) | 433 (68.2) | 0.724 | |
Yes | 6 | 1 (16.7) | 5 (83.3) | ||
Insuline resistance | |||||
No | 634 | 198 (31.2) | 436 (68.8) | 0.062 | |
Yes | 7 | 5 (71.4) | 2 (28.6) | ||
COPD | |||||
No | 636 | 200 (31.4) | 436 (68.6) | 0.376 | |
Yes | 5 | 3 (60) | 2 (40) | ||
Renal disease | |||||
No | 602 | 185 (30.7) | 417 (69.3) | 0.067 | |
Yes | 39 | 18 (46.2) | 21 (53.8) | ||
Cancer | |||||
No | 622 | 194 (31.2) | 428 (68.8) | 0.214 | |
Yes | 19 | 9 (47.4) | 10 (52.6) | ||
Lung disease other than COPD | |||||
No | 636 | 202 (31.8) | 434 (68.2) | 0.936 | |
Yes | 5 | 1 (20) | 4 (80) | ||
Autoimmune disease | |||||
No | 619 | 192 (31) | 427 (69) | 0.099 | |
Yes | 22 | 11 (50) | 11 (50) | ||
Cerebrovascular disease | |||||
No | 629 | 199 (31.6) | 430 (68.4) | 1 | |
Yes | 12 | 4 (33.3) | 8 (66.7) | ||
Overweight | |||||
No | 393 | 116 (29.5) | 277 (70.5) | 0.165 | |
Yes | 248 | 87 (35.1) | 161 (64.9) | ||
Obesity | |||||
No | 385 | 123 (31.9) | 262 (68.1) | 0.921 | |
Yes | 256 | 80 (31.2) | 176 (68.8) | ||
Asthma | |||||
No | 634 | 201 (31.7) | 433 (68.3) | 1 | |
Yes | 7 | 2 (28.6) | 5 (71.4) | ||
Currently smoker | |||||
No | 573 | 177 (30.9) | 396 (69.1) | 0.274 | |
Yes | 68 | 26 (38.2) | 42 (61.8) | ||
Past smoker | |||||
No | 596 | 180 (30.2) | 416 (69.8) | 0.006 | |
Yes | 45 | 23 (51.1) | 22 (48.9) | ||
Use of medication | |||||
No | 449 | 135 (30.1) | 314 (69.9) | 0.215 | |
Yes | 192 | 68 (35.4) | 124 (64.6) | ||
Use of dietary supplements | |||||
No | 633 | 200 (31.6) | 433 (68.4) | 1 | |
Yes | 8 | 3 (37.5) | 5 (62.5) | ||
Cardiovascular disease | |||||
No | 622 | 195 (31.4) | 427 (68.6) | 0.458 | |
Yes | 19 | 8 (42.1) | 11 (57.9) | ||
Diabetes mellitus | |||||
No | 509 | 146 (28.7) | 363 (71.3) | 0.002 | |
Yes | 132 | 57 (43.2) | 75 (56.8) | ||
Short of breath | |||||
No | 49 | 8 (16.3) | 41 (83.7) | 0.025 | |
Yes | 592 | 195 (32.9) | 397 (67.1) | ||
aValues between parentheses represent the percentage for the corresponding category. | |||||
bp value calculated with χ2. | |||||
Variable | Valid cases | Status | Mean ± SD | pa |
|---|---|---|---|---|
D-dimer | 425 | Alive | 841.7 ± 2271.01 | 0.004 |
187 | Dead | 2297.73 ± 6738.31 | 0.004 | |
Arterial pressure, diastolic | 437 | Alive | 76.17 ± 10.81 | 0.007 |
200 | Dead | 73.34 ± 12.81 | 0.007 | |
Chloride | 424 | Alive | 103.74 ± 5.46 | 0.009 |
194 | Dead | 102.47 ± 5.67 | 0.009 | |
Serum Aspartate aminotransferase | 408 | Alive | 48.48 ± 37.92 | 0.01 |
185 | Dead | 73.33 ± 127.89 | 0.01 | |
Sodium | 424 | Alive | 136.67 ± 4.21 | 0.02 |
194 | Dead | 135.62 ± 5.54 | 0.02 | |
Heart rate | 437 | Alive | 97.69 ± 17.57 | 0.032 |
200 | Dead | 101.21 ± 19.89 | 0.032 | |
Blood urea | 436 | Alive | 29 ± 24.87 | 0.032 |
196 | Dead | 66.57 ± 243.04 | 0.032 | |
Monocytes | 435 | Alive | 558.24 ± 520.18 | 0.058 |
196 | Dead | 646.1 ± 544.48 | 0.058 | |
Basophils | 435 | Alive | 47.15 ± 96.2 | 0.068 |
196 | Dead | 65.49 ± 124.59 | 0.068 | |
Days of delay | 438 | Alive | 8.13 ± 5.01 | 0.131 |
203 | Dead | 7.51 ± 4.76 | 0.131 | |
Hemoglobin concentration | 436 | Alive | 14.4 ± 1.89 | 0.151 |
196 | Dead | 14.14 ± 2.23 | 0.151 | |
Height | 438 | Alive | 1.64 ± 0.1 | 0.312 |
200 | Dead | 1.63 ± 0.09 | 0.312 | |
Arterial pressure, systolic | 437 | Alive | 123.8 ± 19.36 | 0.316 |
199 | Dead | 125.86 ± 25.8 | 0.316 | |
Body mass index | 438 | Alive | 29.13 ± 5.13 | 0.323 |
200 | Dead | 29.57 ± 5.22 | 0.323 | |
Arterial bicarbonate | 395 | Alive | 18.01 ± 3.84 | 0.422 |
193 | Dead | 17.68 ± 5.05 | 0.422 | |
Serum Alanine aminotransferase | 406 | Alive | 52.04 ± 40.41 | 0.523 |
179 | Dead | 56.06 ± 79.66 | 0.523 | |
Calcium | 33 | Alive | 7.25 ± 2.41 | 0.572 |
12 | Dead | 7.51 ± 0.52 | 0.572 | |
Serum cholesterol | 436 | Alive | 141.87 ± 52.71 | 0.593 |
201 | Dead | 139.68 ± 45.73 | 0.593 | |
Weight | 438 | Alive | 78 ± 15.48 | 0.662 |
200 | Dead | 78.6 ± 16.26 | 0.662 | |
Platelets | 436 | Alive | 273077.75 ± 111349.37 | 0.685 |
196 | Dead | 268900.51 ± 123348.71 | 0.685 | |
Temperature | 437 | Alive | 37.06 ± 0.82 | 0.774 |
200 | Dead | 37.09 ± 0.93 | 0.774 | |
Blood urea nitrogen | 436 | Alive | 13.72 ± 12.03 | < 0.001 |
196 | Dead | 23 ± 19.35 | < 0.001 | |
Serum creatinine | 436 | Alive | 1.01 ± 1.1 | < 0.001 |
196 | Dead | 1.57 ± 1.87 | < 0.001 | |
Serum lactate dehydrogenase | 355 | Alive | 371.35 ± 200.29 | < 0.001 |
151 | Dead | 565.43 ± 250.71 | < 0.001 | |
Age | 438 | Alive | 51.55 ± 13.87 | < 0.001 |
203 | Dead | 60.31 ± 13.17 | < 0.001 | |
Schooling | 411 | Alive | 3.64 ± 1.25 | < 0.001 |
196 | Dead | 3.02 ± 1.34 | < 0.001 | |
Respiratory rate | 437 | Alive | 24.46 ± 4.47 | < 0.001 |
200 | Dead | 26.48 ± 5.42 | < 0.001 | |
Blood glucose | 436 | Alive | 142.09 ± 82.07 | < 0.001 |
196 | Dead | 173.9 ± 101.57 | < 0.001 | |
Potassium | 424 | Alive | 4.01 ± 0.55 | < 0.001 |
194 | Dead | 4.34 ± 0.84 | < 0.001 | |
White blood cells | 436 | Alive | 9088.81 ± 4586.03 | < 0.001 |
196 | Dead | 12463.13 ± 6413.31 | < 0.001 | |
Lymphocytes | 436 | Alive | 1021.38 ± 510.79 | < 0.001 |
196 | Dead | 836.01 ± 463.22 | < 0.001 | |
Neutrophils | 436 | Alive | 7439.26 ± 4360.39 | < 0.001 |
196 | Dead | 10593.56 ± 5988.25 | < 0.001 | |
Arterial carbon dioxide partial pressure | 395 | Alive | 27.28 ± 7.33 | < 0.001 |
193 | Dead | 32.24 ± 15.14 | < 0.001 | |
Arterial oxygen partial pressure | 395 | Alive | 74.37 ± 29.15 | < 0.001 |
193 | Dead | 58.2 ± 22.73 | < 0.001 | |
pH | 395 | Alive | 7.43 ± 0.07 | < 0.001 |
193 | Dead | 7.36 ± 0.15 | < 0.001 | |
Oxygen saturation | 437 | Alive | 89.64 ± 6.4 | < 0.001 |
200 | Dead | 83.01 ± 12.11 | < 0.001 | |
Severity score | 435 | Alive | 1.24 ± 1.75 | < 0.001 |
199 | Dead | 2.56 ± 2.9 | < 0.001 | |
ap value calculated with Student t test, except for schooling, number of comorbidities and severity score, where the U Mann-Whitney test was used. | ||||
Random forests model. Effect of the number of randomly selected predictors on accuracy.
effect of the number of trees on the stabilization of the out of bag error.