Variable analysis COVID-19 Imaging database

In this document I wanted to have a quick look at variables that we decided to collect in the remaining patients, focusing on variable counts and missing values. This way we can reevaluate our list of variables to recollect and also see the weaknesses in our collected data and prevent the mistakes in the future data collection.

I think that an importmant thing to consider is that even if we collect additional 300 patients and end up with ~400, when modeling, after inclusion of imaging and demographic variables and important cardiovascular variables such as HTN and smoking there would not be much more room to add other variables before we start overfitting our models. Also for every missing value from any variable in the model, the whole observation will be dropped from the model. Therefore I believe that in general we would not be using a lot of other variables outside our imaging, demographic and typical cardiovascular. Tracy, please correct me, if I am getting something wrong.

Link to the Variables to choose spreadsheet: https://docs.google.com/spreadsheets/d/1vj9DYLBjopVMnk5B_sInAX56oxSK1bmCzrAMzrzkAsQ/edit#gid=587039776

Level of care at time of admission and Highest level of care during hospital stay

Here I combined level_of_care_at_time_of_admission with highest_level_of_care_during_hospital_stay for each patient with " - " between them:

##   [1] "Med/surg - Med/surg"           "Step-down/PCU - Step-down/PCU"
##   [3] "Step-down/PCU - Step-down/PCU" "Med/surg - Med/surg"          
##   [5] "Med/surg - Med/surg"           "Step-down/PCU - Step-down/PCU"
##   [7] "Med/surg - ICU"                "NA - NA"                      
##   [9] "ICU - ICU"                     "Med/surg - Med/surg"          
##  [11] "ICU - ICU"                     "Med/surg - Step-down/PCU"     
##  [13] "Med/surg - Med/surg"           "Step-down/PCU - ICU"          
##  [15] "Step-down/PCU - ICU"           "Step-down/PCU - Med/surg"     
##  [17] "Step-down/PCU - Step-down/PCU" "Med/surg - ICU"               
##  [19] "Med/surg - Med/surg"           "ICU - ICU"                    
##  [21] "Med/surg - Med/surg"           "ICU - ICU"                    
##  [23] "Med/surg - Step-down/PCU"      "Step-down/PCU - Step-down/PCU"
##  [25] "Med/surg - Med/surg"           "ICU - ICU"                    
##  [27] "Step-down/PCU - Step-down/PCU" "Med/surg - Step-down/PCU"     
##  [29] "NA - NA"                       "NA - NA"                      
##  [31] "Step-down/PCU - ICU"           "Med/surg - Med/surg"          
##  [33] "NA - NA"                       "Med/surg - Step-down/PCU"     
##  [35] "NA - NA"                       "Med/surg - Med/surg"          
##  [37] "Med/surg - Med/surg"           "NA - NA"                      
##  [39] "NA - NA"                       "NA - NA"                      
##  [41] "NA - NA"                       "NA - NA"                      
##  [43] "NA - NA"                       "NA - NA"                      
##  [45] "NA - NA"                       "NA - NA"                      
##  [47] "NA - NA"                       "NA - NA"                      
##  [49] "NA - NA"                       "NA - NA"                      
##  [51] "NA - NA"                       "NA - NA"                      
##  [53] "NA - NA"                       "NA - NA"                      
##  [55] "NA - NA"                       "NA - NA"                      
##  [57] "NA - NA"                       "NA - NA"                      
##  [59] "NA - NA"                       "NA - NA"                      
##  [61] "NA - NA"                       "NA - NA"                      
##  [63] "NA - NA"                       "NA - NA"                      
##  [65] "NA - NA"                       "NA - NA"                      
##  [67] "NA - NA"                       "NA - NA"                      
##  [69] "Step-down/PCU - NA"            "NA - NA"                      
##  [71] "NA - NA"                       "NA - NA"                      
##  [73] "NA - NA"                       "NA - NA"                      
##  [75] "NA - NA"                       "NA - NA"                      
##  [77] "NA - NA"                       "ICU - ICU"                    
##  [79] "NA - NA"                       "Step-down/PCU - Step-down/PCU"
##  [81] "Med/surg - Med/surg"           "NA - NA"                      
##  [83] "NA - NA"                       "ICU - ICU"                    
##  [85] "Med/surg - Med/surg"           "Med/surg - Med/surg"          
##  [87] "Med/surg - Med/surg"           "Med/surg - Med/surg"          
##  [89] "NA - NA"                       "NA - NA"                      
##  [91] "NA - NA"                       "Med/surg - Med/surg"          
##  [93] "Med/surg - Med/surg"           "Med/surg - Med/surg"          
##  [95] "Step-down/PCU - Step-down/PCU" "NA - NA"                      
##  [97] "NA - NA"                       "Med/surg - Med/surg"          
##  [99] "Step-down/PCU - Step-down/PCU" "Med/surg - Med/surg"          
## [101] "Med/surg - Med/surg"           "NA - NA"                      
## [103] "NA - NA"                       "NA - NA"                      
## [105] "Med/surg - Med/surg"           "NA - NA"                      
## [107] "NA - NA"                       "NA - NA"                      
## [109] "NA - NA"                       "NA - NA"                      
## [111] "NA - NA"                       "NA - NA"                      
## [113] "NA - NA"                       "NA - NA"                      
## [115] "NA - NA"                       "NA - NA"                      
## [117] "NA - NA"                       "NA - NA"                      
## [119] "NA - NA"                       "NA - NA"                      
## [121] "NA - NA"                       "NA - NA"                      
## [123] "NA - NA"                       "NA - NA"                      
## [125] "NA - NA"                       "NA - NA"                      
## [127] "NA - NA"                       "Med/surg - ICU"

I would suggest keeping just one of them, as they are mostly the same with few exceptions. With our data granularity I don’t think these little changes will make too much difference in influencing outcomes. Given so many missing values we could also consider removing both.

Admission diagnosis

primary_admission_diagnosis	n
COVID-19 - confirmed	17
COVID-19 - rule-out	33
Other	11
NA	67

I don’t think that distinguishing rule - out VS confirmed COVID would make too much difference in our outcomes, I would recommend dropping this variable.

Other chronic lung disease

other_chronic_lung_disease	n
No	55
Yes	1
NA	72

I would suggest dropping this variable with only 1 patient with it also given it being very non-specific.

OSA and Asthma

obstructive_sleep_apnea	n
No	53
Yes	5
NA	70

asthma	n
No	46
Yes	12
NA	70

Both are very important predictors of COVID severity. I would suggest keeping asthma as there is a solid number of patients with it (12), but I would suggest removing OSA as only 5 patients have it.

Baseline EF

baseline_ejection_fraction_prior_to_presentation	n
28	1
30	1
51	1
53	1
61	1
65	1
70	1
75	1
77	1
82	1
NA	118

I would drop this importmant variable because of so many missing values.

Importmant baseline characteristics with a lot of missing values

We can see in the Variables to choose spreadsheet that multiple of our very importmant cardiac baseline variables have many missing values (perc_missing).

For instance, CAD variable for each patient looks like this:

##   [1] "No"  NA    NA    NA    "No"  "No"  NA    NA    NA    "No"  NA    NA   
##  [13] "No"  NA    NA    NA    "No"  "No"  NA    NA    NA    NA    "Yes" "Yes"
##  [25] "No"  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    "No" 
##  [37] "Yes" NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
##  [49] NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
##  [61] NA    NA    NA    NA    NA    NA    NA    NA    NA    "No"  NA    NA   
##  [73] NA    NA    NA    NA    NA    "No"  "Yes" "No"  "No"  "No"  "No"  "No" 
##  [85] "No"  "No"  "Yes" "No"  "No"  "No"  "No"  "No"  "Yes" "No"  "No"  "No" 
##  [97] "No"  "No"  "No"  "Yes" "No"  "No"  "No"  "No"  "No"  NA    NA    NA   
## [109] NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
## [121] NA    NA    NA    NA    NA    NA    NA    "Yes"

The pattern is similar across other important variables such as hypertension and smoking. We desperately need these variables and they need to be complete because when used in modeling each missing value will drop the whole observation (as I wrote above). Also I believe that if we miss so many of them, reviewers will scrutinize the quality of our data. We should discuss with the data collection team problems in collecting these variables that they encountered and how to ensure that the percentage of missing in these crucial variables is < 5%. I could assume that some of the data collectors wrote “NA” when they did not find history of it, so potentially it could be “0” instead?

From the variables from above image that I would consider dropping is cancer, given so few patients having it:

malignancy_active_treatment_or_within_last_year	n
No	58
Yes	3
NA	67

Other missing variables

There is a lot of missing variables in other importmant variables that seem very easy to extract from Epic, such as SBP:

SBP_n_missing	n_observatiions	perc_miss
66	128	51.5625

We should discuss with the data collecting teams any issues with obtaining these variables to make sure they are collected for future patients.

Some of the outcomes

tracheostomy	n
Yes	1
NA	127

Might consider dropping given so few patients having it.

suspected_confirmed_myocarditis	n
No	55
Yes	2
NA	71

Might consider dropping given so few patients having it.

suspected_confirmed_myocarditis	n
No	55
Yes	2
NA	71

Might consider dropping given so few patients having it.

Readmissions

was_the_patient_re_admitted_after_the_index_hospitalization_including_to_outside_hospital_if_records_available	n
No	33
Yes	14
NA	81

We could consider updating this variable for previous 125 patients after we pick a specific “end of follow up” date. It could be useful as a potential outcome variable / part of composite outcome variable.

Please let me know your thoughts.

COVID-19 Variables

MT

3/18/2021

Variable analysis COVID-19 Imaging database

Level of care at time of admission and Highest level of care during hospital stay

Admission diagnosis

Other chronic lung disease

OSA and Asthma

Baseline EF

Importmant baseline characteristics with a lot of missing values

Other missing variables

Some of the outcomes

Readmissions