Thieu Nguyen - May-03-2025

COVID-19 REPORTS 2020 - PYTHONΒΆ

Use the SyntheaMass data and rerun the code from this website to practice analyzing healthcare data using Python. I modified some code blocks to handle analysis involving several related tables in the healthcare database.

In this post, I learned how to:

  • Define specific groups of patients, such as those who contracted COVID-19, were hospitalized, admitted to the ICU, placed on ventilators, and more
  • Calculate age, age at death, time points of COVID-19 events, symptom timelines, and hospital stay lengths
  • Aggregate metrics such as counts, means, and sums of symptoms and COVID-19 outcomes
  • Explore lab values
  • Plot key metrics
  • Use Python functions to simplify code blocks in the report by calling external .py files such as analysis.py

Loading data and packagesΒΆ

InΒ [Β ]:
%pip install pandas numpy matplotlib seaborn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
import datetime
InΒ [101]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 1
# for loading analysis.py containing the analysis functions
%aimport analysis
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
InΒ [79]:
# Load the data
conditions = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/conditions.csv")
patients = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/patients.csv")
observations = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/observations.csv")
care_plans = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/careplans.csv")
encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")
devices = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/devices.csv")
supplies = pd.read_csv('/Users/nnthieu/SyntheaData/SyntheaCovid19/supplies.csv')
procedures = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/procedures.csv")
medications = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/medications.csv")

Firstly, create patient Ids for 'covid-19', 'isolation', 'admitted', 'icu_admitted', 'ventilated', 'recovered' and 'death' groups, then create related columns in the covid_patients table. To get that, I have to know CODEs, DESCRIPTIONs related to patient groups, for instance, conditions.CODE = '840539006' for covid-19.

Grab the IDs of patients that have been diagnosed with covid-19.

InΒ [3]:
covid_patient_ids = conditions[conditions.CODE == 840539006].PATIENT.unique()
inpatient_ids = encounters[(encounters.REASONCODE == 840539006) & (encounters.CODE == 1505002)].PATIENT.unique()
isolation_ids = care_plans[(care_plans.CODE == 736376001) & (care_plans.REASONCODE == 840539006)].PATIENT.unique()
icu_patient_ids = encounters[(encounters.REASONCODE == 840539006) & (encounters.CODE == 305351004)].PATIENT.unique()
vent_ids = procedures[procedures.CODE == 26763009].PATIENT.unique()
deceased_ids = patients[patients.DEATHDATE.notna()].Id.unique()
negative_covid_patient_ids = observations[(observations.CODE == '94531-1') & 
                                    (observations.VALUE == 'Not detected (qualifier value)')].PATIENT.unique()
completed_isolation_patients = care_plans[(care_plans.CODE == 736376001) & (care_plans.STOP.notna()) & 
                                          (care_plans.REASONCODE == 840539006)].PATIENT.unique()
survivor_ids = np.union1d(completed_isolation_patients, negative_covid_patient_ids)

# Filter patients who have COVID-19
covid_patients = patients[patients.Id.isin(covid_patient_ids)]  # Correct column name to 'Id'
len(covid_patient_ids)
Out[3]:
88166

The number of inpatient survivors

InΒ [80]:
np.intersect1d(inpatient_ids, survivor_ids).shape
Out[80]:
(14654,)

The number of inpatient non-survivors

InΒ [97]:
np.intersect1d(inpatient_ids, deceased_ids).shape[0]
Out[97]:
3548

Mortality rate of covid-19 patients

InΒ [81]:
(np.intersect1d(covid_patient_ids, deceased_ids).shape[0]/len(covid_patient_ids))* 100
Out[81]:
4.129709865481025
InΒ [4]:
import datetime
# 1. Find the earliest COVID-19 start date
covid_start_date = pd.to_datetime(conditions[conditions['DESCRIPTION'] == 'COVID-19']['START'].min())

# 2. Set the maximum allowed COVID-19 date (80 days later)
covid_max_date = covid_start_date + pd.Timedelta(days=80)

# Ensure the 'START' column is in datetime format
conditions['START'] = pd.to_datetime(conditions['START'], errors='coerce')
observations['DATE'] = pd.to_datetime(observations['DATE'], errors='coerce')

# 3. Create a boolean mask for rows within the COVID-19 date range and for COVID-19 diagnosis
covid_date_mask = (conditions['START'] >= covid_start_date) & (conditions['START'] <= covid_max_date)

# 4. Filter `conditions` and `observations` based on the date range
covid_cons = conditions[covid_date_mask]
covid_obs = observations[
    (observations['DATE'] >= covid_start_date) & (observations['DATE'] <= covid_max_date)
]

# 5. Display first few rows
print(covid_cons.head(3))
covid_cons.shape
       START        STOP                               PATIENT  \
2 2020-03-01  2020-03-30  1ff7f10f-a204-4bb1-aa72-dd763fa99482   
3 2020-03-01  2020-03-01  1ff7f10f-a204-4bb1-aa72-dd763fa99482   
4 2020-03-01  2020-03-30  1ff7f10f-a204-4bb1-aa72-dd763fa99482   

                              ENCOUNTER       CODE         DESCRIPTION  
2  60584c6a-e26b-4176-a660-86ad84c7ceca  386661006     Fever (finding)  
3  60584c6a-e26b-4176-a660-86ad84c7ceca  840544004  Suspected COVID-19  
4  60584c6a-e26b-4176-a660-86ad84c7ceca  840539006            COVID-19  
Out[4]:
(654569, 6)

Health outcomes experienced by COVID-19 patientsΒΆ

The following table shows different health outcomes that were experienced by COVID-19 patients during the course of the disease.

File analysis.py is written and saved in a local working folder.

write a function in analysis.py like import analysis analysis.outcome_table(inpatient_ids, survivor_ids, deceased_patients, covid_cons) to create a table as: outcome total percent of inpatient survivors percent survivors non survivors percent non survivors with outcome are values of covid_condition.DESCRIPTION containing (disorder)

InΒ [103]:
analysis.outcome_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons)
Out[103]:
outcome total percent of inpatient survivors percent survivors non survivors percent non survivors
1 Acute respiratory failure 8701 47.87 5237 5.99 3473 14.38
0 Sepsis caused by virus 6941 38.19 3419 3.91 3526 14.60
6 Acute respiratory distress syndrome 2400 13.20 85 0.10 2315 9.59
2 Septic shock 1746 9.61 0 0.00 1746 7.23
5 Heart failure 1431 7.87 122 0.14 1309 5.42
4 Injury of heart 1288 7.09 20 0.02 1268 5.25
3 Injury of kidney 1252 6.89 8 0.01 1244 5.15
InΒ [Β ]:
#import analysis 
def outcome_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons):
    # Filter conditions containing "(disorder)" in the DESCRIPTION
    disorder_conditions = covid_cons[
        (covid_cons['DESCRIPTION'].str.contains(r'\(disorder\)', na=False)) &
        (~covid_cons['DESCRIPTION'].str.contains(
            r'pneumonia|hypoxemia|chronic|Hypertriglyceridemia|Hyperlipidemia|Hypercholesterolemia|Metabolic|osteoporosis|Alzheimer|Anemia|sinusitis|thrombosis|embolism|pharyngitis|bronchitis', case=False, na=False))
        ]

    # Initialize the result table
    table_rows = []

    for outcome in disorder_conditions['DESCRIPTION'].unique():
        # Filter patients with the specific outcome
        outcome_patients = disorder_conditions[disorder_conditions['DESCRIPTION']
                                               == outcome]['PATIENT'].unique()

        # Calculate metrics
        total = len(outcome_patients)
        inpatient = len(set(outcome_patients) & set(inpatient_ids))
        survivors = len(set(outcome_patients) & set(survivor_ids))
        non_survivors = len(set(outcome_patients) & set(deceased_ids))

        percent_inpatient = (inpatient / len(inpatient_ids)) * \
            100 if len(inpatient_ids) > 0 else 0
        percent_survivors = (survivors / len(survivor_ids)) * \
            100 if len(survivor_ids) > 0 else 0
        percent_non_survivors = (
            non_survivors / len(deceased_ids)) * 100 if len(deceased_ids) > 0 else 0

        # Append row to the table
        table_rows.append({
            'outcome': outcome,
            'total': total,
            'percent of inpatient': round(percent_inpatient, 2),
            'survivors': survivors,
            'percent survivors': round(percent_survivors, 2),
            'non survivors': non_survivors,
            'percent non survivors': round(percent_non_survivors, 2)
        })

    # Convert to DataFrame
    outcome_table = pd.DataFrame(table_rows).head(7)
    outcome_table = outcome_table.sort_values(by='total', ascending=False)
    outcome_table['outcome'] = outcome_table['outcome'].str.replace(
    r'\(disorder\)', '', regex=True).str.strip()

    return outcome_table

outcome_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons)

Outcomes for all COVID-19 PatientsΒΆ

This code builds a new DataFrame for the purposes of display. The DataFrame contains the percentages of patients that experience a particular outcome. Percentages are then provided for only hospitalized patients, ICU admitted patients and ventilated patients.

InΒ [13]:
# Recode the boolean columns from True/False to text to read easily
hospitalized = (cp.admit == True)
icu = (cp.icu_admit == True)
vent = (cp.ventilated == True)
covid_count = cp.Id.size
row_filters = {'Home Isolation': (cp.isolation == True), 
               'Hospital Admission': hospitalized, 
               'ICU Admission': icu, 
               'Ventilated': vent, 
               'Recovered': (cp.recovered == True), 
               'Death': (cp.death == True)}

table_rows = []
for category, row_filter in row_filters.items():
    row = {'Outcome': category}
    row['All Patients'] = round(cp[row_filter].Id.size / covid_count, 2)
    row['Hospitalized'] = round(cp[row_filter & hospitalized].Id.size / hospitalized.value_counts()[True], 2)
    row['ICU Admitted'] = round(cp[row_filter & icu].Id.size / icu.value_counts()[True], 2)
    row['Ventilated'] = round(cp[row_filter & vent].Id.size / vent.value_counts()[True], 2)
    table_rows.append(row)
    
pd.DataFrame.from_records(table_rows)
Out[13]:
Outcome All Patients Hospitalized ICU Admitted Ventilated
0 Home Isolation 0.80 0.03 0.03 0.03
1 Hospital Admission 0.21 1.00 1.00 1.00
2 ICU Admission 0.04 0.20 1.00 1.00
3 Ventilated 0.03 0.16 0.79 1.00
4 Recovered 0.96 0.81 0.32 0.15
5 Death 0.04 0.20 0.68 0.85

Outcomes for ICU Admitted PatientsΒΆ

Essentially a sub table from above, looking only at ICU patients.

InΒ [14]:
icu_only = cp[cp.icu_admit == True]
vent = (icu_only.ventilated == True)
covid_count = icu_only.Id.size
row_filters = {'Ventilated': vent, 
               'Recovered': (icu_only.recovered == True), 
               'Death': (icu_only.death == True)}

table_rows = []
for category, row_filter in row_filters.items():
    row = {'Outcome': category}
    row['ICU Admitted'] = round(icu_only[row_filter].Id.size / covid_count, 2)
    row['Ventilated'] = round(icu_only[row_filter & vent].Id.size / vent.value_counts()[True], 2)
   
    table_rows.append(row)
    
pd.DataFrame.from_records(table_rows)
Out[14]:
Outcome ICU Admitted Ventilated
0 Ventilated 0.79 1.00
1 Recovered 0.32 0.15
2 Death 0.68 0.85

Start to build a DataFrame that we can use to look at other conditions in relation to COVID-19

InΒ [15]:
covid_info = cp[['PATIENT', 'recovered', 'death', 'START', 'DEATHDATE', 'BIRTHDATE', 'GENDER', 'admit', 'icu_admit']]
covid_info = covid_info.rename(columns={'START': 'covid_start'})
covid_info.head(2)
Out[15]:
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False
1 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False

Grab all of the conditions starting after January 20, 2020. This is a hack to get only conditions that are related to COVID-19. We will end up merging these with the COVID patients.

InΒ [16]:
covid_related_conditions = conditions[pd.to_datetime(conditions.START) > pd.to_datetime('2020-01-20')]

This DataFrame will contain all conditions for COVID-19 patients, where START can be compared to covid_start to see how long after the COVID-19 diagnosis something happened.

InΒ [17]:
covid_patient_conditions = covid_info.merge(covid_related_conditions, on='PATIENT')
covid_patient_conditions.head(2)
Out[17]:
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit START STOP ENCOUNTER CODE DESCRIPTION
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 2020-03-01 2020-03-30 60584c6a-e26b-4176-a660-86ad84c7ceca 386661006 Fever (finding)
1 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 2020-03-01 2020-03-01 60584c6a-e26b-4176-a660-86ad84c7ceca 840544004 Suspected COVID-19

Lab values for COVID-19 patientsΒΆ

The following code presents lab values taken for COVID-19 patients. Values are separated into survivors and non survivors. The first block of code selects lab values of interest from all observations in the simulation.

InΒ [82]:
lab_obs = observations[(observations.CODE == '48065-7') | (observations.CODE == '26881-3') | 
                          (observations.CODE == '2276-4') | (observations.CODE == '89579-7') |
                           (observations.CODE == '731-0') | (observations.CODE == '14804-9')
                      ]
lab_obs.head(3)
Out[82]:
DATE PATIENT ENCOUNTER CODE DESCRIPTION VALUE UNITS TYPE
254 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 731-0 Lymphocytes [#/volume] in Blood by Automated c... 1.1 10*3/uL numeric
273 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 48065-7 Fibrin D-dimer FEU [Mass/volume] in Platelet p... 0.4 ug/mL numeric
274 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 2276-4 Ferritin [Mass/volume] in Serum or Plasma 332.4 ug/L numeric

Select COVID-19 conditions out of all conditions in the simulation

InΒ [Β ]:
covid_conditions = conditions[conditions.CODE == 840539006]
covid_patients = covid_conditions.merge(patients, how='left', left_on='PATIENT', right_on='Id')
covid_patients['survivor'] = covid_patients.PATIENT.isin(survivor_ids)
covid_patients = covid_patients[['START', 'PATIENT', 'survivor', 'CODE']]

Calculate attributes needed to support the plot. Also coerce all lab values into a numeric data type.

InΒ [84]:
covid_patients_obs = covid_patients.merge(lab_obs, on='PATIENT')
covid_patients_obs['START'] = pd.to_datetime(covid_patients_obs.START)
covid_patients_obs['DATE'] = pd.to_datetime(covid_patients_obs.DATE)
covid_patients_obs['lab_days'] = covid_patients_obs.DATE - covid_patients_obs.START
covid_patients_obs['days'] = covid_patients_obs.lab_days / np.timedelta64(1, 'D')
covid_patients_obs['VALUE'] = pd.to_numeric(covid_patients_obs['VALUE'], errors='coerce')
InΒ [85]:
loinc_to_display = {'CODE_y = 48065-7': 'D-dimer', 'CODE_y = 2276-4': 'Serum Ferritin',
                    'CODE_y = 89579-7': 'High Sensitivity Cardiac Troponin I',
                    'CODE_y = 26881-3': 'IL-6', 'CODE_y = 731-0': 'Lymphocytes',
                    'CODE_y = 14804-9': 'Lactate dehydrogenase'}
catplt = sns.catplot(x="days", y="VALUE", hue="survivor", kind="box", col='CODE_y', 
            col_wrap=2, sharey=False, sharex=False, data=covid_patients_obs, palette=["C1", "C0"])

# Set the title for the entire figure
catplt.fig.suptitle("Labs Values of Covid-19 Patients", y=1.02)

for axis in catplt.fig.axes:
    axis.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
    axis.xaxis.set_major_locator(ticker.MultipleLocator(base=4))
    axis.set_title(loinc_to_display[axis.get_title()])
        
plt.show()
No description has been provided for this image

Line plots

InΒ [86]:
loinc_to_display = {'CODE_y = 48065-7': 'D-dimer', 'CODE_y = 2276-4': 'Serum Ferritin',
                    'CODE_y = 89579-7': 'High Sensitivity Cardiac Troponin I',
                    'CODE_y = 26881-3': 'IL-6', 'CODE_y = 731-0': 'Lymphocytes',
                    'CODE_y = 14804-9': 'Lactate dehydrogenase'}
g = sns.FacetGrid(covid_patients_obs, col="CODE_y", col_wrap=2, sharey=False, height=3, aspect=2)
g = g.map_dataframe(sns.lineplot, x="days", y="VALUE", marker=".", hue="survivor", palette=["C1", "C0"], alpha=0.5)
g.fig.suptitle("Labs Values of Covid-19 Patients", y=1.02)

for axis in g.axes.flat:
    axis.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
    axis.xaxis.set_major_locator(ticker.MultipleLocator(base=4))
    title = axis.get_title().replace("DESCRIPTION = ", "").strip()
    if title in loinc_to_display:
        axis.set_title(loinc_to_display[title])
    else:
        axis.set_title(title)  # Fallback to the original title if not found in the dictionary
No description has been provided for this image

Another way to create line plots

InΒ [87]:
loinc_to_display = {'CODE_y = 48065-7': 'D-dimer', 'CODE_y = 2276-4': 'Serum Ferritin',
                    'CODE_y = 89579-7': 'High Sensitivity Cardiac Troponin I',
                    'CODE_y = 26881-3': 'IL-6', 'CODE_y = 731-0': 'Lymphocytes',
                    'CODE_y = 14804-9': 'Lactate dehydrogenase'}
catplt = sns.catplot(x="days", y="VALUE", hue="survivor", kind="point", col='CODE_y', 
            col_wrap=2, sharey=False, sharex=False, data=covid_patients_obs, palette=["C1", "C0"])


# Set the title for the entire figure
catplt.fig.suptitle("Labs Values of Covid-19 Patients", y=1.02)

for axis in catplt.fig.axes:
    axis.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
    axis.xaxis.set_major_locator(ticker.MultipleLocator(base=4))
    axis.set_title(loinc_to_display[axis.title.get_text()])

plt.show()
No description has been provided for this image

Set up a new DataFrame with boolean columns representing various outcomes, like admit, recovery or death

First, create patient Ids for 'isolation', 'admitted', 'icu_admitted', 'ventilated', 'recovered' and 'death', then create related columns in the covid_patient_ids table.

InΒ [88]:
cp = covid_conditions.merge(patients, how='left', left_on='PATIENT', right_on='Id')
isolation_ids = care_plans[(care_plans.CODE == 736376001) & (care_plans.REASONCODE == 840539006)].PATIENT
InΒ [89]:
cp['isolation'] = cp.Id.isin(isolation_ids)
cp['admit'] = cp.Id.isin(inpatient_ids)
cp['recovered'] = cp.Id.isin(survivor_ids)
cp['death'] = cp.DEATHDATE.notna()
icu_ids = encounters[encounters.CODE == 305351004].PATIENT
cp['icu_admit'] = cp.Id.isin(icu_ids)
vent_ids = procedures[procedures.CODE == 26763009].PATIENT
cp['ventilated'] = cp.Id.isin(vent_ids)
cp.head(2)
Out[89]:
START STOP PATIENT ENCOUNTER CODE DESCRIPTION Id BIRTHDATE DEATHDATE SSN ... LAT LON HEALTHCARE_EXPENSES HEALTHCARE_COVERAGE isolation admit recovered death icu_admit ventilated
0 2020-03-01 2020-03-30 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 840539006 COVID-19 1ff7f10f-a204-4bb1-aa72-dd763fa99482 2017-08-24 NaN 999-68-6630 ... 42.151961 -72.598959 8446.49 1499.08 True False True False False False
1 2020-03-13 2020-04-14 9bcf6ed5-d808-44af-98a0-7d78a29ede72 a78e78d9-33bb-40bc-9e42-e47ff7e910cc 840539006 COVID-19 9bcf6ed5-d808-44af-98a0-7d78a29ede72 2016-08-01 NaN 999-15-5895 ... 42.177370 -71.281353 94568.40 1870.72 True False True False False False

2 rows Γ— 37 columns

Symptoms for all COVID-19 PatientsΒΆ

Generates a DataFrame with percentages of co-occurring conditions

InΒ [18]:
icu_patient_ids = encounters[(encounters.REASONCODE == 840539006) & (encounters.CODE == 305351004)].PATIENT
icu_patient_ids.shape
Out[18]:
(3677,)
InΒ [105]:
analysis.symptom_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons)
Out[105]:
outcome total percent of inpatient survivors percent survivors non survivors percent non survivors
0 Fever 80865 92.07 77629 88.73 3317 13.73
1 Cough 61710 70.81 59170 67.63 2601 10.77
4 Loss of taste 46121 51.25 44299 50.64 1868 7.73
8 Fatigue 34951 39.86 33551 38.35 1432 5.93
2 Sputum finding 30653 34.78 29418 33.63 1264 5.23
7 Respiratory distress 18177 100.00 14654 16.75 3548 14.69
6 Wheezing 17939 37.86 16621 19.00 1339 5.54
5 Dyspnea 17939 37.86 16621 19.00 1339 5.54
3 Diarrhea symptom 3521 5.80 3319 3.79 208 0.86
9 Hemoptysis 874 2.56 781 0.89 95 0.39
InΒ [23]:
def symptom_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons):
    # Filter conditions containing "(disorder)" in the DESCRIPTION
    disorder_conditions = covid_cons[
        (covid_cons['DESCRIPTION'].str.contains(r'\(finding\)', na=False)) &
        (~covid_cons['DESCRIPTION'].str.contains(
            r'pneumonia|hypoxemia|chronic|Hypertriglyceridemia|Hyperlipidemia|Hypercholesterolemia|Metabolic|osteoporosis|Alzheimer|Anemia|sinusitis|thrombosis|embolism|pharyngitis|bronchitis', case=False, na=False))
        ]

    # Initialize the result table
    table_rows = []

    for outcome in disorder_conditions['DESCRIPTION'].unique():
        # Filter patients with the specific outcome
        outcome_patients = disorder_conditions[disorder_conditions['DESCRIPTION']
                                               == outcome]['PATIENT'].unique()

        # Calculate metrics
        total = len(outcome_patients)
        inpatient = len(set(outcome_patients) & set(inpatient_ids))
        survivors = len(set(outcome_patients) & set(survivor_ids))
        non_survivors = len(set(outcome_patients) & set(deceased_ids))

        percent_inpatient = (inpatient / len(inpatient_ids)) * \
            100 if len(inpatient_ids) > 0 else 0
        percent_survivors = (survivors / len(survivor_ids)) * \
            100 if len(survivor_ids) > 0 else 0
        percent_non_survivors = (
            non_survivors / len(deceased_ids)) * 100 if len(deceased_ids) > 0 else 0

        # Append row to the table
        table_rows.append({
            'outcome': outcome,
            'total': total,
            'percent of inpatient': round(percent_inpatient, 2),
            'survivors': survivors,
            'percent survivors': round(percent_survivors, 2),
            'non survivors': non_survivors,
            'percent non survivors': round(percent_non_survivors, 2)
        })

    # Convert to DataFrame
    outcome_table = pd.DataFrame(table_rows).head(10)
    outcome_table = outcome_table.sort_values(by='total', ascending=False)
    outcome_table['outcome'] = outcome_table['outcome'].str.replace(
    r'\(finding\)', '', regex=True).str.strip()

    return outcome_table

symptom_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons)
Out[23]:
outcome total percent of inpatient survivors percent survivors non survivors percent non survivors
0 Fever 80865 92.07 77629 88.73 3317 13.73
1 Cough 61710 70.81 59170 67.63 2601 10.77
4 Loss of taste 46121 51.25 44299 50.64 1868 7.73
8 Fatigue 34951 39.86 33551 38.35 1432 5.93
2 Sputum finding 30653 34.78 29418 33.63 1264 5.23
7 Respiratory distress 18177 100.00 14654 16.75 3548 14.69
6 Wheezing 17939 37.86 16621 19.00 1339 5.54
5 Dyspnea 17939 37.86 16621 19.00 1339 5.54
3 Diarrhea symptom 3521 5.80 3319 3.79 208 0.86
9 Hemoptysis 874 2.56 781 0.89 95 0.39

Symptoms for ICU-admitted COVID-19 PatientsΒΆ

InΒ [108]:
covid_conditions = covid_cons
icu_patient_ids = encounters[(encounters.REASONCODE == 840539006) & (encounters.CODE == 305351004)].PATIENT
icu_patients = covid_conditions[covid_conditions.PATIENT.isin(icu_patient_ids)]
len(icu_patient_ids)
Out[108]:
3677
InΒ [109]:
analysis.calculate_symptom_stats(icu_patient_ids, survivor_ids, deceased_ids, covid_conditions)
Out[109]:
symptom icu_count percent_icu survivor_count percent_survivor deceased_count percent_deceased
7 Respiratory distress 3677 100.00 1180 1.35 2498 10.34
0 Fever 3345 90.97 1068 1.22 2278 9.43
1 Cough 2637 71.72 853 0.98 1785 7.39
4 Loss of taste 1940 52.76 637 0.73 1303 5.40
8 Fatigue 1482 40.30 498 0.57 985 4.08
6 Wheezing 1381 37.56 466 0.53 916 3.79
5 Dyspnea 1381 37.56 466 0.53 916 3.79
2 Sputum finding 1271 34.57 401 0.46 870 3.60
12 Joint pain 656 17.84 205 0.23 451 1.87
11 Muscle pain 656 17.84 205 0.23 451 1.87
InΒ [110]:
import pandas as pd

def calculate_symptom_stats(icu_patient_ids, survivor_ids, deceased_ids, covid_conditions):
    # Filter conditions for "(finding)" only
    finding_conditions = covid_conditions[
        covid_conditions['DESCRIPTION'].str.contains(r'\(finding\)', na=False)
    ]
    
    # Ensure there are symptoms to process
    if finding_conditions.empty:
        print("No symptoms found with '(finding)' in the description.")
        return pd.DataFrame(columns=[
            'symptom', 'icu_count', 'percent_icu',
            'survivor_count', 'percent_survivor',
            'deceased_count', 'percent_deceased'
        ])
    
    # Convert inputs to sets for faster lookup
    icu_set = set(icu_patient_ids)
    survivor_set = set(survivor_ids)
    deceased_set = set(deceased_ids)

    table_rows = []

    for symptom in finding_conditions['DESCRIPTION'].unique():
        # Get all patients with the symptom
        symptom_patients = finding_conditions[
            finding_conditions['DESCRIPTION'] == symptom
        ]['PATIENT'].unique()

        symptom_patients_set = set(symptom_patients)

        # Intersections
        icu_with_symptom = symptom_patients_set & icu_set
        survivors_with_symptom = icu_with_symptom & survivor_set
        deceased_with_symptom = icu_with_symptom & deceased_set

        icu_count = len(icu_with_symptom)
        survivor_count = len(survivors_with_symptom)
        deceased_count = len(deceased_with_symptom)

        # Safe percentage calculations
        percent_icu = (icu_count / len(icu_patient_ids)) * 100 if len(icu_patient_ids) > 0 else 0
        percent_survivor = (survivor_count / len(survivor_ids)) * 100 if len(survivor_ids) > 0 else 0
        percent_deceased = (deceased_count / len(deceased_ids)) * 100 if len(deceased_ids) > 0 else 0

        table_rows.append({
            'symptom': symptom.replace('(finding)', '').strip(),
            'icu_count': icu_count,
            'percent_icu': round(percent_icu, 2),
            'survivor_count': survivor_count,
            'percent_survivor': round(percent_survivor, 2),
            'deceased_count': deceased_count,
            'percent_deceased': round(percent_deceased, 2)
        })

    # Create and sort the DataFrame
    df = pd.DataFrame(table_rows)
    if df.empty:
        print("No matching symptoms found for ICU patients.")
        return pd.DataFrame(columns=[
            'symptom', 'icu_count', 'percent_icu',
            'survivor_count', 'percent_survivor',
            'deceased_count', 'percent_deceased'
        ])
    
    df = df.sort_values(by='icu_count', ascending=False).head(10)

    return df

# Call the function
calculate_symptom_stats(icu_patient_ids, survivor_ids, deceased_ids, covid_conditions)
Out[110]:
symptom icu_count percent_icu survivor_count percent_survivor deceased_count percent_deceased
7 Respiratory distress 3677 100.00 1180 1.35 2498 10.34
0 Fever 3345 90.97 1068 1.22 2278 9.43
1 Cough 2637 71.72 853 0.98 1785 7.39
4 Loss of taste 1940 52.76 637 0.73 1303 5.40
8 Fatigue 1482 40.30 498 0.57 985 4.08
6 Wheezing 1381 37.56 466 0.53 916 3.79
5 Dyspnea 1381 37.56 466 0.53 916 3.79
2 Sputum finding 1271 34.57 401 0.46 870 3.60
12 Joint pain 656 17.84 205 0.23 451 1.87
11 Muscle pain 656 17.84 205 0.23 451 1.87

Create a DataFrame with columns that show a condition's start and end in days relative to COVID-19 diagnosis. Also create a column that calculates the number of days between COVID-19 diagnosis and a person's death.

InΒ [19]:
covid_patient_conditions['start_days'] = (pd.to_datetime(covid_patient_conditions.START) - pd.to_datetime(covid_patient_conditions.covid_start)) / np.timedelta64(1, 'D')
covid_patient_conditions['end_days'] = (pd.to_datetime(covid_patient_conditions.STOP) - pd.to_datetime(covid_patient_conditions.covid_start)) / np.timedelta64(1, 'D')
covid_patient_conditions['death_days'] = (pd.to_datetime(covid_patient_conditions.DEATHDATE) - pd.to_datetime(covid_patient_conditions.covid_start)) / np.timedelta64(1, 'D')
InΒ [20]:
covid_patient_conditions = covid_patient_conditions[covid_patient_conditions['DESCRIPTION'].str.contains(r'\(finding\)', na=False)]
covid_patient_conditions.head(2)
Out[20]:
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit START STOP ENCOUNTER CODE DESCRIPTION start_days end_days death_days
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 2020-03-01 2020-03-30 60584c6a-e26b-4176-a660-86ad84c7ceca 386661006 Fever (finding) 0.0 29.0 NaN
4 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False 2020-03-13 2020-04-14 a78e78d9-33bb-40bc-9e42-e47ff7e910cc 49727002 Cough (finding) 0.0 32.0 NaN
InΒ [42]:
# Step 1 & 2: Filter and remove duplicate patient-condition pairs
filtered = covid_patient_conditions[
    (covid_patient_conditions['death'] == False) &
    (covid_patient_conditions['icu_admit'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

# Step 3: Group by condition and count unique patients
df_survivor = filtered.groupby('DESCRIPTION') \
    .agg(survivor_count=('PATIENT', 'nunique')) \
    .reset_index()

# Step 4: Add percentage column
total_unique_patients = filtered['PATIENT'].nunique()
df_survivor['survivor_percent'] = round((df_survivor['survivor_count'] / total_unique_patients) * 100, 2)

# Step 5: Sort and take top 10
df_survivor = df_survivor.sort_values(by='survivor_count', ascending=False).head(15)
df_survivor
Out[42]:
DESCRIPTION survivor_count survivor_percent
15 Respiratory distress (finding) 1179 100.00
6 Fever (finding) 1067 90.50
2 Cough (finding) 852 72.26
10 Loss of taste (finding) 637 54.03
5 Fatigue (finding) 497 42.15
19 Wheezing (finding) 465 39.44
4 Dyspnea (finding) 465 39.44
17 Sputum finding (finding) 401 34.01
9 Joint pain (finding) 205 17.39
11 Muscle pain (finding) 205 17.39
1 Chill (finding) 171 14.50
7 Headache (finding) 166 14.08
16 Sore throat symptom (finding) 139 11.79
13 Nausea (finding) 82 6.96
18 Vomiting symptom (finding) 82 6.96
InΒ [43]:
# Step 1 & 2: Filter and remove duplicate patient-condition pairs
filtered = covid_patient_conditions[
    (covid_patient_conditions['death'] == True) &
    (covid_patient_conditions['icu_admit'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

# Step 3: Group by condition and count unique patients
df_death = filtered.groupby('DESCRIPTION') \
    .agg(death_count=('PATIENT', 'nunique')) \
    .reset_index()

# Step 4: Add percentage column
total_unique_patients = filtered['PATIENT'].nunique()
df_death['death_percent'] = round((df_death['death_count'] / total_unique_patients) * 100, 2)

# Step 5: Sort and take top 10
df_death = df_death.sort_values(by='death_count', ascending=False).head(15)
df_death
Out[43]:
DESCRIPTION death_count death_percent
15 Respiratory distress (finding) 2498 100.00
6 Fever (finding) 2278 91.19
2 Cough (finding) 1785 71.46
10 Loss of taste (finding) 1303 52.16
5 Fatigue (finding) 985 39.43
19 Wheezing (finding) 916 36.67
4 Dyspnea (finding) 916 36.67
17 Sputum finding (finding) 870 34.83
9 Joint pain (finding) 451 18.05
11 Muscle pain (finding) 451 18.05
7 Headache (finding) 412 16.49
1 Chill (finding) 370 14.81
16 Sore throat symptom (finding) 348 13.93
13 Nausea (finding) 158 6.33
18 Vomiting symptom (finding) 158 6.33
InΒ [44]:
# Step 1 & 2: Filter and remove duplicate patient-condition pairs
filtered = covid_patient_conditions[
    (covid_patient_conditions['icu_admit'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

# Step 3: Group by condition and count unique patients
df_total = filtered.groupby('DESCRIPTION') \
    .agg(total_count=('PATIENT', 'nunique')) \
    .reset_index()

# Step 4: Add percentage column
total_unique_patients = filtered['PATIENT'].nunique()
df_total['total_percent'] = round((df_total['total_count'] / total_unique_patients) * 100, 2)

# Step 5: Sort and take top 10
df_total = df_total.sort_values(by='total_count', ascending=False).head(15)
df_total
Out[44]:
DESCRIPTION total_count total_percent
15 Respiratory distress (finding) 3677 100.00
6 Fever (finding) 3345 90.97
2 Cough (finding) 2637 71.72
10 Loss of taste (finding) 1940 52.76
5 Fatigue (finding) 1482 40.30
19 Wheezing (finding) 1381 37.56
4 Dyspnea (finding) 1381 37.56
17 Sputum finding (finding) 1271 34.57
9 Joint pain (finding) 656 17.84
11 Muscle pain (finding) 656 17.84
7 Headache (finding) 578 15.72
1 Chill (finding) 541 14.71
16 Sore throat symptom (finding) 487 13.24
13 Nausea (finding) 240 6.53
18 Vomiting symptom (finding) 240 6.53
InΒ [45]:
# Merge df_total and df_survivor on 'DESCRIPTION'
merged_df = pd.merge(df_total, df_survivor, on='DESCRIPTION', how='left')

# Merge the result with df_death on 'DESCRIPTION'
merged_df = pd.merge(merged_df, df_death, on='DESCRIPTION', how='left')
merged_df['DESCRIPTION'] = merged_df['DESCRIPTION'].str.replace(
        r'\(finding\)', '', regex=True).str.strip()
# Display the merged DataFrame
merged_df.head(10)
Out[45]:
DESCRIPTION total_count total_percent survivor_count survivor_percent death_count death_percent
0 Respiratory distress 3677 100.00 1179 100.00 2498 100.00
1 Fever 3345 90.97 1067 90.50 2278 91.19
2 Cough 2637 71.72 852 72.26 1785 71.46
3 Loss of taste 1940 52.76 637 54.03 1303 52.16
4 Fatigue 1482 40.30 497 42.15 985 39.43
5 Wheezing 1381 37.56 465 39.44 916 36.67
6 Dyspnea 1381 37.56 465 39.44 916 36.67
7 Sputum finding 1271 34.57 401 34.01 870 34.83
8 Joint pain 656 17.84 205 17.39 451 18.05
9 Muscle pain 656 17.84 205 17.39 451 18.05

Combined tableΒΆ

InΒ [46]:
# --- TOTAL ICU PATIENTS ---
filtered_total = covid_patient_conditions[
    covid_patient_conditions['icu_admit'] == True
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

df_total = filtered_total.groupby('DESCRIPTION') \
    .agg(total_count=('PATIENT', 'nunique')) \
    .reset_index()

total_unique_patients = filtered_total['PATIENT'].nunique()
df_total['total_percent'] = round((df_total['total_count'] / total_unique_patients) * 100, 2)

# --- ICU SURVIVORS ---
filtered_survivor = covid_patient_conditions[
    (covid_patient_conditions['death'] == False) &
    (covid_patient_conditions['icu_admit'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

df_survivor = filtered_survivor.groupby('DESCRIPTION') \
    .agg(survivor_count=('PATIENT', 'nunique')) \
    .reset_index()

survivor_unique_patients = filtered_survivor['PATIENT'].nunique()
df_survivor['survivor_percent'] = round((df_survivor['survivor_count'] / survivor_unique_patients) * 100, 2)

# --- ICU DEATHS ---
filtered_death = covid_patient_conditions[
    (covid_patient_conditions['death'] == True) &
    (covid_patient_conditions['icu_admit'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

df_death = filtered_death.groupby('DESCRIPTION') \
    .agg(death_count=('PATIENT', 'nunique')) \
    .reset_index()

death_unique_patients = filtered_death['PATIENT'].nunique()
df_death['death_percent'] = round((df_death['death_count'] / death_unique_patients) * 100, 2)

# --- MERGE ALL ---
from functools import reduce

dfs = [df_total, df_survivor, df_death]
df_combined = reduce(lambda left, right: pd.merge(left, right, on='DESCRIPTION', how='outer'), dfs)

# Optional: Fill NaNs with 0
df_combined = df_combined.fillna(0)

# Sort by total_count and show top 15
df_combined = df_combined.sort_values(by='total_count', ascending=False).head(10)
df_combined['DESCRIPTION'] = df_combined['DESCRIPTION'].str.replace(
        r'\(finding\)', '', regex=True).str.strip()
# Final output
df_combined.head(10)
Out[46]:
DESCRIPTION total_count total_percent survivor_count survivor_percent death_count death_percent
15 Respiratory distress 3677 100.00 1179 100.00 2498 100.00
6 Fever 3345 90.97 1067 90.50 2278 91.19
2 Cough 2637 71.72 852 72.26 1785 71.46
10 Loss of taste 1940 52.76 637 54.03 1303 52.16
5 Fatigue 1482 40.30 497 42.15 985 39.43
19 Wheezing 1381 37.56 465 39.44 916 36.67
4 Dyspnea 1381 37.56 465 39.44 916 36.67
17 Sputum finding 1271 34.57 401 34.01 870 34.83
9 Joint pain 656 17.84 205 17.39 451 18.05
11 Muscle pain 656 17.84 205 17.39 451 18.05

Symptom timelines for hospitalized patientsΒΆ

These plots show the progression of COVID-19 related complications in hospitalized patients. The bars represent the average start and end time for the particular item.

InΒ [21]:
covid_patient_conditions.head(2)
Out[21]:
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit START STOP ENCOUNTER CODE DESCRIPTION start_days end_days death_days
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 2020-03-01 2020-03-30 60584c6a-e26b-4176-a660-86ad84c7ceca 386661006 Fever (finding) 0.0 29.0 NaN
4 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False 2020-03-13 2020-04-14 a78e78d9-33bb-40bc-9e42-e47ff7e910cc 49727002 Cough (finding) 0.0 32.0 NaN
InΒ [76]:
symptoms = ['Fever (finding)', 'Cough (finding)', 'Dyspnea (finding)']
filtered = covid_patient_conditions[(covid_patient_conditions['DESCRIPTION'].isin(symptoms)) &
    (covid_patient_conditions['icu_admit'] == True)]
filtered.groupby('DESCRIPTION')['end_days'].mean().round(2)
Out[76]:
DESCRIPTION
Cough (finding)      16.29
Dyspnea (finding)    16.43
Fever (finding)      16.33
Name: end_days, dtype: float64
InΒ [55]:
# Step 1: Create Boolean mask for "(finding)" and exclusion terms
condition_mask = (
    covid_patient_conditions['DESCRIPTION'].str.contains(r'\(finding\)', na=False) &
    ~covid_patient_conditions['DESCRIPTION'].str.contains(
        r'pneumonia|hypoxemia|chronic|Hypertriglyceridemia|Hyperlipidemia|Hypercholesterolemia|Metabolic|osteoporosis|Alzheimer|Anemia|sinusitis|thrombosis|embolism|pharyngitis|bronchitis',
        case=False, na=False)
)

# Step 2: Define function with additional filtering for specific descriptions
def select_condition_averages(df, mask):
    mask = mask.reindex(df.index, fill_value=False)
    filtered_df = df[mask]
    # Filter for specific conditions only
    symptoms = ['Fever (finding)', 'Cough (finding)', 'Dyspnea (finding)']
    filtered_df = filtered_df[filtered_df['DESCRIPTION'].isin(symptoms)]
    
    averages = filtered_df.groupby('DESCRIPTION').agg(
        avg_start_days= ('start_days', 'mean'),
        avg_end_days= ('end_days', 'mean'),
        count=('PATIENT', 'size')
    ).reset_index()
    return averages

# Step 3: Apply function
averages = select_condition_averages(covid_patient_conditions, condition_mask)

# Step 4: View result
averages.head(3)
Out[55]:
DESCRIPTION avg_start_days avg_end_days count
0 Cough (finding) -0.041698 22.529629 59763
1 Dyspnea (finding) -0.041395 20.302278 17514
2 Fever (finding) -0.041540 22.541085 78334
InΒ [56]:
averages = select_condition_averages(
    covid_patient_conditions[
        (covid_patient_conditions['recovered'] == True) & 
        (covid_patient_conditions['icu_admit'] == True)
    ],
    condition_mask
)
averages.head(3)
Out[56]:
DESCRIPTION avg_start_days avg_end_days count
0 Cough (finding) -0.046893 16.293083 853
1 Dyspnea (finding) -0.049356 16.431330 466
2 Fever (finding) -0.048689 16.327715 1068
InΒ [90]:
averages = select_condition_averages(covid_patient_conditions[(covid_patient_conditions['death'] == True) & (covid_patient_conditions['icu_admit'] == True )], condition_mask)
averages.head(3)
Out[90]:
DESCRIPTION avg_start_days avg_end_days count
0 Cough (finding) -0.042017 21.0 1785
1 Dyspnea (finding) -0.036026 21.0 916
2 Fever (finding) -0.046971 21.0 2278
InΒ [30]:
# Deaths of icu patients
np.intersect1d(icu_patient_ids, deceased_ids).shape
Out[30]:
(2498,)

Add an age column to the DataFrame for rows where the patient has died

InΒ [26]:
covid_info.loc[covid_info.death == True, 'age'] = (pd.to_datetime(covid_info.DEATHDATE) - pd.to_datetime(covid_info.BIRTHDATE)).dt.days / 365.25

Populate ages for survivors based on the current date

InΒ [27]:
covid_info.loc[covid_info.recovered == True, 'age'] = (datetime.datetime.now() - pd.to_datetime(covid_info.BIRTHDATE)).dt.days / 365.25

Create an age_range column that places individuals into 10 year age ranges, such as 0 - 10, 10 - 20, etc.

InΒ [28]:
bins = list(range(0, 120, 10))
covid_info['age_range'] = pd.cut(covid_info.age, bins=bins)
InΒ [31]:
covid_info.head(3)
Out[31]:
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit age age_range
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 7.685147 (0, 10]
1 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False 8.747433 (0, 10]
2 5163c501-353c-4a82-b863-a3f1df2d6cf1 True False 2020-03-10 NaN 2004-01-09 F False False 21.308693 (20, 30]

Mortality by Age and SexΒΆ

A plot of deaths grouped by age range and gender.

InΒ [96]:
# Mortality rate
rate = (np.intersect1d(covid_patient_ids, deceased_ids).shape[0]/len(covid_patient_ids))* 100
print(round(rate, 2),'% for all covid-19 patients')
4.13 % for all covid-19 patients
InΒ [32]:
import seaborn as sns

chart = sns.catplot(x="age_range", kind="count", hue="GENDER", data=covid_info[covid_info.death == True])
for axes in chart.axes.flat:
    axes.tick_params(axis='x', rotation=90)
No description has been provided for this image

A table view of the same information from above

InΒ [29]:
covid_info[covid_info.death==True].groupby(['age_range', 'GENDER'], observed=True).count()[['PATIENT']]
Out[29]:
PATIENT
age_range GENDER
(0, 10] F 1
M 1
(10, 20] F 2
M 4
(20, 30] F 16
M 102
(30, 40] F 35
M 61
(40, 50] F 32
M 128
(50, 60] F 200
M 299
(60, 70] F 250
M 425
(70, 80] F 431
M 479
(80, 90] F 289
M 437
(90, 100] F 112
M 117
(100, 110] F 103
M 106

Another table view of the mortality data, this time just grouped by age range

InΒ [30]:
covid_info[covid_info.death == True].groupby(['age_range'], observed=True).count()[['PATIENT']]
Out[30]:
PATIENT
age_range
(0, 10] 2
(10, 20] 6
(20, 30] 118
(30, 40] 96
(40, 50] 160
(50, 60] 499
(60, 70] 675
(70, 80] 910
(80, 90] 726
(90, 100] 229
(100, 110] 209

Build a DataFrame that shows the total count of a supply used on a given day

InΒ [219]:
supplies.head(3)
Out[219]:
DATE PATIENT ENCOUNTER CODE DESCRIPTION QUANTITY
0 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 409534002 Disposable air-purifying respirator (physical ... 2
1 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 713779008 Nitrile examination/treatment glove non-powde... 24
2 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 469673003 Isolation gown single-use (physical object) 12
InΒ [212]:
grouped_supplies = supplies.groupby(['DESCRIPTION', 'DATE']).sum()

Supply UsageΒΆ

Small multiples plot of supply usage over time.

InΒ [213]:
gs = grouped_supplies.reset_index()
gs['DATE'] = pd.to_datetime(gs.DATE)
g = sns.FacetGrid(gs, col="DESCRIPTION", col_wrap=3, sharey=False, height=3, aspect=2)
g = g.map(sns.lineplot, "DATE", "QUANTITY", marker=".")
for axes in g.axes.flat:
    title = axes.get_title()
    if 'glove' in title:
        axes.set_title('Gloves')
    else:    
        axes.set_title(title.replace("DESCRIPTION = ", "").replace(" (physical object)", ""))
    for tick in axes.get_xticklabels():
        tick.set_rotation(90)
No description has been provided for this image

A table showing total supplies used over the entire simulation

Build a DataFrame that has cumulative case counts over time

InΒ [31]:
case_counts = conditions[conditions.CODE == 840539006].groupby('START').count()[['PATIENT']]
case_counts['total'] = case_counts['PATIENT'].cumsum()
case_counts = case_counts.rename(columns={'PATIENT': 'daily'})
case_counts = case_counts.reset_index()
case_counts['START'] = pd.to_datetime(case_counts.START)

Cumulative Case CountΒΆ

Show total cases over time

InΒ [32]:
axes = sns.lineplot(x='START', y='total', data=case_counts)
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

Medication DispensesΒΆ

This table shows medications dispensed to patients with COVID-19 since January 20, 2020.

InΒ [33]:
covid_meds = medications[pd.to_datetime(medications.START) > pd.to_datetime('2020-01-20')]
covid_meds = covid_info.merge(covid_meds, on='PATIENT')
InΒ [34]:
# Use only numeric columns for aggregation
covid_meds.groupby(['DESCRIPTION'])[['DISPENSES']].sum().sort_values('DISPENSES', ascending=False).head(10)
Out[34]:
DISPENSES
DESCRIPTION
Acetaminophen 500 MG Oral Tablet 35509
0.4 ML Enoxaparin sodium 100 MG/ML Prefilled Syringe 34733
1 ML Enoxaparin sodium 150 MG/ML Prefilled Syringe 20332
NDA020503 200 ACTUAT Albuterol 0.09 MG/ACTUAT Metered Dose Inhaler 17163
Hydrochlorothiazide 25 MG Oral Tablet 6832
Simvastatin 10 MG Oral Tablet 5779
insulin human isophane 70 UNT/ML / Regular Insulin Human 30 UNT/ML Injectable Suspension [Humulin] 5392
1 ML Epoetin Alfa 4000 UNT/ML Injection [Epogen] 4817
amLODIPine 5 MG / Hydrochlorothiazide 12.5 MG / Olmesartan medoxomil 20 MG Oral Tablet 4799
Atenolol 50 MG / Chlorthalidone 25 MG Oral Tablet 4550

Hospital Day StatsΒΆ

For patients with COVID-19, calculate the average hospital length of stay as well as total hospital days for all COVID-19 patients. Provide the same information for ICU patients

InΒ [35]:
covid_info.head(3)
Out[35]:
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit age age_range
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 7.685147 (0, 10]
1 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False 8.747433 (0, 10]
2 5163c501-353c-4a82-b863-a3f1df2d6cf1 True False 2020-03-10 NaN 2004-01-09 F False False 21.308693 (20, 30]
InΒ [Β ]:
covid_hosp = covid_info[covid_info['admit'] == True].merge(encounters, on='PATIENT')
covid_hosp = covid_hosp[covid_hosp['REASONCODE'] == 840539006]
covid_hosp['START'] = pd.to_datetime(covid_hosp.START)
covid_hosp['STOP'] = pd.to_datetime(covid_hosp.STOP)
covid_hosp['days'] = (covid_hosp.STOP - covid_hosp.START) / np.timedelta64(1, 'D')
covid_hosp = covid_hosp[['PATIENT', 'icu_admit', 'START', 'STOP', 'days']]
covid_hosp = covid_hosp.rename(columns={'START': 'hospital_start', 'STOP': 'hospital_stop'})
covid_hosp = covid_hosp.drop_duplicates(subset=['PATIENT', 'hospital_start', 'hospital_stop'])
covid_hosp = covid_hosp.sort_values(by='hospital_start')
covid_hosp = covid_hosp.reset_index(drop=True)
covid_hosp.head(3)
Out[Β ]:
PATIENT icu_admit hospital_start hospital_stop days
0 641699c3-03c0-40fc-9f14-38247e0a8cfa False 2020-01-18 20:55:04+00:00 2020-02-02 05:14:04+00:00 14.346528
1 d19da9db-c5a7-468f-aadf-3c7cd5b5a8d2 False 2020-01-20 12:24:28+00:00 2020-02-08 22:59:28+00:00 19.440972
2 a654fdb6-0796-4fb5-9ad8-435e38ed5168 False 2020-01-22 23:57:08+00:00 2020-01-25 00:27:08+00:00 2.020833
InΒ [61]:
# Group by hospital admission and ICU admission
grouped_data = covid_hosp.groupby(['icu_admit']).agg(
    mean_days=('days', 'mean'),
    total_days=('days', 'sum'),
    patient_count=('PATIENT', 'nunique')
).reset_index()

# Rename the columns for better readability
grouped_data = grouped_data.rename(columns={
    'icu_admit': 'Admission'
})
grouped_data['Admission'] = grouped_data['Admission'].map({False: 'Non-ICU Inpatients', True: 'ICU Admission'})

# Calculate the total row
total_row = {
    'Admission': 'Total',
    'mean_days': grouped_data['mean_days'].mean(),
    'total_days': grouped_data['total_days'].sum(),
    'patient_count': grouped_data['patient_count'].sum()
}

# Append the total row to the DataFrame using pd.concat
grouped_total = pd.concat(
    [grouped_data, pd.DataFrame([total_row])],
    ignore_index=True
)

# Display the updated DataFrame
grouped_total
Out[61]:
Admission mean_days total_days patient_count
0 Non-ICU Inpatients 13.817881 214412.061111 14498
1 ICU Admission 5.376755 59305.611806 3677
2 Total 9.597318 273717.672917 18175
InΒ [51]:
device_codes = [448907002, 449071006, 36965003]
grouped_dev = devices[devices.CODE.isin(device_codes)].groupby(['DESCRIPTION', 'START']).count()
grouped_dev = grouped_dev.reset_index()
grouped_dev['START'] = pd.to_datetime(grouped_dev.START)

Device UsageΒΆ

Show the number of devices used to treat COVID-19 over time.

InΒ [52]:
g = sns.FacetGrid(grouped_dev.reset_index(), col="DESCRIPTION", col_wrap=3, sharey=False, height=3, aspect=2)
g = g.map(sns.lineplot, "START", "PATIENT", marker=".")
for axes in g.axes.flat:
    title = axes.get_title()
    axes.set_title(title.replace("DESCRIPTION = ", "").replace(" (physical object)", ""))
    for tick in axes.get_xticklabels():
        tick.set_rotation(90)
No description has been provided for this image

ConclusionΒΆ

In this project, I learned how to:

  • extract data for specific analyses, such as filtering, merging, and recoding data based on fields like 'CODE', 'REASONCODE', and 'DESCRIPTION', for different groups of patients including inpatients, COVID patients, ICU-admitted, and ventilated individuals.