import pandas as pd
import numpy as np
Chapter 4 - Define Specific Patient Groups in Hospital Database
Introduction
When performing data analysis in healthcare, we often need to define specific groups of patients, such as outpatients, inpatients, ICU-admitted patients (those admitted to intensive care units), and ventilated patients (those on mechanical ventilation).
Therefore, it is important to know how to accurately define these patient groups. Here, I show some methods to do that.
Load necessary python packages and data
# Ensure all columns are shown
'display.max_columns', None)
pd.set_option('display.width', None) pd.set_option(
= pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/conditions.csv")
conditions = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/patients.csv")
patients = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/careplans.csv")
care_plans = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/observations.csv")
observations = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")
encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/procedures.csv")
procedures = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/medications.csv") medications
Common patient groups
Inpatient, outpatient
These patients are defined in ‘encounters’.ENCOUNTERCLASS
'ENCOUNTERCLASS'].value_counts() encounters[
ENCOUNTERCLASS
wellness 1569498
ambulatory 835566
outpatient 574583
inpatient 95273
emergency 73484
urgentcare 40271
Name: count, dtype: int64
‘wellness’: Preventive care visits, such as annual physicals, vaccinations, or routine screenings. These are typically scheduled in advance and not related to any acute illness.
‘ambulatory’: Non-urgent care in outpatient settings where patients are not admitted to a hospital. Often includes visits to specialists or primary care physicians.
‘outpatient’: Medical services provided without a hospital stay, such as minor procedures, diagnostic tests, or consultations. Sometimes overlaps with ambulatory.
‘inpatient’: Hospital stays where the patient is formally admitted for one or more nights, often for surgeries, serious illnesses, or intensive treatment.
‘emergency’: Visits to the emergency department (ER) for urgent or life-threatening conditions. Often unplanned.
‘urgentcare’: Walk-in care for conditions that need prompt attention but are not emergencies, like minor injuries or infections. Less severe than ER visits.
To find inpatients, we can filter encounters where ENCOUNTERCLASS is ‘inpatient’.
= encounters[encounters['ENCOUNTERCLASS'] == 'inpatient'].PATIENT.unique()
inpatient_ids = patients[patients.Id.isin(inpatient_ids)]
inpatients len(inpatient_ids)
3) inpatients.head(
Id | BIRTHDATE | DEATHDATE | SSN | DRIVERS | PASSPORT | PREFIX | FIRST | LAST | SUFFIX | MAIDEN | MARITAL | RACE | ETHNICITY | GENDER | BIRTHPLACE | ADDRESS | CITY | STATE | COUNTY | ZIP | LAT | LON | HEALTHCARE_EXPENSES | HEALTHCARE_COVERAGE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | bd1c4ffc-7f1d-4590-adbb-1d6533fb623e | 2019-06-12 | NaN | 999-81-4349 | NaN | NaN | NaN | Karyn217 | Mueller846 | NaN | NaN | NaN | white | nonhispanic | F | New Bedford Massachusetts US | 570 Abshire Forge Suite 32 | Colrain | Massachusetts | Franklin County | NaN | 42.740220 | -72.722648 | 24130.00 | 903.28 |
10 | 09303bc9-3bcc-49fe-b57b-005471833c43 | 1978-06-24 | NaN | 999-85-4926 | S99970861 | X9925405X | Mrs. | Mariana775 | Gulgowski816 | NaN | Williamson769 | M | white | nonhispanic | F | Yarmouth Massachusetts US | 999 Kuhn Forge | Lowell | Massachusetts | Middlesex County | 1851.0 | 42.636143 | -71.343255 | 883040.92 | 5566.08 |
13 | da454f92-9e90-43f8-9514-550fbaa6de3d | 1982-09-29 | NaN | 999-21-5604 | S99921528 | X87641071X | Ms. | Christal240 | Brown30 | NaN | NaN | S | white | nonhispanic | F | Bellingham Massachusetts US | 1060 Hansen Overpass Suite 86 | Boston | Massachusetts | Suffolk County | 2118.0 | 42.284598 | -71.134497 | 761931.85 | 5849.02 |
ICU-admitted patients
To find patients who were admitted to the ICU, we can search for the encounter with the description “Admission to intensive care unit (procedure)”.
ICU-admitted patients belong to the inpatient group. So I search for DESCRIPTION, more detailed for the ‘encounters’[ENCOUNTERCLASS== ‘inpatient’], the reason for hospitalization.
= encounters[encounters.DESCRIPTION == 'Admission to intensive care unit (procedure)'].PATIENT.unique()
icu_patient_ids = patients[patients.Id.isin(icu_patient_ids)]
icu_patients print(len(icu_patient_ids))
3) icu_patients.head(
3677
Id | BIRTHDATE | DEATHDATE | SSN | DRIVERS | PASSPORT | PREFIX | FIRST | LAST | SUFFIX | MAIDEN | MARITAL | RACE | ETHNICITY | GENDER | BIRTHPLACE | ADDRESS | CITY | STATE | COUNTY | ZIP | LAT | LON | HEALTHCARE_EXPENSES | HEALTHCARE_COVERAGE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
39 | a89e2d41-db41-4bf2-8e7d-f315743219dd | 1959-06-17 | 2020-03-11 | 999-59-5266 | S99980358 | X6040407X | Mr. | Emerson869 | Hagenes547 | NaN | NaN | M | white | nonhispanic | M | Worcester Massachusetts US | 436 Mills Wynd Apt 14 | Saugus | Massachusetts | Essex County | NaN | 42.471678 | -71.004501 | 1624622.98 | 5812.88 |
43 | a2a8e809-cc34-4f8e-93b8-74e4a87f35ef | 1969-03-22 | NaN | 999-12-7459 | S99953931 | X68702974X | Mr. | Stanford577 | Goodwin327 | NaN | NaN | M | white | nonhispanic | M | Arlington Massachusetts US | 314 Torp Byway | Somerville | Massachusetts | Middlesex County | 2140.0 | 42.440865 | -71.086388 | 1113259.27 | 6304.04 |
68 | 87537cb1-92e1-4a11-8dd7-e62631a3d8a8 | 1940-01-09 | 2020-03-15 | 999-49-7659 | S99985960 | X58311625X | Mr. | Sammie902 | Greenholt190 | NaN | NaN | M | white | nonhispanic | M | Chelsea Massachusetts US | 807 Bauch Viaduct Suite 72 | Plymouth | Massachusetts | Plymouth County | 2360.0 | 41.928964 | -70.665402 | 1406880.47 | 20322.88 |
Ventilated patients
To find ventilated patients, search for the procedure with code 26763009, which is “Mechanical ventilation (procedure)”.
= procedures[procedures.CODE == 26763009].PATIENT.unique()
vent_ids = patients[patients.Id.isin(vent_ids)]
vented_patients print(len(vent_ids))
3) vented_patients.head(
2914
Id | BIRTHDATE | DEATHDATE | SSN | DRIVERS | PASSPORT | PREFIX | FIRST | LAST | SUFFIX | MAIDEN | MARITAL | RACE | ETHNICITY | GENDER | BIRTHPLACE | ADDRESS | CITY | STATE | COUNTY | ZIP | LAT | LON | HEALTHCARE_EXPENSES | HEALTHCARE_COVERAGE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
39 | a89e2d41-db41-4bf2-8e7d-f315743219dd | 1959-06-17 | 2020-03-11 | 999-59-5266 | S99980358 | X6040407X | Mr. | Emerson869 | Hagenes547 | NaN | NaN | M | white | nonhispanic | M | Worcester Massachusetts US | 436 Mills Wynd Apt 14 | Saugus | Massachusetts | Essex County | NaN | 42.471678 | -71.004501 | 1624622.98 | 5812.88 |
68 | 87537cb1-92e1-4a11-8dd7-e62631a3d8a8 | 1940-01-09 | 2020-03-15 | 999-49-7659 | S99985960 | X58311625X | Mr. | Sammie902 | Greenholt190 | NaN | NaN | M | white | nonhispanic | M | Chelsea Massachusetts US | 807 Bauch Viaduct Suite 72 | Plymouth | Massachusetts | Plymouth County | 2360.0 | 41.928964 | -70.665402 | 1406880.47 | 20322.88 |
82 | 51d777f5-60fd-42fd-9722-63407a3db63f | 1940-01-09 | 2020-03-19 | 999-11-6530 | S99998232 | X12665812X | Mr. | Brad867 | Greenholt190 | NaN | NaN | M | white | nonhispanic | M | Barnstable Massachusetts US | 458 Nader Passage Apt 46 | Plymouth | Massachusetts | Plymouth County | NaN | 41.848351 | -70.594020 | 1380713.48 | 7207.60 |
Death patients
To find patients who died, we can search for the patients with DEATHDATE exists.
= patients[patients.DEATHDATE.notna()].Id.unique()
deceased_ids = patients[patients.Id.isin(deceased_ids)]
deceased_patients print(len(deceased_ids))
3) deceased_patients.head(
24150
Id | BIRTHDATE | DEATHDATE | SSN | DRIVERS | PASSPORT | PREFIX | FIRST | LAST | SUFFIX | MAIDEN | MARITAL | RACE | ETHNICITY | GENDER | BIRTHPLACE | ADDRESS | CITY | STATE | COUNTY | ZIP | LAT | LON | HEALTHCARE_EXPENSES | HEALTHCARE_COVERAGE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
39 | a89e2d41-db41-4bf2-8e7d-f315743219dd | 1959-06-17 | 2020-03-11 | 999-59-5266 | S99980358 | X6040407X | Mr. | Emerson869 | Hagenes547 | NaN | NaN | M | white | nonhispanic | M | Worcester Massachusetts US | 436 Mills Wynd Apt 14 | Saugus | Massachusetts | Essex County | NaN | 42.471678 | -71.004501 | 1624622.98 | 5812.88 |
65 | 87b26350-3798-4289-a2ee-f6d88145537f | 1940-05-03 | 2010-04-20 | 999-30-7132 | S99937049 | X73545664X | Mrs. | Tia76 | Spinka232 | NaN | Crona259 | M | white | nonhispanic | F | Hopedale Massachusetts US | 537 Steuber Heights Unit 8 | Middleborough | Massachusetts | Plymouth County | NaN | 41.855252 | -70.847341 | 1433642.99 | 20767.46 |
68 | 87537cb1-92e1-4a11-8dd7-e62631a3d8a8 | 1940-01-09 | 2020-03-15 | 999-49-7659 | S99985960 | X58311625X | Mr. | Sammie902 | Greenholt190 | NaN | NaN | M | white | nonhispanic | M | Chelsea Massachusetts US | 807 Bauch Viaduct Suite 72 | Plymouth | Massachusetts | Plymouth County | 2360.0 | 41.928964 | -70.665402 | 1406880.47 | 20322.88 |
Patients with specific conditions such as covid19
To find patients with specific conditions like COVID-19, we can filter the ‘conditions’ table for the relevant condition code.
= conditions[conditions.CODE == 840539006].PATIENT.unique()
covid_patient_ids = patients[patients.Id.isin(covid_patient_ids)]
covid_patients print(len(covid_patient_ids))
3) covid_patients.head(
88166
Id | BIRTHDATE | DEATHDATE | SSN | DRIVERS | PASSPORT | PREFIX | FIRST | LAST | SUFFIX | MAIDEN | MARITAL | RACE | ETHNICITY | GENDER | BIRTHPLACE | ADDRESS | CITY | STATE | COUNTY | ZIP | LAT | LON | HEALTHCARE_EXPENSES | HEALTHCARE_COVERAGE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 2017-08-24 | NaN | 999-68-6630 | NaN | NaN | NaN | Jacinto644 | Kris249 | NaN | NaN | NaN | white | nonhispanic | M | Beverly Massachusetts US | 888 Hickle Ferry Suite 38 | Springfield | Massachusetts | Hampden County | 1106.0 | 42.151961 | -72.598959 | 8446.49 | 1499.08 |
1 | 9bcf6ed5-d808-44af-98a0-7d78a29ede72 | 2016-08-01 | NaN | 999-15-5895 | NaN | NaN | NaN | Alva958 | Krajcik437 | NaN | NaN | NaN | white | nonhispanic | F | Boston Massachusetts US | 1048 Skiles Trailer | Walpole | Massachusetts | Norfolk County | 2081.0 | 42.177370 | -71.281353 | 94568.40 | 1870.72 |
2 | 5163c501-353c-4a82-b863-a3f1df2d6cf1 | 2004-01-09 | NaN | 999-73-2461 | S99999063 | NaN | NaN | Jimmie93 | Harris789 | NaN | NaN | NaN | white | nonhispanic | F | Worcester Massachusetts US | 201 Mitchell Lodge Unit 67 | Pembroke | Massachusetts | Plymouth County | NaN | 42.075292 | -70.757035 | 375754.62 | 3131.44 |
Patients completed home isolation
To find patients who completed home isolation, we can filter the ‘care_plans’ DataFrame for the relevant code.
= care_plans[(care_plans.CODE == 736376001) & (care_plans.STOP.notna()) &
completed_isolation_patient_ids == 840539006)].PATIENT.unique()
(care_plans.REASONCODE = patients[patients.Id.isin(completed_isolation_patient_ids)]
completed_isolation_patients print(len(completed_isolation_patient_ids))
3) completed_isolation_patients.head(
70398
Id | BIRTHDATE | DEATHDATE | SSN | DRIVERS | PASSPORT | PREFIX | FIRST | LAST | SUFFIX | MAIDEN | MARITAL | RACE | ETHNICITY | GENDER | BIRTHPLACE | ADDRESS | CITY | STATE | COUNTY | ZIP | LAT | LON | HEALTHCARE_EXPENSES | HEALTHCARE_COVERAGE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 2017-08-24 | NaN | 999-68-6630 | NaN | NaN | NaN | Jacinto644 | Kris249 | NaN | NaN | NaN | white | nonhispanic | M | Beverly Massachusetts US | 888 Hickle Ferry Suite 38 | Springfield | Massachusetts | Hampden County | 1106.0 | 42.151961 | -72.598959 | 8446.49 | 1499.08 |
1 | 9bcf6ed5-d808-44af-98a0-7d78a29ede72 | 2016-08-01 | NaN | 999-15-5895 | NaN | NaN | NaN | Alva958 | Krajcik437 | NaN | NaN | NaN | white | nonhispanic | F | Boston Massachusetts US | 1048 Skiles Trailer | Walpole | Massachusetts | Norfolk County | 2081.0 | 42.177370 | -71.281353 | 94568.40 | 1870.72 |
2 | 5163c501-353c-4a82-b863-a3f1df2d6cf1 | 2004-01-09 | NaN | 999-73-2461 | S99999063 | NaN | NaN | Jimmie93 | Harris789 | NaN | NaN | NaN | white | nonhispanic | F | Worcester Massachusetts US | 201 Mitchell Lodge Unit 67 | Pembroke | Massachusetts | Plymouth County | NaN | 42.075292 | -70.757035 | 375754.62 | 3131.44 |
Patient with negative test results
= observations[(observations.CODE == '94531-1') &
negative_covid_patient_ids == 'Not detected (qualifier value)')].PATIENT.unique()
(observations.VALUE len(negative_covid_patient_ids)
17524
Patients who survived covid19
= np.union1d(completed_isolation_patient_ids, negative_covid_patient_ids)
survivor_ids len(survivor_ids)
87485
Conclusion
In this chapter, we have defined various patient groups based on their encounter types, conditions, and care plans. This allows us to analyze specific populations within the healthcare dataset, such as inpatients, ICU-admitted patients, ventilated patients, deceased patients, and those with specific conditions like COVID-19. Understanding these groups is crucial for targeted healthcare analysis and interventions.