import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime
Chapter 3 - Process Datetime in Healthcare Data Analysis
Introduction
Statistical analysis techniques such as sum, mean, and pivot_table, etc, can be learned from statistics textbooks. However, the ability to derive meaningful insights from data analysis often comes from real-world experience in specific industries, such as healthcare.
In this chapter, I will demonstrate how to work with datetime variables in healthcare data analysis. The focus will be on handling datetime fields in healthcare databases using real-world examples.
Almost fact tables such as ‘conditions’, ‘encounters’, ‘observations’ and more in hospital database are long data table with datetime are the index columns. Patient records are repeated at many points in time forming time series. Datetime series can be irregular without a fixed unit of time depending on the medical events unexpectedly happen in hospitals.
I use covid data from SyntheaMass database to demonstrate how to proccess datetime variables in this chapter.
Load necessary python packages and data files
# Ensure all columns are shown
'display.max_columns', None)
pd.set_option('display.width', None) pd.set_option(
# Load the data
= pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/conditions.csv")
conditions = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/patients.csv")
patients = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/careplans.csv")
care_plans = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/observations.csv")
observations = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")
encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/procedures.csv")
procedures = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/medications.csv") medications
Date and Datetime in tables
‘START’ and ‘STOP’ columns in the tables are datetime variables. They are used to indicate the time when a medical event starts and ends. The ‘START’ column is the datetime when the event begins, while the ‘STOP’ column is the datetime when the event ends. These two columns are timestamps data type, ‘2019-02-16T01:43:20Z’. These columns can be transformed to date type as ‘2019-02-16’.
3) encounters.head(
Id | START | STOP | PATIENT | ORGANIZATION | PROVIDER | PAYER | ENCOUNTERCLASS | CODE | DESCRIPTION | BASE_ENCOUNTER_COST | TOTAL_CLAIM_COST | PAYER_COVERAGE | REASONCODE | REASONDESCRIPTION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | d317dacb-d801-4136-be7d-440094f7ae83 | 2019-02-16T01:43:20Z | 2019-02-16T01:58:20Z | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 5103c940-0c08-392f-95cd-446e0cea042a | 8b532fbe-4254-3a60-a442-33028916d24e | 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a | outpatient | 185345009 | Encounter for symptom | 129.16 | 129.16 | 69.16 | 65363002.0 | Otitis media |
1 | 52051c30-c6c3-45fe-b5da-a790f1680e91 | 2019-08-02T01:43:20Z | 2019-08-02T02:13:20Z | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 0b9f3f7c-8ab6-30a5-b3ae-4dc0e0c00cb3 | 872f1318-45f7-34cc-adc7-26d59fc7c203 | 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a | wellness | 410620009 | Well child visit (procedure) | 129.16 | 129.16 | 129.16 | NaN | NaN |
2 | 759cfd9c-b40b-45df-a5b2-fca418027a6a | 2019-10-31T01:43:20Z | 2019-10-31T01:58:20Z | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 5103c940-0c08-392f-95cd-446e0cea042a | 8b532fbe-4254-3a60-a442-33028916d24e | 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a | outpatient | 185345009 | Encounter for symptom | 129.16 | 129.16 | 69.16 | 65363002.0 | Otitis media |
‘START’ and ‘STOP’ in table ‘conditions’ are ‘date’ type, which is a date without time information. The ‘START’ column indicates the date when the diagnosis begins, while the ‘STOP’ column indicates the date when the disease ends. These columns are in the format ‘YYYY-MM-DD’, such as ‘2019-02-16’.
3) conditions.head(
START | STOP | PATIENT | ENCOUNTER | CODE | DESCRIPTION | |
---|---|---|---|---|---|---|
0 | 2019-02-15 | 2019-08-01 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | d317dacb-d801-4136-be7d-440094f7ae83 | 65363002 | Otitis media |
1 | 2019-10-30 | 2020-01-30 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 759cfd9c-b40b-45df-a5b2-fca418027a6a | 65363002 | Otitis media |
2 | 2020-03-01 | 2020-03-30 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 60584c6a-e26b-4176-a660-86ad84c7ceca | 386661006 | Fever (finding) |
Convert datetime to date
To convert the ‘START’ and ‘STOP’ columns in the ‘encounters’ table from datetime to date, we can use the pd.to_datetime()
function to parse the datetime strings and then use the .dt.date
accessor to extract just the date part.
= encounters.copy()
encounters
# Convert 'START' and 'STOP' columns to datetime
'START'] = pd.to_datetime(encounters['START'])
encounters['STOP'] = pd.to_datetime(encounters['STOP'])
encounters[
# Convert to date
'START2'] = encounters['START'].dt.date
encounters['STOP2'] = encounters['STOP'].dt.date
encounters[
# Display the first few rows to verify the conversion
3) encounters.head(
Id | START | STOP | PATIENT | ORGANIZATION | PROVIDER | PAYER | ENCOUNTERCLASS | CODE | DESCRIPTION | BASE_ENCOUNTER_COST | TOTAL_CLAIM_COST | PAYER_COVERAGE | REASONCODE | REASONDESCRIPTION | START2 | STOP2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | d317dacb-d801-4136-be7d-440094f7ae83 | 2019-02-16 01:43:20+00:00 | 2019-02-16 01:58:20+00:00 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 5103c940-0c08-392f-95cd-446e0cea042a | 8b532fbe-4254-3a60-a442-33028916d24e | 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a | outpatient | 185345009 | Encounter for symptom | 129.16 | 129.16 | 69.16 | 65363002.0 | Otitis media | 2019-02-16 | 2019-02-16 |
1 | 52051c30-c6c3-45fe-b5da-a790f1680e91 | 2019-08-02 01:43:20+00:00 | 2019-08-02 02:13:20+00:00 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 0b9f3f7c-8ab6-30a5-b3ae-4dc0e0c00cb3 | 872f1318-45f7-34cc-adc7-26d59fc7c203 | 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a | wellness | 410620009 | Well child visit (procedure) | 129.16 | 129.16 | 129.16 | NaN | NaN | 2019-08-02 | 2019-08-02 |
2 | 759cfd9c-b40b-45df-a5b2-fca418027a6a | 2019-10-31 01:43:20+00:00 | 2019-10-31 01:58:20+00:00 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 5103c940-0c08-392f-95cd-446e0cea042a | 8b532fbe-4254-3a60-a442-33028916d24e | 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a | outpatient | 185345009 | Encounter for symptom | 129.16 | 129.16 | 69.16 | 65363002.0 | Otitis media | 2019-10-31 | 2019-10-31 |
Set the date when disease is diagnosed for patients
In the ‘conditions’ table, the ‘START’ column indicates the date when a disease is diagnosed for a patient. To set this date, we can simply use the ‘START’ column as it already contains the date information. We can also convert the ‘START’ column to a date type if it is not already in that format.
# Filter COVID-19 rows
= conditions[conditions['DESCRIPTION'] == 'COVID-19'].copy()
covid_conds
# Convert START to datetime
'START'] = pd.to_datetime(conditions['START'])
covid_conds[
# Group by PATIENT and get the earliest COVID date
= covid_conds.groupby('PATIENT')['START'].min().reset_index()
covid_dates ={'START': 'covid_date'}, inplace=True)
covid_dates.rename(columns
# Merge back to the original conditions table
= conditions.merge(covid_dates, on='PATIENT', how='left')
covid_conditions
# Preview result
7) covid_conditions.head(
START | STOP | PATIENT | ENCOUNTER | CODE | DESCRIPTION | covid_date | |
---|---|---|---|---|---|---|---|
0 | 2019-02-15 | 2019-08-01 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | d317dacb-d801-4136-be7d-440094f7ae83 | 65363002 | Otitis media | 2020-03-01 |
1 | 2019-10-30 | 2020-01-30 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 759cfd9c-b40b-45df-a5b2-fca418027a6a | 65363002 | Otitis media | 2020-03-01 |
2 | 2020-03-01 | 2020-03-30 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 60584c6a-e26b-4176-a660-86ad84c7ceca | 386661006 | Fever (finding) | 2020-03-01 |
3 | 2020-03-01 | 2020-03-01 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 60584c6a-e26b-4176-a660-86ad84c7ceca | 840544004 | Suspected COVID-19 | 2020-03-01 |
4 | 2020-03-01 | 2020-03-30 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 60584c6a-e26b-4176-a660-86ad84c7ceca | 840539006 | COVID-19 | 2020-03-01 |
5 | 2020-02-12 | 2020-02-26 | 9bcf6ed5-d808-44af-98a0-7d78a29ede72 | 5d9dfe80-7edf-4181-9753-770e361934d1 | 44465007 | Sprain of ankle | 2020-03-13 |
6 | 2020-03-13 | 2020-04-14 | 9bcf6ed5-d808-44af-98a0-7d78a29ede72 | a78e78d9-33bb-40bc-9e42-e47ff7e910cc | 49727002 | Cough (finding) | 2020-03-13 |
Convert a specific date to datetime
import pytz
'DATE'] = pd.to_datetime(observations['DATE'], errors='coerce')
observations[
# Use a UTC-aware timestamp for comparison
= pd.to_datetime("2020-02-19")
cutoff
# Filter
= observations[observations['DATE'] >= cutoff]
observations 3) observations.head(
DATE | PATIENT | ENCOUNTER | CODE | DESCRIPTION | VALUE | UNITS | TYPE | |
---|---|---|---|---|---|---|---|---|
22 | 2020-03-01 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 60584c6a-e26b-4176-a660-86ad84c7ceca | 8310-5 | Body temperature | 39.7 | Cel | numeric |
23 | 2020-03-01 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 60584c6a-e26b-4176-a660-86ad84c7ceca | 9279-1 | Respiratory rate | 32.9 | /min | numeric |
24 | 2020-03-01 | 1ff7f10f-a204-4bb1-aa72-dd763fa99482 | 60584c6a-e26b-4176-a660-86ad84c7ceca | 8867-4 | Heart rate | 164.2 | /min | numeric |
Identify MIN, MAX date in the ‘START’ column of the ‘encounters’ table. This will give us the earliest and latest dates in the dataset.
# Make a copy to avoid modifying the original DataFrame
= encounters.copy()
encounters
# Ensure DATE column is datetime type
'START'] = pd.to_datetime(encounters['START'], errors='coerce')
encounters[
# Print the min and max date
print(encounters['START'].dt.date.min())
print(encounters['START'].dt.date.max())
1909-08-15
2020-06-02
Create a variable of date difference
to calculate the number of days since diagnosed with COVID for each patient. This can be done by subtracting the ‘START’ date by ‘covid_date’.
= conditions[conditions.CODE == 840539006].PATIENT.unique()
covid_patient_ids
# Further filter by COVID patient IDs
= observations[observations['PATIENT'].isin(covid_patient_ids)]
lab_obs
# Filter observations by specific CODE values
= observations[
lab_obs 'CODE'].isin(['48065-7', '26881-3', '2276-4', '89579-7', '731-0', '14804-9'])
observations[
]
# Ensure the DATE column is in datetime format
'DATE'] = pd.to_datetime(lab_obs['DATE'], errors='coerce')
lab_obs[
# Filter rows after 2020-01-20 for covid-related data
= lab_obs[lab_obs['DATE'] > datetime(2020, 1, 20)].copy()
lab_obs
= lab_obs.merge(covid_dates, on = 'PATIENT').copy()
lab_obs
# Calculate the number of days since the COVID diagnosis
'days'] = (pd.to_datetime(lab_obs['DATE']) - pd.to_datetime(lab_obs['covid_date'])).dt.days.copy()
lab_obs[
# Preview first 3 rows
3) lab_obs.head(
/var/folders/cx/3wbhcqyd3cld6gvk_xjkvr_40000gn/T/ipykernel_2575/3494572798.py:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
DATE | PATIENT | ENCOUNTER | CODE | DESCRIPTION | VALUE | UNITS | TYPE | covid_date | days | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2020-02-19 | bd1c4ffc-7f1d-4590-adbb-1d6533fb623e | b7455838-3607-47f4-aaa5-fd89abea7d29 | 731-0 | Lymphocytes [#/volume] in Blood by Automated c... | 1.1 | 10*3/uL | numeric | 2020-02-19 | 0 |
1 | 2020-02-19 | bd1c4ffc-7f1d-4590-adbb-1d6533fb623e | b7455838-3607-47f4-aaa5-fd89abea7d29 | 48065-7 | Fibrin D-dimer FEU [Mass/volume] in Platelet p... | 0.4 | ug/mL | numeric | 2020-02-19 | 0 |
2 | 2020-02-19 | bd1c4ffc-7f1d-4590-adbb-1d6533fb623e | b7455838-3607-47f4-aaa5-fd89abea7d29 | 2276-4 | Ferritin [Mass/volume] in Serum or Plasma | 332.4 | ug/L | numeric | 2020-02-19 | 0 |
'days'].describe() lab_obs[
count 704747.000000
mean 6.129017
std 4.249655
min 0.000000
25% 2.000000
50% 6.000000
75% 9.000000
max 26.000000
Name: days, dtype: float64
Create a time-point variable
# Assign timepoints based on number of days
def assign_timepoint(days):
if days == 0:
return 'baseline'
elif 6 <= days <= 8:
return 'week_1'
else:
return None
# Apply the timepoint assignment
'TIMEPOINT'] = lab_obs['days'].apply(assign_timepoint)
lab_obs[
# Drop rows without a valid timepoint
= lab_obs.dropna(subset=['TIMEPOINT']).copy()
lab_obs
'VALUE'] = pd.to_numeric(lab_obs['VALUE'], errors='coerce')
lab_obs[
# Aggregate the data
= lab_obs.pivot_table(
summary ="DESCRIPTION",
index="TIMEPOINT",
columns="VALUE",
values="mean"
aggfuncround(2)
).
summary
TIMEPOINT | baseline | week_1 |
---|---|---|
DESCRIPTION | ||
Ferritin [Mass/volume] in Serum or Plasma | 516.66 | 720.05 |
Fibrin D-dimer FEU [Mass/volume] in Platelet poor plasma | 0.53 | 1.38 |
Interleukin 6 [Mass/volume] in Serum or Plasma | NaN | 7.35 |
Lactate dehydrogenase [Enzymatic activity/volume] in Serum or Plasma by Lactate to pyruvate reaction | 245.22 | 256.27 |
Lymphocytes [#/volume] in Blood by Automated count | 0.96 | 0.85 |
Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method | 3.57 | 7.13 |
Lag comparison - Create a variable of readmission to hospital
Patients who are readmitted to the hospital within 30 days of their last discharge. This can be done by checking if the difference between the ‘START’ date of the current encounter and the ‘STOP’ date of the previous encounter is less than or equal to 30 days. The purpose is to find how many patients are readmitted to the hospital within 30 days of their last discharge. This is a common metric in healthcare to assess the quality of care and patient outcomes.
#encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")
= encounters[encounters.ENCOUNTERCLASS == 'inpatient'].copy()
inpatients inpatients.head()
Id | START | STOP | PATIENT | ORGANIZATION | PROVIDER | PAYER | ENCOUNTERCLASS | CODE | DESCRIPTION | BASE_ENCOUNTER_COST | TOTAL_CLAIM_COST | PAYER_COVERAGE | REASONCODE | REASONDESCRIPTION | START2 | STOP2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
23 | b7455838-3607-47f4-aaa5-fd89abea7d29 | 2020-02-19 06:11:02+00:00 | 2020-03-05 15:37:02+00:00 | bd1c4ffc-7f1d-4590-adbb-1d6533fb623e | fbf6180e-b800-3ebe-b91d-93d0288c400e | 7502b497-e32c-32d3-a0b8-f18f540a8a45 | 4d71f845-a6a9-3c39-b242-14d25ef86a8d | inpatient | 1505002 | Hospital admission for isolation (procedure) | 129.16 | 129.16 | 64.16 | 840539006.0 | COVID-19 | 2020-02-19 | 2020-03-05 |
49 | 40336433-df4f-4f5e-9827-258890c2dd40 | 2019-10-26 23:01:01+00:00 | 2019-10-27 23:16:01+00:00 | 09303bc9-3bcc-49fe-b57b-005471833c43 | b0e04623-b02c-3f8b-92ea-943fc4db60da | ae9cd143-8fee-3acc-ab3c-ab3861be85d7 | 047f6ec3-6215-35eb-9608-f9dda363a44c | inpatient | 305408004 | Admission to surgical department | 129.16 | 129.16 | 69.16 | NaN | NaN | 2019-10-26 | 2019-10-27 |
95 | 71089860-4c03-4f9a-a67d-bb2e12df3365 | 2020-05-13 16:53:45+00:00 | 2020-05-14 17:08:45+00:00 | da454f92-9e90-43f8-9514-550fbaa6de3d | db0acede-4abe-3c01-8d03-5c68a190d8c7 | 9cb2439a-df04-3e7b-88b1-6f686d68f590 | d47b3510-2895-3b70-9897-342d681c769d | inpatient | 305411003 | Admission to thoracic surgery department | 129.16 | 129.16 | 49.16 | 87433001.0 | Pulmonary emphysema (disorder) | 2020-05-13 | 2020-05-14 |
105 | 3af37864-e57e-4c9b-91bc-3313aa3fe518 | 2010-12-17 19:21:23+00:00 | 2010-12-18 19:21:23+00:00 | 29581d13-2687-402f-9c4c-06141a319e61 | 465de31f-3098-365c-af70-48a071e1f5aa | 84a692b5-42c9-39cf-9ffa-9a8304e00181 | 6e2f1a2d-27bd-3701-8d08-dae202c58632 | inpatient | 185345009 | Encounter for symptom (procedure) | 129.16 | 129.16 | 0.00 | NaN | NaN | 2010-12-17 | 2010-12-18 |
107 | b6c55c14-95c5-4264-9b24-4e9eaf69f64e | 2010-12-23 19:21:23+00:00 | 2010-12-24 22:17:23+00:00 | 29581d13-2687-402f-9c4c-06141a319e61 | 465de31f-3098-365c-af70-48a071e1f5aa | 84a692b5-42c9-39cf-9ffa-9a8304e00181 | 6e2f1a2d-27bd-3701-8d08-dae202c58632 | inpatient | 185347001 | Encounter for problem | 129.16 | 129.16 | 54.16 | 254837009.0 | Malignant neoplasm of breast (disorder) | 2010-12-23 | 2010-12-24 |
# Filter for inpatients and explicitly make a copy
= encounters[encounters.ENCOUNTERCLASS == 'inpatient'].copy()
inpatients
# Convert date columns
'START'] = pd.to_datetime(inpatients['START'])
inpatients['STOP'] = pd.to_datetime(inpatients['STOP'])
inpatients[
# Sort by PATIENT and START date
= inpatients.sort_values(['PATIENT', 'START'])
inpatients
# Get the previous STOP date per patient
'PREV_STOP'] = inpatients.groupby('PATIENT')['STOP'].shift(1)
inpatients[
# Calculate the gap in days since the last discharge
'DAYS_SINCE_LAST_DISCHARGE'] = (inpatients['START'] - inpatients['PREV_STOP']).dt.days
inpatients[
# Identify readmissions within 30 days
'READMITTED_WITHIN_30_DAYS'] = (
inpatients['DAYS_SINCE_LAST_DISCHARGE'] > 0) &
(inpatients['DAYS_SINCE_LAST_DISCHARGE'] <= 30)
(inpatients[
)
# Display relevant columns
print(inpatients[['PATIENT', 'START', 'PREV_STOP', 'DAYS_SINCE_LAST_DISCHARGE', 'READMITTED_WITHIN_30_DAYS']])
# Calculate patient-level readmission rate
= inpatients.groupby('PATIENT')['READMITTED_WITHIN_30_DAYS'].max().mean()
readmission_rate print(f"Readmission rate: {readmission_rate:.2%}")
PATIENT START \
2245164 0001049f-9248-47fe-b479-ea80eb51ce4a 2020-03-11 16:17:49+00:00
267214 00051dc6-38dc-4781-a6db-b69479990236 2020-03-05 07:11:26+00:00
480342 0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-03-05 13:54:35+00:00
480343 0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-04-25 13:54:35+00:00
480344 0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-11-03 13:54:35+00:00
... ... ...
102046 fffca037-ba70-414e-9930-bc3c6a9023c3 2020-02-19 07:28:42+00:00
332980 fffde06d-4cf8-462a-9429-b3946224c517 2020-03-04 15:21:25+00:00
332981 fffde06d-4cf8-462a-9429-b3946224c517 2020-03-12 15:21:25+00:00
269955 ffff888c-383e-428d-8f53-9aaf40104f9c 2020-02-14 09:41:03+00:00
1938371 ffffadc8-7106-4b51-b003-1307776e9790 2020-03-07 15:05:29+00:00
PREV_STOP DAYS_SINCE_LAST_DISCHARGE \
2245164 NaT NaN
267214 NaT NaN
480342 NaT NaN
480343 1990-03-06 14:09:35+00:00 49.0
480344 1990-04-26 13:54:35+00:00 191.0
... ... ...
102046 NaT NaN
332980 NaT NaN
332981 2020-03-12 20:13:25+00:00 -1.0
269955 NaT NaN
1938371 NaT NaN
READMITTED_WITHIN_30_DAYS
2245164 False
267214 False
480342 False
480343 False
480344 False
... ...
102046 False
332980 False
332981 False
269955 False
1938371 False
[95273 rows x 5 columns]
Readmission rate: 5.75%
Note that the readmission rate can be calculated at two levels: patient-level and encounter-level. The patient-level readmission rate considers whether a patient has been readmitted within 30 days of their last discharge, while the encounter-level readmission rate considers each individual encounter.
# Patient-level readmission rate
= inpatients.groupby('PATIENT')['READMITTED_WITHIN_30_DAYS'].max().mean()
patient_readmission_rate
# Encounter-level readmission rate
= inpatients['READMITTED_WITHIN_30_DAYS'].mean()
encounter_readmission_rate
print(f"Patient-level readmission rate: {patient_readmission_rate:.2%}")
print(f"Encounter-level readmission rate: {encounter_readmission_rate:.2%}")
Patient-level readmission rate: 5.75%
Encounter-level readmission rate: 16.98%
Look at data of readmitted patients to see how many patients are readmitted within 30 days of their last discharge.
# Filter only true readmissions
= inpatients[inpatients['READMITTED_WITHIN_30_DAYS']]
readmitted_patients
print(readmitted_patients[['PATIENT', 'START', 'PREV_STOP', 'DAYS_SINCE_LAST_DISCHARGE', 'READMITTED_WITHIN_30_DAYS']])
len(readmitted_patients['PATIENT'].unique())
PATIENT START \
898016 00465335-b351-40aa-a37c-4fd851b77b50 1993-03-13 02:18:56+00:00
898019 00465335-b351-40aa-a37c-4fd851b77b50 1993-04-13 02:18:56+00:00
898032 00465335-b351-40aa-a37c-4fd851b77b50 1993-07-17 02:18:56+00:00
898044 00465335-b351-40aa-a37c-4fd851b77b50 1993-12-29 02:18:56+00:00
898046 00465335-b351-40aa-a37c-4fd851b77b50 1994-01-29 02:18:56+00:00
... ... ...
1326164 fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-08-11 12:53:54+00:00
1326165 fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-09-04 12:53:54+00:00
1326166 fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-09-28 12:53:54+00:00
1326167 fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-10-21 12:53:54+00:00
1326168 fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-11-13 12:53:54+00:00
PREV_STOP DAYS_SINCE_LAST_DISCHARGE \
898016 1993-02-14 06:18:56+00:00 26.0
898019 1993-03-14 04:18:56+00:00 29.0
898032 1993-06-19 06:18:56+00:00 27.0
898044 1993-11-30 04:18:56+00:00 28.0
898046 1993-12-30 06:18:56+00:00 29.0
... ... ...
1326164 2015-07-21 12:53:54+00:00 21.0
1326165 2015-08-12 14:53:54+00:00 22.0
1326166 2015-09-05 13:53:54+00:00 22.0
1326167 2015-09-29 12:53:54+00:00 22.0
1326168 2015-10-22 15:53:54+00:00 21.0
READMITTED_WITHIN_30_DAYS
898016 True
898019 True
898032 True
898044 True
898046 True
... ...
1326164 True
1326165 True
1326166 True
1326167 True
1326168 True
[16181 rows x 5 columns]
2304
Group by month to count readmitted patients.
# Ensure START is datetime
'START'] = pd.to_datetime(inpatients['START'])
inpatients[
# Extract year and month (e.g., '2025-06') for grouping
'MONTH'] = inpatients['START'].dt.tz_localize(None).dt.to_period('M')
inpatients[
# Filter for readmitted cases
= inpatients[inpatients['READMITTED_WITHIN_30_DAYS'] == True]
readmitted
# Group by MONTH and count
= readmitted.groupby('MONTH').size().reset_index(name='readmitted_count')
readmitted_per_month
print(readmitted_per_month)
MONTH readmitted_count
0 1919-07 1
1 1919-08 1
2 1919-09 2
3 1919-10 2
4 1919-11 1
.. ... ...
734 2020-01 119
735 2020-02 148
736 2020-03 318
737 2020-04 164
738 2020-05 85
[739 rows x 2 columns]