Chapter 3 - Process Datetime in Healthcare Data Analysis

Author

Thieu Nguyen

Introduction

Statistical analysis techniques such as sum, mean, and pivot_table, etc, can be learned from statistics textbooks. However, the ability to derive meaningful insights from data analysis often comes from real-world experience in specific industries, such as healthcare.

In this chapter, I will demonstrate how to work with datetime variables in healthcare data analysis. The focus will be on handling datetime fields in healthcare databases using real-world examples.

Almost fact tables such as ‘conditions’, ‘encounters’, ‘observations’ and more in hospital database are long data table with datetime are the index columns. Patient records are repeated at many points in time forming time series. Datetime series can be irregular without a fixed unit of time depending on the medical events unexpectedly happen in hospitals.

I use covid data from SyntheaMass database to demonstrate how to proccess datetime variables in this chapter.

Load necessary python packages and data files

import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime

# Ensure all columns are shown
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# Load the data
conditions = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/conditions.csv")
patients = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/patients.csv")
care_plans = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/careplans.csv")
observations = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/observations.csv")
encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")
procedures = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/procedures.csv")
medications = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/medications.csv")

Date and Datetime in tables

‘START’ and ‘STOP’ columns in the tables are datetime variables. They are used to indicate the time when a medical event starts and ends. The ‘START’ column is the datetime when the event begins, while the ‘STOP’ column is the datetime when the event ends. These two columns are timestamps data type, ‘2019-02-16T01:43:20Z’. These columns can be transformed to date type as ‘2019-02-16’.

encounters.head(3)

	Id	START	STOP	PATIENT	ORGANIZATION	PROVIDER	PAYER	ENCOUNTERCLASS	CODE	DESCRIPTION	BASE_ENCOUNTER_COST	TOTAL_CLAIM_COST	PAYER_COVERAGE	REASONCODE	REASONDESCRIPTION
0	d317dacb-d801-4136-be7d-440094f7ae83	2019-02-16T01:43:20Z	2019-02-16T01:58:20Z	1ff7f10f-a204-4bb1-aa72-dd763fa99482	5103c940-0c08-392f-95cd-446e0cea042a	8b532fbe-4254-3a60-a442-33028916d24e	7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a	outpatient	185345009	Encounter for symptom	129.16	129.16	69.16	65363002.0	Otitis media
1	52051c30-c6c3-45fe-b5da-a790f1680e91	2019-08-02T01:43:20Z	2019-08-02T02:13:20Z	1ff7f10f-a204-4bb1-aa72-dd763fa99482	0b9f3f7c-8ab6-30a5-b3ae-4dc0e0c00cb3	872f1318-45f7-34cc-adc7-26d59fc7c203	7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a	wellness	410620009	Well child visit (procedure)	129.16	129.16	129.16	NaN	NaN
2	759cfd9c-b40b-45df-a5b2-fca418027a6a	2019-10-31T01:43:20Z	2019-10-31T01:58:20Z	1ff7f10f-a204-4bb1-aa72-dd763fa99482	5103c940-0c08-392f-95cd-446e0cea042a	8b532fbe-4254-3a60-a442-33028916d24e	7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a	outpatient	185345009	Encounter for symptom	129.16	129.16	69.16	65363002.0	Otitis media

‘START’ and ‘STOP’ in table ‘conditions’ are ‘date’ type, which is a date without time information. The ‘START’ column indicates the date when the diagnosis begins, while the ‘STOP’ column indicates the date when the disease ends. These columns are in the format ‘YYYY-MM-DD’, such as ‘2019-02-16’.

conditions.head(3)

	START	STOP	PATIENT	ENCOUNTER	CODE	DESCRIPTION
0	2019-02-15	2019-08-01	1ff7f10f-a204-4bb1-aa72-dd763fa99482	d317dacb-d801-4136-be7d-440094f7ae83	65363002	Otitis media
1	2019-10-30	2020-01-30	1ff7f10f-a204-4bb1-aa72-dd763fa99482	759cfd9c-b40b-45df-a5b2-fca418027a6a	65363002	Otitis media
2	2020-03-01	2020-03-30	1ff7f10f-a204-4bb1-aa72-dd763fa99482	60584c6a-e26b-4176-a660-86ad84c7ceca	386661006	Fever (finding)

Convert datetime to date

To convert the ‘START’ and ‘STOP’ columns in the ‘encounters’ table from datetime to date, we can use the pd.to_datetime() function to parse the datetime strings and then use the .dt.date accessor to extract just the date part.

encounters = encounters.copy()

# Convert 'START' and 'STOP' columns to datetime
encounters['START'] = pd.to_datetime(encounters['START'])
encounters['STOP'] = pd.to_datetime(encounters['STOP'])

# Convert to date
encounters['START2'] = encounters['START'].dt.date
encounters['STOP2'] = encounters['STOP'].dt.date

# Display the first few rows to verify the conversion
encounters.head(3)

	Id	START	STOP	PATIENT	ORGANIZATION	PROVIDER	PAYER	ENCOUNTERCLASS	CODE	DESCRIPTION	BASE_ENCOUNTER_COST	TOTAL_CLAIM_COST	PAYER_COVERAGE	REASONCODE	REASONDESCRIPTION	START2	STOP2
0	d317dacb-d801-4136-be7d-440094f7ae83	2019-02-16 01:43:20+00:00	2019-02-16 01:58:20+00:00	1ff7f10f-a204-4bb1-aa72-dd763fa99482	5103c940-0c08-392f-95cd-446e0cea042a	8b532fbe-4254-3a60-a442-33028916d24e	7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a	outpatient	185345009	Encounter for symptom	129.16	129.16	69.16	65363002.0	Otitis media	2019-02-16	2019-02-16
1	52051c30-c6c3-45fe-b5da-a790f1680e91	2019-08-02 01:43:20+00:00	2019-08-02 02:13:20+00:00	1ff7f10f-a204-4bb1-aa72-dd763fa99482	0b9f3f7c-8ab6-30a5-b3ae-4dc0e0c00cb3	872f1318-45f7-34cc-adc7-26d59fc7c203	7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a	wellness	410620009	Well child visit (procedure)	129.16	129.16	129.16	NaN	NaN	2019-08-02	2019-08-02
2	759cfd9c-b40b-45df-a5b2-fca418027a6a	2019-10-31 01:43:20+00:00	2019-10-31 01:58:20+00:00	1ff7f10f-a204-4bb1-aa72-dd763fa99482	5103c940-0c08-392f-95cd-446e0cea042a	8b532fbe-4254-3a60-a442-33028916d24e	7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a	outpatient	185345009	Encounter for symptom	129.16	129.16	69.16	65363002.0	Otitis media	2019-10-31	2019-10-31

Set the date when disease is diagnosed for patients

In the ‘conditions’ table, the ‘START’ column indicates the date when a disease is diagnosed for a patient. To set this date, we can simply use the ‘START’ column as it already contains the date information. We can also convert the ‘START’ column to a date type if it is not already in that format.

# Filter COVID-19 rows
covid_conds = conditions[conditions['DESCRIPTION'] == 'COVID-19'].copy()

# Convert START to datetime
covid_conds['START'] = pd.to_datetime(conditions['START'])

# Group by PATIENT and get the earliest COVID date
covid_dates = covid_conds.groupby('PATIENT')['START'].min().reset_index()
covid_dates.rename(columns={'START': 'covid_date'}, inplace=True)

# Merge back to the original conditions table
covid_conditions = conditions.merge(covid_dates, on='PATIENT', how='left')

# Preview result
covid_conditions.head(7)

	START	STOP	PATIENT	ENCOUNTER	CODE	DESCRIPTION	covid_date
0	2019-02-15	2019-08-01	1ff7f10f-a204-4bb1-aa72-dd763fa99482	d317dacb-d801-4136-be7d-440094f7ae83	65363002	Otitis media	2020-03-01
1	2019-10-30	2020-01-30	1ff7f10f-a204-4bb1-aa72-dd763fa99482	759cfd9c-b40b-45df-a5b2-fca418027a6a	65363002	Otitis media	2020-03-01
2	2020-03-01	2020-03-30	1ff7f10f-a204-4bb1-aa72-dd763fa99482	60584c6a-e26b-4176-a660-86ad84c7ceca	386661006	Fever (finding)	2020-03-01
3	2020-03-01	2020-03-01	1ff7f10f-a204-4bb1-aa72-dd763fa99482	60584c6a-e26b-4176-a660-86ad84c7ceca	840544004	Suspected COVID-19	2020-03-01
4	2020-03-01	2020-03-30	1ff7f10f-a204-4bb1-aa72-dd763fa99482	60584c6a-e26b-4176-a660-86ad84c7ceca	840539006	COVID-19	2020-03-01
5	2020-02-12	2020-02-26	9bcf6ed5-d808-44af-98a0-7d78a29ede72	5d9dfe80-7edf-4181-9753-770e361934d1	44465007	Sprain of ankle	2020-03-13
6	2020-03-13	2020-04-14	9bcf6ed5-d808-44af-98a0-7d78a29ede72	a78e78d9-33bb-40bc-9e42-e47ff7e910cc	49727002	Cough (finding)	2020-03-13

Convert a specific date to datetime

import pytz

observations['DATE'] = pd.to_datetime(observations['DATE'], errors='coerce')

# Use a UTC-aware timestamp for comparison
cutoff = pd.to_datetime("2020-02-19")

# Filter
observations = observations[observations['DATE'] >= cutoff]
observations.head(3)

	DATE	PATIENT	ENCOUNTER	CODE	DESCRIPTION	VALUE	UNITS	TYPE
22	2020-03-01	1ff7f10f-a204-4bb1-aa72-dd763fa99482	60584c6a-e26b-4176-a660-86ad84c7ceca	8310-5	Body temperature	39.7	Cel	numeric
23	2020-03-01	1ff7f10f-a204-4bb1-aa72-dd763fa99482	60584c6a-e26b-4176-a660-86ad84c7ceca	9279-1	Respiratory rate	32.9	/min	numeric
24	2020-03-01	1ff7f10f-a204-4bb1-aa72-dd763fa99482	60584c6a-e26b-4176-a660-86ad84c7ceca	8867-4	Heart rate	164.2	/min	numeric

Identify MIN, MAX date in the ‘START’ column of the ‘encounters’ table. This will give us the earliest and latest dates in the dataset.

# Make a copy to avoid modifying the original DataFrame
encounters = encounters.copy()

# Ensure DATE column is datetime type
encounters['START'] = pd.to_datetime(encounters['START'], errors='coerce')

# Print the min and max date
print(encounters['START'].dt.date.min())
print(encounters['START'].dt.date.max())

1909-08-15
2020-06-02

Create a variable of date difference

to calculate the number of days since diagnosed with COVID for each patient. This can be done by subtracting the ‘START’ date by ‘covid_date’.

covid_patient_ids = conditions[conditions.CODE == 840539006].PATIENT.unique()

# Further filter by COVID patient IDs
lab_obs = observations[observations['PATIENT'].isin(covid_patient_ids)]

# Filter observations by specific CODE values
lab_obs = observations[
    observations['CODE'].isin(['48065-7', '26881-3', '2276-4', '89579-7', '731-0', '14804-9'])
]

# Ensure the DATE column is in datetime format
lab_obs['DATE'] = pd.to_datetime(lab_obs['DATE'], errors='coerce')

# Filter rows after 2020-01-20 for covid-related data
lab_obs = lab_obs[lab_obs['DATE'] > datetime(2020, 1, 20)].copy()

lab_obs = lab_obs.merge(covid_dates, on = 'PATIENT').copy()

# Calculate the number of days since the COVID diagnosis
lab_obs['days'] = (pd.to_datetime(lab_obs['DATE']) - pd.to_datetime(lab_obs['covid_date'])).dt.days.copy()

# Preview first 3 rows
lab_obs.head(3)

/var/folders/cx/3wbhcqyd3cld6gvk_xjkvr_40000gn/T/ipykernel_2575/3494572798.py:12: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

	DATE	PATIENT	ENCOUNTER	CODE	DESCRIPTION	VALUE	UNITS	TYPE	covid_date
0	2020-02-19	bd1c4ffc-7f1d-4590-adbb-1d6533fb623e	b7455838-3607-47f4-aaa5-fd89abea7d29	731-0	Lymphocytes [#/volume] in Blood by Automated c...	1.1	10*3/uL	numeric	2020-02-19
1	2020-02-19	bd1c4ffc-7f1d-4590-adbb-1d6533fb623e	b7455838-3607-47f4-aaa5-fd89abea7d29	48065-7	Fibrin D-dimer FEU [Mass/volume] in Platelet p...	0.4	ug/mL	numeric	2020-02-19
2	2020-02-19	bd1c4ffc-7f1d-4590-adbb-1d6533fb623e	b7455838-3607-47f4-aaa5-fd89abea7d29	2276-4	Ferritin [Mass/volume] in Serum or Plasma	332.4	ug/L	numeric	2020-02-19

lab_obs['days'].describe()

count    704747.000000
mean          6.129017
std           4.249655
min           0.000000
25%           2.000000
50%           6.000000
75%           9.000000
max          26.000000
Name: days, dtype: float64

Create a time-point variable

# Assign timepoints based on number of days
def assign_timepoint(days):
    if days == 0:
        return 'baseline'
    elif 6 <= days <= 8:
        return 'week_1'
    else:
        return None

# Apply the timepoint assignment
lab_obs['TIMEPOINT'] = lab_obs['days'].apply(assign_timepoint)

# Drop rows without a valid timepoint
lab_obs = lab_obs.dropna(subset=['TIMEPOINT']).copy()

lab_obs['VALUE'] = pd.to_numeric(lab_obs['VALUE'], errors='coerce')

# Aggregate the data
summary = lab_obs.pivot_table(
    index="DESCRIPTION",
    columns="TIMEPOINT",
    values="VALUE",
    aggfunc="mean"
).round(2)

summary

TIMEPOINT	baseline	week_1
DESCRIPTION
Ferritin [Mass/volume] in Serum or Plasma	516.66	720.05
Fibrin D-dimer FEU [Mass/volume] in Platelet poor plasma	0.53	1.38
Interleukin 6 [Mass/volume] in Serum or Plasma	NaN	7.35
Lactate dehydrogenase [Enzymatic activity/volume] in Serum or Plasma by Lactate to pyruvate reaction	245.22	256.27
Lymphocytes [#/volume] in Blood by Automated count	0.96	0.85
Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method	3.57	7.13

Lag comparison - Create a variable of readmission to hospital

Patients who are readmitted to the hospital within 30 days of their last discharge. This can be done by checking if the difference between the ‘START’ date of the current encounter and the ‘STOP’ date of the previous encounter is less than or equal to 30 days. The purpose is to find how many patients are readmitted to the hospital within 30 days of their last discharge. This is a common metric in healthcare to assess the quality of care and patient outcomes.

#encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")

inpatients = encounters[encounters.ENCOUNTERCLASS == 'inpatient'].copy()
inpatients.head()

	Id	START	STOP	PATIENT	ORGANIZATION	PROVIDER	PAYER	ENCOUNTERCLASS	CODE	DESCRIPTION	BASE_ENCOUNTER_COST	TOTAL_CLAIM_COST	PAYER_COVERAGE	REASONCODE	REASONDESCRIPTION	START2	STOP2
23	b7455838-3607-47f4-aaa5-fd89abea7d29	2020-02-19 06:11:02+00:00	2020-03-05 15:37:02+00:00	bd1c4ffc-7f1d-4590-adbb-1d6533fb623e	fbf6180e-b800-3ebe-b91d-93d0288c400e	7502b497-e32c-32d3-a0b8-f18f540a8a45	4d71f845-a6a9-3c39-b242-14d25ef86a8d	inpatient	1505002	Hospital admission for isolation (procedure)	129.16	129.16	64.16	840539006.0	COVID-19	2020-02-19	2020-03-05
49	40336433-df4f-4f5e-9827-258890c2dd40	2019-10-26 23:01:01+00:00	2019-10-27 23:16:01+00:00	09303bc9-3bcc-49fe-b57b-005471833c43	b0e04623-b02c-3f8b-92ea-943fc4db60da	ae9cd143-8fee-3acc-ab3c-ab3861be85d7	047f6ec3-6215-35eb-9608-f9dda363a44c	inpatient	305408004	Admission to surgical department	129.16	129.16	69.16	NaN	NaN	2019-10-26	2019-10-27
95	71089860-4c03-4f9a-a67d-bb2e12df3365	2020-05-13 16:53:45+00:00	2020-05-14 17:08:45+00:00	da454f92-9e90-43f8-9514-550fbaa6de3d	db0acede-4abe-3c01-8d03-5c68a190d8c7	9cb2439a-df04-3e7b-88b1-6f686d68f590	d47b3510-2895-3b70-9897-342d681c769d	inpatient	305411003	Admission to thoracic surgery department	129.16	129.16	49.16	87433001.0	Pulmonary emphysema (disorder)	2020-05-13	2020-05-14
105	3af37864-e57e-4c9b-91bc-3313aa3fe518	2010-12-17 19:21:23+00:00	2010-12-18 19:21:23+00:00	29581d13-2687-402f-9c4c-06141a319e61	465de31f-3098-365c-af70-48a071e1f5aa	84a692b5-42c9-39cf-9ffa-9a8304e00181	6e2f1a2d-27bd-3701-8d08-dae202c58632	inpatient	185345009	Encounter for symptom (procedure)	129.16	129.16	0.00	NaN	NaN	2010-12-17	2010-12-18
107	b6c55c14-95c5-4264-9b24-4e9eaf69f64e	2010-12-23 19:21:23+00:00	2010-12-24 22:17:23+00:00	29581d13-2687-402f-9c4c-06141a319e61	465de31f-3098-365c-af70-48a071e1f5aa	84a692b5-42c9-39cf-9ffa-9a8304e00181	6e2f1a2d-27bd-3701-8d08-dae202c58632	inpatient	185347001	Encounter for problem	129.16	129.16	54.16	254837009.0	Malignant neoplasm of breast (disorder)	2010-12-23	2010-12-24

# Filter for inpatients and explicitly make a copy
inpatients = encounters[encounters.ENCOUNTERCLASS == 'inpatient'].copy()

# Convert date columns
inpatients['START'] = pd.to_datetime(inpatients['START'])
inpatients['STOP'] = pd.to_datetime(inpatients['STOP'])

# Sort by PATIENT and START date
inpatients = inpatients.sort_values(['PATIENT', 'START'])

# Get the previous STOP date per patient
inpatients['PREV_STOP'] = inpatients.groupby('PATIENT')['STOP'].shift(1)

# Calculate the gap in days since the last discharge
inpatients['DAYS_SINCE_LAST_DISCHARGE'] = (inpatients['START'] - inpatients['PREV_STOP']).dt.days

# Identify readmissions within 30 days
inpatients['READMITTED_WITHIN_30_DAYS'] = (
    (inpatients['DAYS_SINCE_LAST_DISCHARGE'] > 0) &
    (inpatients['DAYS_SINCE_LAST_DISCHARGE'] <= 30)
)

# Display relevant columns
print(inpatients[['PATIENT', 'START', 'PREV_STOP', 'DAYS_SINCE_LAST_DISCHARGE', 'READMITTED_WITHIN_30_DAYS']])

# Calculate patient-level readmission rate
readmission_rate = inpatients.groupby('PATIENT')['READMITTED_WITHIN_30_DAYS'].max().mean()
print(f"Readmission rate: {readmission_rate:.2%}")

                                      PATIENT                     START  \
2245164  0001049f-9248-47fe-b479-ea80eb51ce4a 2020-03-11 16:17:49+00:00   
267214   00051dc6-38dc-4781-a6db-b69479990236 2020-03-05 07:11:26+00:00   
480342   0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-03-05 13:54:35+00:00   
480343   0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-04-25 13:54:35+00:00   
480344   0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-11-03 13:54:35+00:00   
...                                       ...                       ...   
102046   fffca037-ba70-414e-9930-bc3c6a9023c3 2020-02-19 07:28:42+00:00   
332980   fffde06d-4cf8-462a-9429-b3946224c517 2020-03-04 15:21:25+00:00   
332981   fffde06d-4cf8-462a-9429-b3946224c517 2020-03-12 15:21:25+00:00   
269955   ffff888c-383e-428d-8f53-9aaf40104f9c 2020-02-14 09:41:03+00:00   
1938371  ffffadc8-7106-4b51-b003-1307776e9790 2020-03-07 15:05:29+00:00   

                        PREV_STOP  DAYS_SINCE_LAST_DISCHARGE  \
2245164                       NaT                        NaN   
267214                        NaT                        NaN   
480342                        NaT                        NaN   
480343  1990-03-06 14:09:35+00:00                       49.0   
480344  1990-04-26 13:54:35+00:00                      191.0   
...                           ...                        ...   
102046                        NaT                        NaN   
332980                        NaT                        NaN   
332981  2020-03-12 20:13:25+00:00                       -1.0   
269955                        NaT                        NaN   
1938371                       NaT                        NaN   

         READMITTED_WITHIN_30_DAYS  
2245164                      False  
267214                       False  
480342                       False  
480343                       False  
480344                       False  
...                            ...  
102046                       False  
332980                       False  
332981                       False  
269955                       False  
1938371                      False  

[95273 rows x 5 columns]
Readmission rate: 5.75%

Note that the readmission rate can be calculated at two levels: patient-level and encounter-level. The patient-level readmission rate considers whether a patient has been readmitted within 30 days of their last discharge, while the encounter-level readmission rate considers each individual encounter.

# Patient-level readmission rate
patient_readmission_rate = inpatients.groupby('PATIENT')['READMITTED_WITHIN_30_DAYS'].max().mean()

# Encounter-level readmission rate
encounter_readmission_rate = inpatients['READMITTED_WITHIN_30_DAYS'].mean()

print(f"Patient-level readmission rate: {patient_readmission_rate:.2%}")
print(f"Encounter-level readmission rate: {encounter_readmission_rate:.2%}")

Patient-level readmission rate: 5.75%
Encounter-level readmission rate: 16.98%

Look at data of readmitted patients to see how many patients are readmitted within 30 days of their last discharge.

# Filter only true readmissions
readmitted_patients = inpatients[inpatients['READMITTED_WITHIN_30_DAYS']]

print(readmitted_patients[['PATIENT', 'START', 'PREV_STOP', 'DAYS_SINCE_LAST_DISCHARGE', 'READMITTED_WITHIN_30_DAYS']])
len(readmitted_patients['PATIENT'].unique())

                                      PATIENT                     START  \
898016   00465335-b351-40aa-a37c-4fd851b77b50 1993-03-13 02:18:56+00:00   
898019   00465335-b351-40aa-a37c-4fd851b77b50 1993-04-13 02:18:56+00:00   
898032   00465335-b351-40aa-a37c-4fd851b77b50 1993-07-17 02:18:56+00:00   
898044   00465335-b351-40aa-a37c-4fd851b77b50 1993-12-29 02:18:56+00:00   
898046   00465335-b351-40aa-a37c-4fd851b77b50 1994-01-29 02:18:56+00:00   
...                                       ...                       ...   
1326164  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-08-11 12:53:54+00:00   
1326165  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-09-04 12:53:54+00:00   
1326166  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-09-28 12:53:54+00:00   
1326167  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-10-21 12:53:54+00:00   
1326168  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-11-13 12:53:54+00:00   

                        PREV_STOP  DAYS_SINCE_LAST_DISCHARGE  \
898016  1993-02-14 06:18:56+00:00                       26.0   
898019  1993-03-14 04:18:56+00:00                       29.0   
898032  1993-06-19 06:18:56+00:00                       27.0   
898044  1993-11-30 04:18:56+00:00                       28.0   
898046  1993-12-30 06:18:56+00:00                       29.0   
...                           ...                        ...   
1326164 2015-07-21 12:53:54+00:00                       21.0   
1326165 2015-08-12 14:53:54+00:00                       22.0   
1326166 2015-09-05 13:53:54+00:00                       22.0   
1326167 2015-09-29 12:53:54+00:00                       22.0   
1326168 2015-10-22 15:53:54+00:00                       21.0   

         READMITTED_WITHIN_30_DAYS  
898016                        True  
898019                        True  
898032                        True  
898044                        True  
898046                        True  
...                            ...  
1326164                       True  
1326165                       True  
1326166                       True  
1326167                       True  
1326168                       True  

[16181 rows x 5 columns]

Group by month to count readmitted patients.

# Ensure START is datetime
inpatients['START'] = pd.to_datetime(inpatients['START'])

# Extract year and month (e.g., '2025-06') for grouping
inpatients['MONTH'] = inpatients['START'].dt.tz_localize(None).dt.to_period('M')

# Filter for readmitted cases
readmitted = inpatients[inpatients['READMITTED_WITHIN_30_DAYS'] == True]

# Group by MONTH and count
readmitted_per_month = readmitted.groupby('MONTH').size().reset_index(name='readmitted_count')

print(readmitted_per_month)

       MONTH  readmitted_count
0    1919-07                 1
1    1919-08                 1
2    1919-09                 2
3    1919-10                 2
4    1919-11                 1
..       ...               ...
734  2020-01               119
735  2020-02               148
736  2020-03               318
737  2020-04               164
738  2020-05                85

[739 rows x 2 columns]