Chapter 3 - Process Datetime in Healthcare Data Analysis

Author

Thieu Nguyen

Introduction

Statistical analysis techniques such as sum, mean, and pivot_table, etc, can be learned from statistics textbooks. However, the ability to derive meaningful insights from data analysis often comes from real-world experience in specific industries, such as healthcare.

In this chapter, I will demonstrate how to work with datetime variables in healthcare data analysis. The focus will be on handling datetime fields in healthcare databases using real-world examples.

Almost fact tables such as ‘conditions’, ‘encounters’, ‘observations’ and more in hospital database are long data table with datetime are the index columns. Patient records are repeated at many points in time forming time series. Datetime series can be irregular without a fixed unit of time depending on the medical events unexpectedly happen in hospitals.

I use covid data from SyntheaMass database to demonstrate how to proccess datetime variables in this chapter.

Load necessary python packages and data files

import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime
# Ensure all columns are shown
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
# Load the data
conditions = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/conditions.csv")
patients = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/patients.csv")
care_plans = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/careplans.csv")
observations = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/observations.csv")
encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")
procedures = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/procedures.csv")
medications = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/medications.csv")

Date and Datetime in tables

‘START’ and ‘STOP’ columns in the tables are datetime variables. They are used to indicate the time when a medical event starts and ends. The ‘START’ column is the datetime when the event begins, while the ‘STOP’ column is the datetime when the event ends. These two columns are timestamps data type, ‘2019-02-16T01:43:20Z’. These columns can be transformed to date type as ‘2019-02-16’.

encounters.head(3)
Id START STOP PATIENT ORGANIZATION PROVIDER PAYER ENCOUNTERCLASS CODE DESCRIPTION BASE_ENCOUNTER_COST TOTAL_CLAIM_COST PAYER_COVERAGE REASONCODE REASONDESCRIPTION
0 d317dacb-d801-4136-be7d-440094f7ae83 2019-02-16T01:43:20Z 2019-02-16T01:58:20Z 1ff7f10f-a204-4bb1-aa72-dd763fa99482 5103c940-0c08-392f-95cd-446e0cea042a 8b532fbe-4254-3a60-a442-33028916d24e 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a outpatient 185345009 Encounter for symptom 129.16 129.16 69.16 65363002.0 Otitis media
1 52051c30-c6c3-45fe-b5da-a790f1680e91 2019-08-02T01:43:20Z 2019-08-02T02:13:20Z 1ff7f10f-a204-4bb1-aa72-dd763fa99482 0b9f3f7c-8ab6-30a5-b3ae-4dc0e0c00cb3 872f1318-45f7-34cc-adc7-26d59fc7c203 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a wellness 410620009 Well child visit (procedure) 129.16 129.16 129.16 NaN NaN
2 759cfd9c-b40b-45df-a5b2-fca418027a6a 2019-10-31T01:43:20Z 2019-10-31T01:58:20Z 1ff7f10f-a204-4bb1-aa72-dd763fa99482 5103c940-0c08-392f-95cd-446e0cea042a 8b532fbe-4254-3a60-a442-33028916d24e 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a outpatient 185345009 Encounter for symptom 129.16 129.16 69.16 65363002.0 Otitis media

‘START’ and ‘STOP’ in table ‘conditions’ are ‘date’ type, which is a date without time information. The ‘START’ column indicates the date when the diagnosis begins, while the ‘STOP’ column indicates the date when the disease ends. These columns are in the format ‘YYYY-MM-DD’, such as ‘2019-02-16’.

conditions.head(3)
START STOP PATIENT ENCOUNTER CODE DESCRIPTION
0 2019-02-15 2019-08-01 1ff7f10f-a204-4bb1-aa72-dd763fa99482 d317dacb-d801-4136-be7d-440094f7ae83 65363002 Otitis media
1 2019-10-30 2020-01-30 1ff7f10f-a204-4bb1-aa72-dd763fa99482 759cfd9c-b40b-45df-a5b2-fca418027a6a 65363002 Otitis media
2 2020-03-01 2020-03-30 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 386661006 Fever (finding)

Convert datetime to date

To convert the ‘START’ and ‘STOP’ columns in the ‘encounters’ table from datetime to date, we can use the pd.to_datetime() function to parse the datetime strings and then use the .dt.date accessor to extract just the date part.

encounters = encounters.copy()

# Convert 'START' and 'STOP' columns to datetime
encounters['START'] = pd.to_datetime(encounters['START'])
encounters['STOP'] = pd.to_datetime(encounters['STOP'])

# Convert to date
encounters['START2'] = encounters['START'].dt.date
encounters['STOP2'] = encounters['STOP'].dt.date

# Display the first few rows to verify the conversion
encounters.head(3)
Id START STOP PATIENT ORGANIZATION PROVIDER PAYER ENCOUNTERCLASS CODE DESCRIPTION BASE_ENCOUNTER_COST TOTAL_CLAIM_COST PAYER_COVERAGE REASONCODE REASONDESCRIPTION START2 STOP2
0 d317dacb-d801-4136-be7d-440094f7ae83 2019-02-16 01:43:20+00:00 2019-02-16 01:58:20+00:00 1ff7f10f-a204-4bb1-aa72-dd763fa99482 5103c940-0c08-392f-95cd-446e0cea042a 8b532fbe-4254-3a60-a442-33028916d24e 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a outpatient 185345009 Encounter for symptom 129.16 129.16 69.16 65363002.0 Otitis media 2019-02-16 2019-02-16
1 52051c30-c6c3-45fe-b5da-a790f1680e91 2019-08-02 01:43:20+00:00 2019-08-02 02:13:20+00:00 1ff7f10f-a204-4bb1-aa72-dd763fa99482 0b9f3f7c-8ab6-30a5-b3ae-4dc0e0c00cb3 872f1318-45f7-34cc-adc7-26d59fc7c203 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a wellness 410620009 Well child visit (procedure) 129.16 129.16 129.16 NaN NaN 2019-08-02 2019-08-02
2 759cfd9c-b40b-45df-a5b2-fca418027a6a 2019-10-31 01:43:20+00:00 2019-10-31 01:58:20+00:00 1ff7f10f-a204-4bb1-aa72-dd763fa99482 5103c940-0c08-392f-95cd-446e0cea042a 8b532fbe-4254-3a60-a442-33028916d24e 7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a outpatient 185345009 Encounter for symptom 129.16 129.16 69.16 65363002.0 Otitis media 2019-10-31 2019-10-31

Set the date when disease is diagnosed for patients

In the ‘conditions’ table, the ‘START’ column indicates the date when a disease is diagnosed for a patient. To set this date, we can simply use the ‘START’ column as it already contains the date information. We can also convert the ‘START’ column to a date type if it is not already in that format.

# Filter COVID-19 rows
covid_conds = conditions[conditions['DESCRIPTION'] == 'COVID-19'].copy()

# Convert START to datetime
covid_conds['START'] = pd.to_datetime(conditions['START'])

# Group by PATIENT and get the earliest COVID date
covid_dates = covid_conds.groupby('PATIENT')['START'].min().reset_index()
covid_dates.rename(columns={'START': 'covid_date'}, inplace=True)

# Merge back to the original conditions table
covid_conditions = conditions.merge(covid_dates, on='PATIENT', how='left')

# Preview result
covid_conditions.head(7)
START STOP PATIENT ENCOUNTER CODE DESCRIPTION covid_date
0 2019-02-15 2019-08-01 1ff7f10f-a204-4bb1-aa72-dd763fa99482 d317dacb-d801-4136-be7d-440094f7ae83 65363002 Otitis media 2020-03-01
1 2019-10-30 2020-01-30 1ff7f10f-a204-4bb1-aa72-dd763fa99482 759cfd9c-b40b-45df-a5b2-fca418027a6a 65363002 Otitis media 2020-03-01
2 2020-03-01 2020-03-30 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 386661006 Fever (finding) 2020-03-01
3 2020-03-01 2020-03-01 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 840544004 Suspected COVID-19 2020-03-01
4 2020-03-01 2020-03-30 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 840539006 COVID-19 2020-03-01
5 2020-02-12 2020-02-26 9bcf6ed5-d808-44af-98a0-7d78a29ede72 5d9dfe80-7edf-4181-9753-770e361934d1 44465007 Sprain of ankle 2020-03-13
6 2020-03-13 2020-04-14 9bcf6ed5-d808-44af-98a0-7d78a29ede72 a78e78d9-33bb-40bc-9e42-e47ff7e910cc 49727002 Cough (finding) 2020-03-13

Convert a specific date to datetime

import pytz

observations['DATE'] = pd.to_datetime(observations['DATE'], errors='coerce')

# Use a UTC-aware timestamp for comparison
cutoff = pd.to_datetime("2020-02-19")

# Filter
observations = observations[observations['DATE'] >= cutoff]
observations.head(3)
DATE PATIENT ENCOUNTER CODE DESCRIPTION VALUE UNITS TYPE
22 2020-03-01 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 8310-5 Body temperature 39.7 Cel numeric
23 2020-03-01 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 9279-1 Respiratory rate 32.9 /min numeric
24 2020-03-01 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 8867-4 Heart rate 164.2 /min numeric

Identify MIN, MAX date in the ‘START’ column of the ‘encounters’ table. This will give us the earliest and latest dates in the dataset.

# Make a copy to avoid modifying the original DataFrame
encounters = encounters.copy()

# Ensure DATE column is datetime type
encounters['START'] = pd.to_datetime(encounters['START'], errors='coerce')

# Print the min and max date
print(encounters['START'].dt.date.min())
print(encounters['START'].dt.date.max())
1909-08-15
2020-06-02

Create a variable of date difference

to calculate the number of days since diagnosed with COVID for each patient. This can be done by subtracting the ‘START’ date by ‘covid_date’.

covid_patient_ids = conditions[conditions.CODE == 840539006].PATIENT.unique()

# Further filter by COVID patient IDs
lab_obs = observations[observations['PATIENT'].isin(covid_patient_ids)]

# Filter observations by specific CODE values
lab_obs = observations[
    observations['CODE'].isin(['48065-7', '26881-3', '2276-4', '89579-7', '731-0', '14804-9'])
]

# Ensure the DATE column is in datetime format
lab_obs['DATE'] = pd.to_datetime(lab_obs['DATE'], errors='coerce')

# Filter rows after 2020-01-20 for covid-related data
lab_obs = lab_obs[lab_obs['DATE'] > datetime(2020, 1, 20)].copy()

lab_obs = lab_obs.merge(covid_dates, on = 'PATIENT').copy()

# Calculate the number of days since the COVID diagnosis
lab_obs['days'] = (pd.to_datetime(lab_obs['DATE']) - pd.to_datetime(lab_obs['covid_date'])).dt.days.copy()

# Preview first 3 rows
lab_obs.head(3)
/var/folders/cx/3wbhcqyd3cld6gvk_xjkvr_40000gn/T/ipykernel_2575/3494572798.py:12: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
DATE PATIENT ENCOUNTER CODE DESCRIPTION VALUE UNITS TYPE covid_date days
0 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 731-0 Lymphocytes [#/volume] in Blood by Automated c... 1.1 10*3/uL numeric 2020-02-19 0
1 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 48065-7 Fibrin D-dimer FEU [Mass/volume] in Platelet p... 0.4 ug/mL numeric 2020-02-19 0
2 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 2276-4 Ferritin [Mass/volume] in Serum or Plasma 332.4 ug/L numeric 2020-02-19 0
lab_obs['days'].describe()
count    704747.000000
mean          6.129017
std           4.249655
min           0.000000
25%           2.000000
50%           6.000000
75%           9.000000
max          26.000000
Name: days, dtype: float64

Create a time-point variable

# Assign timepoints based on number of days
def assign_timepoint(days):
    if days == 0:
        return 'baseline'
    elif 6 <= days <= 8:
        return 'week_1'
    else:
        return None

# Apply the timepoint assignment
lab_obs['TIMEPOINT'] = lab_obs['days'].apply(assign_timepoint)

# Drop rows without a valid timepoint
lab_obs = lab_obs.dropna(subset=['TIMEPOINT']).copy()

lab_obs['VALUE'] = pd.to_numeric(lab_obs['VALUE'], errors='coerce')

# Aggregate the data
summary = lab_obs.pivot_table(
    index="DESCRIPTION",
    columns="TIMEPOINT",
    values="VALUE",
    aggfunc="mean"
).round(2)

summary
TIMEPOINT baseline week_1
DESCRIPTION
Ferritin [Mass/volume] in Serum or Plasma 516.66 720.05
Fibrin D-dimer FEU [Mass/volume] in Platelet poor plasma 0.53 1.38
Interleukin 6 [Mass/volume] in Serum or Plasma NaN 7.35
Lactate dehydrogenase [Enzymatic activity/volume] in Serum or Plasma by Lactate to pyruvate reaction 245.22 256.27
Lymphocytes [#/volume] in Blood by Automated count 0.96 0.85
Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method 3.57 7.13

Lag comparison - Create a variable of readmission to hospital

Patients who are readmitted to the hospital within 30 days of their last discharge. This can be done by checking if the difference between the ‘START’ date of the current encounter and the ‘STOP’ date of the previous encounter is less than or equal to 30 days. The purpose is to find how many patients are readmitted to the hospital within 30 days of their last discharge. This is a common metric in healthcare to assess the quality of care and patient outcomes.

#encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")

inpatients = encounters[encounters.ENCOUNTERCLASS == 'inpatient'].copy()
inpatients.head()
Id START STOP PATIENT ORGANIZATION PROVIDER PAYER ENCOUNTERCLASS CODE DESCRIPTION BASE_ENCOUNTER_COST TOTAL_CLAIM_COST PAYER_COVERAGE REASONCODE REASONDESCRIPTION START2 STOP2
23 b7455838-3607-47f4-aaa5-fd89abea7d29 2020-02-19 06:11:02+00:00 2020-03-05 15:37:02+00:00 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e fbf6180e-b800-3ebe-b91d-93d0288c400e 7502b497-e32c-32d3-a0b8-f18f540a8a45 4d71f845-a6a9-3c39-b242-14d25ef86a8d inpatient 1505002 Hospital admission for isolation (procedure) 129.16 129.16 64.16 840539006.0 COVID-19 2020-02-19 2020-03-05
49 40336433-df4f-4f5e-9827-258890c2dd40 2019-10-26 23:01:01+00:00 2019-10-27 23:16:01+00:00 09303bc9-3bcc-49fe-b57b-005471833c43 b0e04623-b02c-3f8b-92ea-943fc4db60da ae9cd143-8fee-3acc-ab3c-ab3861be85d7 047f6ec3-6215-35eb-9608-f9dda363a44c inpatient 305408004 Admission to surgical department 129.16 129.16 69.16 NaN NaN 2019-10-26 2019-10-27
95 71089860-4c03-4f9a-a67d-bb2e12df3365 2020-05-13 16:53:45+00:00 2020-05-14 17:08:45+00:00 da454f92-9e90-43f8-9514-550fbaa6de3d db0acede-4abe-3c01-8d03-5c68a190d8c7 9cb2439a-df04-3e7b-88b1-6f686d68f590 d47b3510-2895-3b70-9897-342d681c769d inpatient 305411003 Admission to thoracic surgery department 129.16 129.16 49.16 87433001.0 Pulmonary emphysema (disorder) 2020-05-13 2020-05-14
105 3af37864-e57e-4c9b-91bc-3313aa3fe518 2010-12-17 19:21:23+00:00 2010-12-18 19:21:23+00:00 29581d13-2687-402f-9c4c-06141a319e61 465de31f-3098-365c-af70-48a071e1f5aa 84a692b5-42c9-39cf-9ffa-9a8304e00181 6e2f1a2d-27bd-3701-8d08-dae202c58632 inpatient 185345009 Encounter for symptom (procedure) 129.16 129.16 0.00 NaN NaN 2010-12-17 2010-12-18
107 b6c55c14-95c5-4264-9b24-4e9eaf69f64e 2010-12-23 19:21:23+00:00 2010-12-24 22:17:23+00:00 29581d13-2687-402f-9c4c-06141a319e61 465de31f-3098-365c-af70-48a071e1f5aa 84a692b5-42c9-39cf-9ffa-9a8304e00181 6e2f1a2d-27bd-3701-8d08-dae202c58632 inpatient 185347001 Encounter for problem 129.16 129.16 54.16 254837009.0 Malignant neoplasm of breast (disorder) 2010-12-23 2010-12-24
# Filter for inpatients and explicitly make a copy
inpatients = encounters[encounters.ENCOUNTERCLASS == 'inpatient'].copy()

# Convert date columns
inpatients['START'] = pd.to_datetime(inpatients['START'])
inpatients['STOP'] = pd.to_datetime(inpatients['STOP'])

# Sort by PATIENT and START date
inpatients = inpatients.sort_values(['PATIENT', 'START'])

# Get the previous STOP date per patient
inpatients['PREV_STOP'] = inpatients.groupby('PATIENT')['STOP'].shift(1)

# Calculate the gap in days since the last discharge
inpatients['DAYS_SINCE_LAST_DISCHARGE'] = (inpatients['START'] - inpatients['PREV_STOP']).dt.days

# Identify readmissions within 30 days
inpatients['READMITTED_WITHIN_30_DAYS'] = (
    (inpatients['DAYS_SINCE_LAST_DISCHARGE'] > 0) &
    (inpatients['DAYS_SINCE_LAST_DISCHARGE'] <= 30)
)

# Display relevant columns
print(inpatients[['PATIENT', 'START', 'PREV_STOP', 'DAYS_SINCE_LAST_DISCHARGE', 'READMITTED_WITHIN_30_DAYS']])

# Calculate patient-level readmission rate
readmission_rate = inpatients.groupby('PATIENT')['READMITTED_WITHIN_30_DAYS'].max().mean()
print(f"Readmission rate: {readmission_rate:.2%}")
                                      PATIENT                     START  \
2245164  0001049f-9248-47fe-b479-ea80eb51ce4a 2020-03-11 16:17:49+00:00   
267214   00051dc6-38dc-4781-a6db-b69479990236 2020-03-05 07:11:26+00:00   
480342   0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-03-05 13:54:35+00:00   
480343   0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-04-25 13:54:35+00:00   
480344   0006bd25-4d45-41ce-9285-c02d38d57c4d 1990-11-03 13:54:35+00:00   
...                                       ...                       ...   
102046   fffca037-ba70-414e-9930-bc3c6a9023c3 2020-02-19 07:28:42+00:00   
332980   fffde06d-4cf8-462a-9429-b3946224c517 2020-03-04 15:21:25+00:00   
332981   fffde06d-4cf8-462a-9429-b3946224c517 2020-03-12 15:21:25+00:00   
269955   ffff888c-383e-428d-8f53-9aaf40104f9c 2020-02-14 09:41:03+00:00   
1938371  ffffadc8-7106-4b51-b003-1307776e9790 2020-03-07 15:05:29+00:00   

                        PREV_STOP  DAYS_SINCE_LAST_DISCHARGE  \
2245164                       NaT                        NaN   
267214                        NaT                        NaN   
480342                        NaT                        NaN   
480343  1990-03-06 14:09:35+00:00                       49.0   
480344  1990-04-26 13:54:35+00:00                      191.0   
...                           ...                        ...   
102046                        NaT                        NaN   
332980                        NaT                        NaN   
332981  2020-03-12 20:13:25+00:00                       -1.0   
269955                        NaT                        NaN   
1938371                       NaT                        NaN   

         READMITTED_WITHIN_30_DAYS  
2245164                      False  
267214                       False  
480342                       False  
480343                       False  
480344                       False  
...                            ...  
102046                       False  
332980                       False  
332981                       False  
269955                       False  
1938371                      False  

[95273 rows x 5 columns]
Readmission rate: 5.75%

Note that the readmission rate can be calculated at two levels: patient-level and encounter-level. The patient-level readmission rate considers whether a patient has been readmitted within 30 days of their last discharge, while the encounter-level readmission rate considers each individual encounter.

# Patient-level readmission rate
patient_readmission_rate = inpatients.groupby('PATIENT')['READMITTED_WITHIN_30_DAYS'].max().mean()

# Encounter-level readmission rate
encounter_readmission_rate = inpatients['READMITTED_WITHIN_30_DAYS'].mean()

print(f"Patient-level readmission rate: {patient_readmission_rate:.2%}")
print(f"Encounter-level readmission rate: {encounter_readmission_rate:.2%}")
Patient-level readmission rate: 5.75%
Encounter-level readmission rate: 16.98%

Look at data of readmitted patients to see how many patients are readmitted within 30 days of their last discharge.

# Filter only true readmissions
readmitted_patients = inpatients[inpatients['READMITTED_WITHIN_30_DAYS']]

print(readmitted_patients[['PATIENT', 'START', 'PREV_STOP', 'DAYS_SINCE_LAST_DISCHARGE', 'READMITTED_WITHIN_30_DAYS']])
len(readmitted_patients['PATIENT'].unique())
                                      PATIENT                     START  \
898016   00465335-b351-40aa-a37c-4fd851b77b50 1993-03-13 02:18:56+00:00   
898019   00465335-b351-40aa-a37c-4fd851b77b50 1993-04-13 02:18:56+00:00   
898032   00465335-b351-40aa-a37c-4fd851b77b50 1993-07-17 02:18:56+00:00   
898044   00465335-b351-40aa-a37c-4fd851b77b50 1993-12-29 02:18:56+00:00   
898046   00465335-b351-40aa-a37c-4fd851b77b50 1994-01-29 02:18:56+00:00   
...                                       ...                       ...   
1326164  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-08-11 12:53:54+00:00   
1326165  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-09-04 12:53:54+00:00   
1326166  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-09-28 12:53:54+00:00   
1326167  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-10-21 12:53:54+00:00   
1326168  fff6961e-173d-4429-bf21-4c46e3ef7e2e 2015-11-13 12:53:54+00:00   

                        PREV_STOP  DAYS_SINCE_LAST_DISCHARGE  \
898016  1993-02-14 06:18:56+00:00                       26.0   
898019  1993-03-14 04:18:56+00:00                       29.0   
898032  1993-06-19 06:18:56+00:00                       27.0   
898044  1993-11-30 04:18:56+00:00                       28.0   
898046  1993-12-30 06:18:56+00:00                       29.0   
...                           ...                        ...   
1326164 2015-07-21 12:53:54+00:00                       21.0   
1326165 2015-08-12 14:53:54+00:00                       22.0   
1326166 2015-09-05 13:53:54+00:00                       22.0   
1326167 2015-09-29 12:53:54+00:00                       22.0   
1326168 2015-10-22 15:53:54+00:00                       21.0   

         READMITTED_WITHIN_30_DAYS  
898016                        True  
898019                        True  
898032                        True  
898044                        True  
898046                        True  
...                            ...  
1326164                       True  
1326165                       True  
1326166                       True  
1326167                       True  
1326168                       True  

[16181 rows x 5 columns]
2304

Group by month to count readmitted patients.

# Ensure START is datetime
inpatients['START'] = pd.to_datetime(inpatients['START'])

# Extract year and month (e.g., '2025-06') for grouping
inpatients['MONTH'] = inpatients['START'].dt.tz_localize(None).dt.to_period('M')

# Filter for readmitted cases
readmitted = inpatients[inpatients['READMITTED_WITHIN_30_DAYS'] == True]

# Group by MONTH and count
readmitted_per_month = readmitted.groupby('MONTH').size().reset_index(name='readmitted_count')

print(readmitted_per_month)
       MONTH  readmitted_count
0    1919-07                 1
1    1919-08                 1
2    1919-09                 2
3    1919-10                 2
4    1919-11                 1
..       ...               ...
734  2020-01               119
735  2020-02               148
736  2020-03               318
737  2020-04               164
738  2020-05                85

[739 rows x 2 columns]