The raw OR case dataset was imported from Excel and standardized prior to analysis. Key steps included:
Column cleaning and removal of identifiers
Column names were standardized (lowercase, snake_case)
Patient identifiers, provider names, and procedure descriptions were excluded to ensure de-identification.
Parsing dates and times
Dates were converted to Date format.
All time fields (scheduled start, pre-op arrival, MD available, room ready, anesthesia sign-off, pre-op complete, in/out room times) were parsed consistently into hms objects.
Date–time stamps (e.g., consent completion, H&P completion) were converted to POSIXct for accurate interval calculations.
Filtering
Only first cases were retained (first_case == “Yes”).
Cases were restricted to those scheduled for 07:30 (Mon–Wed, Fri) or 08:30 (Thu) to ensure a uniform start reference.
Compliance flags
Lag variables
Interval (lag) variables were created to capture elapsed minutes between milestones (e.g., scheduled start → pre-op arrival, pre-op → anesthesia sign-off, in-room → out-room).
All lags were rounded up to the nearest minute for conservative estimates.
Outcome variable
Handling missing values (for modeling)
Numeric lags: imputed using the median.
Categorical variables (day of week, patient class): missing values replaced with “Unknown”.
Boxplots of all the lags (minutes) between arrive to preop to a 2nd process or task. We also calculated the distance from scheduled to preoop. This is how we know how ealry a patient is arriving.
Compliance rate based on when work should be done by.This shows how we
performed and gives a fuzzy picture.
| Status | Count | Percent |
|---|---|---|
| Late | 182 | 25% |
| On-time | 532 | 75% |
Table shows that 75% we are on time!! Great job kudos. Lets get better.
Top left is a T-Test example that was used to find what factors
(variables ) were statistically significant among the two groups
(on-time vs late to start). T-test only works on number variables and
not categories. On the right you have a similar test (RankSum) that
works on non-numbered variables (categories).Below left and right is a
Random Forest model for all the variables we loaded. The model selects
the best variables (alone or in combination) to find
important most weight at predicting the outcome late vs
on time.
This plot shows how early we need to tell patients to arrive to belong
to a on-time-start class. In short patients that arrive
30 minutes early will not start on time. Various possible factors
(age,IV access, ect) could be at play but a simple intervention could be
to test this on a select group of cases for X amount of
time.
##
## Call:
## roc.default(response = df_rf$is_on_time, predictor = df_rf$mins_sched_to_preop)
##
## Data: df_rf$mins_sched_to_preop in 182 controls (df_rf$is_on_time 0) > 532 cases (df_rf$is_on_time 1).
## Area under the curve: 0.6008
## threshold sensitivity specificity
## 1 -83.5 0.8909774 0.2692308
This shows the probability of starting on time as a function of the day of the week.