#Business Scenario: Emergency Department Volumes Analysis.
Case 1: How many patients will be arriving to emergency department at some time point per hour and minute? (Arrival Volumes forecast)
Case 2: How many patients will be siting in the ED at some point per minute(patient census)?
ed_data <- read.csv("Simulated_ed_data.csv")
head(ed_data)
## arrival_times depart_times
## 1 10/1/18 0:05 10/1/18 2:34
## 2 10/1/18 0:15 10/1/18 3:04
## 3 10/1/18 0:16 10/1/18 2:36
## 4 10/1/18 0:19 10/1/18 2:45
## 5 10/1/18 0:26 10/1/18 3:15
## 6 10/1/18 0:35 10/1/18 3:02
ed_data$arrival_times=mdy_hm(ed_data$arrival_times)
ed_data$depart_times=mdy_hm(ed_data$depart_times)
volumes_per_hour <- ed_data %>%
mutate(timestamp=floor_date(arrival_times,unit='hour')) %>%
count(timestamp)
volumes_per_hour%>%
ggplot(mapping=aes(x=timestamp,y=n))+geom_line()+
labs(title="Emergency Department Arrival Volumes per hour",
subtitle='simulated data for three days',
caption='Data Source:500 Patients')+
theme(
plot.title = element_text(color = "red", size = 12, face = "bold",hjust=0.5),
plot.subtitle = element_text(color = "blue",hjust=0.5),
plot.caption = element_text(color = "green", face = "italic",hjust=1)
)+
xlab('Date')+
ylab('Volume')
volumes_per_minute <- ed_data %>%
mutate(timestamp=floor_date(arrival_times,unit='minute')) %>%
count(timestamp)%>%
select(timestamp,volume=n)
# create a sequence of times from the start to end of your available data.
start<- min(volumes_per_minute$timestamp)
end <- max(volumes_per_minute$timestamp)
complete_window <- tibble(timestamp=seq(start,end,by='mins'))
# do a left join to get all the timestamps
(total_volumes_minute <- complete_window %>% left_join(volumes_per_minute,by='timestamp')%>%
mutate(volume=ifelse(is.na(volume),0,volume)))
## # A tibble: 3,051 x 2
## timestamp volume
## <dttm> <dbl>
## 1 2018-10-01 00:05:00 1
## 2 2018-10-01 00:06:00 0
## 3 2018-10-01 00:07:00 0
## 4 2018-10-01 00:08:00 0
## 5 2018-10-01 00:09:00 0
## 6 2018-10-01 00:10:00 0
## 7 2018-10-01 00:11:00 0
## 8 2018-10-01 00:12:00 0
## 9 2018-10-01 00:13:00 0
## 10 2018-10-01 00:14:00 0
## # ... with 3,041 more rows
total_volumes_minute%>%
ggplot(mapping=aes(x=timestamp,y=volume))+geom_line()+
labs(title="Emergency Department Arrival Volumes per minute",
subtitle='simulated data for three days',
caption='Data Source:500 Patients')+
theme(
plot.title = element_text(color = "red", size = 12, face = "bold",hjust=0.5),
plot.subtitle = element_text(color = "blue",hjust=0.5),
plot.caption = element_text(color = "green", face = "italic",hjust=1)
)+
xlab('Date')+
ylab('Volume')
# From this Viz, the darker dense areas are the times with high volume.
# But, To find the patterns and trend more clearly we must go with more years of data
# We must address seasonality and trend before going for forecasting.
Number of patients waiting in the ED at any given time.
we have to consider both the patients available and patients left the ED.
To accomplish this, we have to keep a counter to track everytime a patient enters and leaves the ED.
When a patient walks in the door we add one to the overall count,
and when a patient leaves we subtract one.
Since we have the timestamps of when patients enter and leave, this is a very simple task
we will split our data in to two chunks, one for arrival times and one for departures times.
for each data split, we will create a counter variable.
This variable will take the value of 1 for the arrival split and -1 for the departure split.
We then bind the two splits back together, arrange them by time,
and take a cumulative sum of the counter variable.
As the previous example we need to fill in the gaps where no arrivals or departures exist. Except this time, instead of filling in the gaps with zeros we take the last observation carried forward.because previous minute’s patients existence is the ED count.
ed_data<- ed_data %>%
mutate(arrival_times=floor_date(arrival_times,unit='minute'),
depart_times=floor_date(depart_times,unit='minute'))
# Arrivals
arrivals <- ed_data%>%
select(timestamp=arrival_times)%>%
mutate(counter=1)
# Departures
departures <- ed_data%>%
select(timestamp=depart_times)%>%
mutate(counter=-1)
#ED census volumes per minute
census_volumes <- arrivals %>%
bind_rows(departures)%>%
arrange(timestamp,counter)%>% #arrange by time
mutate(volume=cumsum(counter)) #cumsum of counters to get the exact volumes at that point.
# create a sequence of times from the start to end of your available data.
start <- min(census_volumes$timestamp)
end <- max(census_volumes$timestamp)
full_time_window <- tibble(timestamp=seq(start,end,by='mins'))
#right join to get the missing time intervals.
census_volumes <- census_volumes%>%
right_join(full_time_window,by='timestamp')%>%
arrange(timestamp)%>%
fill(volume,.direction='down') #take last observation carried forward.
#this is because even though there are no people arrived in to ED, the previous timestamp's
#existing patients are available at that point in time.
census_volumes%>%
ggplot(mapping=aes(x=timestamp,y=volume))+
geom_line()+
labs(title="Emergency Department Census Volumes per minute",
subtitle='simulated data for three days',
caption='Data Source:500 Patients')+
theme(
plot.title = element_text(color = "red", size = 12, face = "bold",hjust=0.5),
plot.subtitle = element_text(color = "blue",hjust=0.5),
plot.caption = element_text(color = "green", face = "italic",hjust=1)
)+
xlab('Date')+
ylab('Volume')
With the simulated data, the arrivals volumes graphs conveys that midnight and afternoon’s on each day are the times most of the patient visits occur in Emergency department.
But, still to understand the patterns clearly we need to have more patients data for longer periods.
Well, the patients census has fluctuations on each day at different points in time.
There are some patterns, but they are not conclusive enough to use for any doctor’s or nurse’s schedules planning.
and, the fluctuations we see are expected in Emergency department. But, to figure a pattern we must include atleast two season’s of patients data.
To check, whether we have any seasonal effects on patients like temperature change, pandemic occurrence and some natural calamities. which is causing the patient volumes to increase
Even to forecast, the data size is small and any planning on such small data is not advisable and it might affect the planning and operations of ED.
But the same volume calculations of ED can be applied to even the terabytes of data to understand the patterns of ED and project the outcomes as forecast and we can use it for future planning activities.