We are to determine at various levels of detail where the agents are spending time. In general, we want to know for each agent and date how much time was spent “on queue” and how much time was on appointments.
We will also look into errors, missing data, and quirks that may come from the inherent messiness of the real data, and try to either explain or account for those erroneous points.
First, let us look at the data starting with the Shifts. Do not see any values explicitly listed as NA or duplicate (row-wise) on the import. Below we see the general shape of each data set.
shifts_ %>%
summarise_all(~ sum(is.na(.)))
appt_ %>%
summarise_all(~ sum(is.na(.)))
status_ %>%
summarise_all(~ sum(is.na(.)))
info<-
as.data.frame(cbind(shifts_ %>% dim(),
appt_ %>% dim(),
status_ %>% dim()),
row.names = c("rows", "columns"))
info %>% rename("shifts"=V1, "appointments"=V2, "status"=V3)
shifts | appointments | status | |
---|---|---|---|
rows | 1217 | 4110 | 30557 |
columns | 3 | 4 | 5 |
The following are the number of unique agents and dates we have for each data set. Given the structure of the conclusion tab, it seems like there are some dates and agents that will be excluded. It seems reasonable that shifts and appointments will be scheduled much further in advance than the day-to-day log in/out of answering calls, which may account for the dimension differences.
info_2<-
data.frame(x = shifts_ %>% distinct(`Agent Id`) %>% count(),
y = shifts_ %>% distinct(as.Date(`Start Time`, tz="UTC")) %>%count()) %>%
rename("unique_agents"=n, "unique_dates"=n.1)
appt_info<-
data.frame(x=appt_ %>% distinct(`Attendee Agent Id`) %>% count(),
y = appt_ %>% distinct(as.Date(`StartTime1`, tz="UTC")) %>%count()) %>%
rename("unique_agents"=n, "unique_dates"=n.1)
stat_info<-
data.frame(x=status_ %>% distinct(`Agent Id`) %>% count(),
y = status_ %>% distinct(as.Date(`Interval Start Date Time`,
tz="UTC")) %>% count()) %>%
rename("unique_agents"=n, "unique_dates"=n.1)
info_2<- rbind(info_2, appt_info, stat_info)
row.names(info_2) <- c("shifts", "appointments", "status")
info_2
unique_agents | unique_dates | |
---|---|---|
shifts | 30 | 74 |
appointments | 31 | 143 |
status | 69 | 51 |
For the first task we are to determine the number of on shift hours per agent per day. Let us first make sure our times have imported correctly and check if the length of shift makes sense.
shifts_ %>%
mutate(shift_hours = as.double(`End Time`- `Start Time`,
units="hours")) %>%
group_by(shift_hours)%>%
summarize(count = n())
shift_hours | count |
---|---|
1 | 2 |
8 | 15 |
9 | 1200 |
The vast majority of shifts are nine hours in length. No one would be scheduled for a one hour shift.
appt_check<-
appt_ %>%
mutate(start_date = as.Date(StartTime1, tz="UTC"))
shift_check<-
shifts_ %>%
mutate(start_date = as.Date(`Start Time`, tz="UTC")) %>%
mutate(shift_hours = as.double(`End Time`- `Start Time`,
units = "hours"))
shift_check %>%
left_join(appt_check, by=c("Agent Id"="Attendee Agent Id",
"start_date"="start_date")) %>%
filter(shift_hours ==1 | Description == "Ill/Sick", start_date=="2019-01-14")
Agent Id | Start Time | End Time | start_date | shift_hours | Description | StartTime1 | EndTime1 |
---|---|---|---|---|---|---|---|
13567703 | 2019-01-14 19:00:00 | 2019-01-14 20:00:00 | 2019-01-14 | 1 | Transition | 2019-01-14 18:45:00 | 2019-01-14 19:00:00 |
13567703 | 2019-01-14 19:00:00 | 2019-01-14 20:00:00 | 2019-01-14 | 1 | Lunch | 2019-01-14 19:00:00 | 2019-01-14 20:00:00 |
17460544 | 2019-01-14 19:00:00 | 2019-01-14 20:00:00 | 2019-01-14 | 1 | Transition | 2019-01-14 19:15:00 | 2019-01-14 19:30:00 |
17460544 | 2019-01-14 19:00:00 | 2019-01-14 20:00:00 | 2019-01-14 | 1 | Lunch | 2019-01-14 19:30:00 | 2019-01-14 20:30:00 |
21586554 | 2019-01-14 13:00:00 | 2019-01-14 22:00:00 | 2019-01-14 | 9 | Ill/Sick | 2019-01-14 05:00:00 | 2019-01-15 05:00:00 |
The two shifts that are one hour in length, appear to be because someone called in sick as shown above.
Let’s check if the shifts are correctly ordered. As in, are there any shifts that end before they start. We will also check if any shift spans multiple days or doubles up on a day.
shifts<-
shifts_ %>%
mutate(shift_hours = as.double(`End Time`- `Start Time`,
units = "hours")) %>%
mutate(start_date = as.Date(`Start Time`, tz="UTC"),
end_date = as.Date(`End Time`, tz="UTC"))
shifts %>%
filter(start_date <= end_date) %>%
dim()
## [1] 1217 6
Returning the same dimensions as the complete data set means that the shifts are correctly ordered. There are no shifts that ended before they started. We also have that shifts end on the same day except for 123 shifts that ended at exactly midnight. This means we can simply focus on the start date and compare total times across each day, since no shift dates will overlap with appointment dates.
As stated, we want the per agent per day number of on-shift hours. From the conclusion tab, it seems like we are to filter out any days beyond 02/19/2019.
shifts<-
shifts_ %>%
mutate(shift_hours = as.double(`End Time`- `Start Time`,
units = "hours")) %>%
mutate(start_date = as.Date(`Start Time`, tz="UTC"),
end_date = as.Date(`End Time`, tz="UTC")) %>%
filter(start_date != (weekdays(as.Date(`Start Time`))
%in% c('Saturday','Sunday')))
shifts<-
shifts %>%
filter(start_date <= "2019-02-19") %>%
group_by(`Agent Id`, start_date, shift_hours) %>%
summarize(count = n()) %>%
select(!count)
shifts %>%
head()
Agent Id | start_date | shift_hours |
---|---|---|
13216880 | 2019-01-01 | 9 |
13216880 | 2019-01-02 | 9 |
13216880 | 2019-01-03 | 9 |
13216880 | 2019-01-04 | 9 |
13216880 | 2019-01-07 | 9 |
13216880 | 2019-01-08 | 9 |
Now we can pivot the data to a wider format for the desired output. Days with no shift hours were filled as zeros.
task_1<-
shifts %>%
filter(start_date <= "2019-02-19") %>%
pivot_wider(id_cols = `Agent Id`,
names_from = start_date,
values_from = shift_hours,
values_fill = 0)
task_1[0:5,0:5]
Agent Id | 2019-01-01 | 2019-01-02 | 2019-01-03 | 2019-01-04 |
---|---|---|---|---|
13216880 | 9 | 9 | 9 | 9 |
13567703 | 9 | 0 | 0 | 0 |
13567727 | 9 | 9 | 9 | 9 |
14294766 | 9 | 9 | 9 | 9 |
14295138 | 9 | 9 | 9 | 9 |
Since no shifts span multiple days, the appointment can be any length. Additionally, since they are counted by individual days, the sum can be larger than 24 hours (someone can be on vacation and have another appointment), without affecting appointments for neighboring days. As such, we can structure the data similarly to shifts and calculate fields based on days and the sum of the appointment hours. A subset of the raw data looks as follows.
appt<-
appt_ %>%
mutate(appt_hours = as.double(EndTime1- StartTime1,
units = "hours")) %>%
mutate(start_date_appt = as.Date(StartTime1, tz="UTC"),
end_date_appt = as.Date(EndTime1, tz="UTC")) %>%
filter(start_date_appt != (weekdays(as.Date(`StartTime1`))
%in% c('Saturday','Sunday')))
appt %>%
select(`Attendee Agent Id`,
start_date_appt,
appt_hours,
Description) %>%
head()
Attendee Agent Id | start_date_appt | appt_hours | Description |
---|---|---|---|
16136582 | 2019-01-01 | 24 | Holiday |
18778054 | 2019-01-01 | 24 | Holiday |
13216880 | 2019-01-01 | 24 | Holiday |
13567703 | 2019-01-01 | 24 | Holiday |
13567727 | 2019-01-01 | 24 | Holiday |
14294766 | 2019-01-01 | 24 | Holiday |
There are no agents scheduled for meetings while not on-shift.
We can re-frame this data into the per agent per day format with the sum of the appointment hours.
appt<-
appt %>%
group_by(`Attendee Agent Id`, start_date_appt) %>%
summarize(appt_hours = sum(appt_hours))
appt %>% head()
Attendee Agent Id | start_date_appt | appt_hours |
---|---|---|
13216880 | 2019-01-01 | 24.00 |
13216880 | 2019-01-02 | 1.50 |
13216880 | 2019-01-03 | 4.50 |
13216880 | 2019-01-04 | 2.25 |
13216880 | 2019-01-07 | 5.00 |
13216880 | 2019-01-08 | 5.75 |
Now that both shifts and appointments are structured the same way, we can join the data together and calculate the difference between shift hours and appointment hours. This gives us our desired On Queue hours.
Negative On Queue Hours: These were forced to a value of zero. If an agent is fully booked with appointments, then that agent is expected to be on queue for zero hours. Below are the first six rows of data.
on_queue<-
shifts %>%
left_join(appt, by = c("Agent Id" = "Attendee Agent Id",
"start_date"= "start_date_appt")) %>%
mutate(on_queue_hours = shift_hours - appt_hours) %>%
mutate(on_queue_hours = case_when(
on_queue_hours< 0 ~ 0,
TRUE ~ on_queue_hours))
Agent Id | start_date | shift_hours | appt_hours | on_queue_hours |
---|---|---|---|---|
13216880 | 2019-01-01 | 9 | 24.00 | 0.00 |
13216880 | 2019-01-02 | 9 | 1.50 | 7.50 |
13216880 | 2019-01-03 | 9 | 4.50 | 4.50 |
13216880 | 2019-01-04 | 9 | 2.25 | 6.75 |
13216880 | 2019-01-07 | 9 | 5.00 | 4.00 |
13216880 | 2019-01-08 | 9 | 5.75 | 3.25 |
Again, we can now pivot the data to the desired format.
task_2<-
on_queue %>%
filter(start_date<="2019-02-19") %>%
pivot_wider(id_cols = `Agent Id`,
names_from = start_date,
values_from = on_queue_hours,
values_fill = 0)
task_2[0:5, 0:5]
Agent Id | 2019-01-01 | 2019-01-02 | 2019-01-03 | 2019-01-04 |
---|---|---|---|---|
13216880 | 0 | 7.50 | 4.50 | 6.75 |
13567703 | 0 | 0.00 | 0.00 | 0.00 |
13567727 | 0 | 0.00 | 0.00 | 0.00 |
14294766 | 0 | 7.75 | 6.75 | 7.25 |
14295138 | 0 | 1.25 | 2.25 | 1.25 |
On queue and fielding calls. To log hours as ADD the agent needs to be on shift and not have a meeting. This has a few ramifications;
We will investigate the possibility of the above situations while calculating the total hours working the queue.
The data is precise to the second and is a simple time tracker for logging in and out of ADD status.
appt_<-
appt_ %>%
mutate(start_date = as.Date(StartTime1, tz="UTC"))
status<-
status_ %>%
mutate(work_date = as.Date(`Interval Start Date Time`, tz="UTC"))
status_[0:5, 0:5]
Agent Id | Interval Start Date Time | Interval End Date Time | Agent Status Id | Agent Status Name |
---|---|---|---|---|
13216880 | 2019-01-02 14:14:01 | 2019-01-02 14:14:07 | 1 | ADD |
13216880 | 2019-01-02 14:14:07 | 2019-01-02 14:16:32 | 3 | HOLD |
13216880 | 2019-01-02 14:16:32 | 2019-01-02 14:28:59 | 1 | ADD |
13216880 | 2019-01-02 14:28:59 | 2019-01-02 14:31:45 | 3 | HOLD |
13216880 | 2019-01-02 14:31:45 | 2019-01-02 14:32:29 | 1 | ADD |
We will join the appointment data to the status data (left join) and create some calculated fields based on how ADD hours might leak from meeting times. Our data now looks as follows, noting the truncation of the interval times to preserve the meetings.
appt_stat<-
appt_ %>%
left_join(status, by=c("Attendee Agent Id"="Agent Id",
"start_date"="work_date"))
actual_work_<-
appt_stat %>%
mutate(
edge_check = case_when(
`Interval Start Date Time`< StartTime1 &
`Interval End Date Time` > StartTime1 ~ "call extended",
`Interval Start Date Time` < EndTime1 &
`Interval Start Date Time` > StartTime1 ~ "meeting drop",
`Interval Start Date Time` >= StartTime1 &
`Interval End Date Time` <= EndTime1 ~ "call during meeting",
Description %in% c("Holiday", "Time Off Day", "Ill/Sick") &
`Agent Status Id` == 1 ~ "work while off",
TRUE ~ "normal"
)
) %>%
mutate(
`Interval Start Date Time` = case_when(
edge_check == "meeting drop" ~ EndTime1,
edge_check == "work while off" ~ as.POSIXct(NA),
edge_check == "call during meeting" ~ as.POSIXct(NA),
TRUE ~ `Interval Start Date Time`),
`Interval End Date Time` = case_when(
edge_check == "call extended" ~ StartTime1,
edge_check == "work while off" ~ as.POSIXct(NA),
edge_check == "call during meeting" ~ as.POSIXct(NA),
TRUE ~ `Interval End Date Time`
),
work_hours = as.double(`Interval End Date Time` - `Interval Start Date Time`,
units="hours"),
work_hours = case_when(work_hours <= 0 ~ 0, TRUE ~ work_hours)
) %>%
filter(`Agent Status Id` == 1)
actual_work_ %>%
select(c("Interval Start Date Time", "Interval End Date Time",
"edge_check", "work_hours")) %>%
head()
Interval Start Date Time | Interval End Date Time | edge_check | work_hours |
---|---|---|---|
2019-01-01 14:07:08 | 2019-01-01 15:38:58 | normal | 1.5305556 |
2019-01-01 15:45:33 | 2019-01-01 15:45:35 | normal | 0.0005556 |
2019-01-01 15:52:43 | 2019-01-01 16:00:00 | call extended | 0.1213889 |
2019-01-01 18:37:44 | 2019-01-01 21:29:49 | normal | 2.8680556 |
2019-01-01 21:32:20 | 2019-01-01 21:57:02 | normal | 0.4116667 |
2019-01-01 13:00:57 | 2019-01-01 15:09:42 | normal | 2.1458333 |
Since the status data will repeat itself for each meeting, we need to calculate the number of meetings each agent has each day and divide their working hours by the number of meetings, and take the distinct per agent per day working hours.
appt_count<-
appt_ %>%
group_by(`Attendee Agent Id`, start_date) %>%
summarise(appt_count = n())
actual_work<-
actual_work_ %>%
inner_join(appt_count, by=c("Attendee Agent Id"="Attendee Agent Id",
"start_date"="start_date")) %>%
group_by(`Attendee Agent Id`, start_date) %>%
summarize(work_hours = sum(work_hours)/appt_count) %>%
distinct()
actual_work %>%
head()
Attendee Agent Id | start_date | work_hours |
---|---|---|
13216880 | 2019-01-02 | 5.563333 |
13216880 | 2019-01-03 | 2.899722 |
13216880 | 2019-01-04 | 4.448333 |
13216880 | 2019-01-07 | 3.431250 |
13216880 | 2019-01-08 | 2.646722 |
13216880 | 2019-01-10 | 4.638611 |
We can now select and pivot the data to the desired output. Non-working hours are left as NA. This is intentional as zero working hours and not expected to work are two different states and should be treated as such. Additionally, one agent had either no on queue hours or had ADD hours truncated due to work during meetings. That agent row was omitted for the final output.
task_3<-
actual_work %>%
na_if(0) %>%
na.omit() %>%
filter(start_date != (weekdays(as.Date(start_date))
%in% c('Saturday','Sunday'))) %>%
arrange(`Attendee Agent Id`) %>%
pivot_wider(id_cols = `Attendee Agent Id`,
names_from = start_date,
values_from = work_hours)
task_3[0:5, 0:5]
Attendee Agent Id | 2019-01-02 | 2019-01-03 | 2019-01-04 | 2019-01-07 |
---|---|---|---|---|
13216880 | 5.563333 | 2.899722 | 4.4483333 | 3.431250 |
13567703 | NA | NA | NA | NA |
13567727 | NA | NA | NA | NA |
14294766 | 6.257222 | 5.917153 | 6.1904630 | NA |
14295138 | 1.067222 | 0.741875 | 0.3641667 | 1.974444 |
Time leakage is when an agent is working the queue when they should not. Who is leaking meeting time? What are the top meetings that get eaten by extended calls or early drop outs?
plot_status<-
status %>%
filter(`Agent Status Id` == 1) %>%
mutate(raw_work_hours =
as.double(`Interval End Date Time` - `Interval Start Date Time`,
units="hours")) %>%
group_by(`Agent Id`, work_date) %>%
summarise(raw_work_hours = sum(raw_work_hours))
plot_actual_work<-
actual_work %>%
left_join(plot_status,
by=c("Attendee Agent Id" = "Agent Id",
"start_date"= "work_date"))
actual_work_ %>%
filter(edge_check != "normal",
edge_check != "work while off") %>%
left_join(plot_actual_work, by=c("Attendee Agent Id" = "Attendee Agent Id",
"start_date"= "start_date")) %>%
mutate(lost_hours = raw_work_hours - work_hours.y) %>%
group_by(Description, edge_check) %>%
summarize(total_lost = sum(lost_hours)) %>%
ggplot(aes(x=reorder(Description, desc(total_lost)),
y=total_lost, fill=edge_check)) +
geom_col() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
legend.position = "none") +
geom_text(aes(label=round(total_lost,0)), vjust=-0.3, hjust=0.5, size=3) +
labs(title="Meeting Time Loss by Type",
x="Meeting Type",
y="Lost Hours",
fill="Reason") +
facet_wrap(~edge_check)
Training meetings are by far the largest source of agents working the queue outside of their scheduled time. Transition time is a distant second. We will now look to see which agents run into meeting times and which agents drop meetings early. For brevity, the following graph shows only lost hours greater than one.
actual_work_ %>%
filter(edge_check != "normal") %>%
left_join(plot_actual_work, by=c("Attendee Agent Id" = "Attendee Agent Id",
"start_date"= "start_date")) %>%
mutate(lost_hours = raw_work_hours - work_hours.y) %>%
group_by(`Attendee Agent Id`, Description, edge_check) %>%
summarize(total_lost = sum(lost_hours)) %>%
filter(total_lost > 1) %>%
ggplot(aes(x=reorder(`Attendee Agent Id`, desc(total_lost)),
y=total_lost, fill=edge_check)) +
geom_col() +
scale_y_continuous(limits = c(0, 60)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
legend.position = "top",
legend.justification = c(-0.02,0)) +
geom_text(aes(label=round(total_lost,0)),
vjust=-0.3, hjust=0.5, size=3) +
labs(title="Meeting Time Loss by Agent",
x="Agent",
y="Total Hours",
fill="Reason") +
facet_wrap(~Description)
Here we see a handful of people account for the vast majority of the meeting time leakage. Three agents make up 75 out of 79 leaked meeting hours. Lastly, we will determine the percentage of lost ADD hours attributable to individual agents.
last_plot<-
actual_work_ %>%
filter(edge_check != "normal") %>%
left_join(plot_actual_work, by=c("Attendee Agent Id" = "Attendee Agent Id",
"start_date"= "start_date")) %>%
mutate(lost_hours = raw_work_hours - work_hours.y)
actual_work_ %>%
filter(edge_check != "normal") %>%
left_join(plot_actual_work, by=c("Attendee Agent Id" = "Attendee Agent Id",
"start_date"= "start_date")) %>%
mutate(lost_hours = raw_work_hours - work_hours.y) %>%
group_by(`Attendee Agent Id`) %>%
summarize(percent_lost = sum(lost_hours)/sum(last_plot$lost_hours,
na.rm = TRUE)) %>%
filter(percent_lost >= 0.01) %>%
ggplot(aes(x=reorder(`Attendee Agent Id`, desc(percent_lost)),
y=percent_lost, fill="#F8766D")) +
geom_col() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
legend.position = "none",
legend.justification = c(-0.02,0)) +
geom_text(aes(label=round(percent_lost,2)),
vjust=-0.3, hjust=0.5, size=3) +
labs(title="Percent of All Meeting Time Lost by Agent",
x="Agent",
y="Percent")
A single agent is responsible for 41% of the total number of hours lost due to working during a meeting.