Time On Queue

We are to determine at various levels of detail where the agents are spending time. In general, we want to know for each agent and date how much time was spent “on queue” and how much time was on appointments.

We will also look into errors, missing data, and quirks that may come from the inherent messiness of the real data, and try to either explain or account for those erroneous points.

Data Exploration

First, let us look at the data starting with the Shifts. Do not see any values explicitly listed as NA or duplicate (row-wise) on the import. Below we see the general shape of each data set.

shifts_ %>%
  summarise_all(~ sum(is.na(.)))
appt_ %>%
  summarise_all(~ sum(is.na(.)))
status_ %>%
  summarise_all(~ sum(is.na(.)))
info<- 
  as.data.frame(cbind(shifts_ %>% dim(),
                      appt_ %>% dim(), 
                      status_ %>% dim()),
                      row.names = c("rows", "columns"))

info %>% rename("shifts"=V1, "appointments"=V2, "status"=V3)
shifts appointments status
rows 1217 4110 30557
columns 3 4 5

The following are the number of unique agents and dates we have for each data set. Given the structure of the conclusion tab, it seems like there are some dates and agents that will be excluded. It seems reasonable that shifts and appointments will be scheduled much further in advance than the day-to-day log in/out of answering calls, which may account for the dimension differences.

info_2<- 
  data.frame(x = shifts_ %>% distinct(`Agent Id`) %>% count(),
             y = shifts_ %>% distinct(as.Date(`Start Time`, tz="UTC")) %>%count()) %>%
  rename("unique_agents"=n, "unique_dates"=n.1)
  
    
appt_info<- 
  data.frame(x=appt_ %>% distinct(`Attendee Agent Id`) %>% count(),
             y = appt_ %>% distinct(as.Date(`StartTime1`, tz="UTC")) %>%count()) %>%
  rename("unique_agents"=n, "unique_dates"=n.1)

stat_info<-
  data.frame(x=status_ %>% distinct(`Agent Id`) %>% count(),
             y = status_ %>% distinct(as.Date(`Interval Start Date Time`,
                                             tz="UTC")) %>% count()) %>%
  rename("unique_agents"=n, "unique_dates"=n.1)

info_2<- rbind(info_2, appt_info, stat_info)
row.names(info_2) <- c("shifts", "appointments", "status")

info_2
unique_agents unique_dates
shifts 30 74
appointments 31 143
status 69 51

On Shift Hours

For the first task we are to determine the number of on shift hours per agent per day. Let us first make sure our times have imported correctly and check if the length of shift makes sense.

shifts_ %>%
  mutate(shift_hours = as.double(`End Time`- `Start Time`, 
                                 units="hours")) %>%
  group_by(shift_hours)%>% 
  summarize(count = n())
shift_hours count
1 2
8 15
9 1200

The vast majority of shifts are nine hours in length. No one would be scheduled for a one hour shift.

appt_check<-
  appt_ %>%
  mutate(start_date = as.Date(StartTime1, tz="UTC")) 

shift_check<-
  shifts_ %>%
  mutate(start_date = as.Date(`Start Time`, tz="UTC")) %>%
  mutate(shift_hours = as.double(`End Time`- `Start Time`,
                                 units = "hours"))
shift_check %>%
  left_join(appt_check, by=c("Agent Id"="Attendee Agent Id",
                       "start_date"="start_date"))  %>%
  filter(shift_hours ==1 | Description == "Ill/Sick", start_date=="2019-01-14")
Agent Id Start Time End Time start_date shift_hours Description StartTime1 EndTime1
13567703 2019-01-14 19:00:00 2019-01-14 20:00:00 2019-01-14 1 Transition 2019-01-14 18:45:00 2019-01-14 19:00:00
13567703 2019-01-14 19:00:00 2019-01-14 20:00:00 2019-01-14 1 Lunch 2019-01-14 19:00:00 2019-01-14 20:00:00
17460544 2019-01-14 19:00:00 2019-01-14 20:00:00 2019-01-14 1 Transition 2019-01-14 19:15:00 2019-01-14 19:30:00
17460544 2019-01-14 19:00:00 2019-01-14 20:00:00 2019-01-14 1 Lunch 2019-01-14 19:30:00 2019-01-14 20:30:00
21586554 2019-01-14 13:00:00 2019-01-14 22:00:00 2019-01-14 9 Ill/Sick 2019-01-14 05:00:00 2019-01-15 05:00:00

The two shifts that are one hour in length, appear to be because someone called in sick as shown above.

Shift Integrity

Let’s check if the shifts are correctly ordered. As in, are there any shifts that end before they start. We will also check if any shift spans multiple days or doubles up on a day.

shifts<-
  shifts_ %>%
  mutate(shift_hours = as.double(`End Time`- `Start Time`,
                                 units = "hours")) %>%
  mutate(start_date = as.Date(`Start Time`, tz="UTC"), 
         end_date = as.Date(`End Time`, tz="UTC")) 


shifts %>%
  filter(start_date <= end_date) %>%
  dim()
## [1] 1217    6
# for wrongly labeled start and end time
# filter(start_date != end_date, end_date != start_date) 

Returning the same dimensions as the complete data set means that the shifts are correctly ordered. There are no shifts that ended before they started. We also have that shifts end on the same day except for 123 shifts that ended at exactly midnight. This means we can simply focus on the start date and compare total times across each day, since no shift dates will overlap with appointment dates.

Structure Shift Data

As stated, we want the per agent per day number of on-shift hours. From the conclusion tab, it seems like we are to filter out any days beyond 02/19/2019.

shifts<-
  shifts_ %>%
  mutate(shift_hours = as.double(`End Time`- `Start Time`,
                                 units = "hours")) %>%
  mutate(start_date = as.Date(`Start Time`, tz="UTC"), 
         end_date = as.Date(`End Time`, tz="UTC")) %>%
  filter(start_date != (weekdays(as.Date(`Start Time`))
                          %in% c('Saturday','Sunday')))

shifts<-
  shifts %>%
  filter(start_date <= "2019-02-19") %>%
  group_by(`Agent Id`, start_date, shift_hours) %>%
  summarize(count = n()) %>%
  select(!count)


shifts %>%
  head()
Agent Id start_date shift_hours
13216880 2019-01-01 9
13216880 2019-01-02 9
13216880 2019-01-03 9
13216880 2019-01-04 9
13216880 2019-01-07 9
13216880 2019-01-08 9

Task 1

Now we can pivot the data to a wider format for the desired output. Days with no shift hours were filled as zeros.

task_1<-
  shifts %>%
  filter(start_date <= "2019-02-19") %>%
  pivot_wider(id_cols = `Agent Id`,
              names_from = start_date,
              values_from = shift_hours,
              values_fill = 0)

task_1[0:5,0:5]
Agent Id 2019-01-01 2019-01-02 2019-01-03 2019-01-04
13216880 9 9 9 9
13567703 9 0 0 0
13567727 9 9 9 9
14294766 9 9 9 9
14295138 9 9 9 9
write_excel_csv(task_1, "task_1_shifts.csv")

Appointments

Since no shifts span multiple days, the appointment can be any length. Additionally, since they are counted by individual days, the sum can be larger than 24 hours (someone can be on vacation and have another appointment), without affecting appointments for neighboring days. As such, we can structure the data similarly to shifts and calculate fields based on days and the sum of the appointment hours. A subset of the raw data looks as follows.

appt<-
  appt_ %>%
  mutate(appt_hours = as.double(EndTime1- StartTime1,
                                 units = "hours")) %>%
  mutate(start_date_appt = as.Date(StartTime1, tz="UTC"), 
         end_date_appt = as.Date(EndTime1, tz="UTC")) %>%
  filter(start_date_appt != (weekdays(as.Date(`StartTime1`))
                          %in% c('Saturday','Sunday')))


appt %>% 
  select(`Attendee Agent Id`, 
         start_date_appt,
         appt_hours, 
         Description) %>%
  head()
Attendee Agent Id start_date_appt appt_hours Description
16136582 2019-01-01 24 Holiday
18778054 2019-01-01 24 Holiday
13216880 2019-01-01 24 Holiday
13567703 2019-01-01 24 Holiday
13567727 2019-01-01 24 Holiday
14294766 2019-01-01 24 Holiday

Appointment Integrity

There are no agents scheduled for meetings while not on-shift.

shifts_ %>%
  mutate(start_date = as.Date(`Start Time`, tz="UTC")) %>%
  mutate(shift_hours = as.double(`End Time`- `Start Time`,
                                 units = "hours")) %>%
  left_join(appt, by=c("Agent Id"="Attendee Agent Id",
                       "start_date"="start_date_appt")) %>%
  filter(shift_hours == 0, appt_hours > 0)

Structure Appointment Data

We can re-frame this data into the per agent per day format with the sum of the appointment hours.

appt<-
  appt %>% 
  group_by(`Attendee Agent Id`, start_date_appt) %>%
  summarize(appt_hours =  sum(appt_hours)) 
appt %>% head()
Attendee Agent Id start_date_appt appt_hours
13216880 2019-01-01 24.00
13216880 2019-01-02 1.50
13216880 2019-01-03 4.50
13216880 2019-01-04 2.25
13216880 2019-01-07 5.00
13216880 2019-01-08 5.75

On Queue Hours

Now that both shifts and appointments are structured the same way, we can join the data together and calculate the difference between shift hours and appointment hours. This gives us our desired On Queue hours.

Negative On Queue Hours: These were forced to a value of zero. If an agent is fully booked with appointments, then that agent is expected to be on queue for zero hours. Below are the first six rows of data.

on_queue<-
  shifts %>%
  left_join(appt, by = c("Agent Id" = "Attendee Agent Id", 
                         "start_date"= "start_date_appt")) %>%
  mutate(on_queue_hours = shift_hours - appt_hours) %>%
  mutate(on_queue_hours = case_when(
                  on_queue_hours< 0 ~ 0,
                  TRUE ~ on_queue_hours)) 
on_queue %>%
  head()
Agent Id start_date shift_hours appt_hours on_queue_hours
13216880 2019-01-01 9 24.00 0.00
13216880 2019-01-02 9 1.50 7.50
13216880 2019-01-03 9 4.50 4.50
13216880 2019-01-04 9 2.25 6.75
13216880 2019-01-07 9 5.00 4.00
13216880 2019-01-08 9 5.75 3.25

Task 2

Again, we can now pivot the data to the desired format.

task_2<- 
  on_queue %>%
  filter(start_date<="2019-02-19") %>%
  pivot_wider(id_cols = `Agent Id`,
              names_from = start_date,
              values_from = on_queue_hours,
              values_fill = 0)
task_2[0:5, 0:5]
Agent Id 2019-01-01 2019-01-02 2019-01-03 2019-01-04
13216880 0 7.50 4.50 6.75
13567703 0 0.00 0.00 0.00
13567727 0 0.00 0.00 0.00
14294766 0 7.75 6.75 7.25
14295138 0 1.25 2.25 1.25
write_excel_csv(task_2, "task_2_queue_hours.csv")

Status

On queue and fielding calls. To log hours as ADD the agent needs to be on shift and not have a meeting. This has a few ramifications;

  1. Calls that extend beyond the start of a meeting will be truncated to the meeting start time.
  2. Switching to ADD status while in a meeting should not count towards ADD hours.
  3. No working during a day off/sick day.
  4. ADD hours only count before/after a meeting (even if the call started before the end of the meeting).

We will investigate the possibility of the above situations while calculating the total hours working the queue.

Structure Status Data

The data is precise to the second and is a simple time tracker for logging in and out of ADD status.

appt_<-
  appt_ %>%
  mutate(start_date = as.Date(StartTime1, tz="UTC")) 

status<- 
  status_ %>%
  mutate(work_date = as.Date(`Interval Start Date Time`, tz="UTC"))

status_[0:5, 0:5]
Agent Id Interval Start Date Time Interval End Date Time Agent Status Id Agent Status Name
13216880 2019-01-02 14:14:01 2019-01-02 14:14:07 1 ADD
13216880 2019-01-02 14:14:07 2019-01-02 14:16:32 3 HOLD
13216880 2019-01-02 14:16:32 2019-01-02 14:28:59 1 ADD
13216880 2019-01-02 14:28:59 2019-01-02 14:31:45 3 HOLD
13216880 2019-01-02 14:31:45 2019-01-02 14:32:29 1 ADD

We will join the appointment data to the status data (left join) and create some calculated fields based on how ADD hours might leak from meeting times. Our data now looks as follows, noting the truncation of the interval times to preserve the meetings.

appt_stat<-
  appt_ %>%
  left_join(status, by=c("Attendee Agent Id"="Agent Id", 
                         "start_date"="work_date"))

actual_work_<-
  appt_stat %>%
  mutate(
    edge_check = case_when(
              `Interval Start Date Time`< StartTime1 & 
                `Interval End Date Time` > StartTime1 ~ "call extended",
              `Interval Start Date Time` < EndTime1  &
                `Interval Start Date Time` > StartTime1 ~ "meeting drop",
              `Interval Start Date Time` >= StartTime1 &
                `Interval End Date Time` <= EndTime1 ~ "call during meeting",
              Description %in% c("Holiday", "Time Off Day", "Ill/Sick") & 
                `Agent Status Id` == 1 ~ "work while off",
              TRUE ~ "normal"
                          )
          ) %>%
  mutate(
    `Interval Start Date Time` = case_when(
                          edge_check == "meeting drop" ~ EndTime1,
                          edge_check == "work while off" ~ as.POSIXct(NA),
                          edge_check == "call during meeting" ~ as.POSIXct(NA),
                          TRUE ~ `Interval Start Date Time`),
    `Interval End Date Time` = case_when(
                          edge_check == "call extended" ~ StartTime1,
                          edge_check == "work while off" ~ as.POSIXct(NA),
                          edge_check == "call during meeting" ~ as.POSIXct(NA),
                          TRUE ~ `Interval End Date Time`
                          ),
    work_hours = as.double(`Interval End Date Time` - `Interval Start Date Time`,
                           units="hours"),
    work_hours = case_when(work_hours <= 0 ~ 0, TRUE ~ work_hours)
         ) %>%
  filter(`Agent Status Id` == 1) 
actual_work_ %>% 
  select(c("Interval Start Date Time", "Interval End Date Time",
           "edge_check", "work_hours")) %>%
  head()
Interval Start Date Time Interval End Date Time edge_check work_hours
2019-01-01 14:07:08 2019-01-01 15:38:58 normal 1.5305556
2019-01-01 15:45:33 2019-01-01 15:45:35 normal 0.0005556
2019-01-01 15:52:43 2019-01-01 16:00:00 call extended 0.1213889
2019-01-01 18:37:44 2019-01-01 21:29:49 normal 2.8680556
2019-01-01 21:32:20 2019-01-01 21:57:02 normal 0.4116667
2019-01-01 13:00:57 2019-01-01 15:09:42 normal 2.1458333

Since the status data will repeat itself for each meeting, we need to calculate the number of meetings each agent has each day and divide their working hours by the number of meetings, and take the distinct per agent per day working hours.

appt_count<- 
  appt_ %>% 
  group_by(`Attendee Agent Id`, start_date) %>%
  summarise(appt_count = n())

actual_work<-
  actual_work_ %>%
  inner_join(appt_count, by=c("Attendee Agent Id"="Attendee Agent Id", 
                         "start_date"="start_date")) %>%
  group_by(`Attendee Agent Id`, start_date) %>%
  summarize(work_hours = sum(work_hours)/appt_count) %>%
  distinct() 
  
  
actual_work %>%
  head()
Attendee Agent Id start_date work_hours
13216880 2019-01-02 5.563333
13216880 2019-01-03 2.899722
13216880 2019-01-04 4.448333
13216880 2019-01-07 3.431250
13216880 2019-01-08 2.646722
13216880 2019-01-10 4.638611

Task 3

We can now select and pivot the data to the desired output. Non-working hours are left as NA. This is intentional as zero working hours and not expected to work are two different states and should be treated as such. Additionally, one agent had either no on queue hours or had ADD hours truncated due to work during meetings. That agent row was omitted for the final output.

task_3<-
  actual_work %>%
  na_if(0) %>%
  na.omit() %>%
  filter(start_date != (weekdays(as.Date(start_date))
                           %in% c('Saturday','Sunday'))) %>%
  arrange(`Attendee Agent Id`) %>%
  pivot_wider(id_cols = `Attendee Agent Id`,
              names_from = start_date,
              values_from = work_hours) 
  
  


task_3[0:5, 0:5]
Attendee Agent Id 2019-01-02 2019-01-03 2019-01-04 2019-01-07
13216880 5.563333 2.899722 4.4483333 3.431250
13567703 NA NA NA NA
13567727 NA NA NA NA
14294766 6.257222 5.917153 6.1904630 NA
14295138 1.067222 0.741875 0.3641667 1.974444
write_excel_csv(task_3, "task_3_working_hours.csv")

Meeting Time Leakage

Time leakage is when an agent is working the queue when they should not. Who is leaking meeting time? What are the top meetings that get eaten by extended calls or early drop outs?

plot_status<- 
  status %>%
  filter(`Agent Status Id` == 1) %>%
  mutate(raw_work_hours = 
              as.double(`Interval End Date Time` - `Interval Start Date Time`,
              units="hours")) %>%
  group_by(`Agent Id`, work_date) %>%
  summarise(raw_work_hours = sum(raw_work_hours))
            
plot_actual_work<-
  actual_work %>%
  left_join(plot_status, 
            by=c("Attendee Agent Id" = "Agent Id",
                "start_date"= "work_date"))
actual_work_ %>%
  filter(edge_check != "normal",
         edge_check != "work while off") %>%
  left_join(plot_actual_work, by=c("Attendee Agent Id" = "Attendee Agent Id",
                "start_date"= "start_date")) %>%
  mutate(lost_hours = raw_work_hours - work_hours.y) %>%
  group_by(Description, edge_check) %>%
  summarize(total_lost = sum(lost_hours)) %>%
  ggplot(aes(x=reorder(Description, desc(total_lost)),
             y=total_lost, fill=edge_check)) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
        legend.position = "none") +
  geom_text(aes(label=round(total_lost,0)), vjust=-0.3, hjust=0.5, size=3) +
  labs(title="Meeting Time Loss by Type",
       x="Meeting Type",
       y="Lost Hours",
       fill="Reason") +
  facet_wrap(~edge_check)

Training meetings are by far the largest source of agents working the queue outside of their scheduled time. Transition time is a distant second. We will now look to see which agents run into meeting times and which agents drop meetings early. For brevity, the following graph shows only lost hours greater than one.

actual_work_ %>%
  filter(edge_check != "normal") %>%
  left_join(plot_actual_work, by=c("Attendee Agent Id" = "Attendee Agent Id",
                "start_date"= "start_date")) %>%
  mutate(lost_hours = raw_work_hours - work_hours.y) %>%
  group_by(`Attendee Agent Id`, Description, edge_check) %>%
  summarize(total_lost = sum(lost_hours)) %>%
  filter(total_lost > 1) %>%
  ggplot(aes(x=reorder(`Attendee Agent Id`, desc(total_lost)),
             y=total_lost, fill=edge_check)) +
  geom_col() +
  scale_y_continuous(limits = c(0, 60)) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), 
        legend.position = "top",
        legend.justification = c(-0.02,0)) +
  geom_text(aes(label=round(total_lost,0)), 
            vjust=-0.3, hjust=0.5, size=3) +
  labs(title="Meeting Time Loss by Agent",
       x="Agent",
       y="Total Hours",
       fill="Reason") +
  facet_wrap(~Description)

Here we see a handful of people account for the vast majority of the meeting time leakage. Three agents make up 75 out of 79 leaked meeting hours. Lastly, we will determine the percentage of lost ADD hours attributable to individual agents.

last_plot<- 
  actual_work_ %>%
  filter(edge_check != "normal") %>%
  left_join(plot_actual_work, by=c("Attendee Agent Id" = "Attendee Agent Id",
                "start_date"= "start_date")) %>%
  mutate(lost_hours = raw_work_hours - work_hours.y)

actual_work_ %>%
  filter(edge_check != "normal") %>%
  left_join(plot_actual_work, by=c("Attendee Agent Id" = "Attendee Agent Id",
                "start_date"= "start_date")) %>%
  mutate(lost_hours = raw_work_hours - work_hours.y) %>%
  group_by(`Attendee Agent Id`) %>%
  summarize(percent_lost = sum(lost_hours)/sum(last_plot$lost_hours,
                   na.rm = TRUE)) %>%
  filter(percent_lost >= 0.01) %>%
  ggplot(aes(x=reorder(`Attendee Agent Id`, desc(percent_lost)),
             y=percent_lost, fill="#F8766D")) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none",
        legend.justification = c(-0.02,0)) +
  geom_text(aes(label=round(percent_lost,2)),
            vjust=-0.3, hjust=0.5, size=3) +
  labs(title="Percent of All Meeting Time Lost by Agent",
       x="Agent",
       y="Percent")

A single agent is responsible for 41% of the total number of hours lost due to working during a meeting.