Call Centre Queue Simulator
Background
This notebook sets up a hypothetical call centre analysis with simulated data. Imgaine a business whose lines are open from 8:00 am to 6:00 pm, Monday to Friday. 4 agents are on duty at any given time and each call takes an average of 5 minutes to resolve
The call centre manager is required to meet a performance target: 90% of calls must be answered within 1 minute. Lately, the performance has slipped. As the data analytic expert, you have been brought in to analyze their performance and make recommendations to return the centre back to its target.
The dataset records timestamps for when a call was placed, when it was answered, and when the call was completed. The total waiting and service times are calculated, as well as a logical for whether the call was answered within the performance standard.
Discrete-Event Simulation
Discrete-Event Simulation allows us to model real calling behaviour with a few simple variables.
Arrival Rate
Service Rate
Number of Agents
The simulations in this dataset are performed using the package simmer.
Well need a few libaries for this to work
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(simmer)
## Warning: package 'simmer' was built under R version 4.2.2
##
## Attaching package: 'simmer'
##
## The following object is masked from 'package:MASS':
##
## select
##
## The following objects are masked from 'package:lubridate':
##
## now, rollback
##
## The following object is masked from 'package:dplyr':
##
## select
##
## The following object is masked from 'package:tidyr':
##
## separate
A random seed is set to ensure each run of this notebook will produce the same call centre results
set.seed(42)
We want to simulate one year of call centre data. Each business day will get its own simulation as we vary call demand. A sequence of dates for 2021 is created, followed by some lubridate functions to extract different elements of the date - month, date, weekday. These elements will be used to vary demand through the year.
Finally, weekends are dopped from the data frame to only include business days. We wont brother dropping holidays from this data frame.
# Initialize the empty df
df <- NULL
# Generate a sequence of dates
df$dates <- seq(as.Date("2021-01-01"), as.Date("2021-12-31"), by = 1)
df <- as.data.frame(df)
df %>% head(10)
## dates
## 1 2021-01-01
## 2 2021-01-02
## 3 2021-01-03
## 4 2021-01-04
## 5 2021-01-05
## 6 2021-01-06
## 7 2021-01-07
## 8 2021-01-08
## 9 2021-01-09
## 10 2021-01-10
df <- df %>%
mutate(weekday_str = wday(dates, label = TRUE, abbr = FALSE),
month_str = month(dates, label = TRUE, abbr = FALSE),
day = mday(dates),
year = year(dates),
weekday = wday(dates, label = FALSE),
weeknum = week(dates),
monthnum = month(dates, label = FALSE))
df <- df %>% filter(!weekday %in% c(7,1))
Within the elements of each date in columns, we can now calculate some coefficients for each date, adding some stochasticity, that will driven the demand (number of calls) in the simulations. As you will see in the next section, this coefficient will modify the Arrival Rate - i.e increasing the rate which customers call, thereby increasing the total vall volume in the simulation.
The mutations of the date elements were chosen somewhat arbitrarily, in a trial-and-error approach to arrive at a calls dataset that supports data visualization exercises. The objective here isnt to make a linear model of call volume by date, but rather to focus on how well the call centre performs, and creative ways you may find to visualize these data.
# Calculate coefficinets to modify call centre busy -ness
df <-
df %>%
mutate(month_coef = monthnum / 10,
weeknum_coef = weeknum / 35,
weekday_coef = weekday / 100,
total_coef = month_coef * (1 + weeknum_coef) + weekday_coef)
# "Jitter" the coefficients - adds stochasticity to demand
for(i in 1:nrow(df)){
df$total_coef[i] <- (df$total_coef[i] + rnorm(n = 1, mean = 0, sd = 0.05)) / 3
}
Plot the demand
The queue simulation takes in argument of Number of Agents, Arrival Rate, and Service Rate. Here we will run simulations with 4 agents. The service rate will be fixed at 5 (i.e the range service portion of each call is sampled from exponential distribution with an average of 5 minutes).
The arrival date will vary each day.The base arrival rate will start at 13.7 calls per hour. For each day the demand coeffient calculated above will be applied to the arrival and modify demand for that day.
Each day will be run for 600 minutes, corresponding 10 hours business day. A maximum of 500 calls will be simulated. At the end of each day, the results are appended to the main dataframe, which will be prepared for export
## Queue Model ####
agents <- 4
base_arrival_rate <- 13.7/60
service_rate <- 5
# Initialize a dataframe to store simulation results
df.calls <- data.frame(matrix(ncol = 8, nrow = 0))
# Loop through each date and simulate calls
for(i in 1:nrow(df)){
inner.date <- df$dates[i]
day_arrival_rate <- base_arrival_rate * (1 + df$total_coef[i])
customer <-
trajectory("Customer's path") %>%
seize("counter") %>%
timeout(function() {rexp(1, 1/service_rate)}) %>%
release("counter")
centre <-
simmer("centre") %>%
add_resource("counter", agents) %>%
add_generator("Customer", customer, function() {c(0,rexp(500, day_arrival_rate), -1)})
centre %>% run(until = 600)
result <-
centre %>%
get_mon_arrivals() %>%
transform(waiting_time = end_time - start_time - activity_time)
result_attributes <-
centre %>%
get_mon_attributes()
for(j in 1:nrow(result)){
result$date[j] <- inner.date
}
# Append each day's results to the main dataframe
df.calls <- rbind(df.calls, result)
}
The simulation has now given us 261 business days of call centre data. We should perform some EDA of these data before proceeding with any analysis, just to check that we trust the simulatiosn make sense.
par(mar = c(5, 4, 2, 2) + 0.1)
x <- c('Customer', ' start_time', 'end_time', 'activity_time', 'finished', 'replication', 'waiting_time', 'date')
colnames(df.calls) <- x
df.calls$dtes <- as.Date(df.calls$date, format = "%Y0%m-%d", origin = "1970-01-01")
df.calls %>%
group_by(date) %>%
summarise(calls_per_days = n()) %>%
plot(main = "",
xlab = "",
ylab = "Calls per day")
These volumes look like a reasonable outcome of our demand coefficients. Remember that the calling behaviour is also allowed to vary randomly, so different days will see more or fewer calls by chance.
Next, let’s see the waiting time for every call in the dataset.
Most of the calls are still answered immediately. Some calls wait in the queue, at most for just over 8 minutes. These waiting times skew the average wait up to 0.72 minutes or about 43 seconds.
Remember that these calls and queue times are a function of just the number of agents available, the arrival rate, and the service rate. The behaviour of this system resembles what we would see in a real call centre under similar conditions. What patterns do you see in this plot? Do these repeat in other days? Any ideas why some calls are answered immediately and some wait for several minutes?