Call Centre Queue Simulator


Background

Discrete-Event Simulation

The simulations in this dataset are performed using the package simmer.

Intialize the R Session

Well need a few libaries for this to work

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(MASS)
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
library(simmer)
## Warning: package 'simmer' was built under R version 4.2.2
## 
## Attaching package: 'simmer'
## 
## The following object is masked from 'package:MASS':
## 
##     select
## 
## The following objects are masked from 'package:lubridate':
## 
##     now, rollback
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## The following object is masked from 'package:tidyr':
## 
##     separate

A random seed is set to ensure each run of this notebook will produce the same call centre results

set.seed(42)

Create a Data Frame

We want to simulate one year of call centre data. Each business day will get its own simulation as we vary call demand. A sequence of dates for 2021 is created, followed by some lubridate functions to extract different elements of the date - month, date, weekday. These elements will be used to vary demand through the year.

Finally, weekends are dopped from the data frame to only include business days. We wont brother dropping holidays from this data frame.

# Initialize the empty df 

df <- NULL 

# Generate a sequence of dates 

df$dates <- seq(as.Date("2021-01-01"), as.Date("2021-12-31"), by = 1)


df <- as.data.frame(df)

df %>% head(10)
##         dates
## 1  2021-01-01
## 2  2021-01-02
## 3  2021-01-03
## 4  2021-01-04
## 5  2021-01-05
## 6  2021-01-06
## 7  2021-01-07
## 8  2021-01-08
## 9  2021-01-09
## 10 2021-01-10
df <- df %>% 
  mutate(weekday_str = wday(dates, label = TRUE, abbr = FALSE),
         month_str = month(dates, label = TRUE, abbr = FALSE),
         day = mday(dates),
         year  = year(dates),
         weekday = wday(dates, label = FALSE),
         weeknum = week(dates),
         monthnum = month(dates, label = FALSE))


df <- df %>% filter(!weekday %in% c(7,1))

Within the elements of each date in columns, we can now calculate some coefficients for each date, adding some stochasticity, that will driven the demand (number of calls) in the simulations. As you will see in the next section, this coefficient will modify the Arrival Rate - i.e increasing the rate which customers call, thereby increasing the total vall volume in the simulation.

The mutations of the date elements were chosen somewhat arbitrarily, in a trial-and-error approach to arrive at a calls dataset that supports data visualization exercises. The objective here isnt to make a linear model of call volume by date, but rather to focus on how well the call centre performs, and creative ways you may find to visualize these data.

# Calculate coefficinets to modify call centre busy -ness

df <-
  df %>%
  mutate(month_coef = monthnum / 10,
         weeknum_coef = weeknum / 35,
         weekday_coef = weekday / 100,
         total_coef = month_coef  * (1 + weeknum_coef)  + weekday_coef)

# "Jitter" the coefficients - adds stochasticity to demand
for(i in 1:nrow(df)){
  df$total_coef[i] <- (df$total_coef[i] + rnorm(n = 1, mean = 0, sd = 0.05)) / 3
}

Plot the demand

Queue Simulator

The queue simulation takes in argument of Number of Agents, Arrival Rate, and Service Rate. Here we will run simulations with 4 agents. The service rate will be fixed at 5 (i.e the range service portion of each call is sampled from exponential distribution with an average of 5 minutes).

The arrival date will vary each day.The base arrival rate will start at 13.7 calls per hour. For each day the demand coeffient calculated above will be applied to the arrival and modify demand for that day.

Each day will be run for 600 minutes, corresponding 10 hours business day. A maximum of 500 calls will be simulated. At the end of each day, the results are appended to the main dataframe, which will be prepared for export

## Queue Model ####
agents <- 4
base_arrival_rate <- 13.7/60
service_rate <- 5

# Initialize a dataframe to store simulation results
df.calls <- data.frame(matrix(ncol = 8, nrow = 0))

# Loop through each date and simulate calls
for(i in 1:nrow(df)){
  
  inner.date <- df$dates[i]
  
  day_arrival_rate <- base_arrival_rate * (1 + df$total_coef[i])
  
  customer <-
    trajectory("Customer's path") %>%
    seize("counter") %>%
    timeout(function() {rexp(1, 1/service_rate)}) %>%
    release("counter")
  
  centre <-
    simmer("centre") %>%
    add_resource("counter", agents) %>%
    add_generator("Customer", customer, function() {c(0,rexp(500, day_arrival_rate), -1)})
  
  centre %>% run(until = 600)
  
  result <-
    centre %>%
    get_mon_arrivals() %>%
    transform(waiting_time = end_time - start_time - activity_time)
  
  result_attributes <-
    centre %>%
    get_mon_attributes()
  
  for(j in 1:nrow(result)){
    result$date[j] <- inner.date
  }
  
  # Append each day's results to the main dataframe
  df.calls <- rbind(df.calls, result)
}

Sanity Check

The simulation has now given us 261 business days of call centre data. We should perform some EDA of these data before proceeding with any analysis, just to check that we trust the simulatiosn make sense.

par(mar = c(5, 4, 2, 2) + 0.1)

x <- c('Customer', ' start_time', 'end_time', 'activity_time', 'finished', 'replication', 'waiting_time', 'date')

colnames(df.calls) <- x 

df.calls$dtes <- as.Date(df.calls$date, format = "%Y0%m-%d", origin = "1970-01-01")

df.calls %>% 
  group_by(date) %>%
  summarise(calls_per_days = n()) %>% 
  plot(main = "",
  xlab = "",
  ylab  = "Calls per day")

These volumes look like a reasonable outcome of our demand coefficients. Remember that the calling behaviour is also allowed to vary randomly, so different days will see more or fewer calls by chance.

Next, let’s see the waiting time for every call in the dataset.

Most of the calls are still answered immediately. Some calls wait in the queue, at most for just over 8 minutes. These waiting times skew the average wait up to 0.72 minutes or about 43 seconds.

Remember that these calls and queue times are a function of just the number of agents available, the arrival rate, and the service rate. The behaviour of this system resembles what we would see in a real call centre under similar conditions. What patterns do you see in this plot? Do these repeat in other days? Any ideas why some calls are answered immediately and some wait for several minutes?