1 Results

1.1 Customer Segmentation

Customer segmentation: Identify the group customers that may stop purchasing from the company and identify the group of customers that may be suitable for additional campaigns so they increase the amount of money spent on the products.

Let’s see how do we do it!

1.1.1 Loading the data and functions

To do that we load the necessary functions and data.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)

source(file = "Scripts/Customer Segmentation.R")

customer <- read_csv(  file = "data/Datasets for DS - Customer segmentation.csv")

1.1.2 Calling the functions

And then we call the necessary functions

First, we prepare the data with pre_process_customers_table. Then we find the customers that may stop purchasing from the company with leaving_clients().

And finally, we find the customers that may be suitable for additional campaigns so they increase the amount of money spent on the products with top_clients().

customer_time <- pre_process_customers_table(customer)

leaving_clients(customer_time)
## # A tibble: 1,089 x 4
##    CustomerID last_trans          transactions `last(Country)`
##         <dbl> <dttm>                     <int> <chr>          
##  1      13113 2011-12-09 12:49:00           40 United Kingdom 
##  2      15804 2011-12-09 12:31:00           19 United Kingdom 
##  3      13777 2011-12-09 12:25:00           41 United Kingdom 
##  4      17581 2011-12-09 12:21:00           31 United Kingdom 
##  5      12748 2011-12-09 12:20:00          225 United Kingdom 
##  6      16705 2011-12-09 12:08:00           29 United Kingdom 
##  7      15311 2011-12-09 12:00:00          118 United Kingdom 
##  8      12662 2011-12-09 11:59:00           12 Germany        
##  9      17315 2011-12-09 11:58:00           43 United Kingdom 
## 10      16626 2011-12-09 11:56:00           20 United Kingdom 
## # ... with 1,079 more rows
top_clients(customer_time)
## # A tibble: 100 x 3
##    CustomerID amount_spend frequency
##         <dbl>        <dbl>     <int>
##  1      14646      279489.        77
##  2      18102      256438.        62
##  3      17450      187482.        55
##  4      14911      132573.       248
##  5      12415      123725.        26
##  6      14156      113384.        66
##  7      17511       88125.        46
##  8      16684       65892.        31
##  9      13694       62653.        60
## 10      15311       59419.       118
## # ... with 90 more rows

Those functions also generate in the folder we are working in the folder results_clients with a CSV with all the results.

If I had access to a table with contact information I would append it to the output of top_clients and leaving_clients so we could see the contact information that we need, for example, e-mail, telephone or address.

1.2 Agent Performance

Based on the 3 datasets provided, identify the agents that are the lowest performers and the areas where they need support and the top performers and the areas where they excel.

To solve this problem I chose to use the CRM table because I feel that the most relevant metrics to judge effectiveness where the hard numbers leads taken and, so we have a ladder for the ratio of deals closed to leads taken I like to call it effectivity ladder, and another for deals closed that I call deals ladder.

By merging the top and bottom of those ladders I select who are the best/worst performers.

Now let’s see how it goes!

1.2.1 CRM Table and Results

1.2.1.1 Loading the data and functions

As before we start loading the data (CRM table) and the necessary functions.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)

source(file = "Scripts/Employee performance_CRM.R")

classification_df <- read_csv2(file = "data/CRM_employee.csv")

1.2.1.2 Calling the functions

And then we use Performance_function that puts in a folder named results_CRM the files: top_dealer, worst_dealers who shows who are historically the best/worst dealers. most_efficient, less_efficient which represent who has de best/worst probability to transform a lead into a deal.

Performance_function(classification_df)
## $most_efficient
## # A tibble: 15 x 4
##    Name               avg_request_d closing_rate avg_deals_day_rating
##    <chr>                      <dbl>        <dbl>                <dbl>
##  1 Loan Tomita                 18.4         35.2                 6.49
##  2 Leeanna Elza                18.4         29.2                 5.36
##  3 Shelby Schmitmeyer          17.7         27.3                 4.82
##  4 Alfonso Cheshire            22.8         21.0                 4.79
##  5 Tonette Dammann             14.6         32.6                 4.76
##  6 Ashly Vaught                18.8         24.0                 4.51
##  7 Lieselotte Riemer           15.6         28.8                 4.48
##  8 Stasia Koski                15.1         29.3                 4.44
##  9 Ricarda Claywell            14.9         29.4                 4.40
## 10 Edie Fitzwater              14.4         26.8                 3.85
## 11 Wade Garten                 17.2         21.6                 3.73
## 12 Stasia Ecton                13.6         24.2                 3.30
## 13 Maisie Mcpherson            13.3         24.4                 3.24
## 14 Caitlin Coffey              14.2         22.3                 3.15
## 15 Nubia Ordway                13.4         21.6                 2.90
## 
## $less_efficient
## # A tibble: 10 x 4
##    Name               avg_request_d closing_rate avg_deals_day_rating
##    <chr>                      <dbl>        <dbl>                <dbl>
##  1 Voncile Grimaldi            7.02         6.27               0.440 
##  2 Noel Villicana              6.81         6.31               0.430 
##  3 Kathyrn Douglas             4.62         6.71               0.310 
##  4 Dia Roux                    4.98         6.02               0.300 
##  5 Rolanda Bombardier          4.65         6.02               0.280 
##  6 Raymundo Veltri             5.02         3.98               0.200 
##  7 Sage Clodfelter             3.71         5.12               0.190 
##  8 Xuan Shupp                  2.48         6.45               0.160 
##  9 Maurita Christina           5.61         2.67               0.150 
## 10 Francine Gulotta            6.06         1.32               0.0800
## 
## $top_dealers
## # A tibble: 15 x 5
##    Name          Deals_signed avg_request_d closing_rate avg_deals_day_rat~
##    <chr>                <dbl>         <dbl>        <dbl>              <dbl>
##  1 Loan Tomita            714          18.4         35.2               6.49
##  2 Leeanna Elza           552          18.4         29.2               5.36
##  3 Alfonso Ches~          536          22.8         21.0               4.79
##  4 Shelby Schmi~          525          17.7         27.3               4.82
##  5 Tonette Damm~          514          14.6         32.6               4.76
##  6 Ricarda Clay~          488          14.9         29.4               4.40
##  7 Stasia Koski           480          15.1         29.3               4.44
##  8 Lieselotte R~          475          15.6         28.8               4.48
##  9 Ashly Vaught           469          18.8         24.0               4.51
## 10 Edie Fitzwat~          412          14.4         26.8               3.85
## 11 Wade Garten            395          17.2         21.6               3.73
## 12 Maisie Mcphe~          347          13.3         24.4               3.24
## 13 Caitlin Coff~          346          14.2         22.3               3.15
## 14 Stasia Ecton           346          13.6         24.2               3.30
## 15 Nubia Ordway           313          13.4         21.6               2.90
## 
## $bottom_dealers
## # A tibble: 10 x 5
##    Name          Deals_signed avg_request_d closing_rate avg_deals_day_rat~
##    <chr>                <dbl>         <dbl>        <dbl>              <dbl>
##  1 Noel Villica~           45          6.81         6.31             0.430 
##  2 Voncile Grim~           42          7.02         6.27             0.440 
##  3 Kathyrn Doug~           28          4.62         6.71             0.310 
##  4 Rolanda Bomb~           27          4.65         6.02             0.280 
##  5 Dia Roux                18          4.98         6.02             0.300 
##  6 Xuan Shupp              11          2.48         6.45             0.160 
##  7 Maurita Chri~           10          5.61         2.67             0.150 
##  8 Raymundo Vel~            9          5.02         3.98             0.200 
##  9 Sage Clodfel~            9          3.71         5.12             0.190 
## 10 Francine Gul~            5          6.06         1.32             0.0800

But the fact that I believe that the best table to determine who is the best/worse agent is the CRM does not mean that the other tables are useless or uninteresting.

So I made specific functions to analyze each one of them.

1.2.2 Audits Table

For the audits table, we have a Customer Opinion Ladder where we have ranked all our agents who have more than X reviews (the default number of minimum reviews is 10) by the opinion of our customers.

And now let me show you how to get that data!

1.2.2.1 Loading the data and functions

As always we start with loading the table and the functions we are going to need

if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)

source(file = "Scripts/Audits.R")

audits <- read_csv2(file = "data/Audits_Employee_performance.csv")

1.2.2.2 Calling the functions

And then we call the necesary function customer_opinion_ladder() that generated the folder results_audits where we store the CSV file for the ladder.

customer_opinion_ladder(audits)
## # A tibble: 154 x 2
##    Employee          average_score
##    <chr>                     <dbl>
##  1 Christel Selvidge          3.98
##  2 Eugene Coltharp            3.98
##  3 Michell Meek               3.97
##  4 Haydee Maestas             3.93
##  5 Christie Rohr              3.90
##  6 Billye Pittman             3.90
##  7 Henry Neuberger            3.89
##  8 Rayna Harkless             3.86
##  9 Holley Pitzer              3.86
## 10 Lou Wilkerson              3.85
## # ... with 144 more rows

1.2.3 Phone Table

For the phone table, we have several ladders the ones I like the most are the Break_time_ladder ranking the time spend on breaks and the phone_time_ladder ranking the time spend on the phone.

And now let me show you how to get that ladders!

1.2.3.1 Loading the data and functions

As always we start with loading the table and the functions we are going to need

if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)

source(file = "Scripts/telephone.R")

telephone <- read_csv2(file = "data/Telephone_Employee_performance.csv")

1.2.3.2 Calling the functions

And then we call the necessary function interesting_ladders() that generated the folder results_telephone where we store the CSV files for the ladders.

interesting_ladders(telephone)
## $phone_time_ladder
## # A tibble: 227 x 2
##    User              `Time at the Phone`             
##    <chr>             <chr>                           
##  1 Adalberto Byrd    24 days 13 hours 39 mins 27 secs
##  2 Armando Magoon    23 days 20 hours 23 mins 44 secs
##  3 Beula Luse        23 days 16 hours 58 mins 48 secs
##  4 Eugene Coltharp   23 days 16 hours 51 mins 00 secs
##  5 Ricarda Claywell  23 days 13 hours 47 mins 01 secs
##  6 Rosenda Duby      23 days 06 hours 04 mins 00 secs
##  7 Opal Vose         22 days 19 hours 37 mins 35 secs
##  8 Fernande Pounders 22 days 16 hours 51 mins 16 secs
##  9 Jacquie Mansfield 22 days 16 hours 49 mins 30 secs
## 10 Nancy Semmes      22 days 09 hours 20 mins 43 secs
## # ... with 217 more rows
## 
## $Backoffice_time_ladder
## # A tibble: 227 x 2
##    User                `Time at the Backoffice`        
##    <chr>               <chr>                           
##  1 Amira Mcniel        13 days 02 hours 53 mins 37 secs
##  2 Yon Ogden           12 days 16 hours 46 mins 02 secs
##  3 Edie Fitzwater      09 days 04 hours 50 mins 01 secs
##  4 Heath Hendriks      08 days 12 hours 00 mins 41 secs
##  5 Noel Villicana      08 days 10 hours 04 mins 53 secs
##  6 Sanford Brandenburg 08 days 05 hours 14 mins 04 secs
##  7 Sybil Macaraeg      07 days 14 hours 30 mins 41 secs
##  8 Fay Brand           07 days 14 hours 21 mins 52 secs
##  9 Geoffrey Ashley     07 days 08 hours 16 mins 27 secs
## 10 Diana Bolinger      07 days 06 hours 41 mins 11 secs
## # ... with 217 more rows
## 
## $Break_time_ladder
## # A tibble: 227 x 2
##    User            `Time at Break`                 
##    <chr>           <chr>                           
##  1 Margo Corrales  06 days 12 hours 35 mins 05 secs
##  2 Zena Mallet     05 days 10 hours 38 mins 35 secs
##  3 Katy Mcghie     04 days 22 hours 54 mins 10 secs
##  4 Kylie Serpa     04 days 22 hours 17 mins 18 secs
##  5 Harriet Dunkin  04 days 18 hours 02 mins 26 secs
##  6 Kayleigh Noblin 04 days 17 hours 53 mins 30 secs
##  7 Delisa Meador   04 days 17 hours 00 mins 17 secs
##  8 Renata Sokoloff 04 days 15 hours 38 mins 33 secs
##  9 Henry Neuberger 04 days 13 hours 57 mins 21 secs
## 10 Kathey Giefer   04 days 12 hours 53 mins 43 secs
## # ... with 217 more rows
## 
## $Logged_time_ladder
## # A tibble: 227 x 2
##    User            `Time Logged in`                
##    <chr>           <chr>                           
##  1 Nancy Semmes    30 days 21 hours 29 mins 09 secs
##  2 Armando Magoon  30 days 18 hours 28 mins 57 secs
##  3 Christy Eddie   30 days 08 hours 47 mins 14 secs
##  4 Charlotte Mager 29 days 23 hours 41 mins 14 secs
##  5 Adalberto Byrd  29 days 21 hours 08 mins 53 secs
##  6 Eugene Coltharp 29 days 19 hours 44 mins 45 secs
##  7 Monet Escamilla 29 days 18 hours 28 mins 24 secs
##  8 Lacie Lyle      29 days 18 hours 15 mins 08 secs
##  9 Danna Desoto    29 days 14 hours 48 mins 06 secs
## 10 Renata Sokoloff 29 days 14 hours 35 mins 55 secs
## # ... with 217 more rows
## 
## $Calls_handled_ladder
## # A tibble: 227 x 2
##    User                `Calls handled`
##    <chr>                         <dbl>
##  1 Ashly Vaught                  23156
##  2 Sanford Brandenburg           20756
##  3 Vikki Drakeford               20245
##  4 Travis Bosse                  20147
##  5 Charlotte Mager               19523
##  6 Alfonso Cheshire              19127
##  7 Ricarda Claywell              18700
##  8 Danna Desoto                  18662
##  9 Jolene Childers               18582
## 10 Lyndsey Biehl                 18524
## # ... with 217 more rows

2 Explaining the functions

Now I’m going to show and comment on the code of those functions and I’m going to go table by table and then function by function.

2.1 Customers

For the table Customers we have a total of 5 functions which are:

2.1.1 Pre_process_customers_table

  • In this function:

    • We give a name to the columns of the table.
    • Format the date.
    • Generate the column profit
    • And aggregate the rows by transaction.
pre_process_customers_table <- function (customer)
{
  tags <- c("InvoiceNo","StockCode","Description","Quantity","InvoiceDate",
            "UnitPrice","CustomerID","Country")
  
  names(customer) <- (tags)
  customer <- customer[!is.na(customer$CustomerID),]
  # we remove the anonymus coustomers because they aren't usefull for our goals.
  
  customer$InvoiceDate <- dmy_hms(customer$InvoiceDate)
  customer <- customer %>% mutate(profit = Quantity*UnitPrice)
  
  # as a general rule I like to work with a different variable than the RAW data
  # so I do not have to re-read the data if I mess something in the process ^_^.
  customer_time  <- customer %>%  group_by(InvoiceNo, InvoiceDate) %>%
    summarise(CustomerID = unique(CustomerID),
              profit = sum(profit),
              Country = last(Country)) 
  return(customer_time)
}

2.1.2 leaving_clients

  • In this function:

    • We select the time since the last purchase (default 50 days) to consider a client to be in critic condition
    • We select the minimum number of purchases to consider a repeated customer (default 5)
    • Then we aggregate by Customer, select the last transaction that it did and count the number of transactions.
    • Then we filter all clients that hadn’t had the last transaction before the current date minus the number of days necessary to be considered critical and has several transactions bigger than the limit for the repeated customer because we don’t want to waste effort on a one-hit-wonder.
    • The remaining customers we arrange in descending order by date, so the ones who just entered the critical zone will be shown before the older ones.
# Leaving customers function

# so if we consider a repeated customer one that has at least purchased 5 times
# the "kinda" conservative estimation of the average time between purchases is
# around 16.32 days so if we leave around 3 times that timeframe,
# the potential customers that we care about are customers that at least
# purchased 6 times in our shop we need to filter the repeated customers that
# have purchased at least 6 times and haven't purchased in the last
# 50 days, to calculate this we made a function called leaving_clients

leaving_clients <- function(df_like_customer_time,
                            time_with_out_purchase = 50,
                            min_num_transactions = 5){
  leaving_clients_var <- df_like_customer_time %>%
    group_by(CustomerID) %>% summarise(last_trans = max(InvoiceDate),
                                     transactions = n(),
                                     last(Country)) %>%
    filter(transactions > min_num_transactions &
             last_trans<(Sys.Date() - days(time_with_out_purchase))) %>%
    arrange(desc(last_trans))
  mainDir <- getwd()
  subDir <- "results_clients"
  dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
  setwd(file.path(mainDir, subDir))
  write.csv(x = leaving_clients_var, file = "leaving_clients.csv" )
  setwd(mainDir)
  return(leaving_clients_var)
}

2.1.3 top_spenders

  • In this function:

    • We feed the number of top spenders we want to know (default 150).
    • Then we aggregate by Customer and sum the profit we made in all its transactions.
    • Then we arrange by the total amount that they have spend and return the top X we have feed the function before.
top_spenders <- function(df_with_CustomerID_and_profit,
                         amount = 150){
  temp <- df_with_CustomerID_and_profit %>%
    group_by(CustomerID) %>%
    summarise(amount_spend = sum(profit)) %>% arrange(desc(amount_spend)) %>% 
    head(n=amount)
  return(temp)
  }

2.1.4 most_frecuent_buyers

This function is nearly identical to the previous only that we don’t sum the total amount spend instead we count the number of transactions that they have done and return the top Y (150 by default) that have done more transactions.

# most_frequent_clients

most_frecuent_buyers <- function(df_with_CustomerID_and_InvoiceNo,
                                 amount = 150){
  Invoices_by_ID <- df_with_CustomerID_and_InvoiceNo %>% group_by(InvoiceNo) %>% 
    summarise(CustomerID = last(CustomerID)) %>% arrange(desc(CustomerID)) %>%
    group_by(CustomerID) %>% summarise(frequency = n()) %>%
    arrange(desc(frequency)) %>% head(n= amount)
  return(Invoices_by_ID)
}

2.1.5 top_clients

In top_clients we just full_join the result of the previous two functions (top_spenders, most_frecuent_buyers) with the logic that those are the most susceptible to promotions and advertisement because they trust our service and have the money to spend.

# Top clients

top_clients <- function(df_with_CustomerID_profit_and_InvoiceNo){
  submit <- inner_join(
    x = top_spenders(df_with_CustomerID_profit_and_InvoiceNo),
    y = most_frecuent_buyers(df_with_CustomerID_profit_and_InvoiceNo),
    by = "CustomerID")
  mainDir <- getwd()
  subDir <- "results_clients"
  dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
  setwd(file.path(mainDir, subDir))
  write.csv(x = submit, file = "top_clients.csv" )
  setwd(mainDir)
  
  return(submit)
}

2.2 CRM

2.2.1 Performance_function

  • This is a big one:

    • We start aggregation by Date and Name of our agents and returning the sum of Customer_requests, Visits_scheduled, Visits_completed and Deals_signed. So we have the information about what they did on any given date.
    • Once we have done that we repeat the process but aggregating by name and counting the number of dates we have aggregated. This gives us the total amount of work they have done and in what number of days they did it. +Then I did a boxplot graph of the days worked because I did not want to start ranking employees until I had a significant amount of data. The final value was set in 45 days. +Then we filter out all employees that have less than 45 days recorded. We calculate the amount of Customer_requests, Visits_scheduled, Visits_completed and Deals_signed they do on an average day. And finally, we calculate the closing rate each employe has. +Then we make 3 different rankings dedication_ladder, deals_ladder, and efectivity_ladder +Then we calculate the best/worst performers by inner_join the head/tail of the dedication_ladder and the efectivity_ladder. +In the next step we do the same for top/bottom dealers with the inner_join the head/tail of the deals_ladder and the worst_performers. +Finally we return the results we got in the last 2 points.
Performance_function <- function(classification){
  
  # to start I'm going to calculate the ration of Customer_requests,
  # Visits_scheduled, Visits_completed and Deals_signed on average by day and 
  # seller.
  
  clas_agr_day <- classification %>% 
    group_by(Date,Name) %>%
    summarise(Customer_requests = sum(Customer_requests),
              Visits_scheduled = sum(Visits_scheduled),
              Visits_completed = sum(Visits_completed),
              Deals_signed = sum(Deals_signed))
  
  metrics_sellers <- clas_agr_day %>% 
    group_by(Name) %>%
    summarise(Customer_requests = sum(Customer_requests),
              Visits_scheduled = sum(Visits_scheduled),
              Visits_completed = sum(Visits_completed),
              Deals_signed = sum(Deals_signed),
              days = n())
  # box <- boxplot(metrics_sellers$days)
  # box$out
  # In the boxplot we see that the users with less than "35 days of work"
  # are outliers by lower range, and because we dont really want to start
  # ranking them till we have some significative amount of data, I am going to
  # remove all sellers that don't have 45 at least 45 days registered 
  # which eyeballing the number feels like 2 months of working days 
  # and I like it.
  
  metrics_sellers_2 <- metrics_sellers %>%
    filter(days >= 45 )%>% 
    mutate("avg_request_d" = round(Customer_requests/days,2),
           "avg_visits_d" = round(Visits_scheduled/days,2),
           "avg_v_compl_d" = round(Visits_completed/days,2),
           "avg_dealst_d" = round(Deals_signed/days,2),
           "closing_rate" = round((avg_dealst_d/avg_request_d)*100,2) )
  
  # metrics_sellers_2 %>% select(starts_with("avg"),closing_rate)
  
  # and now I'm going to make 3 different rankings 
  # Dedication Ladder: in this rank will be the X sellers that take more requests/day.
  # Sells Ladder: this will show the Y sellers with more deals from a date, 
  #               it will count sales since we have a record.
  # Effectivity Ladder: this will rank the Z with a higher deals/requests ratio.
  
  dedication_ladder <- metrics_sellers_2 %>%
    arrange(desc(avg_request_d)) %>% 
    select(Name,avg_request_d) 
  
  deals_ladder <- metrics_sellers_2 %>%
    arrange(desc(Deals_signed)) %>% 
    select(Name,Deals_signed) 
  
  
  efectivity_ladder <- metrics_sellers_2 %>%
    arrange(desc(closing_rate)) %>% 
    select(Name,closing_rate) 
  
  # and the best performers are the joint of the heads of dedication ladder and
  # effectivity ladder. 
  
  
  best_performers <- inner_join(head(dedication_ladder,n=50),
                                                  head(efectivity_ladder,50),"Name") %>%
    mutate (avg_deals_day_rating = avg_request_d *(closing_rate/100)) %>%
    arrange(desc(avg_deals_day_rating))

  worst_performers <- inner_join(tail(dedication_ladder,n=50),
                                                   tail(efectivity_ladder,50),"Name") %>%
    mutate (avg_deals_day_rating = avg_request_d *(closing_rate/100)) %>%
    arrange(desc(avg_deals_day_rating))
  
  
  mainDir <- getwd()
  subDir <- "results_CRM"
  dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
  setwd(file.path(mainDir, subDir))
  ladders_to_return <- list()
  ladders_to_return$most_efficient <- best_performers
  write.csv(x = best_performers, file = paste0("most_efficient",".csv"))
  
  ladders_to_return$less_efficient <- worst_performers
  write.csv(x = worst_performers, file = paste0("less_efficient",".csv"))
  
  top_dealers <- inner_join(head(deals_ladder,50), best_performers,"Name")
  
  ladders_to_return$top_dealers <- top_dealers
  write.csv(x = top_dealers, file = paste0("top_dealers",".csv"))
  
  bottom_dealers <- inner_join(tail(deals_ladder,50),worst_performers,"Name")
  
  ladders_to_return$bottom_dealers <- bottom_dealers
  write.csv(x = bottom_dealers, file = paste0("bottom_dealers",".csv"))
  
  setwd(mainDir)
  return(ladders_to_return)
  }

2.3 Telephone table

For the Telephone table we have 2 functions

2.3.1 format_telephone

The objective of this function is to format the names of the agent to be in the same format that the CRM Table, I’m sure there is a way to do it with regular expressions but I’m not very fluent with them and I did not have much time, so I did it in this roundabout way.

format_telephone <- function(telephone_df){
  

rename <- separate(telephone, User ,
                   sep = " ",into = c("User","b")) %>% select(User)

telephone$User <- rename$User

names(telephone) <- telephone %>% names() %>% gsub(pattern = " ",replacement = "_")
return (telephone)
}

2.3.2 interesting_ladders

  • In this function:

    • We start defining the function Fmt that I found on stack overflow and customized to format a timediff variable into a string consisting on XX days, YY hours, ZZ minutes and SS seconds.
    • Then we format the data with the previous function, remove all no numerical variables except user, group by user and sum the other attributes.
    • Then with this data, we make all the interesting rankings by arranging the camp of interest in descending order and transmuting the sum of the dates (which is a difftime variable) and formating it with the previously defined FTM function.
    • All those rankings are saved in its respective CSVs and stored in a list which we return at the end of the function.
interesting_ladders <- function(telephone_df){

  
   Fmt <- function(x) UseMethod("Fmt")
  
   Fmt.difftime <- function(x) {
    units(x) <- "secs"
    x <- unclass(x)
    NextMethod()
    }
   Fmt.default <- function(x) {
    y <- abs(x)
    sprintf("%s%02d days %02d hours %02d mins %02d secs", 
            ifelse(x < 0, "-", ""), # sign
            y %/% 86400,  # days
            y %% 86400 %/% 3600,  # hours 
            y %% 3600 %/% 60,  # minutes
            y %% 60 %/% 1) # seconds
   }
   
  data <- format_telephone(telephone_df)
  agg_by_user <- data %>% select(-Day,-Manager,-Hub) %>%
    group_by(User) %>% summarise_all(sum)
  rankings_list <- list()

  rankings_list$phone_time_ladder <- agg_by_user %>% arrange(desc(Phone_time)) %>%
    transmute(User =  User,"Time at the Phone" =  Fmt(Phone_time))
  
  rankings_list$Backoffice_time_ladder <- agg_by_user %>%
    arrange(desc(Backoffice)) %>% transmute(User = User,
                                            "Time at the Backoffice" =
                                              Fmt(Backoffice))
  
  rankings_list$Break_time_ladder <- agg_by_user %>%
    arrange(desc(Break)) %>% transmute(User = User,
                                            "Time at Break" =
                                              Fmt(Break))
  rankings_list$Logged_time_ladder <- agg_by_user %>%
    arrange(desc(Logged_in)) %>% transmute(User = User,
                                            "Time Logged in" =
                                              Fmt(Logged_in))
  rankings_list$Calls_handled_ladder <- agg_by_user %>%
    arrange(desc(Calls_handled)) %>% transmute(User = User,
                                            "Calls handled" =
                                              Calls_handled)
  mainDir <- getwd()
  subDir <- "results_telephone"
  dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
  setwd(file.path(mainDir, subDir))
  for (i in 1:length(rankings_list)) {
    write.csv(x = rankings_list[[i]],
              file = paste0(names(rankings_list)[i],".csv"))
  }
  setwd(mainDir)
  return(rankings_list)  
}

2.4 Audits table

2.4.1 customer_opinion_ladder

  • In this function:

    • In this function we can feed it the minimum number of reviews that an employee has to have to enter the ladder (the default is a minimum of 10) because we don’t want that the leader of the ladder being an employee with a single review of maximum values.
    • We start formating the Employee to be in the same format that the CRM Table, I did it inefficiently because I’m not proficient enough with regular expressions and I did not have the luxury of spending time learning how to do it.
    • Because there are a lot of attributes to aggregate, we split aggregation in 2 in the first one we count the number of reviews of each employee and in the second we calculate the mean mark of each employee in all the camps of the survey, then we join both parts and filter out all employees that have less than 10 reviews.
    • Then we calculate the average of all the scores minus the number of reviews.
    • With that information, we arrange by overall score descending and select the name, overall mark and number of reviews. +Finally we write the CSV and return the last results.
customer_opinion_ladder<- function(audits,min_reviews = 10)
{

  
  audits$Employee <- gsub(pattern = "@iwg", replacement = "", audits$Employee)
  
  simpleCap <- function(x) {
    s <- strsplit(x, " ")[[1]]
    paste(toupper(substring(s, 1,1)), substring(s, 2),
          sep="", collapse=" ")
  }
  
  audits$Employee <- audits %>% separate(Employee, c("A", "B")) %>% select(A,B) %>%
    apply(1:2, simpleCap) %>% apply(1,function(x){paste(x[1],x[2])})
  
  # first we group by employe 
  
  # names(audits)
  # 
  # class(audits$AnsweredCustomer)
  count <- audits %>% group_by(Employee) %>% summarise(reviews = n())
  scores <- audits %>% select(names(audits)[c(2,5:15)]) %>%
    group_by(Employee) %>% summarise_all(mean)
  evaluation <-  left_join(scores,count,"Employee") %>% filter(reviews >=10) %>%
    select(-reviews)
  # evaluation %>% ggplot(aes(x = reviews, fill = "red")) +
  #   geom_histogram(binwidth = 1)
  # giving a fast look to the histogram i fill filter all the users that
  # dont have at least 10 reviews so the comparision it has some ground.
  
  evaluation$average_score <- rowMeans(evaluation[,-1],
                                       na.rm = TRUE)
  customer_ladder <- evaluation %>% select(Employee,average_score) %>%
    arrange(desc(average_score))
  
  mainDir <- getwd()
  subDir <- "results_audits"
  dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
  setwd(file.path(mainDir, subDir))
  write.csv(x = customer_ladder,
            file = paste0("customer_opinion_ladder",".csv"))
  setwd(mainDir)
  
  return(customer_ladder)

}

3 Shiny Dashboard

3.1 The preparation

library(shiny)
library(shinydashboard)
library(DT)
library(rstudioapi)
library(rsconnect)


source(file = "Scripts/Customer Segmentation.R")

customer <- read_csv(file = "data/Datasets for DS - Customer segmentation.csv")

customer_time <- pre_process_customers_table(customer)
# customer_time <- as.data.frame(customer_time)
min_date_clients <- floor_date(x = customer_time$InvoiceDate, unit = "day") %>% min()
max_date_clients <- floor_date(x = customer_time$InvoiceDate, unit = "day") %>% max()

rm(customer)

# CRM block
source(file = "Scripts/Employee performance_CRM.R")

classification_df <- read_csv2(file = "data/CRM_employee.csv")

min_date_CRM <- floor_date(x = classification_df$Date, unit = "day") %>% min()
max_date_CRM <- floor_date(x = classification_df$Date, unit = "day") %>% max()

# Performance_function(classification_df)

# audits block

source(file = "Scripts/Audits.R")

audits <- read_csv2(file = "data/Audits_Employee_performance.csv")
audits$TimeCompleted <- dmy(audits$TimeCompleted)
min_date_audits <- floor_date(x = audits$TimeCompleted, unit = "day") %>% min()
max_date_audits <- floor_date(x = audits$TimeCompleted, unit = "day") %>% max()
# customer_opinion_ladder(audits)

# telephone block

source(file = "Scripts/telephone.R")

telephone <- read_csv2(file = "data/Telephone_Employee_performance.csv")
telephone$Day <- dmy(telephone$Day)
min_date_telephone <- floor_date(x = telephone$Day, unit = "day") %>% min()
max_date_telephone <- floor_date(x = telephone$Day, unit = "day") %>% max()

3.2 The UI

# 1 Dashbord ----------------------------------------------------------------

# USER INTERFACE
ui <- dashboardPage(
    dashboardHeader(title="IWG Regus"),
    
    # _1.1 sidebar --------------------------------------------------------------
    
    dashboardSidebar(
        sidebarMenu(id = "pestaña",
                    menuItem("Clients", tabName = "Clients",
                             icon = icon("dashboard")),
                    
                    menuItem("Agents", tabName = "Agents",
                             icon = icon("align-justify"))
        )#cierro sidebarMenu
    ), #cierro  dashboardSidebar
    
    # _1.2 body -----------------------------------------------------------------
    
    dashboardBody(
        tabItems(
            
            #   _1.2.1 body Clients----------------------------------------------------------
            
            tabItem(tabName = "Clients", 
                    # First row
                    fluidRow(
                        column(3,
                               dateRangeInput(inputId = "range_clients", 
                                              label = "Choose Date Range for clients: yyyy-mm-dd",
                                              start = min_date_clients,
                                              end = max_date_clients,
                                              min = min_date_clients,
                                              max = max_date_clients)),
                        column(3,
                               radioButtons(inputId = "client_buttons", label = h3("Select Table"),
                                            choices = list("Top Customers" = 1,
                                                           "Leaving Customers" = 2), 
                                            selected = 1,
                                            inline = TRUE))
                    ),#cierro fluidRow
                    fluidRow( #second row
                        wellPanel(dataTableOutput('table_clients'))
                    )
            ),#cierro tabItem
            
            #   _1.2.2 body Agents----------------------------------------------------------
            
            tabItem(tabName = "Agents",
                    tabsetPanel(type = "tabs", id = "Agents",
                                tabPanel("CRM",
                                         # First row
                                         fluidRow(
                                             column(3,
                                                    dateRangeInput(inputId = "range_CRM", 
                                                                   label = "Choose Date Range for CRM data: yyyy-mm-dd",
                                                                   start = min_date_CRM,
                                                                   end = max_date_CRM,
                                                                   min = min_date_CRM,
                                                                   max = max_date_CRM)),
                                             column(3,
                                                    radioButtons(inputId = "CRM_buttons", label = h3("Select Table"),
                                                                 choices = list("Most Efficient Agents" = 1,
                                                                                "Least Efficient Agents" = 2,
                                                                                "Most Deals Closed" = 3,
                                                                                "Least Deals Closed" = 4), 
                                                                 selected = 1,
                                                                 inline = TRUE))
                                         ),#cierro fluidRow
                                         fluidRow( #second row
                                             wellPanel(dataTableOutput('table_CRM'))
                                         )#cierro fluidRow
                                ),#cierro tabpanel CRM
                                tabPanel("Audits",
                                         # First row
                                         fluidRow(
                                             column(3,
                                                    dateRangeInput(inputId = "range_Audits", 
                                                                   label = "Choose Date Range for the Audits: yyyy-mm-dd",
                                                                   start = min_date_audits,
                                                                   end = max_date_audits,
                                                                   min = min_date_audits,
                                                                   max = max_date_audits)),
                                             column(3,
                                                    numericInput(inputId = "min_reviews",
                                                                 label = "Observations:",
                                                                 value = 10,
                                                                 min = 1,
                                                                 max = 100)
                                             )
                                         ),#cierro fluidRow
                                         fluidRow( #second row
                                             wellPanel(dataTableOutput('table_Audits'))
                                         )#cierro fluidRow
                                ),
                                tabPanel("Telephone",
                                         # First row
                                         fluidRow(
                                             column(3,
                                                    dateRangeInput(inputId = "range_telephone", 
                                                                   label = "Choose Date Range for Telephone data: yyyy-mm-dd",
                                                                   start = min_date_telephone,
                                                                   end = max_date_telephone,
                                                                   min = min_date_telephone,
                                                                   max = max_date_telephone)),
                                             column(9,
                                                    radioButtons(inputId = "Telephone_buttons", label = h3("Select Ladder"),
                                                                 choices = list("Time on the Phone" = 1,
                                                                                "Time on the Backoffice" = 2,
                                                                                "Time on Break" = 3,
                                                                                "Time logged" = 4,
                                                                                "Number of calls handled" = 5), 
                                                                 selected = 1,
                                                                 inline = TRUE))
                                         ),#cierro fluidRow
                                         fluidRow( #second row
                                             wellPanel(dataTableOutput('table_Telephone'))
                                         )#cierro fluidRow
                                )#cierro tabpanel
                    )#cierro tabsetPanel Agents
            )#cierro tabItem
        )#cierro tabItems
    )#cierro dashboardBody
)#cierro dashboardPage

3.3 The Server and aplication

# server ------------------------------------------------------------------
# SERVER
server <- function(input, output) { 
    # we process the clients data
    clients_data <- reactive({
        customers_data_reactive <- customer_time %>%
            filter(InvoiceDate > input$range_clients[1]) %>% 
            filter(InvoiceDate < input$range_clients[2])
        if(input$client_buttons == 1){
            customers_data_reactive2 <- top_clients(customers_data_reactive)
        }
        else{
            customers_data_reactive2 <- leaving_clients(customers_data_reactive)
        }
        return(customers_data_reactive2)
    })
    
    # we process the Agents CRM data
    CRM_data <- reactive({
        CRM_data_reactive <- classification_df %>%
            filter(Date > input$range_CRM[1]) %>%
            filter(Date < input$range_CRM[2]) %>%
            Performance_function()
        if(input$CRM_buttons == 1){toreturn <- CRM_data_reactive[[1]]}
        if(input$CRM_buttons == 2){toreturn <- CRM_data_reactive[[2]]}
        if(input$CRM_buttons == 3){toreturn <- CRM_data_reactive[[3]]}
        if(input$CRM_buttons == 4){toreturn <- CRM_data_reactive[[4]]}
        return(toreturn)
    })
    
    # we process the Agents Audits data
    Audits_data <- reactive({
        Audits_data_reactive <- audits %>%
            filter(TimeCompleted > input$range_Audits[1]) %>%
            filter(TimeCompleted < input$range_Audits[2])
        opinion <- customer_opinion_ladder(Audits_data_reactive,
                                           min_reviews = input$min_reviews)
        return(opinion)
    })
    
    # we process the Agents Telephone data
    Tel_data <- reactive({
        Tel_data_reactive <- telephone %>%
            filter(Day > input$range_telephone[1]) %>%
            filter(Day < input$range_telephone[2]) %>%
            interesting_ladders()
        if(input$Telephone_buttons == 1){ladder_to_return <- Tel_data_reactive[[1]]}
        if(input$Telephone_buttons == 2){ladder_to_return <- Tel_data_reactive[[2]]}
        if(input$Telephone_buttons == 3){ladder_to_return <- Tel_data_reactive[[3]]}
        if(input$Telephone_buttons == 4){ladder_to_return <- Tel_data_reactive[[4]]}
        if(input$Telephone_buttons == 5){ladder_to_return <- Tel_data_reactive[[5]]}
        return(ladder_to_return)
    })
    
    output$table_clients <- renderDataTable  ({ clients_data() })
    output$table_CRM <- renderDataTable ({ CRM_data() })
    output$table_Audits <- renderDataTable ({ Audits_data() })
    output$table_Telephone <- renderDataTable ({ Tel_data() })
    
}#cerramos sistem

# My app
shinyApp(ui, server)

4 Future Tasks

This is a Minimum Viable Product, some of the options of the functions aren’t implemented in the general one, and we aren’t asking for input on the options, the functions are pretty fragile and there is a lot of information that it’s not used. Especially the managers have been ignored in all this task, and there could be some really interesting rankings to be done with them. Also, the region is largely ignored.

Also, I missed tables with the contact information of the clients so the outputs could be more complete.

There is also a lot of potential for improving the Dashboard. Starting with refreshing the data automatically, to graphs showing the temporal progression of leads, visits scheduled, visits done and deals closed by timeframe in the company, so we could see a visual representation of the growth of the company/employee. Just to mention a few possibilities.

Also we could do a cohort rentetion analysis in the CRM table to observe the evolution of the results of the company over time.

Some side approaches would be interesting to perform like personalized recommendations/offers, which could be done with R using the apriori function from the arules package or with deep learning using collaborative filtering and fast.ai a library constructed on pytorch.