Customer segmentation: Identify the group customers that may stop purchasing from the company and identify the group of customers that may be suitable for additional campaigns so they increase the amount of money spent on the products.
Let’s see how do we do it!
To do that we load the necessary functions and data.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)
source(file = "Scripts/Customer Segmentation.R")
customer <- read_csv( file = "data/Datasets for DS - Customer segmentation.csv")
And then we call the necessary functions
First, we prepare the data with pre_process_customers_table. Then we find the customers that may stop purchasing from the company with leaving_clients().
And finally, we find the customers that may be suitable for additional campaigns so they increase the amount of money spent on the products with top_clients().
customer_time <- pre_process_customers_table(customer)
leaving_clients(customer_time)
## # A tibble: 1,089 x 4
## CustomerID last_trans transactions `last(Country)`
## <dbl> <dttm> <int> <chr>
## 1 13113 2011-12-09 12:49:00 40 United Kingdom
## 2 15804 2011-12-09 12:31:00 19 United Kingdom
## 3 13777 2011-12-09 12:25:00 41 United Kingdom
## 4 17581 2011-12-09 12:21:00 31 United Kingdom
## 5 12748 2011-12-09 12:20:00 225 United Kingdom
## 6 16705 2011-12-09 12:08:00 29 United Kingdom
## 7 15311 2011-12-09 12:00:00 118 United Kingdom
## 8 12662 2011-12-09 11:59:00 12 Germany
## 9 17315 2011-12-09 11:58:00 43 United Kingdom
## 10 16626 2011-12-09 11:56:00 20 United Kingdom
## # ... with 1,079 more rows
top_clients(customer_time)
## # A tibble: 100 x 3
## CustomerID amount_spend frequency
## <dbl> <dbl> <int>
## 1 14646 279489. 77
## 2 18102 256438. 62
## 3 17450 187482. 55
## 4 14911 132573. 248
## 5 12415 123725. 26
## 6 14156 113384. 66
## 7 17511 88125. 46
## 8 16684 65892. 31
## 9 13694 62653. 60
## 10 15311 59419. 118
## # ... with 90 more rows
Those functions also generate in the folder we are working in the folder results_clients with a CSV with all the results.
If I had access to a table with contact information I would append it to the output of top_clients and leaving_clients so we could see the contact information that we need, for example, e-mail, telephone or address.
Based on the 3 datasets provided, identify the agents that are the lowest performers and the areas where they need support and the top performers and the areas where they excel.
To solve this problem I chose to use the CRM table because I feel that the most relevant metrics to judge effectiveness where the hard numbers leads taken and, so we have a ladder for the ratio of deals closed to leads taken I like to call it effectivity ladder, and another for deals closed that I call deals ladder.
By merging the top and bottom of those ladders I select who are the best/worst performers.
Now let’s see how it goes!
As before we start loading the data (CRM table) and the necessary functions.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)
source(file = "Scripts/Employee performance_CRM.R")
classification_df <- read_csv2(file = "data/CRM_employee.csv")
And then we use Performance_function that puts in a folder named results_CRM the files: top_dealer, worst_dealers who shows who are historically the best/worst dealers. most_efficient, less_efficient which represent who has de best/worst probability to transform a lead into a deal.
Performance_function(classification_df)
## $most_efficient
## # A tibble: 15 x 4
## Name avg_request_d closing_rate avg_deals_day_rating
## <chr> <dbl> <dbl> <dbl>
## 1 Loan Tomita 18.4 35.2 6.49
## 2 Leeanna Elza 18.4 29.2 5.36
## 3 Shelby Schmitmeyer 17.7 27.3 4.82
## 4 Alfonso Cheshire 22.8 21.0 4.79
## 5 Tonette Dammann 14.6 32.6 4.76
## 6 Ashly Vaught 18.8 24.0 4.51
## 7 Lieselotte Riemer 15.6 28.8 4.48
## 8 Stasia Koski 15.1 29.3 4.44
## 9 Ricarda Claywell 14.9 29.4 4.40
## 10 Edie Fitzwater 14.4 26.8 3.85
## 11 Wade Garten 17.2 21.6 3.73
## 12 Stasia Ecton 13.6 24.2 3.30
## 13 Maisie Mcpherson 13.3 24.4 3.24
## 14 Caitlin Coffey 14.2 22.3 3.15
## 15 Nubia Ordway 13.4 21.6 2.90
##
## $less_efficient
## # A tibble: 10 x 4
## Name avg_request_d closing_rate avg_deals_day_rating
## <chr> <dbl> <dbl> <dbl>
## 1 Voncile Grimaldi 7.02 6.27 0.440
## 2 Noel Villicana 6.81 6.31 0.430
## 3 Kathyrn Douglas 4.62 6.71 0.310
## 4 Dia Roux 4.98 6.02 0.300
## 5 Rolanda Bombardier 4.65 6.02 0.280
## 6 Raymundo Veltri 5.02 3.98 0.200
## 7 Sage Clodfelter 3.71 5.12 0.190
## 8 Xuan Shupp 2.48 6.45 0.160
## 9 Maurita Christina 5.61 2.67 0.150
## 10 Francine Gulotta 6.06 1.32 0.0800
##
## $top_dealers
## # A tibble: 15 x 5
## Name Deals_signed avg_request_d closing_rate avg_deals_day_rat~
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Loan Tomita 714 18.4 35.2 6.49
## 2 Leeanna Elza 552 18.4 29.2 5.36
## 3 Alfonso Ches~ 536 22.8 21.0 4.79
## 4 Shelby Schmi~ 525 17.7 27.3 4.82
## 5 Tonette Damm~ 514 14.6 32.6 4.76
## 6 Ricarda Clay~ 488 14.9 29.4 4.40
## 7 Stasia Koski 480 15.1 29.3 4.44
## 8 Lieselotte R~ 475 15.6 28.8 4.48
## 9 Ashly Vaught 469 18.8 24.0 4.51
## 10 Edie Fitzwat~ 412 14.4 26.8 3.85
## 11 Wade Garten 395 17.2 21.6 3.73
## 12 Maisie Mcphe~ 347 13.3 24.4 3.24
## 13 Caitlin Coff~ 346 14.2 22.3 3.15
## 14 Stasia Ecton 346 13.6 24.2 3.30
## 15 Nubia Ordway 313 13.4 21.6 2.90
##
## $bottom_dealers
## # A tibble: 10 x 5
## Name Deals_signed avg_request_d closing_rate avg_deals_day_rat~
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Noel Villica~ 45 6.81 6.31 0.430
## 2 Voncile Grim~ 42 7.02 6.27 0.440
## 3 Kathyrn Doug~ 28 4.62 6.71 0.310
## 4 Rolanda Bomb~ 27 4.65 6.02 0.280
## 5 Dia Roux 18 4.98 6.02 0.300
## 6 Xuan Shupp 11 2.48 6.45 0.160
## 7 Maurita Chri~ 10 5.61 2.67 0.150
## 8 Raymundo Vel~ 9 5.02 3.98 0.200
## 9 Sage Clodfel~ 9 3.71 5.12 0.190
## 10 Francine Gul~ 5 6.06 1.32 0.0800
But the fact that I believe that the best table to determine who is the best/worse agent is the CRM does not mean that the other tables are useless or uninteresting.
So I made specific functions to analyze each one of them.
For the audits table, we have a Customer Opinion Ladder where we have ranked all our agents who have more than X reviews (the default number of minimum reviews is 10) by the opinion of our customers.
And now let me show you how to get that data!
As always we start with loading the table and the functions we are going to need
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)
source(file = "Scripts/Audits.R")
audits <- read_csv2(file = "data/Audits_Employee_performance.csv")
And then we call the necesary function customer_opinion_ladder() that generated the folder results_audits where we store the CSV file for the ladder.
customer_opinion_ladder(audits)
## # A tibble: 154 x 2
## Employee average_score
## <chr> <dbl>
## 1 Christel Selvidge 3.98
## 2 Eugene Coltharp 3.98
## 3 Michell Meek 3.97
## 4 Haydee Maestas 3.93
## 5 Christie Rohr 3.90
## 6 Billye Pittman 3.90
## 7 Henry Neuberger 3.89
## 8 Rayna Harkless 3.86
## 9 Holley Pitzer 3.86
## 10 Lou Wilkerson 3.85
## # ... with 144 more rows
For the phone table, we have several ladders the ones I like the most are the Break_time_ladder ranking the time spend on breaks and the phone_time_ladder ranking the time spend on the phone.
And now let me show you how to get that ladders!
As always we start with loading the table and the functions we are going to need
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)
source(file = "Scripts/telephone.R")
telephone <- read_csv2(file = "data/Telephone_Employee_performance.csv")
And then we call the necessary function interesting_ladders() that generated the folder results_telephone where we store the CSV files for the ladders.
interesting_ladders(telephone)
## $phone_time_ladder
## # A tibble: 227 x 2
## User `Time at the Phone`
## <chr> <chr>
## 1 Adalberto Byrd 24 days 13 hours 39 mins 27 secs
## 2 Armando Magoon 23 days 20 hours 23 mins 44 secs
## 3 Beula Luse 23 days 16 hours 58 mins 48 secs
## 4 Eugene Coltharp 23 days 16 hours 51 mins 00 secs
## 5 Ricarda Claywell 23 days 13 hours 47 mins 01 secs
## 6 Rosenda Duby 23 days 06 hours 04 mins 00 secs
## 7 Opal Vose 22 days 19 hours 37 mins 35 secs
## 8 Fernande Pounders 22 days 16 hours 51 mins 16 secs
## 9 Jacquie Mansfield 22 days 16 hours 49 mins 30 secs
## 10 Nancy Semmes 22 days 09 hours 20 mins 43 secs
## # ... with 217 more rows
##
## $Backoffice_time_ladder
## # A tibble: 227 x 2
## User `Time at the Backoffice`
## <chr> <chr>
## 1 Amira Mcniel 13 days 02 hours 53 mins 37 secs
## 2 Yon Ogden 12 days 16 hours 46 mins 02 secs
## 3 Edie Fitzwater 09 days 04 hours 50 mins 01 secs
## 4 Heath Hendriks 08 days 12 hours 00 mins 41 secs
## 5 Noel Villicana 08 days 10 hours 04 mins 53 secs
## 6 Sanford Brandenburg 08 days 05 hours 14 mins 04 secs
## 7 Sybil Macaraeg 07 days 14 hours 30 mins 41 secs
## 8 Fay Brand 07 days 14 hours 21 mins 52 secs
## 9 Geoffrey Ashley 07 days 08 hours 16 mins 27 secs
## 10 Diana Bolinger 07 days 06 hours 41 mins 11 secs
## # ... with 217 more rows
##
## $Break_time_ladder
## # A tibble: 227 x 2
## User `Time at Break`
## <chr> <chr>
## 1 Margo Corrales 06 days 12 hours 35 mins 05 secs
## 2 Zena Mallet 05 days 10 hours 38 mins 35 secs
## 3 Katy Mcghie 04 days 22 hours 54 mins 10 secs
## 4 Kylie Serpa 04 days 22 hours 17 mins 18 secs
## 5 Harriet Dunkin 04 days 18 hours 02 mins 26 secs
## 6 Kayleigh Noblin 04 days 17 hours 53 mins 30 secs
## 7 Delisa Meador 04 days 17 hours 00 mins 17 secs
## 8 Renata Sokoloff 04 days 15 hours 38 mins 33 secs
## 9 Henry Neuberger 04 days 13 hours 57 mins 21 secs
## 10 Kathey Giefer 04 days 12 hours 53 mins 43 secs
## # ... with 217 more rows
##
## $Logged_time_ladder
## # A tibble: 227 x 2
## User `Time Logged in`
## <chr> <chr>
## 1 Nancy Semmes 30 days 21 hours 29 mins 09 secs
## 2 Armando Magoon 30 days 18 hours 28 mins 57 secs
## 3 Christy Eddie 30 days 08 hours 47 mins 14 secs
## 4 Charlotte Mager 29 days 23 hours 41 mins 14 secs
## 5 Adalberto Byrd 29 days 21 hours 08 mins 53 secs
## 6 Eugene Coltharp 29 days 19 hours 44 mins 45 secs
## 7 Monet Escamilla 29 days 18 hours 28 mins 24 secs
## 8 Lacie Lyle 29 days 18 hours 15 mins 08 secs
## 9 Danna Desoto 29 days 14 hours 48 mins 06 secs
## 10 Renata Sokoloff 29 days 14 hours 35 mins 55 secs
## # ... with 217 more rows
##
## $Calls_handled_ladder
## # A tibble: 227 x 2
## User `Calls handled`
## <chr> <dbl>
## 1 Ashly Vaught 23156
## 2 Sanford Brandenburg 20756
## 3 Vikki Drakeford 20245
## 4 Travis Bosse 20147
## 5 Charlotte Mager 19523
## 6 Alfonso Cheshire 19127
## 7 Ricarda Claywell 18700
## 8 Danna Desoto 18662
## 9 Jolene Childers 18582
## 10 Lyndsey Biehl 18524
## # ... with 217 more rows
Now I’m going to show and comment on the code of those functions and I’m going to go table by table and then function by function.
For the table Customers we have a total of 5 functions which are:
In this function:
pre_process_customers_table <- function (customer)
{
tags <- c("InvoiceNo","StockCode","Description","Quantity","InvoiceDate",
"UnitPrice","CustomerID","Country")
names(customer) <- (tags)
customer <- customer[!is.na(customer$CustomerID),]
# we remove the anonymus coustomers because they aren't usefull for our goals.
customer$InvoiceDate <- dmy_hms(customer$InvoiceDate)
customer <- customer %>% mutate(profit = Quantity*UnitPrice)
# as a general rule I like to work with a different variable than the RAW data
# so I do not have to re-read the data if I mess something in the process ^_^.
customer_time <- customer %>% group_by(InvoiceNo, InvoiceDate) %>%
summarise(CustomerID = unique(CustomerID),
profit = sum(profit),
Country = last(Country))
return(customer_time)
}
In this function:
# Leaving customers function
# so if we consider a repeated customer one that has at least purchased 5 times
# the "kinda" conservative estimation of the average time between purchases is
# around 16.32 days so if we leave around 3 times that timeframe,
# the potential customers that we care about are customers that at least
# purchased 6 times in our shop we need to filter the repeated customers that
# have purchased at least 6 times and haven't purchased in the last
# 50 days, to calculate this we made a function called leaving_clients
leaving_clients <- function(df_like_customer_time,
time_with_out_purchase = 50,
min_num_transactions = 5){
leaving_clients_var <- df_like_customer_time %>%
group_by(CustomerID) %>% summarise(last_trans = max(InvoiceDate),
transactions = n(),
last(Country)) %>%
filter(transactions > min_num_transactions &
last_trans<(Sys.Date() - days(time_with_out_purchase))) %>%
arrange(desc(last_trans))
mainDir <- getwd()
subDir <- "results_clients"
dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
setwd(file.path(mainDir, subDir))
write.csv(x = leaving_clients_var, file = "leaving_clients.csv" )
setwd(mainDir)
return(leaving_clients_var)
}
In this function:
top_spenders <- function(df_with_CustomerID_and_profit,
amount = 150){
temp <- df_with_CustomerID_and_profit %>%
group_by(CustomerID) %>%
summarise(amount_spend = sum(profit)) %>% arrange(desc(amount_spend)) %>%
head(n=amount)
return(temp)
}
This function is nearly identical to the previous only that we don’t sum the total amount spend instead we count the number of transactions that they have done and return the top Y (150 by default) that have done more transactions.
# most_frequent_clients
most_frecuent_buyers <- function(df_with_CustomerID_and_InvoiceNo,
amount = 150){
Invoices_by_ID <- df_with_CustomerID_and_InvoiceNo %>% group_by(InvoiceNo) %>%
summarise(CustomerID = last(CustomerID)) %>% arrange(desc(CustomerID)) %>%
group_by(CustomerID) %>% summarise(frequency = n()) %>%
arrange(desc(frequency)) %>% head(n= amount)
return(Invoices_by_ID)
}
In top_clients we just full_join the result of the previous two functions (top_spenders, most_frecuent_buyers) with the logic that those are the most susceptible to promotions and advertisement because they trust our service and have the money to spend.
# Top clients
top_clients <- function(df_with_CustomerID_profit_and_InvoiceNo){
submit <- inner_join(
x = top_spenders(df_with_CustomerID_profit_and_InvoiceNo),
y = most_frecuent_buyers(df_with_CustomerID_profit_and_InvoiceNo),
by = "CustomerID")
mainDir <- getwd()
subDir <- "results_clients"
dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
setwd(file.path(mainDir, subDir))
write.csv(x = submit, file = "top_clients.csv" )
setwd(mainDir)
return(submit)
}
This is a big one:
Performance_function <- function(classification){
# to start I'm going to calculate the ration of Customer_requests,
# Visits_scheduled, Visits_completed and Deals_signed on average by day and
# seller.
clas_agr_day <- classification %>%
group_by(Date,Name) %>%
summarise(Customer_requests = sum(Customer_requests),
Visits_scheduled = sum(Visits_scheduled),
Visits_completed = sum(Visits_completed),
Deals_signed = sum(Deals_signed))
metrics_sellers <- clas_agr_day %>%
group_by(Name) %>%
summarise(Customer_requests = sum(Customer_requests),
Visits_scheduled = sum(Visits_scheduled),
Visits_completed = sum(Visits_completed),
Deals_signed = sum(Deals_signed),
days = n())
# box <- boxplot(metrics_sellers$days)
# box$out
# In the boxplot we see that the users with less than "35 days of work"
# are outliers by lower range, and because we dont really want to start
# ranking them till we have some significative amount of data, I am going to
# remove all sellers that don't have 45 at least 45 days registered
# which eyeballing the number feels like 2 months of working days
# and I like it.
metrics_sellers_2 <- metrics_sellers %>%
filter(days >= 45 )%>%
mutate("avg_request_d" = round(Customer_requests/days,2),
"avg_visits_d" = round(Visits_scheduled/days,2),
"avg_v_compl_d" = round(Visits_completed/days,2),
"avg_dealst_d" = round(Deals_signed/days,2),
"closing_rate" = round((avg_dealst_d/avg_request_d)*100,2) )
# metrics_sellers_2 %>% select(starts_with("avg"),closing_rate)
# and now I'm going to make 3 different rankings
# Dedication Ladder: in this rank will be the X sellers that take more requests/day.
# Sells Ladder: this will show the Y sellers with more deals from a date,
# it will count sales since we have a record.
# Effectivity Ladder: this will rank the Z with a higher deals/requests ratio.
dedication_ladder <- metrics_sellers_2 %>%
arrange(desc(avg_request_d)) %>%
select(Name,avg_request_d)
deals_ladder <- metrics_sellers_2 %>%
arrange(desc(Deals_signed)) %>%
select(Name,Deals_signed)
efectivity_ladder <- metrics_sellers_2 %>%
arrange(desc(closing_rate)) %>%
select(Name,closing_rate)
# and the best performers are the joint of the heads of dedication ladder and
# effectivity ladder.
best_performers <- inner_join(head(dedication_ladder,n=50),
head(efectivity_ladder,50),"Name") %>%
mutate (avg_deals_day_rating = avg_request_d *(closing_rate/100)) %>%
arrange(desc(avg_deals_day_rating))
worst_performers <- inner_join(tail(dedication_ladder,n=50),
tail(efectivity_ladder,50),"Name") %>%
mutate (avg_deals_day_rating = avg_request_d *(closing_rate/100)) %>%
arrange(desc(avg_deals_day_rating))
mainDir <- getwd()
subDir <- "results_CRM"
dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
setwd(file.path(mainDir, subDir))
ladders_to_return <- list()
ladders_to_return$most_efficient <- best_performers
write.csv(x = best_performers, file = paste0("most_efficient",".csv"))
ladders_to_return$less_efficient <- worst_performers
write.csv(x = worst_performers, file = paste0("less_efficient",".csv"))
top_dealers <- inner_join(head(deals_ladder,50), best_performers,"Name")
ladders_to_return$top_dealers <- top_dealers
write.csv(x = top_dealers, file = paste0("top_dealers",".csv"))
bottom_dealers <- inner_join(tail(deals_ladder,50),worst_performers,"Name")
ladders_to_return$bottom_dealers <- bottom_dealers
write.csv(x = bottom_dealers, file = paste0("bottom_dealers",".csv"))
setwd(mainDir)
return(ladders_to_return)
}
For the Telephone table we have 2 functions
The objective of this function is to format the names of the agent to be in the same format that the CRM Table, I’m sure there is a way to do it with regular expressions but I’m not very fluent with them and I did not have much time, so I did it in this roundabout way.
format_telephone <- function(telephone_df){
rename <- separate(telephone, User ,
sep = " ",into = c("User","b")) %>% select(User)
telephone$User <- rename$User
names(telephone) <- telephone %>% names() %>% gsub(pattern = " ",replacement = "_")
return (telephone)
}
In this function:
interesting_ladders <- function(telephone_df){
Fmt <- function(x) UseMethod("Fmt")
Fmt.difftime <- function(x) {
units(x) <- "secs"
x <- unclass(x)
NextMethod()
}
Fmt.default <- function(x) {
y <- abs(x)
sprintf("%s%02d days %02d hours %02d mins %02d secs",
ifelse(x < 0, "-", ""), # sign
y %/% 86400, # days
y %% 86400 %/% 3600, # hours
y %% 3600 %/% 60, # minutes
y %% 60 %/% 1) # seconds
}
data <- format_telephone(telephone_df)
agg_by_user <- data %>% select(-Day,-Manager,-Hub) %>%
group_by(User) %>% summarise_all(sum)
rankings_list <- list()
rankings_list$phone_time_ladder <- agg_by_user %>% arrange(desc(Phone_time)) %>%
transmute(User = User,"Time at the Phone" = Fmt(Phone_time))
rankings_list$Backoffice_time_ladder <- agg_by_user %>%
arrange(desc(Backoffice)) %>% transmute(User = User,
"Time at the Backoffice" =
Fmt(Backoffice))
rankings_list$Break_time_ladder <- agg_by_user %>%
arrange(desc(Break)) %>% transmute(User = User,
"Time at Break" =
Fmt(Break))
rankings_list$Logged_time_ladder <- agg_by_user %>%
arrange(desc(Logged_in)) %>% transmute(User = User,
"Time Logged in" =
Fmt(Logged_in))
rankings_list$Calls_handled_ladder <- agg_by_user %>%
arrange(desc(Calls_handled)) %>% transmute(User = User,
"Calls handled" =
Calls_handled)
mainDir <- getwd()
subDir <- "results_telephone"
dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
setwd(file.path(mainDir, subDir))
for (i in 1:length(rankings_list)) {
write.csv(x = rankings_list[[i]],
file = paste0(names(rankings_list)[i],".csv"))
}
setwd(mainDir)
return(rankings_list)
}
In this function:
customer_opinion_ladder<- function(audits,min_reviews = 10)
{
audits$Employee <- gsub(pattern = "@iwg", replacement = "", audits$Employee)
simpleCap <- function(x) {
s <- strsplit(x, " ")[[1]]
paste(toupper(substring(s, 1,1)), substring(s, 2),
sep="", collapse=" ")
}
audits$Employee <- audits %>% separate(Employee, c("A", "B")) %>% select(A,B) %>%
apply(1:2, simpleCap) %>% apply(1,function(x){paste(x[1],x[2])})
# first we group by employe
# names(audits)
#
# class(audits$AnsweredCustomer)
count <- audits %>% group_by(Employee) %>% summarise(reviews = n())
scores <- audits %>% select(names(audits)[c(2,5:15)]) %>%
group_by(Employee) %>% summarise_all(mean)
evaluation <- left_join(scores,count,"Employee") %>% filter(reviews >=10) %>%
select(-reviews)
# evaluation %>% ggplot(aes(x = reviews, fill = "red")) +
# geom_histogram(binwidth = 1)
# giving a fast look to the histogram i fill filter all the users that
# dont have at least 10 reviews so the comparision it has some ground.
evaluation$average_score <- rowMeans(evaluation[,-1],
na.rm = TRUE)
customer_ladder <- evaluation %>% select(Employee,average_score) %>%
arrange(desc(average_score))
mainDir <- getwd()
subDir <- "results_audits"
dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
setwd(file.path(mainDir, subDir))
write.csv(x = customer_ladder,
file = paste0("customer_opinion_ladder",".csv"))
setwd(mainDir)
return(customer_ladder)
}
library(shiny)
library(shinydashboard)
library(DT)
library(rstudioapi)
library(rsconnect)
source(file = "Scripts/Customer Segmentation.R")
customer <- read_csv(file = "data/Datasets for DS - Customer segmentation.csv")
customer_time <- pre_process_customers_table(customer)
# customer_time <- as.data.frame(customer_time)
min_date_clients <- floor_date(x = customer_time$InvoiceDate, unit = "day") %>% min()
max_date_clients <- floor_date(x = customer_time$InvoiceDate, unit = "day") %>% max()
rm(customer)
# CRM block
source(file = "Scripts/Employee performance_CRM.R")
classification_df <- read_csv2(file = "data/CRM_employee.csv")
min_date_CRM <- floor_date(x = classification_df$Date, unit = "day") %>% min()
max_date_CRM <- floor_date(x = classification_df$Date, unit = "day") %>% max()
# Performance_function(classification_df)
# audits block
source(file = "Scripts/Audits.R")
audits <- read_csv2(file = "data/Audits_Employee_performance.csv")
audits$TimeCompleted <- dmy(audits$TimeCompleted)
min_date_audits <- floor_date(x = audits$TimeCompleted, unit = "day") %>% min()
max_date_audits <- floor_date(x = audits$TimeCompleted, unit = "day") %>% max()
# customer_opinion_ladder(audits)
# telephone block
source(file = "Scripts/telephone.R")
telephone <- read_csv2(file = "data/Telephone_Employee_performance.csv")
telephone$Day <- dmy(telephone$Day)
min_date_telephone <- floor_date(x = telephone$Day, unit = "day") %>% min()
max_date_telephone <- floor_date(x = telephone$Day, unit = "day") %>% max()
# 1 Dashbord ----------------------------------------------------------------
# USER INTERFACE
ui <- dashboardPage(
dashboardHeader(title="IWG Regus"),
# _1.1 sidebar --------------------------------------------------------------
dashboardSidebar(
sidebarMenu(id = "pestaña",
menuItem("Clients", tabName = "Clients",
icon = icon("dashboard")),
menuItem("Agents", tabName = "Agents",
icon = icon("align-justify"))
)#cierro sidebarMenu
), #cierro dashboardSidebar
# _1.2 body -----------------------------------------------------------------
dashboardBody(
tabItems(
# _1.2.1 body Clients----------------------------------------------------------
tabItem(tabName = "Clients",
# First row
fluidRow(
column(3,
dateRangeInput(inputId = "range_clients",
label = "Choose Date Range for clients: yyyy-mm-dd",
start = min_date_clients,
end = max_date_clients,
min = min_date_clients,
max = max_date_clients)),
column(3,
radioButtons(inputId = "client_buttons", label = h3("Select Table"),
choices = list("Top Customers" = 1,
"Leaving Customers" = 2),
selected = 1,
inline = TRUE))
),#cierro fluidRow
fluidRow( #second row
wellPanel(dataTableOutput('table_clients'))
)
),#cierro tabItem
# _1.2.2 body Agents----------------------------------------------------------
tabItem(tabName = "Agents",
tabsetPanel(type = "tabs", id = "Agents",
tabPanel("CRM",
# First row
fluidRow(
column(3,
dateRangeInput(inputId = "range_CRM",
label = "Choose Date Range for CRM data: yyyy-mm-dd",
start = min_date_CRM,
end = max_date_CRM,
min = min_date_CRM,
max = max_date_CRM)),
column(3,
radioButtons(inputId = "CRM_buttons", label = h3("Select Table"),
choices = list("Most Efficient Agents" = 1,
"Least Efficient Agents" = 2,
"Most Deals Closed" = 3,
"Least Deals Closed" = 4),
selected = 1,
inline = TRUE))
),#cierro fluidRow
fluidRow( #second row
wellPanel(dataTableOutput('table_CRM'))
)#cierro fluidRow
),#cierro tabpanel CRM
tabPanel("Audits",
# First row
fluidRow(
column(3,
dateRangeInput(inputId = "range_Audits",
label = "Choose Date Range for the Audits: yyyy-mm-dd",
start = min_date_audits,
end = max_date_audits,
min = min_date_audits,
max = max_date_audits)),
column(3,
numericInput(inputId = "min_reviews",
label = "Observations:",
value = 10,
min = 1,
max = 100)
)
),#cierro fluidRow
fluidRow( #second row
wellPanel(dataTableOutput('table_Audits'))
)#cierro fluidRow
),
tabPanel("Telephone",
# First row
fluidRow(
column(3,
dateRangeInput(inputId = "range_telephone",
label = "Choose Date Range for Telephone data: yyyy-mm-dd",
start = min_date_telephone,
end = max_date_telephone,
min = min_date_telephone,
max = max_date_telephone)),
column(9,
radioButtons(inputId = "Telephone_buttons", label = h3("Select Ladder"),
choices = list("Time on the Phone" = 1,
"Time on the Backoffice" = 2,
"Time on Break" = 3,
"Time logged" = 4,
"Number of calls handled" = 5),
selected = 1,
inline = TRUE))
),#cierro fluidRow
fluidRow( #second row
wellPanel(dataTableOutput('table_Telephone'))
)#cierro fluidRow
)#cierro tabpanel
)#cierro tabsetPanel Agents
)#cierro tabItem
)#cierro tabItems
)#cierro dashboardBody
)#cierro dashboardPage
# server ------------------------------------------------------------------
# SERVER
server <- function(input, output) {
# we process the clients data
clients_data <- reactive({
customers_data_reactive <- customer_time %>%
filter(InvoiceDate > input$range_clients[1]) %>%
filter(InvoiceDate < input$range_clients[2])
if(input$client_buttons == 1){
customers_data_reactive2 <- top_clients(customers_data_reactive)
}
else{
customers_data_reactive2 <- leaving_clients(customers_data_reactive)
}
return(customers_data_reactive2)
})
# we process the Agents CRM data
CRM_data <- reactive({
CRM_data_reactive <- classification_df %>%
filter(Date > input$range_CRM[1]) %>%
filter(Date < input$range_CRM[2]) %>%
Performance_function()
if(input$CRM_buttons == 1){toreturn <- CRM_data_reactive[[1]]}
if(input$CRM_buttons == 2){toreturn <- CRM_data_reactive[[2]]}
if(input$CRM_buttons == 3){toreturn <- CRM_data_reactive[[3]]}
if(input$CRM_buttons == 4){toreturn <- CRM_data_reactive[[4]]}
return(toreturn)
})
# we process the Agents Audits data
Audits_data <- reactive({
Audits_data_reactive <- audits %>%
filter(TimeCompleted > input$range_Audits[1]) %>%
filter(TimeCompleted < input$range_Audits[2])
opinion <- customer_opinion_ladder(Audits_data_reactive,
min_reviews = input$min_reviews)
return(opinion)
})
# we process the Agents Telephone data
Tel_data <- reactive({
Tel_data_reactive <- telephone %>%
filter(Day > input$range_telephone[1]) %>%
filter(Day < input$range_telephone[2]) %>%
interesting_ladders()
if(input$Telephone_buttons == 1){ladder_to_return <- Tel_data_reactive[[1]]}
if(input$Telephone_buttons == 2){ladder_to_return <- Tel_data_reactive[[2]]}
if(input$Telephone_buttons == 3){ladder_to_return <- Tel_data_reactive[[3]]}
if(input$Telephone_buttons == 4){ladder_to_return <- Tel_data_reactive[[4]]}
if(input$Telephone_buttons == 5){ladder_to_return <- Tel_data_reactive[[5]]}
return(ladder_to_return)
})
output$table_clients <- renderDataTable ({ clients_data() })
output$table_CRM <- renderDataTable ({ CRM_data() })
output$table_Audits <- renderDataTable ({ Audits_data() })
output$table_Telephone <- renderDataTable ({ Tel_data() })
}#cerramos sistem
# My app
shinyApp(ui, server)
This is a Minimum Viable Product, some of the options of the functions aren’t implemented in the general one, and we aren’t asking for input on the options, the functions are pretty fragile and there is a lot of information that it’s not used. Especially the managers have been ignored in all this task, and there could be some really interesting rankings to be done with them. Also, the region is largely ignored.
Also, I missed tables with the contact information of the clients so the outputs could be more complete.
There is also a lot of potential for improving the Dashboard. Starting with refreshing the data automatically, to graphs showing the temporal progression of leads, visits scheduled, visits done and deals closed by timeframe in the company, so we could see a visual representation of the growth of the company/employee. Just to mention a few possibilities.
Also we could do a cohort rentetion analysis in the CRM table to observe the evolution of the results of the company over time.
Some side approaches would be interesting to perform like personalized recommendations/offers, which could be done with R using the apriori function from the arules package or with deep learning using collaborative filtering and fast.ai a library constructed on pytorch.