Question:

The team wants to find the month they’re likely to contact the most clients, so they can schedule a product upgrade announcement. Which month does the team tend to contact the greatest percentage of its clients?

Answer: Oct,2013 for detailed information see below.

(In addition to providing an answer, please either include a public link to a spreadsheet showing your work, or describe your process for answering the question.)

library(sqldf)
## Warning: package 'sqldf' was built under R version 3.6.1
## Loading required package: gsubfn
## Warning: package 'gsubfn' was built under R version 3.6.1
## Loading required package: proto
## Warning: package 'proto' was built under R version 3.6.1
## Loading required package: RSQLite
## Warning: package 'RSQLite' was built under R version 3.6.1
library(tidyverse)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.1     v purrr   0.3.2
## v tibble  2.1.1     v dplyr   0.8.1
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(dplyr)
library(readxl)
library(ggplot2)

load data and package

DA <- read_excel("C:/Users/jiayi/personal/study/Job/Amaterial/HireArt/DA_assigment for HireArt.xlsx", col_types = c("text", "text", "date"))
summary(DA)
##  Account manager    Client Name        Date of Contact              
##  Length:1001        Length:1001        Min.   :2013-10-03 00:00:00  
##  Class :character   Class :character   1st Qu.:2014-09-16 00:00:00  
##  Mode  :character   Mode  :character   Median :2015-09-01 00:00:00  
##                                        Mean   :2015-08-22 02:36:48  
##                                        3rd Qu.:2016-09-02 00:00:00  
##                                        Max.   :2017-09-29 00:00:00
head(DA)

change the column name and add month_year column with R

DA = DA %>% rename(manager = `Account manager`, client= `Client Name`, date = `Date of Contact`)
DA$mon_y = format(as.Date(DA$date), "%Y-%m")
DA %>% select(manager)%>% distinct()

order the number of distinct client for each month with SQL

sqldf('SELECT COUNT(DISTINCT client) AS count_client, mon_y FROM DA 
      GROUP BY mon_y ORDER BY COUNT(DISTINCT client) DESC LIMIT 10')

Answer: Here we can find that the 2013-10 had contacted most clients which is 31. Furthermore, we could find some time series patten that each October has more contact clients than other month in each year except 2017.

month_num <- sqldf('SELECT COUNT(DISTINCT client) AS count_client, mon_y FROM DA 
                   GROUP BY mon_y ORDER BY  COUNT(DISTINCT client) DESC')
month_num %>% ggplot(aes(mon_y,count_client))+geom_bar(stat = "identity", 
                                                       aes(fill= count_client))+coord_flip()

order the number of distinct client for each month and each manager with SQL

sqldf('SELECT COUNT(DISTINCT client) AS count_client, mon_y, manager FROM DA 
      GROUP BY mon_y, manager ORDER BY  COUNT(DISTINCT client) DESC LIMIT 10')
sqldf('SELECT COUNT(DISTINCT client) AS count_client, mon_y, manager FROM DA 
      GROUP BY manager ORDER BY  COUNT(DISTINCT client) DESC LIMIT 10')

Answer: Accounting to “month year” and manager group information, we can find that" Chauncey Dach" contacted most customers in Oct, 2013 (2013-10) compare other manager base on every manager had the same number of clients which was 35.