Salesforce Pipe Analysis

Introduction

The goal of this project is to identify attributes of contact data that correlate with over performance and under-performance in terms of funnel conversion

Funnel Stages: Inquiry → MQL → SAL → SQL → SQO → Won

Notes on nuance in the data

For all Campaigns except “US_FY17_HDC_SignUp”, here are the definitions:

Inquiry - Exclude these as these contacts will not be worked by sellers^1
MQL - Marketing qualified leads that are routed to sellers to be worked
SAL - Meeting is booked with the contact and an opportunity is created
SQL - Opportunity moved to at least stage 2 “Qualified”^2
SQO - Opportunity moved beyond stage 2
Won - Either opportunity was closed won with Opportunity Amount > $0 OR opportunity was closed lost but closed lost reason = Signed Up for Access^3

“US_FY17_HDC_SignUp” Campaign needs to be analyzed by itself as there is an automation in Salesforce that automatically moves contacts who confirm their emails in the customer’s product from the MQL to SQL stage.

For these contacts, we need to look at SQO and Win conversion which have the same definitions as the other campaigns listed above.

Some Contacts are marked as disqualified. These contacts where Person or Contact Status = Disqualified should be removed from the analysis.

Average win percentage is 45%.

Filtering Data

In order to optimize time working on columns that have relevant information filtering columns that have a threshold of fill rate higher than an arbitrary percentage. The amounts were for 50%, 70% and 90%, respectively, 185, 72 and 60 attributes. The selected threshold is 90%.

	col	uniques
Acquisition Date	Acquisition Date	2851
Acquisition Program	Acquisition Program	55
Add to Marketo Campaign	Add to Marketo Campaign	2842
Admin Update Counter	Admin Update Counter	23
Behavior Score	Behavior Score	7
Campaign Type	Campaign Type	15
City	City	631
Company Name	Company Name	2798
Country	Country	108
Created At	Created At	2842
DF Account Score (Stamped)	DF Account Score (Stamped)	255
DeletedFromDiscoverOrg	DeletedFromDiscoverOrg	2
Email Address	Email Address	2852
Entered Raw	Entered Raw	2798
First Name	First Name	1880
Full Name	Full Name	2805
IT Org Chart	IT Org Chart	90
Job Function Score	Job Function Score	3
Job Level Score	Job Level Score	5
Last Name	Last Name	2340
Last Response Date	Last Response Date	387
Lead Currency	Lead Currency	2
Sales Owner Email Address	Sales Owner Email Address	54
Sales Owner First Name	Sales Owner First Name	53
Sales Owner Last Name	Sales Owner Last Name	55
Person Score	Person Score	14
Person Source	Person Source	6
Lead Source Description	Lead Source Description	10
MQL Day of Week	MQL Day of Week	8
Marketo SFDC ID	Marketo SFDC ID	2842
Marketo Unique Code	Marketo Unique Code	2856
Marketo Unique Name	Marketo Unique Name	2856
Most Recent Campaign	Most Recent Campaign	79
Most Recent Campaign Member Status	Most Recent Campaign Member Status	21
Name Created Date	Name Created Date	453
Number of Opportunities	Number of Opportunities	11
Original Source Info	Original Source Info	81
Original Source Type	Original Source Type	6
Person Time Zone	Person Time Zone	111
Registration Source Info	Registration Source Info	56
Registration Source Type	Registration Source Type	4
Response Score	Response Score	4
Response count	Response count	24
SFDC Campaigns	SFDC Campaigns	324
SFDC Type	SFDC Type	3
Sales Created Date	Sales Created Date	2837
Segmentation_ActiveMetropolis_1005	Segmentation_ActiveMetropolis_1005	18
Segmentation_Approved_1001	Segmentation_Approved_1001	2
Segmentation_ClientsVsCandidates_1008	Segmentation_ClientsVsCandidates_1008	1
Segmentation_CommonEmailExclusions_1004	Segmentation_CommonEmailExclusions_1004	7
Segmentation_DFAccountTier_1012	Segmentation_DFAccountTier_1012	4
Segmentation_GDPROptin_1006	Segmentation_GDPROptin_1006	2
Segmentation_NurturePersona2023_1014	Segmentation_NurturePersona2023_1014	6
Segmentation_Personas_1007	Segmentation_Personas_1007	5
Sourced from Data.com	Sourced from Data.com	2
State	State	223
Status Age	Status Age	325
Status Last Set	Status Last Set	2445
Updated At	Updated At	927
Usertype	Usertype	3
lead_won	lead_won	2

With this in mind a few columns have been selected for a correlation analysis in the next section.

Correlation Analysis

The analysis in this section will be split for each stage of the opportunity pipeline, aiming to find which attribute has highest correlation winning the lead. For numerical attributes the method selected to draw a comparison between itself and the success metric is ANOVA, as for categorical attributes, chi-squared test.

MQL Stage

mql_contact_data <- subset(contact_data,`Marketing Funnel State`=='MQL')

mql_numeric <- sapply(numeric_cols, function(x,cdata=mql_contact_data){
        t_test <- t.test(formula = get(x) ~ factor(lead_won),data = cdata, paired=FALSE)
        x.aov <- aov(formula = get(x) ~ factor(lead_won), data = cdata)
        x.anova <- anova(x.aov)
        list(t_test$p.value, x.anova[1,5])
})
df_num <- as.data.frame(t(data.frame(mql_numeric))) %>% arrange(V1) %>% filter(V1<0.05)
colnames(df_num) <- c('t.test_pvalue','anova_p_value')
knitr::kable(df_num)

	t.test_pvalue	anova_p_value
Admin.Update.Counter	0.001229178	0.0002423441
Behavior.Score	1.450705e-20	2.102016e-09
Job.Function.Score	9.029178e-08	1.459987e-18
Person.Score	1.606398e-11	2.415799e-12
Number.of.Opportunities	1.358728e-30	5.880638e-192

mql_categorical <- sapply(cat_cols, function(x,cdata=mql_contact_data){
        cdata <- na.omit(cdata[,c(x,'lead_won')])
        if(nrow(unique(cdata[x]))>1 & length(unique(cdata$lead_won))>1){
                chisq_test <- chisq.test(cdata[x],cdata$lead_won, simulate.p.value = TRUE)
                list(chisq_test[3]$p.value)
        } else 1
})
df_cat <- as.data.frame(t(data.frame(mql_categorical))) %>% arrange(V1) %>% filter(V1<0.05)
colnames(df_cat) <- c('chisq.test_pvalue')
knitr::kable(df_cat)

	chisq.test_pvalue
Acquisition.Program	0.0004998
Campaign.Type	0.0004998
Person.Source	0.0004998
Most.Recent.Campaign.Member.Status	0.0004998
Original.Source.Type	0.0004998
Registration.Source.Info	0.0004998
Registration.Source.Type	0.0004998
SFDC.Campaigns	0.0004998
SFDC.Type	0.0004998
Usertype	0.0004998
IT.Org.Chart	0.0064968
Region	0.0094953
Sales.Owner.Email.Address	0.0134933
Lead.Source.Description	0.0344828
Job.Level	0.0479760

SQO Stage

sqo_contact_data <- subset(contact_data,`Marketing Funnel State`=='SQO')

sqo_numeric <- sapply(numeric_cols, function(x,cdata=mql_contact_data){
        t_test <- t.test(formula = get(x) ~ factor(lead_won),data = cdata, paired=FALSE)
        x.aov <- aov(formula = get(x) ~ factor(lead_won), data = cdata)
        x.anova <- anova(x.aov)
        list(t_test$p.value, x.anova[1,5])
})
df_num <- as.data.frame(t(data.frame(sqo_numeric))) %>% arrange(V1) %>% filter(V1<0.05)
colnames(df_num) <- c('t.test_pvalue','anova_p_value')
knitr::kable(df_num)

	t.test_pvalue	anova_p_value
Admin.Update.Counter	0.001229178	0.0002423441
Behavior.Score	1.450705e-20	2.102016e-09
Job.Function.Score	9.029178e-08	1.459987e-18
Person.Score	1.606398e-11	2.415799e-12
Number.of.Opportunities	1.358728e-30	5.880638e-192

sqo_categorical <- sapply(cat_cols, function(x,cdata=sqo_contact_data){
        cdata <- na.omit(cdata[,c(x,'lead_won')])
        if(nrow(unique(cdata[x]))>1 & length(unique(cdata$lead_won))){
                chisq_test <- chisq.test(cdata[,x],cdata$lead_won, simulate.p.value = TRUE)
                list(chisq_test[3]$p.value)
        } else 1
})
df_cat <- as.data.frame(t(data.frame(sqo_categorical))) %>% arrange(V1) %>% filter(V1<0.05)
colnames(df_cat) <- c('chisq.test_pvalue')
knitr::kable(df_cat)

	chisq.test_pvalue
Acquisition.Program	0.0004998
Campaign.Type	0.0004998
Sales.Owner.Email.Address	0.0004998
Most.Recent.Campaign.Member.Status	0.0004998
Registration.Source.Type	0.0004998
SFDC.Campaigns	0.0004998
Region	0.0004998
UTMSource	0.0004998
Person.Time.Zone	0.0009995
State	0.0009995
Original.Source.Type	0.0019990
Registration.Source.Info	0.0034983
IT.Org.Chart	0.0059970

In this step we noticed that in the MQL stage all numeric variables have a general correlation to the result (5%), but as for categorical, none of the UTM, Job, City and State parameters are statistically relevant.

In the SQO step there is a notable appearence of the UTMSource variable as relevant.

As for specific analysis for each campaign, its possible to filter data and redo the analysis to see changes in the correlated variables.

UTMSource conversion changes

df <- contact_data %>% filter(!is.na(UTMSource)) %>%
        mutate(quarter=floor_date(ymd(`MQL Date`),'quarter')) %>% 
        filter(year(quarter)==2022) %>%
        group_by(UTMSource,quarter) %>% 
        summarise(.by = ,lead_won_rate=mean(lead_won))
p <- ggplot(df, aes(x=quarter, y=lead_won_rate, group=UTMSource, color=UTMSource)) +
  geom_line(linewidth=1.5) + geom_point() +
  xlab("")
p

Looking at the trends it seems that myPresences has a notable increase in conversion rate