Statistical Analysis of Sales Data

Julian Hatwell
Aug 2016

Introduction

The client’s sales department is organised into groups A and B. Members of Group B are set much lower quotas (sales targets) on average, reflecting a difference in difficulty to sell to group B customers.

Business Problem

Is there any difference performance between the two groups? What (if any) anomalies are present? Make recommendations based on the findings.

Executive Summary

Members of Group B where much more sensitive to quota size, with diminishing chances of meeting their quota as quota increased over a very small range. This reflected the additional challenge of selling to this customer segment.

Methodology

We will present the descriptive statistics. We will then analyse groups using a Bayesian statistical test. Bayesian methods are preferred for this analysis, because the data is a complete census (the entire sales department). The interpretation is more intuitive and we do not intend to make inferences about a larger population. Furthermore, checking assumptions and diagnostics for packaged Bayesian models is much less time consuming.

Descriptive

The following listing shows the top 6 rows of the data file followed by a standard summary for the whole sample, and separately by group.

## # A tibble: 6 × 5
##   Group Attainment KSales KQuota MetTarget
##   <fct>      <dbl>  <dbl>  <dbl> <lgl>    
## 1 A          104.    33.7   32.4 TRUE     
## 2 A           92.6   25.0   26.9 FALSE    
## 3 A          109.    29.9   27.4 TRUE     
## 4 A          107.    24.2   22.6 TRUE     
## 5 A           92.9   23.2   25.0 FALSE    
## 6 A           98.2   24.7   25.1 FALSE
## Both Groups
##  Group     Attainment         KSales           KQuota       MetTarget      
##  A:346   Min.   : 55.35   Min.   : 6.282   Min.   : 6.813   Mode :logical  
##  B: 76   1st Qu.: 93.25   1st Qu.:20.279   1st Qu.:20.714   FALSE:233      
##          Median : 98.68   Median :28.271   Median :27.791   TRUE :189      
##          Mean   : 99.37   Mean   :30.140   Mean   :30.363                  
##          3rd Qu.:104.00   3rd Qu.:37.284   3rd Qu.:37.786                  
##          Max.   :150.12   Max.   :86.803   Max.   :82.044
## Group A only
##  Group     Attainment         KSales           KQuota       MetTarget      
##  A:346   Min.   : 73.73   Min.   : 8.969   Min.   : 9.873   Mode :logical  
##  B:  0   1st Qu.: 93.88   1st Qu.:23.605   1st Qu.:24.513   FALSE:190      
##          Median : 99.11   Median :30.601   Median :31.171   TRUE :156      
##          Mean   : 99.37   Mean   :32.949   Mean   :33.154                  
##          3rd Qu.:103.75   3rd Qu.:40.309   3rd Qu.:39.982                  
##          Max.   :139.73   Max.   :86.803   Max.   :82.044
## Group B only
##  Group    Attainment         KSales           KQuota       MetTarget      
##  A: 0   Min.   : 55.35   Min.   : 6.282   Min.   : 6.813   Mode :logical  
##  B:76   1st Qu.: 87.72   1st Qu.:13.770   1st Qu.:13.089   FALSE:43       
##         Median : 95.00   Median :16.462   Median :17.322   TRUE :33       
##         Mean   : 99.33   Mean   :17.353   Mean   :17.656                  
##         3rd Qu.:107.80   3rd Qu.:20.534   3rd Qu.:20.392                  
##         Max.   :150.12   Max.   :39.890   Max.   :43.988
## Quota to Sales Correlation
## [1] 0.9776268

Remarks

As expected, given the brief, Group B has lower Quota and Sales than Group A. There appears to be a very strong correlation between Quota and Sales. We asked the client to consider what this correlation implies. Does the person setting the quota (targets) at the start of the sales year have an uncanny grasp of the annual sales process? Are sales team members working up to the quotas but taking their foot off the gas when they know they’ve qualified for a bonus? If so, could a stretch target change the outcomes? These are causal questions that we could not possibly answer with this simple data set but it led to some interesting follow up discussions.

Differences in Performance Within Groups

We will consider the two groups separately, given the known difference between groups already described.

For those who made their targets compared to those who didn’t, it’s useful to know how much their quotas were influential. A boxplot helps to visualise this for the two groups:

In Group A, there is no real difference in the mean Quotas (\(\approx\) 0.1K sales quota units) between those who met their target and those who didn’t. For Group B, this difference is notably \(\approx\) -2.36K sales quota units i.e. somewhat lower among those who met their target than those who did not.

However, with such a large range of quotas (6.81, 32.32), is this difference important? A Bayesian t-test can help to answer this question. For comparison, we show the test results for both groups.

Bayesian t-test Group A

The bottom two charts show the posterior (distribution) of the mean difference (Group A: Met Target vs Group A: Target not Met). We can see the posterior is centred very close to zero difference.

Bayesian t-test Group B

The bottom two charts show the posterior distribution of the mean difference (Group B: Met Target vs Group B: Target not Met). We can see the posterior is centred at -2.27 which aligns with the above box plot. The effect size (relative to the variance) is considered medium at -0.41. Zero is inside the credible interval, but there is a \(\approx 95\%\) that the Met Target sub group have a lower median target than the Target not Met sub group.

Remarks

The obvious difference between groups is expected, as per the client brief (different customer segments), so the groups were considered separately here. A Bayesian t-test confirms the visual assessment that there is no significant difference in the median quotas set for the sub-groups in Group A (those who met their target vs those who missed it). One the other hand, we have evidence to suggest that this is not the case for Group B. Those who met their targets (taken as a group) may have had slightly more achievable targets than those who did not. Note that the evidence is borderline credible.

A standard t-test (not shown) produces a very similar result but the difference in Group B would be reported as non-significant based on the p-values (p = 0.056). Again, we might consider this a borderline result but the Bayesian interpretation is more intuitive.

A visual analysis of the MCMC diagnostics (not shown) revealed no problems with the test convergence.

Closing Remarks

In light of this finding we suggest a review of the quota setting for group B, in order to ensure fairness across the team.

Appendix: Code

library(readr)
library(dplyr)
library(lattice)
library(vcd)
library(BayesianFirstAid)
source("TC_Theme.R")
link <- "TC_SalesQuota.csv"
raw.data <- read_csv(link) %>%
  mutate(Group = as.factor(Group)
            , KSales = Sales/1000
            , KQuota = Quota/1000
            , Sales = as.integer(Sales)
            , Quota = as.integer(Quota)
            , MetTarget = Attainment >= 100)

get.col <- function(colnm, grp) {
  as.matrix(raw.data %>% filter(Group == grp) %>% dplyr::select(colnm))
}
set.seed(12321)
k.data <- raw.data %>% dplyr::select(-Sales, -Quota)
head(k.data)
cat("Both Groups")
summary(k.data)
cat("Group A only")
summary(k.data %>% filter(Group == "A"))
cat("Group B only")
summary(k.data %>%  filter(Group == "B"))
cat("Quota to Sales Correlation")
with(raw.data, cor(KSales, KQuota))
xyplot(KSales~KQuota, data=raw.data
       , groups = Group
       , par.settings = MyLatticeTheme)
mns <- round(tapply(raw.data$KQuota, list(raw.data$Group, raw.data$MetTarget), mean),2)
mns <- mns[, c("TRUE", "FALSE")]
bwplot(KQuota ~ factor(MetTarget, levels = c(TRUE, FALSE)) | Group
       , data = raw.data
       , scales = list(format = list(digits = 10))
       , par.settings = MyLatticeTheme
       , strip = MyLatticeStrip
       , xlab = "Met target"
       , ylab = "Quota (1000's units)"
       , main = "Group performance to targets"
       , panel = function(x, y, ...) {
         panel.bwplot(x, y, ...)
         panel.average(x, y
                       , lwd = 2
                       , lty = 1
                       , col = myPalDark[1]
                       , ...)
         panel.text(1:2
                    , mns[which.packet(),] + 5
                    , mns[which.packet(),]
                    , col = myPalDark[1]
                    , cex = 1.1
                    , fontface = "bold"
                    , ...)
       }
)
tta <- bayes.t.test(get.col("KQuota","A")[get.col("MetTarget", "A")]
            , get.col("KQuota","A")[!get.col("MetTarget", "A")]
            , var.equal = TRUE)

tta$x_name <- "Group A: Met Target"
tta$y_name <- "Group A: Target not Met"

plot(tta)
ttb <- bayes.t.test(get.col("KQuota","B")[get.col("MetTarget", "B")]
            , get.col("KQuota","B")[!get.col("MetTarget", "B")]
            , var.equal = TRUE)

ttb$x_name <- "Group B: Met Target"
ttb$y_name <- "Group B: Target not Met"

plot(ttb)