Data set preparation and inspection

First six observations of the newly created data sets

Client Region Customer.Type Product Status Dealsize exp_time
Acesonzap Austria Existing Customer BI Lost 26000 1
Applex Austria Existing Customer BI Won 74000 2
bam-hex Austria New Customer BI Lost 21000 2
Basecare Germany New Customer ERP Won 111000 1
Biozumzap Austria Existing Customer BI Lost 214000 3
Blacklax Germany New Customer ERP Lost 5000 2

Contingency table of the counts

Existing Customer New Customer
Lost 0.58 0.7
Won 0.42 0.3
Austria Germany Switzerland
Lost 0.77 0.44 0.57
Won 0.23 0.56 0.43
BI ERP
Lost 0.77 0.46
Won 0.23 0.54

Create a sales revenue forecast for 10-18

I first decided to forecast the wins (1) and loses (0) for 10-18 by running a logistic regression on the previous months. Because the revenue depends on the probability of winning a deal.

Variables considered for the analysis:

Logistic regression results
Status
Constant -1.640*
(0.892)
Customer.TypeNew Customer -0.528
(0.603)
RegionGermany 1.287*
(0.691)
RegionSwitzerland 0.824
(0.747)
ProductERP 1.224**
(0.583)
exp_time -0.066
(0.270)
Dealsize 0.00000
(0.00000)
84.449
N 63
Log Likelihood -35.224
Akaike Inf. Crit. 84.449
Notes: ***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.

Summary

In a nutshell, results suggest that:

  • Customer type does not play a role in term of winning or losing a contract even though I suspected that a contract is won when the client negotiate with an existing customer
  • Client significantly wins more in Germany with regard to Austria (p-value < 0.1), and there is no significant difference in the probability of winning comparing Austria with Switzerland
  • The probability of selling ERP products is significantly more than the propbability of selling a BI product
  • Expected time and deal size do not play a role in the probability of winning or losing a deal

Forecast for 10 18

I build a confidence interval for each deal’s outcome in 10 18 using t-distribution with a degrees of freedom 11 (n-1).

Client Dealsize Region Customer.Type Product exp_time lwr_prob Prob upr_prob Status Status_bin Revenue
Toolcompany 50000 Austria New Customer ERP 1 0.110 0.294 0.477 Lost 0 0
tanteemma 77000 Germany Existing Customer ERP 2 0.538 0.720 0.902 Won 1 77000
Swups 80000 Switzerland Existing Customer ERP 1 0.386 0.635 0.885 Won 1 80000
Superdoopa 33000 Austria New Customer ERP 1 0.100 0.285 0.469 Lost 0 0
fixidea 222000 Germany New Customer BI 2 0.143 0.396 0.648 Lost 0 0
Tortenmacher 44000 Austria Existing Customer ERP 0 0.149 0.426 0.703 Lost 0 0
tobado 55000 Austria Existing Customer ERP 0 0.158 0.433 0.708 Lost 0 0
Rumsdidums 300000 Switzerland Existing Customer BI 0 0.102 0.495 0.889 Lost 0 0
MaxSteel 22000 Austria New Customer ERP 0 0.073 0.292 0.511 Lost 0 0
beatmeat 111000 Germany Existing Customer BI 0 0.164 0.486 0.807 Lost 0 0
Ausdiemaus 99000 Switzerland New Customer BI 0 0.014 0.253 0.493 Lost 0 0
Arrow 200000 Austria Existing Customer BI 0 0.033 0.248 0.464 Lost 0 0
  • There is still a probability for the client to lose the deal with “Swups”: CI (0.386, 0.885)

Probability of winning opportunities for different criteria in a certain month

Client Dealsize Region Customer.Type Product exp_time Prob Status
A 98619.05 Austria Existing Customer BI 2 0.181 Lost
A 98619.05 Germany Existing Customer BI 2 0.445 Lost
A 98619.05 Switzerland Existing Customer BI 2 0.335 Lost
A 98619.05 Austria New Customer BI 2 0.115 Lost
A 98619.05 Germany New Customer BI 2 0.321 Lost
A 98619.05 Switzerland New Customer BI 2 0.229 Lost
A 98619.05 Austria Existing Customer ERP 2 0.429 Lost
A 98619.05 Germany Existing Customer ERP 2 0.732 Won
A 98619.05 Switzerland Existing Customer ERP 2 0.632 Won
A 98619.05 Austria New Customer ERP 2 0.307 Lost
A 98619.05 Germany New Customer ERP 2 0.616 Won
A 98619.05 Switzerland New Customer ERP 2 0.503 Won

Summary

  • Germany / existing customer / ERP combination tends to win with a probability of 0.732 (WINS)
  • Switzerland / new customer / BI combination tends to win with a probability of 0.229 (LOSES)
  • Austria / existing customer / ERP combination tends to win with a probability of 0.429 (LOSES)

Alternative forecasting methods

kNN algorithm for forecasting

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Lost Won
##       Lost    9   5
##       Won     0   1
##                                           
##                Accuracy : 0.6667          
##                  95% CI : (0.3838, 0.8818)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.40322         
##                                           
##                   Kappa : 0.1935          
##  Mcnemar's Test P-Value : 0.07364         
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.1667          
##          Pos Pred Value : 0.6429          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.6000          
##          Detection Rate : 0.6000          
##    Detection Prevalence : 0.9333          
##       Balanced Accuracy : 0.5833          
##                                           
##        'Positive' Class : Lost            
## 

Time-series analysis (ARIMA)

Further methods

  • Bayesian model for binary Markov chains
  • Hidden Markov Models for Time Series