I will show you how to predict propensity of prospects. I will use web clicks data about the links clicked by the user while he is browsing to predict his propensity to buy the product. Using that propensity, you can decide whether you want to offer a chat to the customer with an agent or not

1 Loading and Viewing Data

Load the data file for this example and checkout summary statistics and columns for that file.

## Observations: 500
## Variables: 12
## $ SESSION_ID      <int> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008...
## $ IMAGES          <int> 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1...
## $ REVIEWS         <int> 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1...
## $ FAQ             <int> 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0...
## $ SPECS           <int> 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0...
## $ SHIPPING        <int> 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1...
## $ BOUGHT_TOGETHER <int> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1...
## $ COMPARE_SIMILAR <int> 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1...
## $ VIEW_SIMILAR    <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1...
## $ WARRANTY        <int> 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1...
## $ SPONSORED_LINKS <int> 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0...
## $ BUY             <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1...

1.1 Data Structure

The data contains information about the various links on the website that are clicked by the user during his browsing. This is past data that will be used to build the model.

  • Session ID : A unique identifier for the web browsing session
  • Buy : Whether the prospect ended up buying the product
  • Other columns : a 0 or 1 indicator to show whether the prospect visited that particular page or did the activity mentioned.
##   SESSION_ID IMAGES REVIEWS FAQ SPECS SHIPPING BOUGHT_TOGETHER
## 1       1001      0       0   1     0        1               0
## 2       1002      0       1   1     0        0               0
## 3       1003      1       0   1     1        1               0
## 4       1004      1       0   0     0        1               1
## 5       1005      1       1   1     0        1               0
## 6       1006      1       0   0     1        0               1
##   COMPARE_SIMILAR VIEW_SIMILAR WARRANTY SPONSORED_LINKS BUY
## 1               0            0        1               0   0
## 2               0            0        0               1   0
## 3               0            0        1               0   0
## 4               1            0        0               0   0
## 5               1            0        0               0   0
## 6               1            0        0               0   0

##    SESSION_ID       IMAGES        REVIEWS          FAQ      
##  Min.   :1001   Min.   :0.00   Min.   :0.00   Min.   :0.00  
##  1st Qu.:1126   1st Qu.:0.00   1st Qu.:0.00   1st Qu.:0.00  
##  Median :1250   Median :1.00   Median :1.00   Median :0.00  
##  Mean   :1250   Mean   :0.51   Mean   :0.52   Mean   :0.44  
##  3rd Qu.:1375   3rd Qu.:1.00   3rd Qu.:1.00   3rd Qu.:1.00  
##  Max.   :1500   Max.   :1.00   Max.   :1.00   Max.   :1.00  
##      SPECS         SHIPPING     BOUGHT_TOGETHER COMPARE_SIMILAR
##  Min.   :0.00   Min.   :0.000   Min.   :0.0     Min.   :0.00   
##  1st Qu.:0.00   1st Qu.:0.000   1st Qu.:0.0     1st Qu.:0.00   
##  Median :0.00   Median :1.000   Median :0.5     Median :1.00   
##  Mean   :0.48   Mean   :0.528   Mean   :0.5     Mean   :0.58   
##  3rd Qu.:1.00   3rd Qu.:1.000   3rd Qu.:1.0     3rd Qu.:1.00   
##  Max.   :1.00   Max.   :1.000   Max.   :1.0     Max.   :1.00   
##   VIEW_SIMILAR      WARRANTY     SPONSORED_LINKS      BUY      
##  Min.   :0.000   Min.   :0.000   Min.   :0.00    Min.   :0.00  
##  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.00    1st Qu.:0.00  
##  Median :0.000   Median :1.000   Median :1.00    Median :0.00  
##  Mean   :0.468   Mean   :0.532   Mean   :0.55    Mean   :0.37  
##  3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.00    3rd Qu.:1.00  
##  Max.   :1.000   Max.   :1.000   Max.   :1.00    Max.   :1.00

2 Perform Correlation Analysis

##      SESSION_ID     IMAGES   REVIEWS         FAQ       SPECS    SHIPPING
## [1,] 0.02667655 0.04681922 0.4046284 -0.09513567 0.009949879 -0.02223851
##      BOUGHT_TOGETHER COMPARE_SIMILAR VIEW_SIMILAR  WARRANTY
## [1,]      -0.1035616       0.1905224  -0.09613658 0.1791561
##      SPONSORED_LINKS BUY
## [1,]       0.1103284   1

Looking at the correlations above we can see that some features like

have medium correlation to the target variable. We will reduce our feature set to that list of variables.

3 Training and Testing Split

We now split the model into training and testing data in the ratio of 70:30

## [1] 351   5
## [1] 149   5

4 Build Model and Check Accuracy

4.1 Build Naive Bayes Classifier

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 81 28
##          1 13 27
##                                           
##                Accuracy : 0.7248          
##                  95% CI : (0.6457, 0.7947)
##     No Information Rate : 0.6309          
##     P-Value [Acc > NIR] : 0.009873        
##                                           
##                   Kappa : 0.3738          
##  Mcnemar's Test P-Value : 0.028784        
##                                           
##             Sensitivity : 0.8617          
##             Specificity : 0.4909          
##          Pos Pred Value : 0.7431          
##          Neg Pred Value : 0.6750          
##              Prevalence : 0.6309          
##          Detection Rate : 0.5436          
##    Detection Prevalence : 0.7315          
##       Balanced Accuracy : 0.6763          
##                                           
##        'Positive' Class : 0               
## 

4.2 Accuracy

## [1] 0.7248322

4.3 Get Soft Predictions

## [1] 0.6386626

The probability above can be read as 64% chance that the prospect will buy the product.

5 Real time predictions

Now that the model has been built, let us use it for real time predictions. So when the customer starts visiting the pages one by one, we collect that list and then use it to compute the probability. We do that for every new click that comes in.

So let us start. The prospect just came to your website. There are no significant clicks. Let us compute the probability. The array of values passed has the values for

So the array is all zeros to begin with

## # A tibble: 1 x 5
##   REVIEWS BOUGHT_TOGETHER COMPARE_SIMILAR WARRANTY SPONSORED_LINKS
## * <fctr>  <fctr>          <fctr>          <fctr>   <fctr>         
## 1 0       0               0               0        0
##            1
## 1 0.05498068

So the initial probability is 5%. Now, suppose the customer does a comparison of similar products. The array changes to include a 1 for that function. The new probability will be

##           1
## 1 0.1245868

It goes up. Next, he checks out reviews.

##           1
## 1 0.4009554

It shoots up to 40+%. You can have a threshold for when you want to offer chat. You can keep checking this probability against that threshold to see if you want to popup a chat window.

This example shows you how you can use predictive analytics in real time to decide whether a prospect has high propensity to convert and offer him a chat with a sales rep/agent.