I will show you how to predict propensity of prospects. I will use web clicks data about the links clicked by the user while he is browsing to predict his propensity to buy the product. Using that propensity, you can decide whether you want to offer a chat to the customer with an agent or not
Load the data file for this example and checkout summary statistics and columns for that file.
## Observations: 500
## Variables: 12
## $ SESSION_ID <int> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008...
## $ IMAGES <int> 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1...
## $ REVIEWS <int> 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1...
## $ FAQ <int> 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0...
## $ SPECS <int> 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0...
## $ SHIPPING <int> 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1...
## $ BOUGHT_TOGETHER <int> 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1...
## $ COMPARE_SIMILAR <int> 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1...
## $ VIEW_SIMILAR <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1...
## $ WARRANTY <int> 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1...
## $ SPONSORED_LINKS <int> 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0...
## $ BUY <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1...
The data contains information about the various links on the website that are clicked by the user during his browsing. This is past data that will be used to build the model.
Session ID : A unique identifier for the web browsing sessionBuy : Whether the prospect ended up buying the product0 or 1 indicator to show whether the prospect visited that particular page or did the activity mentioned.## SESSION_ID IMAGES REVIEWS FAQ SPECS SHIPPING BOUGHT_TOGETHER
## 1 1001 0 0 1 0 1 0
## 2 1002 0 1 1 0 0 0
## 3 1003 1 0 1 1 1 0
## 4 1004 1 0 0 0 1 1
## 5 1005 1 1 1 0 1 0
## 6 1006 1 0 0 1 0 1
## COMPARE_SIMILAR VIEW_SIMILAR WARRANTY SPONSORED_LINKS BUY
## 1 0 0 1 0 0
## 2 0 0 0 1 0
## 3 0 0 1 0 0
## 4 1 0 0 0 0
## 5 1 0 0 0 0
## 6 1 0 0 0 0
## SESSION_ID IMAGES REVIEWS FAQ
## Min. :1001 Min. :0.00 Min. :0.00 Min. :0.00
## 1st Qu.:1126 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00
## Median :1250 Median :1.00 Median :1.00 Median :0.00
## Mean :1250 Mean :0.51 Mean :0.52 Mean :0.44
## 3rd Qu.:1375 3rd Qu.:1.00 3rd Qu.:1.00 3rd Qu.:1.00
## Max. :1500 Max. :1.00 Max. :1.00 Max. :1.00
## SPECS SHIPPING BOUGHT_TOGETHER COMPARE_SIMILAR
## Min. :0.00 Min. :0.000 Min. :0.0 Min. :0.00
## 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.0 1st Qu.:0.00
## Median :0.00 Median :1.000 Median :0.5 Median :1.00
## Mean :0.48 Mean :0.528 Mean :0.5 Mean :0.58
## 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.0 3rd Qu.:1.00
## Max. :1.00 Max. :1.000 Max. :1.0 Max. :1.00
## VIEW_SIMILAR WARRANTY SPONSORED_LINKS BUY
## Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.00
## Median :0.000 Median :1.000 Median :1.00 Median :0.00
## Mean :0.468 Mean :0.532 Mean :0.55 Mean :0.37
## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.00 3rd Qu.:1.00
## Max. :1.000 Max. :1.000 Max. :1.00 Max. :1.00
## SESSION_ID IMAGES REVIEWS FAQ SPECS SHIPPING
## [1,] 0.02667655 0.04681922 0.4046284 -0.09513567 0.009949879 -0.02223851
## BOUGHT_TOGETHER COMPARE_SIMILAR VIEW_SIMILAR WARRANTY
## [1,] -0.1035616 0.1905224 -0.09613658 0.1791561
## SPONSORED_LINKS BUY
## [1,] 0.1103284 1
Looking at the correlations above we can see that some features like
REVIEWSBRO_TOGETHERCOMPARE_SIMILARWARRANTYSPONSORED_LINKShave medium correlation to the target variable. We will reduce our feature set to that list of variables.
We now split the model into training and testing data in the ratio of 70:30
## [1] 351 5
## [1] 149 5
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 81 28
## 1 13 27
##
## Accuracy : 0.7248
## 95% CI : (0.6457, 0.7947)
## No Information Rate : 0.6309
## P-Value [Acc > NIR] : 0.009873
##
## Kappa : 0.3738
## Mcnemar's Test P-Value : 0.028784
##
## Sensitivity : 0.8617
## Specificity : 0.4909
## Pos Pred Value : 0.7431
## Neg Pred Value : 0.6750
## Prevalence : 0.6309
## Detection Rate : 0.5436
## Detection Prevalence : 0.7315
## Balanced Accuracy : 0.6763
##
## 'Positive' Class : 0
##
## [1] 0.7248322
## [1] 0.6386626
The probability above can be read as 64% chance that the prospect will buy the product.
Now that the model has been built, let us use it for real time predictions. So when the customer starts visiting the pages one by one, we collect that list and then use it to compute the probability. We do that for every new click that comes in.
So let us start. The prospect just came to your website. There are no significant clicks. Let us compute the probability. The array of values passed has the values for
REVIEWSBOUGHT_TOGETHERCOMPARE_SIMILARWARRANTYSPONSORED_LINKSSo the array is all zeros to begin with
## # A tibble: 1 x 5
## REVIEWS BOUGHT_TOGETHER COMPARE_SIMILAR WARRANTY SPONSORED_LINKS
## * <fctr> <fctr> <fctr> <fctr> <fctr>
## 1 0 0 0 0 0
## 1
## 1 0.05498068
So the initial probability is 5%. Now, suppose the customer does a comparison of similar products. The array changes to include a 1 for that function. The new probability will be
## 1
## 1 0.1245868
It goes up. Next, he checks out reviews.
## 1
## 1 0.4009554
It shoots up to 40+%. You can have a threshold for when you want to offer chat. You can keep checking this probability against that threshold to see if you want to popup a chat window.
This example shows you how you can use predictive analytics in real time to decide whether a prospect has high propensity to convert and offer him a chat with a sales rep/agent.