Project #5 - Evaluating Predictive Performance
Do Textbook Problems 5.6 and 5.7
A firm that sells software services has been piloting a new product and has records of 500 customers who have either bought the services or decided not to. The target value is the estimated profit from each sale (excluding sales costs). The global mean is $2128. However, the cost of the sales effort is not cheap—the company figures it comes to $2500 for each of the 500 customers (whether they buy or not). The firm developed a predictive model in hopes of being able to identify the top spenders in the future. The lift and decile charts for the validation set are shown in Figure 5.13.
If the company begins working with a new set of 1000 leads to sell the same services, similar to the 500 in the pilot study, without any use of predictive modeling to target sales efforts, what is the estimated profit?
Without the use of predictive modeling, the estimated profit off 1000 cases would be 1000 * $2,128 = $2,128,000, but it would cost a total of 1000 * $2,500 = $2.5 million to obtain this profit. The aim of building a model and analyzing the lift-chart is to keep the profit high, but relatively cut down the costs to obtain these profits.
If the firm wants the average profit on each sale to at least double the sales effort cost, and applies an appropriate cutoff with this predictive model to a new set of 1000 leads, how far down the new list of 1000 should it proceed (how many deciles)?
In order to make a profit double the obtainment cost, the firm could use their predictive model and just target the first 10% (first decile). As seen in the decile-wise lift chart, this 10% would make a profit slightly >2.0 times the sales effort cost.
Still considering the new list of 1000 leads, if the company applies this predictive model with a lower cutoff of $2500, how far should it proceed down the ranked leads, in terms of deciles?
A cutoff of $2500 is equivalent to profit that is 1.0x the sales effort cost. Looking at the decile-wise lift chart, they could target the top 600 customers (6 deciles or 60% of the leads) and obtain an average profit of $2500.
Why use this two-stage process for predicting sales—why not simply develop a model for predicting profit for the 1000 new leads?
Not all customers are as likely to purchase the services. Some customers will buy the services 99 times out of 100, others might only buy 1 time out of 100. If we simply developed a model to predict the profit of 1000 new leads, we are averaging the amount of profit evenly amongst the 1000 leads. However, by creating a model and making it perform on a validation or hold-out set, we can see what percent of customers are more likely or less likely to buy the services. We can maximize our profits:cost ratio by targeting this specific subset of leads. Additionally, maybe targeting 1000 leads takes too much money, time, or resources, and we can only afford to target a subset of the 1000 leads.
Table 5.7 shows a small set of predictive model validation results for a classification model, with both actual values and propensities.
propensity = c(.03, .52, .38, .82, .33, .42, .55, .59, .09, .21, .43, .04, .08, .13, .01, .79, .42, .29, .08, .02)
actual = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
table5.7 <- data.frame(propensity, actual)
table5.7[order(table5.7$propensity, decreasing = TRUE), ]
## propensity actual
## 4 0.82 1
## 16 0.79 1
## 8 0.59 0
## 7 0.55 1
## 2 0.52 0
## 11 0.43 0
## 6 0.42 0
## 17 0.42 0
## 3 0.38 0
## 5 0.33 0
## 18 0.29 0
## 10 0.21 0
## 14 0.13 0
## 9 0.09 0
## 13 0.08 0
## 19 0.08 0
## 12 0.04 0
## 1 0.03 0
## 20 0.02 0
## 15 0.01 0
Part a. Calculate error rates, sensitivity, and specificity using cutoffs of 0.25, 0.5, and 0.75.
Confusion Matrix interpretation:
Prediction: 1 if Propensity > cutoff, 0 if Propensity < cutoff
Reference: 1 if Actual == 1, 0 if Acutal == 0.
Top Left: True Positive - Correctly Classified as ‘1’
Bottom Left: False Negative (Type II error) - Incorrectly classified as ‘0’ when it is actually a ‘1’
Top Right: False Positive (Type I error) - Incorrectly classified as ‘1’ when it is actually a ‘0’
Bottom Right: True Negative - Correctly Classified as ‘0’
library(caret)
cutoffs = c(0.25, 0.5, 0.75)
for (x in cutoffs){
p = ifelse(propensity > x, 1, 0)
M = confusionMatrix(factor(p, 1:0), factor(actual, 1:0))
print(paste0("Cutoff: ", x))
print(M$table)
}
## [1] "Cutoff: 0.25"
## Reference
## Prediction 1 0
## 1 3 8
## 0 0 9
## [1] "Cutoff: 0.5"
## Reference
## Prediction 1 0
## 1 3 2
## 0 0 15
## [1] "Cutoff: 0.75"
## Reference
## Prediction 1 0
## 1 2 0
## 0 1 17
The error rate is the proportion of misclassified results. Errors can be Type I or Type II errors.
Sensitivity is the proportion of those correctly identified as positive to all those that are positive.
Specificity is the proportion of those correctly identified as negative to all those that are negative.
When the cutoff is 0.25,
When the cutoff is 0.5,
When the cutoff is 0.75,
Part b. Create a decile-wise lift chart in R.
library(gains)
gain <- gains(table5.7$actual, table5.7$propensity)
barplot(gain$mean.resp / mean(table5.7$actual),
names.arg = gain$depth, xlab = "Percentile",
ylab = "Mean Response", main = "Decile-wise lift chart")
As we can see from the decile-wise lift chart, all of the 1’s take place within the top 20% (2 deciles).