Read predictions and true values
library(dplyr)
library(readr)
res = read_csv("meetup_presentation/warm_prediction_result.csv")
head(res)
## # A tibble: 6 × 2
## pred truth
## <dbl> <chr>
## 1 0.02988453 no
## 2 0.02965572 no
## 3 0.02965572 no
## 4 0.03435142 no
## 5 0.02965572 no
## 6 0.02922404 no
library(plotly)
plot_ly(x = pred, group = truth, data = res, type = "box") %>%
layout(title = "Predicted probability of Positive Response vs True Response")
Calculate the minimum required probability of “yes”.
Value of positive answer is 50€
Cost of calling is -2.50€
We can formulate this as
tpb = 50 - 2.50
fnc = 0
tnb = 0
fpc = - 2.50
pR * (50 - 2.50) + (1 - pR) * (-2.50) > 0
Which equates to
pR > -vNR / vR ==> pR > 2.5 / 50
pR > 0.05
Now we can use this value to interpret the decisions from the model probabilities and calculate a confusion matrix.
pR = 0.05
res = res %>%
mutate(decision = ifelse(pred > pR, yes = "model.yes", no = "model.no")) %>%
mutate(decision = as.factor(decision),
truth = as.factor(truth))
conf.mat = table(res$decision, res$truth)
conf.mat
##
## no yes
## model.no 13310 478
## model.yes 23227 4161
conf.mat.rate = conf.mat/sum(conf.mat)
conf.mat.rate
##
## no yes
## model.no 0.3232466 0.0116087
## model.yes 0.5640907 0.1010540
Model accuracy = correct class predictions / total predictions = 0.42. But what does it tell us?
We see that the model is heavily biased towards a yes decision and makes a lot of false positives, even more than true positives. However, it is also performing well on identifying true negatives (which are the majority anyway).
Is there a point in using this inaccurate model? Let’s calculate the expected profit.
First, we need to calculate the class priors …
Total = nrow(res)
Pos = 478 + 4161
Neg = Total - Pos
pP = Pos/Total
pN = 1 - pP
… and the rates
tpr = 4161/Pos
fnr = 1 - tpr
tnr = 13310 / Neg
fpr = 1 - tnr
Now we are ready to calculate the expected profit using the formula:
Expected Profit = $p(P) * (tpr * tpb + fnr * fnc) + p(N) * (tnr * tnb + fpr * fpc) $
expP = pP * (tpr * tpb + fnr * fnc) + pN * (tnr * tnb + fpr * fpc)
Expected Profit = 3.39 € per customer
So using the model for campaign planning for 41176 customers, we can expect to earn in total
139.58 k€ ($150k).
Not bad!