Model Predictions vs Test Data

Read predictions and true values

library(dplyr)
library(readr)

res = read_csv("meetup_presentation/warm_prediction_result.csv")
head(res)
## # A tibble: 6 × 2
##         pred truth
##        <dbl> <chr>
## 1 0.02988453    no
## 2 0.02965572    no
## 3 0.02965572    no
## 4 0.03435142    no
## 5 0.02965572    no
## 6 0.02922404    no
library(plotly)
plot_ly(x = pred, group = truth, data = res, type = "box") %>% 
  layout(title = "Predicted probability of Positive Response vs True Response")

Minimum Reqiuired Class Probability & Confusion Matrix

Calculate the minimum required probability of “yes”.

Value of positive answer is 50€
Cost of calling is -2.50€

We can formulate this as

tpb = 50 - 2.50
fnc = 0
tnb = 0
fpc = - 2.50

pR * (50 - 2.50) + (1 - pR) * (-2.50) > 0

Which equates to

pR > -vNR / vR ==> pR > 2.5 / 50

pR > 0.05

Now we can use this value to interpret the decisions from the model probabilities and calculate a confusion matrix.

pR = 0.05

res = res %>% 
  mutate(decision = ifelse(pred > pR, yes = "model.yes", no = "model.no")) %>% 
  mutate(decision = as.factor(decision),
         truth = as.factor(truth))

conf.mat = table(res$decision, res$truth)

conf.mat
##            
##                no   yes
##   model.no  13310   478
##   model.yes 23227  4161
conf.mat.rate = conf.mat/sum(conf.mat)
conf.mat.rate
##            
##                    no       yes
##   model.no  0.3232466 0.0116087
##   model.yes 0.5640907 0.1010540

Model accuracy = correct class predictions / total predictions = 0.42. But what does it tell us?

We see that the model is heavily biased towards a yes decision and makes a lot of false positives, even more than true positives. However, it is also performing well on identifying true negatives (which are the majority anyway).

Is there a point in using this inaccurate model? Let’s calculate the expected profit.

Estimating Class Priors and Model T/F Rates

First, we need to calculate the class priors …

Total = nrow(res)
Pos = 478 + 4161
Neg = Total - Pos

pP = Pos/Total
pN = 1 - pP

… and the rates

tpr = 4161/Pos
fnr = 1 - tpr

tnr = 13310 / Neg
fpr = 1 - tnr

Expected profit

Now we are ready to calculate the expected profit using the formula:

Expected Profit = $p(P) * (tpr * tpb + fnr * fnc) + p(N) * (tnr * tnb + fpr * fpc) $

expP = pP * (tpr * tpb + fnr * fnc) + pN * (tnr * tnb + fpr * fpc)

Expected Profit = 3.39 € per customer

So using the model for campaign planning for 41176 customers, we can expect to earn in total
139.58 k€ ($150k).

Not bad!