Purpose/Context: Project Everlast staff completed about 25 PAVE screening’s as part of their pilot period. As part of this, they also offered their input on whether or not they agreed with PAVE’s risk assessment. This offers a really valuable “golden” dataset where we have fully completed Screens alongside practitioner assessment of risk. We can use this information to train a model to mimic the risk assessment based on the decision making criteria that Project Everlast staff employ.

Understanding the Training Data

Training data includes case attributes alongside Project Everlast staff’s risk assessment. We can think of this as manually labeled data.

Example training data, with staff assessment
Gender Black.or.African.American Limited.independence staff_assessment
M 0 0 0
F 0 0 1
F 1 1 0

Train the Model

Train a Naive Bayes model to predict traffikcing risk based on all case information & Project Everlast staff’s assessment of the case. Train the model exclusively on cases where the the Project Everlast staff made an assessment on the case’s risk level.

risk_model <- naive_bayes(staff_assessment ~ # staff assessment of risk is the outcome of interest
                            ., # pass in every possible predictor variable ("." refers to all variables)
                            laplace = 1, # incorporate Laplace transformation to account for 0s
                            data=training_data)

Check out under the hood of the model

summary(risk_model)
## 
## ================================== Naive Bayes ================================== 
##  
## - Call: naive_bayes.formula(formula = staff_assessment ~ ., data = training_data,      laplace = 1) 
## - Laplace: 1 
## - Classes: 2 
## - Samples: 20 
## - Features: 64 
## - Conditional distributions: 
##     - Bernoulli: 35
##     - Categorical: 29
## - Prior probabilities: 
##     - 0: 0.6
##     - 1: 0.4
## 
## ---------------------------------------------------------------------------------

Use the model to predict risk assessment

test_data <- add_predictions(data, risk_model, var = "risk_prediction", type = NULL)

See how the model did compared to PAVE risk assessments

In general, the new predicted risk values agree with PAVE-generated risk assessments. The model’s predictions agree with PAVE risk assessments for about 77% of cases. Here’s some examples of the output:

Comparing predictions to PAVE asessments
Case.No. Gender PAVE_Generated_Risk User_Agree Bayes_Model_Generated_Risk model_agrees_with_PAVE
18 MAFE1226 F 1 Agree 1 1
19 tySm0317 M 1 Unsure 1 1
20 JaBe0527 N 1 Agree 1 1
21 ALMC0103 F 0 Agree 0 1
22 DERO1299 F 0 Agree 0 1

Consider case’s where staff disagreed with PAVE’s risk assessment

The newly trained model can generate risk assessments in cases where staff disagreed with PAVE’s assessment. These newly generated risk assessments are generated by the way Project Everlast’s staff reasoned through risk assessment themselves. Intuitively then, the model’s predicted risk level agrees with the staff’s assessment, but disagrees with PAVE’s

Comparing PAVE Risk, User Assessment, and Bayes-generated Risk.
Case.No. Gender PAVE_Generated_Risk User_Agree Bayes_Model_Generated_Risk
ISBU1008 F 1 Disagree 0
SiKe1222 F 1 Disagree 0
LASH0211 F 1 Disagree 0
COOL0912 M 1 Disagree 0

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.